posts/lc3: added

This commit is contained in:
dogeystamp 2024-01-31 20:21:32 -05:00
parent ef8d87b716
commit dd764615cf
Signed by: dogeystamp
GPG Key ID: 7225FE3592EFFA38
2 changed files with 773 additions and 0 deletions

View File

@ -33,6 +33,12 @@ The backend is written in Python using the Flask framework,
while the frontend is written in TypeScript with Mithril.js. while the frontend is written in TypeScript with Mithril.js.
There are neat features like dark mode and responsive, mobile-friendly design. There are neat features like dark mode and responsive, mobile-friendly design.
## [lc-3 vm](/lc3)
Rust implementation of the Little Computer-3 educational architecture as a virtual machine.
LC-3 is a simplified 16-bit computer that can run programs and games like 2048.
Concepts found in LC-3, like the fetch-execute loop, and assembly, are foundations of modern computing.
## [bitmask](https://github.com/dogeystamp/bitmask) ## [bitmask](https://github.com/dogeystamp/bitmask)
A Python library that helps with manipulating bit flags. It provides A Python library that helps with manipulating bit flags. It provides

767
posts/lc3.md Normal file
View File

@ -0,0 +1,767 @@
# making a virtual machine from scratch (in rust)
2024-01-20
Computers are wonderful machines that can do many things.
However, even though they are very complex, at their core they are surprisingly simple.
In fact, with only basic programming knowledge,
it is possible to simulate a fully-functioning computer akin to modern ones.
In this post, I document how I built a virtual machine for Little Computer 3 (LC-3),
an educational computer model.
LC-3 may be simple, but it works the same way any modern computer does:
through implementing it, you get a glimpse of how real architectures like x86 or ARM work.
Besides that, once you're finished, you can play 2048 in the VM you created from scratch.
Is there anything more gratifying?
Writing a virtual machine is also a great test of your programming ability.
I decided to do this project in Rust, in order to learn how the language works hands-on.
I've mostly had experience with C and Python before this, and this was my first Rust project.
However, this article will focus more on the virtual machine aspect, rather than the Rust.
If you're just looking for the source code, check it out on GitHub: [dogeystamp/lc3-vm](https://github.com/dogeystamp/lc3-vm).
## prerequisite reading
I assume you have a certain amount of fundamental knowledge before reading this article:
- **Programming knowledge** (preferably in Rust or C).
I assume you know programming decently well, but are not familiar with virtual machines.
I'll give code analogies in C, and I assume you can figure out what the Rust means with prior knowledge in other languages.
- **Binary arithmetic**.
Computers, at their fundamental level, deal with exclusively binary.
You should know [bitwise operations](https://www.hackerearth.com/practice/basic-programming/bit-manipulation/basics-of-bit-manipulation/tutorial/),
like left/right shift, AND, NOT, OR, etc. to see how binary is manipulated.
You should also be familiar with hexadecimal and binary number representation.
Besides that, I aim for this article to be shorter and more of an overview of topics that were interesting to me.
For all the context and implementation details, see Justin Meiners' and Ryan Pendleton's [blog post](https://www.jmeiners.com/lc3-vm/)
which meticulously explains LC-3,
as well as the [LC-3 ISA specification](https://www.jmeiners.com/lc3-vm/supplies/lc3-isa.pdf).
## what's inside a VM?
Starting off, here is some theory to get you in context.
Skim this if you are familiar with how everything works.
LC-3 can be modeled with three simple components: the processor and registers (CPU), and memory (RAM).
```
registers
[ ][ ][ ][ ][ ]
[ ][ ][ ][ ][ ]
+-------------+ +------------------------+
| | | |
| processor |-----| memory |
| | | |
+-------------+ | |
+------------------------+
```
### memory
First, *memory* is where most temporary data lives in the computer.
The easiest way for me to visualize it is a huge C-style array:
```
uint16_t mem[65536];
```
Instead of an index, we use an "address" for each element.
Just like an array, we can read and write (get or set) the data at each address.
When the computer shuts down, all data in memory is lost.
(For data to persist, we use storage like hard drives, but LC-3 doesn't have this.)
In LC-3, each element in memory is 16 bits, or 2 bytes.
At this level, memory does not have any types as they do in languages like C:
it's all raw binary.
Typed variables, malloc and the stack are all abstractions on top of memory.
### registers
Second, *registers* are where data that is immediately useful is stored.
They are also 16-bit just like memory elements.
Think of it as having fixed variables.
LC-3 has 10 registers:
- the general purpose R0 to R7,
- Program Counter (PC),
- and the Processor Status Register (PSR).
I'll explain the last two later.
Just like variables, you can assign values to the general purpose registers, and read from them:
```
uint16_t r0 = 0;
...
uint16_t r7 = 0;
r6 = r0 + r7;
r3 -= 3;
```
The reason registers exist when we already have memory is that it's way more convenient to use.
Physically, registers are closer to the processor than memory is,
and are therefore much faster.
Besides that, reading/writing from memory usually takes more instructions than just using registers.
However, in exchange for convenience and speed, you have to deal with a limited amount of registers.
### processor
Finally, the *processor* is where the real computing happens.
The processor reads *instructions* one by one, and executes them.
Instructions are like statements in higher level code, but much simpler.
LC-3 only has 15 different types of instructions,
that do things like "read memory at this address and load the value into R4".
Instructions are just 16-bit binary data.
Again, read the [ISA specification](https://www.jmeiners.com/lc3-vm/supplies/lc3-isa.pdf)
for detailled information about this.
For example, the spec says this is how to ADD two registers and store the result in a third register:
```
ADD
| 0 0 0 1 | DR | SR1 | 0 0 0 | SR2 |
```
DR (Destination Register), SR1 (Source Register) and SR2 are all placeholders for general registers.
For example, if we wanted to do `r1 = r2 + r3`:
```
ADD r1 r2 (...) r3
| 0 0 0 1 | 0 0 1 | 0 1 0 | 0 0 0 | 0 1 1 |
or in hex:
0x1283
```
As you can see, this is pure binary data,
and as such can be read by the processor.
When you compile a C program, machine code like this is what comes out.
There exists *assembly* which is a human-readable version of this binary.
Unlike regular code, it is a 1-to-1 correspondence to the binary.
For example, the above ADD instruction would be
```
ADD R1, R2, R3 ; you can put comments using semicolon
```
Using an assembler, a programmer can convert the assembly code into the pure binary program that LC-3 can read.
### fetch-execute loop
Now that the individual parts are explained, we move on to how it all works together.
So far, I haven't explained where the instructions actually come from.
Since instructions are just binary data, we actually just place a series of them (a program) in memory:
```
address value (hex) equivalent assembly code
0x3000: [e002] (LEA R0, HELLO_WORLD)
0x3001: [f022] (PUTS)
0x3002: [f025] (HALT)
...
```
The Program Counter (PC) register we talked about earlier is a pointer to the instructions inside a program in memory.
In LC-3, as seen above, PC starts at the address `0x3000`.
The processor will perform a *fetch-execute* loop:
- Fetch the instruction in memory using the PC (`instr = mem[PC]`)
- Increment PC (`PC += 1`)
- Execute the instruction
- Repeat on the next instruction
Until it reaches an instruction to stop (HALT), the processor will continue this loop.
Remember that **PC points to the next instruction**, not the current one!
This often tripped me up implementing the virtual machine.
The reason it doesn't point to the current instruction is that it makes the mechanisms in the next section easier to understand.
### control flow
If statements, switches, and loops are all implemented using two instructions, `JMP` and `BR`,
which alter PC.
`JMP` (jump) directly sets the value of PC,
which means on the next fetch-execute loop, the processor will not execute the next instruction,
but rather the instruction at the new PC.
This is what `goto` in C does under the hood.
Meanwhile, `BR` (branch) conditionally sets the value of PC.
Remember the PSR register mentioned earlier?
The bottom few bits in PSR contain *condition flags*.
Condition flags essentially represent the sign of the result of an operation like AND or ADD.
Respectively, they're P (Positive), Z, (Zero), N (Negative).
`BR` works with these flags:
for example `BRnz` is "jump if result is negative or zero".
Let's see how a for loop would be implemented this way:
```
AND R1, R1, #0 ; set R1 to 0 (anything bitwise and 0 is 0)
ADD R1, #5 ; set R1 = R1 + 5
LOOP_START ; this is a label
... (for loop body)
ADD R1, #-1 ; R1 = R1 + (-1) (decrement)
BRp LOOP_START ; loop back. once assembled, the label becomes a numerical address offset
HALT ; we're done
```
The equivalent in C:
```
for (int i = 5; i > 0; i--) {
// for loop body
}
```
It's also common practice to AND a register with itself to check if it is positive/negative/zero.
The the register AND itself is the same value, which allows testing without altering the register's contents.
### subroutine calls
LC-3 has support for "subroutines", which are like functions but less convenient.
Practically, LC-3 can jump into a subroutine, then at the end of the subroutine, jump back to the place where the subroutine was called.
This works using instructions like JSR, and RET.
JSR means "jump subroutine", and it is essentially the same as JMP, however it also saves the current PC into the register R7.
At the end of the subroutine, we put the instruction RET, which is actually a disguised JMP.
RET means "JMP to the address given in R7", which means return to the place where we originally used JSR.
An example usage:
```
AND R1, R1, #0 ; random code
JSR SOME_ROUTINE ; call subroutine
HALT
SOME_ROUTINE
ADD, R1, #1 ; random subroutine code
RET ; return
```
This code will first set R1 to 0, jump into the subroutine, increment R1, then return to the main part and stop the program.
### memory mapped i/o
Earlier, when I described how the LC-3 virtual machine works, I omitted a pretty significant component:
input and output.
I/O is the sole method that the virtual machine can communicate with the outside world, whether it's receiving user input,
or sending output.
LC-3 uses a terminal for I/O, which means it can work with standard input (stdin) and standard output (stdout).
Input is done via keyboard, and output via display.
The way this works is *memory-mapped I/O*.
In LC-3's memory, there are specific addresses that connect to external I/O devices.
```
+----------------+ +----------+
··· -----| memory |-----| terminal |
+----------------+ +----------+
```
This is useful because LC-3 can reuse the existing instructions for loading/storing from memory to talk to the terminal.
When it manipulates the special memory-mapped locations (device registers), it doesn't actually store or load data from memory,
but it communicates with the peripherals like the display or keyboard.
Here is a full list of memory-mapped addresses in LC-3:
```
addr name short description
0xFE00 keyboard status register (KBSR) has a key been pressed?
0xFE02 keyboard data register (KBDR) what key was pressed?
0xFE04 display status register (DSR) can the display receive a character?
0xFE06 display data register (DDR) character to send to the display
0xFFFE machine control register (MCR) power button
```
For example, to read keyboard input, a program would first poll bit 15 (the ready bit) of KBSR to wait for the user to press a key.
If the bit is `0`, then the program keeps waiting.
Otherwise, the bit is `1`, and that means the user pressed a key.
Then, the program reads bits [7:0] from KBDR into its registers.
This contains the key that was pressed, encoded as ASCII.
To read more characters, it would continue this loop.
As an aside, remember that by convention, the least significant bit (the right-most units bit) is bit 0,
and the other bits are numbered right-to-left as 1, 2, 3, and so on.
Not realizing this has caused me issues while implementing this VM,
as the program was reading from bit 15, while my VM was providing information on bit 0 (which if numbered left to right, is bit 15).
Displaying data is similar: the program first polls bit 15 of DSR (the ready bit) until the display is ready to receive a character.
Then, the program stores a character in DDR encoded as ASCII.
This character is finally sent to the display.
The program can also halt the computer (shut it down) by setting MCR to all zeroes.
For detailled information, again, consult the [LC-3 ISA specification](https://www.jmeiners.com/lc3-vm/supplies/lc3-isa.pdf),
specifically the Device Register Assigments.
## assorted implementation details
Now that we've gone through the basics of how LC-3 works, I'll go through some interesting details that I encountered during implementation.
If you came here to follow along implementing yourself, I recommend you read [Meiners' and Pendleton's](https://www.jmeiners.com/lc3-vm/) LC-3 blog post,
which is actually a tutorial.
For a Rust version, see [Rodrigo Araujo's implementation](https://www.rodrigoaraujo.me/posts/lets-build-an-lc-3-virtual-machine/).
### endianness
Endianness is the order bytes are stored in memory within a word (a single piece of data).
There's big endian, and little endian.
By definition, big endian starts with the most significant byte,
and little endian starts with the least significant byte.
By "most significant", it means in numbers like decimal 123, the hundreds position is "more significant" than the units position.
However, I find it more comprehensible to think that big endian is the "natural" order,
while little endian is the "reversed" order.
For example, take the number `0x12345678`.
On a big endian system, it would be stored in memory like this:
```
address value (hex)
0x0001: 12
0x0002: 34
0x0003: 56
0x0004: 78
```
However, on a little endian system, it would be like this:
```
address value (hex)
0x0001: 78
0x0002: 56
0x0003: 34
0x0004: 12
```
[Supposedly, ](https://softwareengineering.stackexchange.com/questions/95556)
it is easier to deal with little endian on processors,
which is why it is used in many popular CPU architectures.
However, LC-3 uses big endian.
This is an issue to consider during implementation.
For example, if you use `hexdump` on a program file, you may see this:
```
0000000 0030 02e0 22f0 25f0 7900 6f00 7500 2000
0000010 6c00 6900 6b00 6500 2000 7600 6900 7200
0000020 7400 7500 6100 6c00 6900 7a00 6900 6e00
0000030 6700 2000 6200 6f00 7900 7300 2000 6400
0000040 6f00 6e00 7400 2000 7900 6f00 7500 0000
```
This is actually incorrect output!
`hexdump` assumes groups of two bytes are a single little-endian word,
so it flips it to make it the proper order.
However, LC-3 data is in big endian order.
`hexdump -C` prints bytes as they are on disk, which produces the proper ordering:
```
00000000 30 00 e0 02 f0 22 f0 25 00 79 00 6f 00 75 00 20
00000010 00 6c 00 69 00 6b 00 65 00 20 00 76 00 69 00 72
00000020 00 74 00 75 00 61 00 6c 00 69 00 7a 00 69 00 6e
00000030 00 67 00 20 00 62 00 6f 00 79 00 73 00 20 00 64
00000040 00 6f 00 6e 00 74 00 20 00 79 00 6f 00 75 00 00
```
It is important to flip bytes or specify that the data is big-endian when reading programs into memory from a file.
### integer overflow
You may know that because integers are represented by a finite amount of bits,
it is possible for them to overflow when they get too big.
For LC-3, we usually implement registers and memory using unsigned 16-bit integers,
which gives us a range of 0-65535.
This is also the limit of our memory's size, since we can not represent an address bigger than that.
The same issue makes 32-bit computers unable to have more than around 4GB of memory (`(1 << 32) - 1`).
When integers overflow, they often wrap around back to 0 or the lowest number possible.
This is necessary behaviour on the LC-3, as it makes it possible to use signed numbers in 2's complement.
There is no subtract operation, we just add negative numbers, and it magically wraps around to the correct value.
However, we usually do not want integer overflow, so Rust complains when it happens:
```
error: this arithmetic operation will overflow
--> test.rs:2:20
|
2 | println!("{}", 65535u16+1u16)
| ^^^^^^^^^^^^^ attempt to compute `u16::MAX + 1_u16`, which would overflow
|
```
Earlier, I mentioned [Rodrigo Araujo's VM](https://www.rodrigoaraujo.me/posts/lets-build-an-lc-3-virtual-machine/),
which was also written in Rust.
This implementation served as a Rust reference for me.
In his instruction implementations, he uses casts to perform wrapping arithmetic:
```
let addr = (vm.registers.get_reg(base_r) as u32 + offset as u32) as u16;
```
First, he adds parameters as 32-bit unsigned ints, then casts it back to 16-bit unsigned.
I personally thought that this would result in truncating the extra bits,
however upon further experimentation it turned out that it does a modulo operation in the cast.
This means that if the value exceeds the u16 limit, it wraps back to 0.
Personally, I found this to be a very janky, implicit way to perform wrapping arithmetic.
After all, it took me multiple Google searches and a bit of testing to be sure of what the code was doing.
In my own code, I use explicit syntax for a wrapping addition:
```
let addr = vm.registers.get_reg(base_r).wrapping_add(offset);
```
This code is much clearer, and, quoting the Zen of Python, `Explicit is better than implicit.`
### traps
Earlier, in the memory-mapped I/O section, we saw how LC-3 can use memory-mapped I/O to talk to external peripherals.
You may have noticed that all of this is a very tedious process just to get some user input.
To simplify things, LC-3 implements *traps*.
Traps are essentially utility subroutines that make life easier.
These can be accessed with the TRAP instruction, along with a code to specify which subroutine the program wants (the trap vector).
However, instead of the programmer writing these subroutines, traps are part of an operating system on the LC-3.
The operating system is also a program (it is also comprised of a bunch of instructions in memory), however it runs with higher privileges than the user program.
The OS is stored in a special location in memory, earlier than the user program memory.
When a TRAP instruction is called, LC-3 takes the trap vector, looks up a corresponding address in the trap vector table (a section in memory before the operating system),
then calls that address as if using the JSR instruction on a subroutine.
These addresses all lead to subroutines within the OS.
For a C analogy, it's like the trap vector table is an array of function pointers,
where the functions are part of the operating system.
Here is a list of trap subroutines in LC-3.
Consult the [LC-3 ISA specification](https://www.jmeiners.com/lc3-vm/supplies/lc3-isa.pdf) for a detailled explanation.
```
trap vector name description
0x20 GETC get a single character from keyboard (like C's getchar())
0x21 OUT put a single character to terminal
0x22 PUTS put a string to terminal
0x23 IN get a single character with echo (show the character typed)
0x24 PUTSP put a string to terminal (two characters packed per memory address)
0x25 HALT shut down the computer
```
As you can see, these are high level wrappers for the memory-mapped I/O seen in the last section,
and are also much friendlier to work with in general.
Importantly, these routines can all be implemented in LC-3 code.
For my own virtual machine though, and Justin Meiner's VM that inspired it, we do not actually write assembly code for trap routines.
Instead, in the VM itself, we intercept these trap calls, and perform the tasks in high-level C or Rust code, instead of LC-3 assembly.
This is generally simpler, although less faithful to the specification.
Because of this, it is also not necessary to implement some of the memory-mapped registers, like the display registers, and the machine control register.
For example, here is my code that performs GETC:
```
fn trap_getc(vm: &mut VM) {
while vm.mem.get_mem(0xFE00) & (1 << 15) == 0 {}
vm.registers.r0 = vm.mem.get_mem(0xFE02) & 0xFF;
}
```
First, we poll the Keyboard Ready bit, then we load the keypress into the VM's registers.
This type of implementation is much more convenient than writing raw assembly.
My own GETC is not really efficient, but using standard library `getchar()` or an alternative would avoid polling the ready bit constantly.
Right now, with the polling loop, we use up a lot of CPU on the host machine running the VM,
when we are doing nothing but waiting.
However, this is the only choice if you actually implement the trap routines in assembly.
### terminal input/output
We've seen in the last section the interface LC-3 provides for I/O, but in this section, I'll explain concretely *how* the terminal interface works in my own implementation.
In my LC-3 VM, only the keyboard device registers and the output-related trap routines are directly implemented.
The input-related trap routines are based on the keyboard device registers.
#### output
First, output is relatively simple : we just use the built-in print functions.
```
fn trap_puts(vm: &mut VM) {
let mut idx = vm.registers.r0;
loop {
let c = vm.mem.get_mem(idx) as u8 as char;
if c == '\0' {
break;
}
print!("{}", c);
idx += 1;
}
let _ = io::stdout().flush();
}
```
For example, to output a null-terminated string, we loop through it and print each character, breaking when we see a null.
Importantly, *remember to flush stdout*.
This makes sure the output actually appears when needed, and fixes some visual glitches.
#### input
Input is more difficult.
There are a few problems we need to fix:
- **Blocking input**: Normal standard library input functions block,
which means that your code will stop and wait until the user types their input.
LC-3 requires that the CPU be able to keep running and periodically check if input comes in,
instead of pausing everything to wait for input.
- **Buffered input**: In a terminal, input is buffered, which means that input is only sent to the program when you press Enter.
This behaviour is called "canonical mode".
This is not what we want: we want to get raw keypresses.
It would not be fun to have to press Enter after each keypress for it to register.
- **Echo**: In a terminal, when you type, the letters you type show up.
This behaviour is called echo.
We do not want this: we want the program to silently read user input to avoid visual clutter.
We'll first get rid of canonical mode and echo.
This can be done on Linux using termios:
```
fn setup_termios() {
let mut term: Termios = Termios::from_fd(STDIN_FILENO).unwrap();
term.c_lflag &= !(ICANON | ECHO);
// TCSANOW: "the change occurs immediately"
tcsetattr(STDIN_FILENO, TCSANOW, &term).unwrap();
// when leaving the program we want to be polite and undo the above changes
ctrlc::set_handler(|| {
restore_terminal();
// typical CTRL-C exit code
std::process::exit(130);
})
.expect("Failed to set CTRL-C handler");
}
```
Here, we disable the `ICANON` and `ECHO` bit-flags.
We also set a Ctrl-C handler:
if we exit the LC-3 VM unexpectedly,
we don't want to be stuck with weird terminal settings.
All `restore_terminal` does is flip on the flags we disabled.
Now, we have instant, silent input.
However, we still block on input.
This means that with code that deals with user input,
the program freezes up between keypresses.
To fix this, we need *non-blocking input*.
There are libraries to do this, however I decided to use standard Rust features to do it instead.
We first create a thread dedicated to managing stdin.
This thread will block until the user presses a key, however it does not block the main thread.
There is a "channel" between this thread and the main thread that allows one-way communication.
This channel is like a queue data structure : the input thread can send information about key-presses,
and when the main thread is ready, it can receive this information when it wants to.
In Rust, I use a `TerminalIO` struct to implement this:
```
impl TerminalIO {
pub fn new() -> TerminalIO {
setup_termios();
TerminalIO {
stdin_channel: Self::spawn_stdin_channel(),
char: None,
}
}
fn spawn_stdin_channel() -> Receiver<u8> {
// https://stackoverflow.com/questions/30012995
let (tx, rx) = mpsc::channel::<u8>();
let mut buffer: [u8; 1] = [0];
thread::spawn(move || loop {
let _ = io::stdin().lock().read_exact(&mut buffer);
let _ = tx.send(buffer[0]);
});
rx
}
}
```
Here, we use a closure (the `move` here means that the function acquires the variables in the outside scope)
that runs in a new thread.
It is an infinite loop that waits for a single byte of input from the user,
then transmits it over the channel back to the main thread.
Back in the main thread, I then implement the KBSR and KBDR registers:
```
impl KeyboardIO for TerminalIO {
fn get_key(&mut self) -> Option<u8> {
let c = self.char;
self.char = None;
c
}
fn check_key(&mut self) -> bool {
match self.char {
Some(c) => true,
None => match self.stdin_channel.try_recv() {
Ok(key) => {
self.char = Some(key);
true
}
Err(mpsc::TryRecvError::Empty) => false,
Err(mpsc::TryRecvError::Disconnected) => panic!("terminal keyboard stream broke"),
},
}
}
}
```
In the main thread, when the VM checks if there is a keypress ready through the Keyboard Ready bit,
we attempt to receive a keystroke over the channel from the input thread.
If the channel is empty, return that there is no keypress ready.
Otherwise, store the character we received.
Then, when the VM gets a key through the Keyboard Data register, we give it this character.
I personally find that this solution is elegant:
it allows for the VM to keep working while waiting on user input,
and it also doesn't take a bunch of boilerplate and working with obscure options like file descriptors.
The input thread just uses normal input functions, and passes it over to the main thread to be read later.
## debugging
Implementing a virtual machine can often introduce hard-to-find bugs.
Indeed, there's no such thing as a syntax error or a type error when you're dealing with assembly.
When something goes wrong, you'll have absolutely no indication of where the issue stems from:
you'll just see weird behaviour.
With LC-3, though, you can be reasonably sure that the programs you're running (like 2048, Rogue),
can be trusted to be bug-free, given that they've existed for years.
Therefore, any bug most certainly stems from you, the virtual machine author.
To find the source of these bugs in your virtual machine implementation,
I recommend that you read over the code implementing all the instructions,
and compare it to the ISA specification.
I find that this is in general great advice for programming anything that involves logic.
It may not seem like reading will do much,
but you will be able to catch many, many, dumb mistakes with this method.
Let's talk about my own experience debugging LC-3.
Rogue worked perfectly,
but when running the 2048 program, the game started to an empty grid.
(Usually, 2048 has tiles in the grid.)
Pressing keys would not do anything either.
To fix this, I tried applying my earlier advice about re-reading your code.
I read 70% of the instructions implementation file,
then decided that it was probably not worth the effort to continue.
(We will see later this was a bad decision.)
I then did run-time debugging of what was happening in the VM.
First, I wrote some debug print statements (these are still available with the `--debug` flag of the VM.)
These had the following format:
```
PC: 0x3312, op: ADD, params: 0x261
R0: 0x0
R1: 0x29a
R2: 0x0
R3: 0x0
R4: 0x0
R5: 0x3017
R6: 0x3ffc
R7: 0x32db
COND: 0x2 (Z)
```
All the registers' contents are displayed, as well as information about the current instruction.
Every cycle, this information is printed to stderr, which allows the debug stream to be piped into a log file separate from regular output.
The log is useful, but is too fast to be read while the VM is running, and doesn't show any information about memory.
Most importantly though, it doesn't have one of the best creature comforts that you'd expect from a debugger: breakpoints.
At this point, I figured it was probably best to use a real debugger.
For those who have used C or C++ on Linux before, you probably have experience with using GDB to debug.
GDB is a venerable debugger with a command-line interface.
The user experience is quite unfriendly, but it's efficient, and fast.
It turns out that GDB also works in Rust, with some slight modifications.
This debugger, `rust-gdb`, comes packaged with the Rust compiler.
We can't just directly use `rust-gdb` to debug our virtual machine software though.
The debugger doesn't understand LC-3 assembly;
we can't just tell it to, for example, break on a given line in the LC-3 code.
First, I set up a breakpoint in the fetch-execute loop.
This means that entering `c` (continue) in GDB will step through a single LC-3 instruction.
With the assembly source code in a separate window,
it is possible to step through the execution of the program
and examine the instructions and how they affect the registers.
However, stepping through instructions individually gets tedious eventually.
Monitoring the PC register, I wrote down all the addresses of a few instructions as comments in the assembly source.
It's also possible to get addresses by counting how many instructions there are in the source code.
I then used GDB's conditional breakpoints to break in the VM only when PC reaches that address.
In essence, this is a breakpoint within the LC-3 code.
To make this process faster, I made a GDB macro to automate it:
```
define vmb
# set a breakpoint at VM addr $0
break lc3::vm::instruction::execute_instruction if vm.registers.pc == $arg0 + 1
set $vmb_break = $bpnum
end
```
This creates a new command that can be used to set breakpoints within LC-3.
Using this, I narrowed down the source of the bug in 2048 to the `RAND_MOD` subroutine,
which is supposed to provide a random number within a given range.
This is used to determine where a new tile is placed in the grid.
When tested, it was giving a garbage number entirely outside the range argument.
Then, I further narrowed the issue down to the division/modulo subroutine, `MOD_DIV`,
giving the wrong answer.
Stepping through this function,
I finally found a single instruction that was behaving oddly: `NOT`.
As it turns out, I had made a very subtle typo in the implementation of this instruction:
```
let res = !vm.registers.get_reg(sr);
- vm.registers.set_reg_with_cond(sr, res);
+ vm.registers.set_reg_with_cond(dr, res);
```
Instead of performing the operation on the source register (`sr`) and storing the result in the destination register (`dr`),
I had stored the result back in the source.
In Rogue, this had not caused any issues, since that program only uses this instruction "in-place" (`NOT R0, R0`).
However, this behaviour is obviously incorrect when the source and destination are different (`NOT R2, R1`), like in 2048.
This spread corrupted data everywhere, and was hard to diagnose.
In the end, had I followed my earlier advice about re-reading the code, and I hadn't given up midway,
I would've noticed this much quicker.
Indeed, this will serve as a lesson for me to properly check over code I write in the future,
and make sure that it is not sloppy.
# conclusion
While LC-3 is still a relatively simple program,
implementing it as a virtual machine is still quite educational.
I learned about the core functionality of computers,
as well as how assembly works.
Through LC-3, I also learned the basics of Rust
and got more experience finding subtle bugs too.
If you're looking to sharpen your skills in a language,
consider the idea of writing a simple virtual machine.
Or, maybe, a disassembler or assembler for LC-3.
Either way, the fact that just this small, contrived system can be so interesting is surprising.
This is just a microcosm of computing,
and there is so, so much more to discover.
Again, if you're interested in further reading about LC-3,
read the [original blog post](https://www.jmeiners.com/lc3-vm/) that inspired this one.
Also, thank you for bearing with me this far.
This post is longer than any that I've written up to now.
I hope you enjoyed my journey with LC-3 as much as I did.