diff --git a/pages/projects.md b/pages/projects.md index 6ef2bb7..4a192ae 100644 --- a/pages/projects.md +++ b/pages/projects.md @@ -33,6 +33,12 @@ The backend is written in Python using the Flask framework, while the frontend is written in TypeScript with Mithril.js. There are neat features like dark mode and responsive, mobile-friendly design. +## [lc-3 vm](/lc3) + +Rust implementation of the Little Computer-3 educational architecture as a virtual machine. +LC-3 is a simplified 16-bit computer that can run programs and games like 2048. +Concepts found in LC-3, like the fetch-execute loop, and assembly, are foundations of modern computing. + ## [bitmask](https://github.com/dogeystamp/bitmask) A Python library that helps with manipulating bit flags. It provides diff --git a/posts/lc3.md b/posts/lc3.md new file mode 100644 index 0000000..99e3c74 --- /dev/null +++ b/posts/lc3.md @@ -0,0 +1,767 @@ +# making a virtual machine from scratch (in rust) + +2024-01-20 + +Computers are wonderful machines that can do many things. +However, even though they are very complex, at their core they are surprisingly simple. +In fact, with only basic programming knowledge, +it is possible to simulate a fully-functioning computer akin to modern ones. + +In this post, I document how I built a virtual machine for Little Computer 3 (LC-3), +an educational computer model. +LC-3 may be simple, but it works the same way any modern computer does: +through implementing it, you get a glimpse of how real architectures like x86 or ARM work. +Besides that, once you're finished, you can play 2048 in the VM you created from scratch. +Is there anything more gratifying? + +Writing a virtual machine is also a great test of your programming ability. +I decided to do this project in Rust, in order to learn how the language works hands-on. +I've mostly had experience with C and Python before this, and this was my first Rust project. +However, this article will focus more on the virtual machine aspect, rather than the Rust. + +If you're just looking for the source code, check it out on GitHub: [dogeystamp/lc3-vm](https://github.com/dogeystamp/lc3-vm). + +## prerequisite reading + +I assume you have a certain amount of fundamental knowledge before reading this article: + +- **Programming knowledge** (preferably in Rust or C). + I assume you know programming decently well, but are not familiar with virtual machines. + I'll give code analogies in C, and I assume you can figure out what the Rust means with prior knowledge in other languages. +- **Binary arithmetic**. + Computers, at their fundamental level, deal with exclusively binary. + You should know [bitwise operations](https://www.hackerearth.com/practice/basic-programming/bit-manipulation/basics-of-bit-manipulation/tutorial/), + like left/right shift, AND, NOT, OR, etc. to see how binary is manipulated. + You should also be familiar with hexadecimal and binary number representation. + +Besides that, I aim for this article to be shorter and more of an overview of topics that were interesting to me. +For all the context and implementation details, see Justin Meiners' and Ryan Pendleton's [blog post](https://www.jmeiners.com/lc3-vm/) +which meticulously explains LC-3, +as well as the [LC-3 ISA specification](https://www.jmeiners.com/lc3-vm/supplies/lc3-isa.pdf). + +## what's inside a VM? + +Starting off, here is some theory to get you in context. +Skim this if you are familiar with how everything works. + +LC-3 can be modeled with three simple components: the processor and registers (CPU), and memory (RAM). + +``` + + registers + [ ][ ][ ][ ][ ] + [ ][ ][ ][ ][ ] + +-------------+ +------------------------+ + | | | | + | processor |-----| memory | + | | | | + +-------------+ | | + +------------------------+ + +``` + +### memory + +First, *memory* is where most temporary data lives in the computer. +The easiest way for me to visualize it is a huge C-style array: + +``` +uint16_t mem[65536]; +``` + +Instead of an index, we use an "address" for each element. +Just like an array, we can read and write (get or set) the data at each address. +When the computer shuts down, all data in memory is lost. +(For data to persist, we use storage like hard drives, but LC-3 doesn't have this.) + +In LC-3, each element in memory is 16 bits, or 2 bytes. +At this level, memory does not have any types as they do in languages like C: +it's all raw binary. +Typed variables, malloc and the stack are all abstractions on top of memory. + +### registers + +Second, *registers* are where data that is immediately useful is stored. +They are also 16-bit just like memory elements. +Think of it as having fixed variables. +LC-3 has 10 registers: + +- the general purpose R0 to R7, +- Program Counter (PC), +- and the Processor Status Register (PSR). + +I'll explain the last two later. +Just like variables, you can assign values to the general purpose registers, and read from them: + +``` +uint16_t r0 = 0; +... +uint16_t r7 = 0; + +r6 = r0 + r7; +r3 -= 3; +``` + +The reason registers exist when we already have memory is that it's way more convenient to use. +Physically, registers are closer to the processor than memory is, +and are therefore much faster. +Besides that, reading/writing from memory usually takes more instructions than just using registers. +However, in exchange for convenience and speed, you have to deal with a limited amount of registers. + +### processor + +Finally, the *processor* is where the real computing happens. +The processor reads *instructions* one by one, and executes them. +Instructions are like statements in higher level code, but much simpler. + +LC-3 only has 15 different types of instructions, +that do things like "read memory at this address and load the value into R4". +Instructions are just 16-bit binary data. +Again, read the [ISA specification](https://www.jmeiners.com/lc3-vm/supplies/lc3-isa.pdf) +for detailled information about this. +For example, the spec says this is how to ADD two registers and store the result in a third register: + +``` +ADD +| 0 0 0 1 | DR | SR1 | 0 0 0 | SR2 | +``` + +DR (Destination Register), SR1 (Source Register) and SR2 are all placeholders for general registers. +For example, if we wanted to do `r1 = r2 + r3`: + +``` +ADD r1 r2 (...) r3 +| 0 0 0 1 | 0 0 1 | 0 1 0 | 0 0 0 | 0 1 1 | + +or in hex: +0x1283 +``` + +As you can see, this is pure binary data, +and as such can be read by the processor. +When you compile a C program, machine code like this is what comes out. + +There exists *assembly* which is a human-readable version of this binary. +Unlike regular code, it is a 1-to-1 correspondence to the binary. +For example, the above ADD instruction would be + +``` +ADD R1, R2, R3 ; you can put comments using semicolon +``` + +Using an assembler, a programmer can convert the assembly code into the pure binary program that LC-3 can read. + +### fetch-execute loop + +Now that the individual parts are explained, we move on to how it all works together. +So far, I haven't explained where the instructions actually come from. +Since instructions are just binary data, we actually just place a series of them (a program) in memory: + +``` +address value (hex) equivalent assembly code +0x3000: [e002] (LEA R0, HELLO_WORLD) +0x3001: [f022] (PUTS) +0x3002: [f025] (HALT) +... +``` + +The Program Counter (PC) register we talked about earlier is a pointer to the instructions inside a program in memory. +In LC-3, as seen above, PC starts at the address `0x3000`. +The processor will perform a *fetch-execute* loop: + +- Fetch the instruction in memory using the PC (`instr = mem[PC]`) +- Increment PC (`PC += 1`) +- Execute the instruction +- Repeat on the next instruction + +Until it reaches an instruction to stop (HALT), the processor will continue this loop. +Remember that **PC points to the next instruction**, not the current one! +This often tripped me up implementing the virtual machine. +The reason it doesn't point to the current instruction is that it makes the mechanisms in the next section easier to understand. + +### control flow + +If statements, switches, and loops are all implemented using two instructions, `JMP` and `BR`, +which alter PC. + +`JMP` (jump) directly sets the value of PC, +which means on the next fetch-execute loop, the processor will not execute the next instruction, +but rather the instruction at the new PC. +This is what `goto` in C does under the hood. + +Meanwhile, `BR` (branch) conditionally sets the value of PC. +Remember the PSR register mentioned earlier? +The bottom few bits in PSR contain *condition flags*. +Condition flags essentially represent the sign of the result of an operation like AND or ADD. +Respectively, they're P (Positive), Z, (Zero), N (Negative). +`BR` works with these flags: +for example `BRnz` is "jump if result is negative or zero". + +Let's see how a for loop would be implemented this way: + +``` +AND R1, R1, #0 ; set R1 to 0 (anything bitwise and 0 is 0) +ADD R1, #5 ; set R1 = R1 + 5 + +LOOP_START ; this is a label + +... (for loop body) + +ADD R1, #-1 ; R1 = R1 + (-1) (decrement) +BRp LOOP_START ; loop back. once assembled, the label becomes a numerical address offset + +HALT ; we're done +``` + +The equivalent in C: + +``` +for (int i = 5; i > 0; i--) { + // for loop body +} +``` + +It's also common practice to AND a register with itself to check if it is positive/negative/zero. +The the register AND itself is the same value, which allows testing without altering the register's contents. + +### subroutine calls + +LC-3 has support for "subroutines", which are like functions but less convenient. +Practically, LC-3 can jump into a subroutine, then at the end of the subroutine, jump back to the place where the subroutine was called. +This works using instructions like JSR, and RET. + +JSR means "jump subroutine", and it is essentially the same as JMP, however it also saves the current PC into the register R7. +At the end of the subroutine, we put the instruction RET, which is actually a disguised JMP. +RET means "JMP to the address given in R7", which means return to the place where we originally used JSR. + +An example usage: + +``` + AND R1, R1, #0 ; random code + JSR SOME_ROUTINE ; call subroutine + HALT + +SOME_ROUTINE + ADD, R1, #1 ; random subroutine code + RET ; return +``` + +This code will first set R1 to 0, jump into the subroutine, increment R1, then return to the main part and stop the program. + +### memory mapped i/o + +Earlier, when I described how the LC-3 virtual machine works, I omitted a pretty significant component: +input and output. +I/O is the sole method that the virtual machine can communicate with the outside world, whether it's receiving user input, +or sending output. + +LC-3 uses a terminal for I/O, which means it can work with standard input (stdin) and standard output (stdout). +Input is done via keyboard, and output via display. +The way this works is *memory-mapped I/O*. +In LC-3's memory, there are specific addresses that connect to external I/O devices. + +``` + +----------------+ +----------+ + ยทยทยท -----| memory |-----| terminal | + +----------------+ +----------+ +``` + +This is useful because LC-3 can reuse the existing instructions for loading/storing from memory to talk to the terminal. +When it manipulates the special memory-mapped locations (device registers), it doesn't actually store or load data from memory, +but it communicates with the peripherals like the display or keyboard. + +Here is a full list of memory-mapped addresses in LC-3: + +``` +addr name short description + +0xFE00 keyboard status register (KBSR) has a key been pressed? +0xFE02 keyboard data register (KBDR) what key was pressed? +0xFE04 display status register (DSR) can the display receive a character? +0xFE06 display data register (DDR) character to send to the display +0xFFFE machine control register (MCR) power button +``` + +For example, to read keyboard input, a program would first poll bit 15 (the ready bit) of KBSR to wait for the user to press a key. +If the bit is `0`, then the program keeps waiting. +Otherwise, the bit is `1`, and that means the user pressed a key. +Then, the program reads bits [7:0] from KBDR into its registers. +This contains the key that was pressed, encoded as ASCII. +To read more characters, it would continue this loop. + +As an aside, remember that by convention, the least significant bit (the right-most units bit) is bit 0, +and the other bits are numbered right-to-left as 1, 2, 3, and so on. +Not realizing this has caused me issues while implementing this VM, +as the program was reading from bit 15, while my VM was providing information on bit 0 (which if numbered left to right, is bit 15). + +Displaying data is similar: the program first polls bit 15 of DSR (the ready bit) until the display is ready to receive a character. +Then, the program stores a character in DDR encoded as ASCII. +This character is finally sent to the display. + +The program can also halt the computer (shut it down) by setting MCR to all zeroes. + +For detailled information, again, consult the [LC-3 ISA specification](https://www.jmeiners.com/lc3-vm/supplies/lc3-isa.pdf), +specifically the Device Register Assigments. + +## assorted implementation details + +Now that we've gone through the basics of how LC-3 works, I'll go through some interesting details that I encountered during implementation. +If you came here to follow along implementing yourself, I recommend you read [Meiners' and Pendleton's](https://www.jmeiners.com/lc3-vm/) LC-3 blog post, +which is actually a tutorial. +For a Rust version, see [Rodrigo Araujo's implementation](https://www.rodrigoaraujo.me/posts/lets-build-an-lc-3-virtual-machine/). + +### endianness + +Endianness is the order bytes are stored in memory within a word (a single piece of data). +There's big endian, and little endian. +By definition, big endian starts with the most significant byte, +and little endian starts with the least significant byte. +By "most significant", it means in numbers like decimal 123, the hundreds position is "more significant" than the units position. + +However, I find it more comprehensible to think that big endian is the "natural" order, +while little endian is the "reversed" order. +For example, take the number `0x12345678`. +On a big endian system, it would be stored in memory like this: + +``` +address value (hex) +0x0001: 12 +0x0002: 34 +0x0003: 56 +0x0004: 78 +``` + +However, on a little endian system, it would be like this: + +``` +address value (hex) +0x0001: 78 +0x0002: 56 +0x0003: 34 +0x0004: 12 +``` + +[Supposedly, ](https://softwareengineering.stackexchange.com/questions/95556) +it is easier to deal with little endian on processors, +which is why it is used in many popular CPU architectures. +However, LC-3 uses big endian. +This is an issue to consider during implementation. + +For example, if you use `hexdump` on a program file, you may see this: + +``` +0000000 0030 02e0 22f0 25f0 7900 6f00 7500 2000 +0000010 6c00 6900 6b00 6500 2000 7600 6900 7200 +0000020 7400 7500 6100 6c00 6900 7a00 6900 6e00 +0000030 6700 2000 6200 6f00 7900 7300 2000 6400 +0000040 6f00 6e00 7400 2000 7900 6f00 7500 0000 +``` + +This is actually incorrect output! +`hexdump` assumes groups of two bytes are a single little-endian word, +so it flips it to make it the proper order. +However, LC-3 data is in big endian order. + +`hexdump -C` prints bytes as they are on disk, which produces the proper ordering: + +``` +00000000 30 00 e0 02 f0 22 f0 25 00 79 00 6f 00 75 00 20 +00000010 00 6c 00 69 00 6b 00 65 00 20 00 76 00 69 00 72 +00000020 00 74 00 75 00 61 00 6c 00 69 00 7a 00 69 00 6e +00000030 00 67 00 20 00 62 00 6f 00 79 00 73 00 20 00 64 +00000040 00 6f 00 6e 00 74 00 20 00 79 00 6f 00 75 00 00 +``` + +It is important to flip bytes or specify that the data is big-endian when reading programs into memory from a file. + +### integer overflow + +You may know that because integers are represented by a finite amount of bits, +it is possible for them to overflow when they get too big. +For LC-3, we usually implement registers and memory using unsigned 16-bit integers, +which gives us a range of 0-65535. +This is also the limit of our memory's size, since we can not represent an address bigger than that. +The same issue makes 32-bit computers unable to have more than around 4GB of memory (`(1 << 32) - 1`). + +When integers overflow, they often wrap around back to 0 or the lowest number possible. +This is necessary behaviour on the LC-3, as it makes it possible to use signed numbers in 2's complement. +There is no subtract operation, we just add negative numbers, and it magically wraps around to the correct value. +However, we usually do not want integer overflow, so Rust complains when it happens: + +``` +error: this arithmetic operation will overflow + --> test.rs:2:20 + | +2 | println!("{}", 65535u16+1u16) + | ^^^^^^^^^^^^^ attempt to compute `u16::MAX + 1_u16`, which would overflow + | +``` + +Earlier, I mentioned [Rodrigo Araujo's VM](https://www.rodrigoaraujo.me/posts/lets-build-an-lc-3-virtual-machine/), +which was also written in Rust. +This implementation served as a Rust reference for me. +In his instruction implementations, he uses casts to perform wrapping arithmetic: + +``` +let addr = (vm.registers.get_reg(base_r) as u32 + offset as u32) as u16; +``` + +First, he adds parameters as 32-bit unsigned ints, then casts it back to 16-bit unsigned. +I personally thought that this would result in truncating the extra bits, +however upon further experimentation it turned out that it does a modulo operation in the cast. +This means that if the value exceeds the u16 limit, it wraps back to 0. + +Personally, I found this to be a very janky, implicit way to perform wrapping arithmetic. +After all, it took me multiple Google searches and a bit of testing to be sure of what the code was doing. +In my own code, I use explicit syntax for a wrapping addition: + +``` +let addr = vm.registers.get_reg(base_r).wrapping_add(offset); +``` + +This code is much clearer, and, quoting the Zen of Python, `Explicit is better than implicit.` + +### traps + +Earlier, in the memory-mapped I/O section, we saw how LC-3 can use memory-mapped I/O to talk to external peripherals. +You may have noticed that all of this is a very tedious process just to get some user input. +To simplify things, LC-3 implements *traps*. +Traps are essentially utility subroutines that make life easier. +These can be accessed with the TRAP instruction, along with a code to specify which subroutine the program wants (the trap vector). + +However, instead of the programmer writing these subroutines, traps are part of an operating system on the LC-3. +The operating system is also a program (it is also comprised of a bunch of instructions in memory), however it runs with higher privileges than the user program. +The OS is stored in a special location in memory, earlier than the user program memory. + +When a TRAP instruction is called, LC-3 takes the trap vector, looks up a corresponding address in the trap vector table (a section in memory before the operating system), +then calls that address as if using the JSR instruction on a subroutine. +These addresses all lead to subroutines within the OS. +For a C analogy, it's like the trap vector table is an array of function pointers, +where the functions are part of the operating system. + +Here is a list of trap subroutines in LC-3. +Consult the [LC-3 ISA specification](https://www.jmeiners.com/lc3-vm/supplies/lc3-isa.pdf) for a detailled explanation. + +``` +trap vector name description + +0x20 GETC get a single character from keyboard (like C's getchar()) +0x21 OUT put a single character to terminal +0x22 PUTS put a string to terminal +0x23 IN get a single character with echo (show the character typed) +0x24 PUTSP put a string to terminal (two characters packed per memory address) +0x25 HALT shut down the computer +``` + +As you can see, these are high level wrappers for the memory-mapped I/O seen in the last section, +and are also much friendlier to work with in general. +Importantly, these routines can all be implemented in LC-3 code. + +For my own virtual machine though, and Justin Meiner's VM that inspired it, we do not actually write assembly code for trap routines. +Instead, in the VM itself, we intercept these trap calls, and perform the tasks in high-level C or Rust code, instead of LC-3 assembly. +This is generally simpler, although less faithful to the specification. +Because of this, it is also not necessary to implement some of the memory-mapped registers, like the display registers, and the machine control register. + +For example, here is my code that performs GETC: + +``` +fn trap_getc(vm: &mut VM) { + while vm.mem.get_mem(0xFE00) & (1 << 15) == 0 {} + vm.registers.r0 = vm.mem.get_mem(0xFE02) & 0xFF; +} +``` + +First, we poll the Keyboard Ready bit, then we load the keypress into the VM's registers. +This type of implementation is much more convenient than writing raw assembly. +My own GETC is not really efficient, but using standard library `getchar()` or an alternative would avoid polling the ready bit constantly. +Right now, with the polling loop, we use up a lot of CPU on the host machine running the VM, +when we are doing nothing but waiting. +However, this is the only choice if you actually implement the trap routines in assembly. + +### terminal input/output + +We've seen in the last section the interface LC-3 provides for I/O, but in this section, I'll explain concretely *how* the terminal interface works in my own implementation. + +In my LC-3 VM, only the keyboard device registers and the output-related trap routines are directly implemented. +The input-related trap routines are based on the keyboard device registers. + +#### output + +First, output is relatively simple : we just use the built-in print functions. + +``` +fn trap_puts(vm: &mut VM) { + let mut idx = vm.registers.r0; + loop { + let c = vm.mem.get_mem(idx) as u8 as char; + if c == '\0' { + break; + } + + print!("{}", c); + idx += 1; + } + let _ = io::stdout().flush(); +} +``` + +For example, to output a null-terminated string, we loop through it and print each character, breaking when we see a null. +Importantly, *remember to flush stdout*. +This makes sure the output actually appears when needed, and fixes some visual glitches. + +#### input + +Input is more difficult. +There are a few problems we need to fix: + +- **Blocking input**: Normal standard library input functions block, + which means that your code will stop and wait until the user types their input. + LC-3 requires that the CPU be able to keep running and periodically check if input comes in, + instead of pausing everything to wait for input. +- **Buffered input**: In a terminal, input is buffered, which means that input is only sent to the program when you press Enter. + This behaviour is called "canonical mode". + This is not what we want: we want to get raw keypresses. + It would not be fun to have to press Enter after each keypress for it to register. +- **Echo**: In a terminal, when you type, the letters you type show up. + This behaviour is called echo. + We do not want this: we want the program to silently read user input to avoid visual clutter. + +We'll first get rid of canonical mode and echo. +This can be done on Linux using termios: + +``` +fn setup_termios() { + let mut term: Termios = Termios::from_fd(STDIN_FILENO).unwrap(); + term.c_lflag &= !(ICANON | ECHO); + // TCSANOW: "the change occurs immediately" + tcsetattr(STDIN_FILENO, TCSANOW, &term).unwrap(); + + // when leaving the program we want to be polite and undo the above changes + ctrlc::set_handler(|| { + restore_terminal(); + // typical CTRL-C exit code + std::process::exit(130); + }) + .expect("Failed to set CTRL-C handler"); +} +``` + +Here, we disable the `ICANON` and `ECHO` bit-flags. +We also set a Ctrl-C handler: +if we exit the LC-3 VM unexpectedly, +we don't want to be stuck with weird terminal settings. +All `restore_terminal` does is flip on the flags we disabled. + +Now, we have instant, silent input. +However, we still block on input. +This means that with code that deals with user input, +the program freezes up between keypresses. + +To fix this, we need *non-blocking input*. +There are libraries to do this, however I decided to use standard Rust features to do it instead. + +We first create a thread dedicated to managing stdin. +This thread will block until the user presses a key, however it does not block the main thread. +There is a "channel" between this thread and the main thread that allows one-way communication. +This channel is like a queue data structure : the input thread can send information about key-presses, +and when the main thread is ready, it can receive this information when it wants to. + +In Rust, I use a `TerminalIO` struct to implement this: + +``` + impl TerminalIO { + pub fn new() -> TerminalIO { + setup_termios(); + TerminalIO { + stdin_channel: Self::spawn_stdin_channel(), + char: None, + } + } + + fn spawn_stdin_channel() -> Receiver { + // https://stackoverflow.com/questions/30012995 + let (tx, rx) = mpsc::channel::(); + let mut buffer: [u8; 1] = [0]; + thread::spawn(move || loop { + let _ = io::stdin().lock().read_exact(&mut buffer); + let _ = tx.send(buffer[0]); + }); + rx + } +} +``` + +Here, we use a closure (the `move` here means that the function acquires the variables in the outside scope) +that runs in a new thread. +It is an infinite loop that waits for a single byte of input from the user, +then transmits it over the channel back to the main thread. + +Back in the main thread, I then implement the KBSR and KBDR registers: + +``` +impl KeyboardIO for TerminalIO { + fn get_key(&mut self) -> Option { + let c = self.char; + self.char = None; + c + } + + fn check_key(&mut self) -> bool { + match self.char { + Some(c) => true, + None => match self.stdin_channel.try_recv() { + Ok(key) => { + self.char = Some(key); + true + } + Err(mpsc::TryRecvError::Empty) => false, + Err(mpsc::TryRecvError::Disconnected) => panic!("terminal keyboard stream broke"), + }, + } + } +} +``` + +In the main thread, when the VM checks if there is a keypress ready through the Keyboard Ready bit, +we attempt to receive a keystroke over the channel from the input thread. +If the channel is empty, return that there is no keypress ready. +Otherwise, store the character we received. +Then, when the VM gets a key through the Keyboard Data register, we give it this character. + +I personally find that this solution is elegant: +it allows for the VM to keep working while waiting on user input, +and it also doesn't take a bunch of boilerplate and working with obscure options like file descriptors. +The input thread just uses normal input functions, and passes it over to the main thread to be read later. + +## debugging + +Implementing a virtual machine can often introduce hard-to-find bugs. +Indeed, there's no such thing as a syntax error or a type error when you're dealing with assembly. +When something goes wrong, you'll have absolutely no indication of where the issue stems from: +you'll just see weird behaviour. +With LC-3, though, you can be reasonably sure that the programs you're running (like 2048, Rogue), +can be trusted to be bug-free, given that they've existed for years. +Therefore, any bug most certainly stems from you, the virtual machine author. + +To find the source of these bugs in your virtual machine implementation, +I recommend that you read over the code implementing all the instructions, +and compare it to the ISA specification. +I find that this is in general great advice for programming anything that involves logic. +It may not seem like reading will do much, +but you will be able to catch many, many, dumb mistakes with this method. + +Let's talk about my own experience debugging LC-3. +Rogue worked perfectly, +but when running the 2048 program, the game started to an empty grid. +(Usually, 2048 has tiles in the grid.) +Pressing keys would not do anything either. +To fix this, I tried applying my earlier advice about re-reading your code. +I read 70% of the instructions implementation file, +then decided that it was probably not worth the effort to continue. +(We will see later this was a bad decision.) + +I then did run-time debugging of what was happening in the VM. +First, I wrote some debug print statements (these are still available with the `--debug` flag of the VM.) +These had the following format: + +``` +PC: 0x3312, op: ADD, params: 0x261 +R0: 0x0 +R1: 0x29a +R2: 0x0 +R3: 0x0 +R4: 0x0 +R5: 0x3017 +R6: 0x3ffc +R7: 0x32db +COND: 0x2 (Z) +``` + +All the registers' contents are displayed, as well as information about the current instruction. +Every cycle, this information is printed to stderr, which allows the debug stream to be piped into a log file separate from regular output. +The log is useful, but is too fast to be read while the VM is running, and doesn't show any information about memory. +Most importantly though, it doesn't have one of the best creature comforts that you'd expect from a debugger: breakpoints. + +At this point, I figured it was probably best to use a real debugger. +For those who have used C or C++ on Linux before, you probably have experience with using GDB to debug. +GDB is a venerable debugger with a command-line interface. +The user experience is quite unfriendly, but it's efficient, and fast. +It turns out that GDB also works in Rust, with some slight modifications. +This debugger, `rust-gdb`, comes packaged with the Rust compiler. + +We can't just directly use `rust-gdb` to debug our virtual machine software though. +The debugger doesn't understand LC-3 assembly; +we can't just tell it to, for example, break on a given line in the LC-3 code. +First, I set up a breakpoint in the fetch-execute loop. +This means that entering `c` (continue) in GDB will step through a single LC-3 instruction. +With the assembly source code in a separate window, +it is possible to step through the execution of the program +and examine the instructions and how they affect the registers. + +However, stepping through instructions individually gets tedious eventually. +Monitoring the PC register, I wrote down all the addresses of a few instructions as comments in the assembly source. +It's also possible to get addresses by counting how many instructions there are in the source code. +I then used GDB's conditional breakpoints to break in the VM only when PC reaches that address. +In essence, this is a breakpoint within the LC-3 code. +To make this process faster, I made a GDB macro to automate it: + +``` +define vmb + # set a breakpoint at VM addr $0 + break lc3::vm::instruction::execute_instruction if vm.registers.pc == $arg0 + 1 + set $vmb_break = $bpnum +end +``` + +This creates a new command that can be used to set breakpoints within LC-3. + +Using this, I narrowed down the source of the bug in 2048 to the `RAND_MOD` subroutine, +which is supposed to provide a random number within a given range. +This is used to determine where a new tile is placed in the grid. +When tested, it was giving a garbage number entirely outside the range argument. +Then, I further narrowed the issue down to the division/modulo subroutine, `MOD_DIV`, +giving the wrong answer. +Stepping through this function, +I finally found a single instruction that was behaving oddly: `NOT`. + +As it turns out, I had made a very subtle typo in the implementation of this instruction: + +``` + let res = !vm.registers.get_reg(sr); +- vm.registers.set_reg_with_cond(sr, res); ++ vm.registers.set_reg_with_cond(dr, res); +``` + +Instead of performing the operation on the source register (`sr`) and storing the result in the destination register (`dr`), +I had stored the result back in the source. +In Rogue, this had not caused any issues, since that program only uses this instruction "in-place" (`NOT R0, R0`). +However, this behaviour is obviously incorrect when the source and destination are different (`NOT R2, R1`), like in 2048. +This spread corrupted data everywhere, and was hard to diagnose. + +In the end, had I followed my earlier advice about re-reading the code, and I hadn't given up midway, +I would've noticed this much quicker. +Indeed, this will serve as a lesson for me to properly check over code I write in the future, +and make sure that it is not sloppy. + +# conclusion + +While LC-3 is still a relatively simple program, +implementing it as a virtual machine is still quite educational. +I learned about the core functionality of computers, +as well as how assembly works. +Through LC-3, I also learned the basics of Rust +and got more experience finding subtle bugs too. +If you're looking to sharpen your skills in a language, +consider the idea of writing a simple virtual machine. +Or, maybe, a disassembler or assembler for LC-3. + +Either way, the fact that just this small, contrived system can be so interesting is surprising. +This is just a microcosm of computing, +and there is so, so much more to discover. + +Again, if you're interested in further reading about LC-3, +read the [original blog post](https://www.jmeiners.com/lc3-vm/) that inspired this one. + +Also, thank you for bearing with me this far. +This post is longer than any that I've written up to now. +I hope you enjoyed my journey with LC-3 as much as I did.