--- layout: post title: 'Verilog: How does a CPU actually work?' lang: en categories: tech date: 2026-03-22 16:55 +0100 description: How to write a CPU in Verilog --- {% include verilog.md %} A.k.a. "What's a fetch-decode-execute" cycle? So, yeah, I studied electrical engineering, and I learned about the usual CPU execution cycle, and the CPU architectures von Neumann vs. Harvard, but I never really thought much about it, it was always very abstract. Now, with my Verilog experiments, I could dig deeper into this. I implemented both the [Nandgame CPU](https://nandgame.com/) and [Ben Eaters 8 bit breadboard CPU](https://eater.net/8bit)[^2]. [^2]: To a degree? I already haven't touched the project again in months… The Nandgame one was easy, as fetch/decode/execute would happen basically within one cycle. (Harvard[^1] architecture). [^1]: Or at least, Harvard-ish architecture. Don't ask me for specifics, I didn't study computer science. However, with the Ben Eater CPU (von-Neumann) architecture, things get a bit more involved. In his video series, Ben used an EEPROM to manage the execution. But I thought I can do better! With a Finite State Machine! The states usually go like ``` PC_to_MAR <-----\ Get contents of program counter, | | move to memory address register. | | v | MEM_to_INS_PC_inc | memory contents to instruction register, | | increment program counter, | | determine next state based on instruction. v | .. instruction .. | usually executes the actual instruction. .. dependent .. | might take multiple cycles. | | +------------/ | | (eventually) v HALT <+ Nothing is done anymore. +----+ ``` [It took me some time](https://woof.tech/@uvok/115922235276385631) to figure out how to do it exactly, especially since I still had to figure out how clocked vs combinatoric components work. One of the tricks "to make it more efficient" was to [negate the PC clock](https://git.uvok.de/fpga-exper/tree/eater_cpu/eater_computer.sv?h=main&id=4cc62801974319a0ea2a1ed59fcf61aa9afed5bd#n145), this way, I could increment the program counter basically in the same clock cycle as the instruction was decoded (step 2 of the state machine), only on the falling edge. What's nice, I can simulate this and record the waveforms, to get an even better understanding of what exactly happens: {% linked_image img="https://pics.uvokchee.de/_data/i/upload/2026/03/22/20260322153652-e6e29632-me.png" alt="Screenshot of a waveform viewer, showing various CPU flags and states" url="https://pics.uvokchee.de/upload/2026/03/22/20260322153652-e6e29632.png" %} Unfortunately, I couldn't both get the "full state names" into the picture, as well is the whole program. My screen width is limited. I put the whole stuff [on my git repo](https://git.uvok.de/fpga-exper/tree/eater_cpu?h=main), though, so feel free to check it out. ## Footnotes