2 files changed, 84 insertions, 1 deletions
diff --git a/_includes/verilog.md b/_includes/verilog.md
index d89bbb0..fbc1659 100644
--- a/_includes/verilog.md
+++ b/_includes/verilog.md
@@ -6,7 +6,7 @@
 
 0. [Presenting my FPGA dev board]({% post_url 2025-12-26-fpga-dev-board %})
 1. [Getting Started with Verilog]({% post_url 2026-01-06-getting-started-with-verilog %})
-2. TODO
+2. [How does a CPU actually work?]({% post_url 2026-03-22-verilog-how-does-a-cpu-actually-work %})
 
 ---
 {: .spaced.bottom }
diff --git a/_posts/2026-03-22-verilog-how-does-a-cpu-actually-work.md b/_posts/2026-03-22-verilog-how-does-a-cpu-actually-work.md
new file mode 100644
index 0000000..63d8951
--- /dev/null
+++ b/_posts/2026-03-22-verilog-how-does-a-cpu-actually-work.md
@@ -0,0 +1,83 @@
+---
+layout: post
+title: 'Verilog: How does a CPU actually work?'
+lang: en
+categories: tech
+date: 2026-03-22 16:55 +0100
+description: How to write a CPU in Verilog
+---
+
+{% include verilog.md %}
+
+A.k.a. "What's a fetch-decode-execute" cycle?
+
+So, yeah, I studied electrical engineering, and I learned about
+the usual CPU execution cycle, and the CPU architectures
+von Neumann vs. Harvard, but I never really thought much about it,
+it was always very abstract.
+
+Now, with my Verilog experiments, I could dig deeper into this.
+I implemented both the 
+[Nandgame CPU](https://nandgame.com/) and
+[Ben Eaters 8 bit breadboard CPU](https://eater.net/8bit)[^2].
+
+[^2]: To a degree? I already haven't touched the project
+      again in months…
+
+The Nandgame one was easy, as fetch/decode/execute would happen basically
+within one cycle. (Harvard[^1] architecture).
+
+[^1]: Or at least, Harvard-ish architecture. 
+      Don't ask me for specifics, I didn't study computer science.
+
+However, with the Ben Eater CPU (von-Neumann) architecture, things
+get a bit more involved. In his video series, Ben used an EEPROM
+to manage the execution. But I thought I can do better! With a Finite
+State Machine! The states usually go like
+
+```
+
+    PC_to_MAR <-----\  Get contents of program counter,
+       |            |  move to memory address register.
+       |            |
+       v            |
+MEM_to_INS_PC_inc   |  memory contents to instruction register,
+       |            |  increment program counter,
+       |            |  determine next state based on instruction.
+       v            |
+.. instruction ..   |  usually executes the actual instruction.
+..  dependent  ..   |  might take multiple cycles.
+       |            |
+       +------------/
+       |
+       | (eventually)
+       v
+      HALT <+          Nothing is done anymore.
+       +----+
+```
+
+[It took me some time](https://woof.tech/@uvok/115922235276385631) to figure out
+how to do it exactly, especially since I still had to figure out how clocked
+vs combinatoric components work. One of the tricks "to make it more efficient"
+was to
+[negate the PC clock](https://git.uvok.de/fpga-exper/tree/eater_cpu/eater_computer.sv?h=main&id=4cc62801974319a0ea2a1ed59fcf61aa9afed5bd#n145),
+this way, I could increment the program counter basically in the same clock
+cycle as the instruction was decoded (step 2 of the state machine), only
+on the falling edge.
+
+What's nice, I can simulate this and record the waveforms, to get an even better
+understanding of what exactly happens:
+
+{% linked_image
+  img="https://pics.uvokchee.de/_data/i/upload/2026/03/22/20260322153652-e6e29632-me.png"
+  alt="Screenshot of a waveform viewer, showing various CPU flags and states"
+  url="https://pics.uvokchee.de/upload/2026/03/22/20260322153652-e6e29632.png"
+%}
+
+Unfortunately, I couldn't both get the "full state names" into the picture,
+as well is the whole program. My screen width is limited. I put the whole
+stuff [on my git repo](https://git.uvok.de/fpga-exper/tree/eater_cpu?h=main),
+though, so feel free to check it out.
+
+## Footnotes
+