---
layout: post
title: 'Verilog: How does a CPU actually work?'
lang: en
categories: tech
date: 2026-03-22 16:55 +0100
description: How to write a CPU in Verilog
---

{% include verilog.md %}

A.k.a. "What's a fetch-decode-execute" cycle?

So, yeah, I studied electrical engineering, and I learned about
the usual CPU execution cycle, and the CPU architectures
von Neumann vs. Harvard, but I never really thought much about it,
it was always very abstract.

Now, with my Verilog experiments, I could dig deeper into this.
I implemented both the 
[Nandgame CPU](https://nandgame.com/) and
[Ben Eaters 8 bit breadboard CPU](https://eater.net/8bit)[^2].

[^2]: To a degree? I already haven't touched the project
      again in months…

The Nandgame one was easy, as fetch/decode/execute would happen basically
within one cycle. (Harvard[^1] architecture).

[^1]: Or at least, Harvard-ish architecture. 
      Don't ask me for specifics, I didn't study computer science.

However, with the Ben Eater CPU (von-Neumann) architecture, things
get a bit more involved. In his video series, Ben used an EEPROM
to manage the execution. But I thought I can do better! With a Finite
State Machine! The states usually go like

```

    PC_to_MAR <-----\  Get contents of program counter,
       |            |  move to memory address register.
       |            |
       v            |
MEM_to_INS_PC_inc   |  memory contents to instruction register,
       |            |  increment program counter,
       |            |  determine next state based on instruction.
       v            |
.. instruction ..   |  usually executes the actual instruction.
..  dependent  ..   |  might take multiple cycles.
       |            |
       +------------/
       |
       | (eventually)
       v
      HALT <+          Nothing is done anymore.
       +----+
```

[It took me some time](https://woof.tech/@uvok/115922235276385631) to figure out
how to do it exactly, especially since I still had to figure out how clocked
vs combinatoric components work. One of the tricks "to make it more efficient"
was to
[negate the PC clock](https://git.uvok.de/fpga-exper/tree/eater_cpu/eater_computer.sv?h=main&id=4cc62801974319a0ea2a1ed59fcf61aa9afed5bd#n145),
this way, I could increment the program counter basically in the same clock
cycle as the instruction was decoded (step 2 of the state machine), only
on the falling edge.

What's nice, I can simulate this and record the waveforms, to get an even better
understanding of what exactly happens:

{% linked_image
  img="https://pics.uvokchee.de/_data/i/upload/2026/03/22/20260322153652-e6e29632-me.png"
  alt="Screenshot of a waveform viewer, showing various CPU flags and states"
  url="https://pics.uvokchee.de/upload/2026/03/22/20260322153652-e6e29632.png"
%}

Unfortunately, I couldn't both get the "full state names" into the picture,
as well is the whole program. My screen width is limited. I put the whole
stuff [on my git repo](https://git.uvok.de/fpga-exper/tree/eater_cpu?h=main),
though, so feel free to check it out.

## Footnotes