_posts/2026-03-22-verilog-how-does-a-cpu-actually-work.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83

---
layout: post
title: 'Verilog: How does a CPU actually work?'
lang: en
categories: tech
date: 2026-03-22 16:55 +0100
description: How to write a CPU in Verilog
---

{% include verilog.md %}

A.k.a. "What's a fetch-decode-execute" cycle?

So, yeah, I studied electrical engineering, and I learned about
the usual CPU execution cycle, and the CPU architectures
von Neumann vs. Harvard, but I never really thought much about it,
it was always very abstract.

Now, with my Verilog experiments, I could dig deeper into this.
I implemented both the 
[Nandgame CPU](https://nandgame.com/) and
[Ben Eaters 8 bit breadboard CPU](https://eater.net/8bit)[^2].

[^2]: To a degree? I already haven't touched the project
      again in months…

The Nandgame one was easy, as fetch/decode/execute would happen basically
within one cycle. (Harvard[^1] architecture).

[^1]: Or at least, Harvard-ish architecture. 
      Don't ask me for specifics, I didn't study computer science.

However, with the Ben Eater CPU (von-Neumann) architecture, things
get a bit more involved. In his video series, Ben used an EEPROM
to manage the execution. But I thought I can do better! With a Finite
State Machine! The states usually go like

```

    PC_to_MAR <-----\  Get contents of program counter,
       |            |  move to memory address register.
       |            |
       v            |
MEM_to_INS_PC_inc   |  memory contents to instruction register,
       |            |  increment program counter,
       |            |  determine next state based on instruction.
       v            |
.. instruction ..   |  usually executes the actual instruction.
..  dependent  ..   |  might take multiple cycles.
       |            |
       +------------/
       |
       | (eventually)
       v
      HALT <+          Nothing is done anymore.
       +----+
```

[It took me some time](https://woof.tech/@uvok/115922235276385631) to figure out
how to do it exactly, especially since I still had to figure out how clocked
vs combinatoric components work. One of the tricks "to make it more efficient"
was to
[negate the PC clock](https://git.uvok.de/fpga-exper/tree/eater_cpu/eater_computer.sv?h=main&id=4cc62801974319a0ea2a1ed59fcf61aa9afed5bd#n145),
this way, I could increment the program counter basically in the same clock
cycle as the instruction was decoded (step 2 of the state machine), only
on the falling edge.

What's nice, I can simulate this and record the waveforms, to get an even better
understanding of what exactly happens:

{% linked_image
  img="https://pics.uvokchee.de/_data/i/upload/2026/03/22/20260322153652-e6e29632-me.png"
  alt="Screenshot of a waveform viewer, showing various CPU flags and states"
  url="https://pics.uvokchee.de/upload/2026/03/22/20260322153652-e6e29632.png"
%}

Unfortunately, I couldn't both get the "full state names" into the picture,
as well is the whole program. My screen width is limited. I put the whole
stuff [on my git repo](https://git.uvok.de/fpga-exper/tree/eater_cpu?h=main),
though, so feel free to check it out.

## Footnotes