1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
|
---
layout: post
title: 'Verilog: How does a CPU actually work?'
lang: en
categories: tech
date: 2026-03-22 16:55 +0100
description: How to write a CPU in Verilog
---
{% include verilog.md %}
A.k.a. "What's a fetch-decode-execute" cycle?
So, yeah, I studied electrical engineering, and I learned about
the usual CPU execution cycle, and the CPU architectures
von Neumann vs. Harvard, but I never really thought much about it,
it was always very abstract.
Now, with my Verilog experiments, I could dig deeper into this.
I implemented both the
[Nandgame CPU](https://nandgame.com/) and
[Ben Eaters 8 bit breadboard CPU](https://eater.net/8bit)[^2].
[^2]: To a degree? I already haven't touched the project
again in months…
The Nandgame one was easy, as fetch/decode/execute would happen basically
within one cycle. (Harvard[^1] architecture).
[^1]: Or at least, Harvard-ish architecture.
Don't ask me for specifics, I didn't study computer science.
However, with the Ben Eater CPU (von-Neumann) architecture, things
get a bit more involved. In his video series, Ben used an EEPROM
to manage the execution. But I thought I can do better! With a Finite
State Machine! The states usually go like
```
PC_to_MAR <-----\ Get contents of program counter,
| | move to memory address register.
| |
v |
MEM_to_INS_PC_inc | memory contents to instruction register,
| | increment program counter,
| | determine next state based on instruction.
v |
.. instruction .. | usually executes the actual instruction.
.. dependent .. | might take multiple cycles.
| |
+------------/
|
| (eventually)
v
HALT <+ Nothing is done anymore.
+----+
```
[It took me some time](https://woof.tech/@uvok/115922235276385631) to figure out
how to do it exactly, especially since I still had to figure out how clocked
vs combinatoric components work. One of the tricks "to make it more efficient"
was to
[negate the PC clock](https://git.uvok.de/fpga-exper/tree/eater_cpu/eater_computer.sv?h=main&id=4cc62801974319a0ea2a1ed59fcf61aa9afed5bd#n145),
this way, I could increment the program counter basically in the same clock
cycle as the instruction was decoded (step 2 of the state machine), only
on the falling edge.
What's nice, I can simulate this and record the waveforms, to get an even better
understanding of what exactly happens:
{% linked_image
img="https://pics.uvokchee.de/_data/i/upload/2026/03/22/20260322153652-e6e29632-me.png"
alt="Screenshot of a waveform viewer, showing various CPU flags and states"
url="https://pics.uvokchee.de/upload/2026/03/22/20260322153652-e6e29632.png"
%}
Unfortunately, I couldn't both get the "full state names" into the picture,
as well is the whole program. My screen width is limited. I put the whole
stuff [on my git repo](https://git.uvok.de/fpga-exper/tree/eater_cpu?h=main),
though, so feel free to check it out.
## Footnotes
|