summaryrefslogtreecommitdiff
path: root/_drafts
diff options
context:
space:
mode:
authoruvok cheetah2025-01-09 14:09:23 +0100
committeruvok cheetah2025-01-09 14:09:23 +0100
commit57439ab15f1f3c8ab482f03045cba381d2224563 (patch)
treebf81c5e3bc7e77ce81cd156f1dbda15933a20ae1 /_drafts
parentee9419d84036b5c77bc1530ac7bfd6488a09a003 (diff)
Bird CPU usage, cnt
Diffstat (limited to '_drafts')
-rw-r--r--_drafts/bird-cpu-usage.md46
1 files changed, 38 insertions, 8 deletions
diff --git a/_drafts/bird-cpu-usage.md b/_drafts/bird-cpu-usage.md
index 136a475..36bad4e 100644
--- a/_drafts/bird-cpu-usage.md
+++ b/_drafts/bird-cpu-usage.md
@@ -14,9 +14,8 @@ alt="RRD tool graphic showing a high CPU usage" %}
I found it strange, but had no time to inspect further.
-Recently, I tried to investigate what happens on the server there.
-htop showed high CPU usage for both bird and dnsmasq (always
-together) in these times.
+Recently, I tried to investigate what happens on the server there. htop showed
+high CPU usage for both bird and dnsmasq (always together) in these times.
Fuming a bit, I went with a brute-force approach:
@@ -97,14 +96,45 @@ debug symbols](https://wiki.debian.org/HowToGetABacktrace) - it doesn't for
dnsmasq (yet) in bookworm. Luckily, the nice people at labs.nic.cz provide a
dbgsym package in their Bird(2) Debian repository.
-Now, stracing dnsmasq (when "idle") reveals some recvmsg of type `RTM_NEWROUTE`.
-I have *no idea* why dnsmasq would need that. But I already *assume* the high
-CPU usage occurs when Bird exports lots of routes to the kernel.
+Now, stracing dnsmasq (when "idle") reveals some recvmsg of type
+`RTM_NEWROUTE`. I have *no idea* why dnsmasq would need that. But I already
+*assume* the high CPU usage occurs when Bird exports lots of routes to the
+kernel.
Also, in journalctl, I see lots of the infamous `Kernel dropped some netlink
messages, will resync on next scan.` messages at times - the message apparently
nobody has a solution to, and even though there are mailing list posts telling
to sysctl `net.core.rmem_default`, I doesn't seem to yield a solution.
-[1] At least when I want to see the binaries function names.
- Kernel symbols seem to show up fine.
+[1] At least when I want to see the binaries function names. Kernel symbols
+seem to show up fine.
+
+So… step-by-step? I have both DN42 and clearnet bird running on this VPS, in
+parallel. So maybe start by disabling one of these? Or, I have a better idea,
+keep it enabled, and disable all protocols! (`birdc disable \"*\"`).
+
+That helped, until midnight. When suddenly the CPU went up again. WTF? Let's
+have a look at `birdc s p`. All protocols up. Huh!?!?
+
+Let's investigate further:
+
+* Oooo, log rotation happens at midnight
+* Fuck, I specified the same log files for both bird's
+
+Well, log rotation. Which I added manually. This does a `birdc configure`
+afterward. Which means the protocols go up again, because I disabled them on
+the command line, not in the config.
+
+Ungh. Okay, this is getting really ugly. `systemctl disable --now bird-clear`.
+Now let's run this for a few days...
+
+That seem to have helped. I not decided to edit the clearnet config and disable
+all "external" protocols (to IXPs), keeping Babel, RPKI, and IBGP enabled.
+Immediately after starting bird-clearnet, the CPU usage went up again. To be
+expected, for the initial sync. But it kept being high. So, I disabled the RPKI
+protocols as well and... suddenly, the CPU usage is down???
+
+I actually did a `perf record` of bird when the CPU usage was high, and I saw
+that `rt_event` was running 33% of the time. I don't know what to make of that.
+
+