diff options
author | uvok cheetah | 2025-01-09 14:09:23 +0100 |
---|---|---|
committer | uvok cheetah | 2025-01-09 14:09:23 +0100 |
commit | 57439ab15f1f3c8ab482f03045cba381d2224563 (patch) | |
tree | bf81c5e3bc7e77ce81cd156f1dbda15933a20ae1 | |
parent | ee9419d84036b5c77bc1530ac7bfd6488a09a003 (diff) |
Bird CPU usage, cnt
-rw-r--r-- | _drafts/bird-cpu-usage.md | 46 |
1 files changed, 38 insertions, 8 deletions
diff --git a/_drafts/bird-cpu-usage.md b/_drafts/bird-cpu-usage.md index 136a475..36bad4e 100644 --- a/_drafts/bird-cpu-usage.md +++ b/_drafts/bird-cpu-usage.md @@ -14,9 +14,8 @@ alt="RRD tool graphic showing a high CPU usage" %} I found it strange, but had no time to inspect further. -Recently, I tried to investigate what happens on the server there. -htop showed high CPU usage for both bird and dnsmasq (always -together) in these times. +Recently, I tried to investigate what happens on the server there. htop showed +high CPU usage for both bird and dnsmasq (always together) in these times. Fuming a bit, I went with a brute-force approach: @@ -97,14 +96,45 @@ debug symbols](https://wiki.debian.org/HowToGetABacktrace) - it doesn't for dnsmasq (yet) in bookworm. Luckily, the nice people at labs.nic.cz provide a dbgsym package in their Bird(2) Debian repository. -Now, stracing dnsmasq (when "idle") reveals some recvmsg of type `RTM_NEWROUTE`. -I have *no idea* why dnsmasq would need that. But I already *assume* the high -CPU usage occurs when Bird exports lots of routes to the kernel. +Now, stracing dnsmasq (when "idle") reveals some recvmsg of type +`RTM_NEWROUTE`. I have *no idea* why dnsmasq would need that. But I already +*assume* the high CPU usage occurs when Bird exports lots of routes to the +kernel. Also, in journalctl, I see lots of the infamous `Kernel dropped some netlink messages, will resync on next scan.` messages at times - the message apparently nobody has a solution to, and even though there are mailing list posts telling to sysctl `net.core.rmem_default`, I doesn't seem to yield a solution. -[1] At least when I want to see the binaries function names. - Kernel symbols seem to show up fine. +[1] At least when I want to see the binaries function names. Kernel symbols +seem to show up fine. + +So… step-by-step? I have both DN42 and clearnet bird running on this VPS, in +parallel. So maybe start by disabling one of these? Or, I have a better idea, +keep it enabled, and disable all protocols! (`birdc disable \"*\"`). + +That helped, until midnight. When suddenly the CPU went up again. WTF? Let's +have a look at `birdc s p`. All protocols up. Huh!?!? + +Let's investigate further: + +* Oooo, log rotation happens at midnight +* Fuck, I specified the same log files for both bird's + +Well, log rotation. Which I added manually. This does a `birdc configure` +afterward. Which means the protocols go up again, because I disabled them on +the command line, not in the config. + +Ungh. Okay, this is getting really ugly. `systemctl disable --now bird-clear`. +Now let's run this for a few days... + +That seem to have helped. I not decided to edit the clearnet config and disable +all "external" protocols (to IXPs), keeping Babel, RPKI, and IBGP enabled. +Immediately after starting bird-clearnet, the CPU usage went up again. To be +expected, for the initial sync. But it kept being high. So, I disabled the RPKI +protocols as well and... suddenly, the CPU usage is down??? + +I actually did a `perf record` of bird when the CPU usage was high, and I saw +that `rt_event` was running 33% of the time. I don't know what to make of that. + + |