diff options
-rw-r--r-- | _config.yml | 3 | ||||
-rw-r--r-- | _data/news.yml | 2 | ||||
-rw-r--r-- | _drafts/bird-cpu-usage.md | 110 | ||||
-rw-r--r-- | _includes/header.html | 1 | ||||
-rw-r--r-- | _layouts/postlist.html | 3 | ||||
-rw-r--r-- | dn42.md | 2 | ||||
-rw-r--r-- | projects.md | 31 |
7 files changed, 149 insertions, 3 deletions
diff --git a/_config.yml b/_config.yml index 98e68aa..cc4ac74 100644 --- a/_config.yml +++ b/_config.yml @@ -72,3 +72,6 @@ autopages: enabled: false cssversion: "2024010301" + +feed: + posts_limit: 20 diff --git a/_data/news.yml b/_data/news.yml index 4684514..0e64621 100644 --- a/_data/news.yml +++ b/_data/news.yml @@ -2,7 +2,7 @@ content: >- <div> <div style="font-weight: bold">You don't seem to be using an ad blocker.</div> - <div>Please consider installing one.</div> + <div>Please consider installing one. I would recommend uBlock Origin.</div> </div> - id: news1 content: >- diff --git a/_drafts/bird-cpu-usage.md b/_drafts/bird-cpu-usage.md new file mode 100644 index 0000000..136a475 --- /dev/null +++ b/_drafts/bird-cpu-usage.md @@ -0,0 +1,110 @@ +--- +layout: post +title: Bird CPU usage +date: 2025-01-07 19:06 +0100 +categories: tech +lang: en +--- + +Several times already, I noticed this in my Munin monitoring: + +{% image +img="https://pics.uvokchee.de/upload/2025/01/07/20250107180512-c1453895.png" +alt="RRD tool graphic showing a high CPU usage" %} + +I found it strange, but had no time to inspect further. + +Recently, I tried to investigate what happens on the server there. +htop showed high CPU usage for both bird and dnsmasq (always +together) in these times. + +Fuming a bit, I went with a brute-force approach: + +``` +#!/bin/bash + +# Configuration +THRESHOLD=1.0 +# 3 hours +DURATION=$((3*80*80)) +MAIL_TO="lolnope" +SUBJECT="High Load Average Alert" +BODY="The load average has been above ${THRESHOLD} for more than 3 hours." +REBOOT_CMD="/sbin/reboot" + +# Function to check the load average +check_load() { + # 15 min loadavg + loadavg=$(awk '{print $3}' /proc/loadavg) + echo "$(date): Current Load Average: $loadavg" + + if (( $(echo "$loadavg > $THRESHOLD" | bc -l) )); then + echo "$(date): Load average is above threshold." + return 0 + else + echo "$(date): Load average is below threshold." + return 1 + fi +} + +# Monitor the load average +start_time=$(date +%s) +while true; do + if check_load; then + current_time=$(date +%s) + elapsed_time=$((current_time - start_time)) + + if [ "$elapsed_time" -gt "$DURATION" ]; then + echo "$(date): Load average has been above threshold for more than 3 hours." + + # Send alert email + (echo "$BODY"; ps -e -o %cpu,%mem,cmd --sort pcpu | tail) | mail -s "$SUBJECT" "$MAIL_TO" + + # Reboot the server +# systemctl stop bird +# systemctl start bird + $REBOOT_CMD + break + fi + else + start_time=$(date +%s) + fi + sleep 300 # Check every 5 minutes +done + +``` + +Specifically, the output of ps + +``` +22.7 2.7 /usr/sbin/bird -f -u bird -g bird +33.3 0.1 ps -e -o %cpu,%mem,cmd --sort pcpu +37.4 0.0 /usr/sbin/dnsmasq -x /run/dnsmasq/dnsmasq.pid -u dnsmasq -7 /etc/dnsmasq.d,.dpkg-dist,.dpkg-old,.dpkg-new,.bak --local-service --trust-anchor=.,20326,8,2,e06d44b80b8f1d39a95c0b0d7c65d08458e880409bbc683457104237c7f8ec8d +``` + +confirmed the suspicion - although the "percentage" is a bit weird. From the +manpage: + +> Currently, it is the CPU time used divided by the time the process has been +> running (cputime/realtime ratio), expressed as a percentage. + +(So if the process runs "long enough" and only starts misbehaving after a year, +it won't show up?). + +I asked an LLM what to do, in addition to strace, and it suggested perf. +Unfortunately, this requires debug symbols [1]. [And while Debian does provide +debug symbols](https://wiki.debian.org/HowToGetABacktrace) - it doesn't for +dnsmasq (yet) in bookworm. Luckily, the nice people at labs.nic.cz provide a +dbgsym package in their Bird(2) Debian repository. + +Now, stracing dnsmasq (when "idle") reveals some recvmsg of type `RTM_NEWROUTE`. +I have *no idea* why dnsmasq would need that. But I already *assume* the high +CPU usage occurs when Bird exports lots of routes to the kernel. + +Also, in journalctl, I see lots of the infamous `Kernel dropped some netlink +messages, will resync on next scan.` messages at times - the message apparently +nobody has a solution to, and even though there are mailing list posts telling +to sysctl `net.core.rmem_default`, I doesn't seem to yield a solution. + +[1] At least when I want to see the binaries function names. + Kernel symbols seem to show up fine. diff --git a/_includes/header.html b/_includes/header.html index 492dcd5..86e812e 100644 --- a/_includes/header.html +++ b/_includes/header.html @@ -13,7 +13,6 @@ </label> <div class="trigger" lang="en"> {% include_cached navlinks.html %} - <a class="page-link" href="https://uvokchee.de/wiki/">Wiki</a> <a class="rss-subscribe page-link" href="{{ "/feed.xml" | relative_url }}">RSS feed</a> </div> </nav> diff --git a/_layouts/postlist.html b/_layouts/postlist.html index 67a1d48..52d07df 100644 --- a/_layouts/postlist.html +++ b/_layouts/postlist.html @@ -39,6 +39,9 @@ layout: default {%- elsif site.show_excerpts -%} {{ post.excerpt }} {%- assign showsep = true -%} + {%- elsif post.description -%} + {{ post.description }} + {%- assign showsep = true -%} {%- endif -%} </div> </li> @@ -1,7 +1,7 @@ --- layout: page title: DN42 -in_navbar: true +in_navbar: false order: 50 lang: "en" --- diff --git a/projects.md b/projects.md new file mode 100644 index 0000000..b3d47f7 --- /dev/null +++ b/projects.md @@ -0,0 +1,31 @@ +--- +layout: page +title: Projects +in_navbar: true +order: 50 +lang: "en" +--- + +Some of my projects and sites: + +- This blog you're reading right now +- [DN42]({% link dn42.md %}) +- [Wiki](https://uvokchee.de/wiki/) +- [Funkwhale](https://fw.uvok.de/) (currently defunc) +- [Personal Matrix and XMPP Server]({% link contact.html %}) +- Running an authoritative DNS server with [PowerDNS](https://www.powerdns.com/powerdns-community) +- Running various VPS (all with Debian, of course) +- Running [OpenWRT](https://openwrt.org/) in my home network (e.g. for tagged VLAN) [2] +- Running [Proxmox](https://www.proxmox.com/en/) (hosting various LXC containers) in my home +- Running an [RIPE ATLAS](https://atlas.ripe.net/docs/) probe - [software](https://github.com/RIPE-NCC/ripe-atlas-software-probe) +- [OpenPGP WKD](https://wiki.gnupg.org/WKD) via DNS + (for shits and giggles - I don't really write mail, and PGP has it's usability problems) +- [Git server](https://git.uvok.de/) + using [gitolite](https://gitolite.com/gitolite/index.html) [1] + and [cgit](https://git.zx2c4.com/cgit/) + +[1] Can really recommend this if you *don't* want a full-blown Git hosting with + "UI" / CI etc. - just the bare-bones git repository hosting. + (Which saves you setting up the bare git repos manually, though). \ +[2] I tried OPNsense in the past, too. But at some point I ran into problems I couldn't fix. + Also, it was virtualized inside Proxmox. Not the optimal solution. |