From 4b12ad7fefe65616c1c6ad982e77d9d5648756eb Mon Sep 17 00:00:00 2001 From: Wesley Moore Date: Sat, 25 Feb 2023 11:43:00 +1000 Subject: [PATCH] Add Docker debug post --- .../posts/2023/systemd-gdb-coredumps.md | 221 ++++++++++++++++++ 1 file changed, 221 insertions(+) create mode 100644 v2/content/posts/2023/systemd-gdb-coredumps.md diff --git a/v2/content/posts/2023/systemd-gdb-coredumps.md b/v2/content/posts/2023/systemd-gdb-coredumps.md new file mode 100644 index 0000000..781ec36 --- /dev/null +++ b/v2/content/posts/2023/systemd-gdb-coredumps.md @@ -0,0 +1,221 @@ ++++ +title = "Debugging a Docker Core Dump" +date = 2023-02-25T10:39:49+10:00 + +#[extra] +#updated = 2023-01-11T21:11:28+10:00 ++++ + +On my main machine I use an excellent cross-platform tool called [Docuum] that +automatically cleans up unused docker images. This allows me to use Docker +without the need to periodically wonder why I'm out of disk space, run `docker +system prune` and recover half my disk. + +I installed Docuum via [the AUR package][aurpkg] (although tweaked to build the +latest Docuum release) and ran it via the bundled systemd service definition. +This worked great for a while but some time back it started failing. Every time +Docuum would try to check for things to clean up I'd see the following in the +system journal: + + + +``` +Feb 25 10:03:12 ryzen docuum[77751]: [2023-02-25 10:03:12 +10:00 INFO] Performing an initial vacuum on startup… +Feb 25 10:03:12 ryzen kernel: audit: type=1326 audit(1677283392.831:2): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=77763 comm="docker" exe="/usr/bin/docke> +Feb 25 10:03:12 ryzen systemd[1]: Created slice Slice /system/systemd-coredump. +Feb 25 10:03:12 ryzen systemd[1]: Started Process Core Dump (PID 77768/UID 0). +Feb 25 10:03:13 ryzen systemd-coredump[77769]: [🡕] Process 77763 (docker) of user 0 dumped core. + + Stack trace of thread 77763: + #0 0x00005568dbcb5c4e n/a (docker + 0x243c4e) + #1 0x00005568dbd35a3b n/a (docker + 0x2c3a3b) + #2 0x00005568dbd3482f n/a (docker + 0x2c282f) + #3 0x00005568dbd6c2ee n/a (docker + 0x2fa2ee) + #4 0x00005568dbcfafa8 n/a (docker + 0x288fa8) + #5 0x00005568dbcfaef1 n/a (docker + 0x288ef1) + #6 0x00005568dbced953 n/a (docker + 0x27b953) + #7 0x00005568dbd1eb41 n/a (docker + 0x2acb41) + ELF object binary architecture: AMD x86-64 +Feb 25 10:03:13 ryzen docuum[77751]: [2023-02-25 10:03:13 +10:00 ERROR] Unable to list images. +Feb 25 10:03:13 ryzen docuum[77751]: [2023-02-25 10:03:13 +10:00 INFO] Retrying in 5 seconds… +Feb 25 10:03:13 ryzen systemd[1]: systemd-coredump@0-77768-0.service: Deactivated successfully. +``` + +This would repeat every 5 seconds. I ignored this for a while but finally decided +to investigate it today. To find the failing command I ran +`coredumpctl list` then identified one of the docker crashes and ran +`coredumpctl info` with its PID: + +``` +$ coredumpctl info 78255 + PID: 78255 (docker) + UID: 0 (root) + GID: 0 (root) + Signal: 31 (SYS) + Timestamp: Sat 2023-02-25 10:03:23 AEST (44min ago) + Command Line: docker image ls --all --no-trunc --format $'{{.ID}}\\t{{.Repository}}\\t{{.Tag}}\\t{{.CreatedAt}}' + Executable: /usr/bin/docker + Control Group: /system.slice/docuum.service + Unit: docuum.service + Slice: system.slice + Boot ID: 0ac9f0dd246548949c3a90a0e7494665 + Machine ID: affcb0b7a7d1464385d65464d9be450e + Hostname: ryzen + Storage: /var/lib/systemd/coredump/core.docker.0.0ac9f0dd246548949c3a90a0e7494665.78255.1677283403000000.zst (inaccessible) + Message: Process 78255 (docker) of user 0 dumped core. +``` + +Strangely I could run the command myself (`Command Line` line) just fine. I +figured I needed to see where in docker it was crashing. I learned how to +access systemd coredumps with gdb and ran: `sudo coredumpctl gdb 78255`. `sudo` +is needed because the core dump belongs to root due the crashing `docker` +process belonging to root. This didn't yield much extra info as debug symbols +were not present for the binary. It did identify why it crashed though: + +> Program terminated with signal SIGSYS, Bad system call. + +Knowing that docker was implemented in Go and that they make system calls +manually on Linux I wondered if this was some sort of Go bug—although given +the popularity of Docker this did seem unlikely. + +To get more info I needed debug symbols. Arch Linux makes these available via +[debuginfod](https://wiki.archlinux.org/title/Debuginfod) and gdb can +automatically download debuginfo files if `DEBUGINFOD_URLS` is set. I reran +`gdb`, telling `sudo` to pass the `DEBUGINFOD_URLS` environment variable +through (I had already set `DEBUGINFOD_URLS=https://debuginfod.archlinux.org/` +in my Zsh config some time ago): + +``` +sudo --preserve-env=DEBUGINFOD_URLS coredumpctl gdb 78255 +``` + +Now there was a proper backtrace: + +``` +#0 runtime/internal/syscall.Syscall6 () at runtime/internal/syscall/asm_linux_amd64.s:36 +#1 0x00005588e7b93c33 in syscall.RawSyscall6 (num=160, a1=7, a2=94046491589710, a3=7, a4=824634289408, a5=0, a6=0, r1=, r2=, + errno=) at runtime/internal/syscall/syscall_linux.go:38 +#2 0x00005588e7c13a3b in syscall.RawSyscall (trap=160, a1=7, a2=94046491589710, a3=7, r1=, r2=, err=) + at syscall/syscall_linux.go:62 +#3 0x00005588e7c1282f in syscall.Setrlimit (resource=, rlim=, err=...) at syscall/zsyscall_linux_amd64.go:1326 +#4 0x00005588e7c4a2ee in os.init.1 () at os/rlimit.go:30 +#5 0x00005588e7bd8fa8 in runtime.doInit (t=0x5588e91a7be0 ) at runtime/proc.go:6506 +#6 0x00005588e7bd8ef1 in runtime.doInit (t=0x5588e91a9900 ) at runtime/proc.go:6483 +#7 0x00005588e7bcb953 in runtime.main () at runtime/proc.go:233 +#8 0x00005588e7bfcb41 in runtime.goexit () at runtime/asm_amd64.s:1598 +#9 0x0000000000000000 in ?? () +``` + +So the issue seems to be a call to `setrlimit`. Looking at [the code][gocode] +and searching the Go issue tracker didn't turn up anyone else having this +issue, which pointed at an issue on my system. + +I'm honestly not sure what led me to the next step but I decided to take a look +at the Docuum service definition. I was surprised to see that it was more +complicated than most definitions I'm used to seeing: + +```ini +[Unit] +Description=LRU eviction of Docker images +Documentation=https://github.com/stepchowfun/docuum +DefaultDependencies=false +After=docker.service docker.socket +Requires=docker.service docker.socket + +[Service] +Type=simple +Environment=DOCUUM_THRESHOLD=10GB +EnvironmentFile=-/etc/default/docuum +ExecStart=/usr/bin/docuum --threshold $DOCUUM_THRESHOLD +ProtectSystem=full +PrivateTmp=true +PrivateDevices=true +PrivateNetwork=true +CapabilityBoundingSet= +KeyringMode=private +RestrictNamespaces=~cgroup ipc net mnt pid user uts +RestrictAddressFamilies=AF_UNIX +ReadWritePaths=/var/run/docker.sock +DeviceAllow= +IPAddressDeny=any +NoNewPrivileges=true +PrivateTmp=true +PrivateDevices=true +PrivateMounts=true +PrivateUsers=true +ProtectControlGroups=true +ProtectSystem=strict +ProtectHome=tmpfs +ProtectKernelModules=true +ProtectKernelTunables=true +RestrictSUIDSGID=true +SystemCallArchitectures=native +SystemCallFilter=@system-service +SystemCallFilter=~@privileged @resources +RestrictRealtime=true +LockPersonality=true +MemoryDenyWriteExecute=true +UMask=0077 +ProtectHostname=true + +[Install] +WantedBy=multi-user.target +``` + +Suspiciously it seemed to be doing some sandboxing and filtering system calls +(`SystemCallFilter`). A bit more research pointed me to `systemd-analyze +syscall-filter`, which lists which system calls belong to the predefined system +call sets (`@privileged`, `@resources`, etc.). + +`setrlimit` was listed under `@resources`: + +``` +@resources + # Alter resource settings + ioprio_set + mbind + migrate_pages + move_pages + nice + sched_setaffinity + sched_setattr + sched_setparam + sched_setscheduler + set_mempolicy + setpriority + setrlimit +``` + +The [systemd docs for `SystemCallFilter`][SystemCallFilter] also mentioned: + +> If the first character of the list is "~", the effect is inverted: only the +> listed system calls will result in immediate process termination +> (deny-listing) + +So we finally had our culprit: the service definition was denying system calls in +the `@resources` set and at some point `docker` had started making `setrlimit` calls, +which were resulting in termination. + +The fix was simple enough: I removed `@resources` from the deny list, rebuilt +the package, and then re-enabled the `docuum` service (I'd previously disabled +it do to the constant crashes). I was pleased to see it start successfully and +begin vacuuming up a few months of Docker image detritus. + +### Conclusion + +This small debugging session taught me a number of things: I learned how to +find and list core dumps managed by systemd, how to open them in GDB with +symbols present, and that systemd has powerful, fine-grained system call +sandboxing. + +I was ultimately able to resolve the issue and get Docuum working again. +I have published my patched version of the AUR package to my personal AUR +repo in case it's useful to anyone else: + +. + +[aurpkg]: https://aur.archlinux.org/packages/docuum +[Docuum]: https://github.com/stepchowfun/docuum +[gocode]: https://github.com/golang/go/blob/203e59ad41bd288e1d92b6f617c2f55e70d3c8e3/src/syscall/zsyscall_linux_amd64.go#L1335 +[service]: https://aur.archlinux.org/cgit/aur.git/tree/docuum.service?h=docuum&id=650d2c24fe9df712e8a98dde37f3ee47d3af4e47 +[SystemCallFilter]: https://www.freedesktop.org/software/systemd/man/systemd.exec.html#System%20Call%20Filtering