wezm.net/v2/content/posts/2023/systemd-gdb-coredumps.md

9.4 KiB

+++ title = "Debugging a Docker Core Dump" date = 2023-02-25T10:39:49+10:00

#[extra] #updated = 2023-01-11T21:11:28+10:00 +++

On my main machine I use an excellent cross-platform tool called Docuum that automatically cleans up unused docker images. This allows me to use Docker without the need to periodically wonder why I'm out of disk space, run docker system prune and recover half my disk.

I installed Docuum via the AUR package (although tweaked to build the latest Docuum release) and ran it via the bundled systemd service definition. This worked great for a while but some time back it started failing. Every time Docuum would try to check for things to clean up I'd see the following in the system journal:

Feb 25 10:03:12 ryzen docuum[77751]: [2023-02-25 10:03:12 +10:00 INFO] Performing an initial vacuum on startup…
Feb 25 10:03:12 ryzen kernel: audit: type=1326 audit(1677283392.831:2): auid=4294967295 uid=0 gid=0 ses=4294967295 pid=77763 comm="docker" exe="/usr/bin/docke>
Feb 25 10:03:12 ryzen systemd[1]: Created slice Slice /system/systemd-coredump.
Feb 25 10:03:12 ryzen systemd[1]: Started Process Core Dump (PID 77768/UID 0).
Feb 25 10:03:13 ryzen systemd-coredump[77769]: [🡕] Process 77763 (docker) of user 0 dumped core.

                                               Stack trace of thread 77763:
                                               #0  0x00005568dbcb5c4e n/a (docker + 0x243c4e)
                                               #1  0x00005568dbd35a3b n/a (docker + 0x2c3a3b)
                                               #2  0x00005568dbd3482f n/a (docker + 0x2c282f)
                                               #3  0x00005568dbd6c2ee n/a (docker + 0x2fa2ee)
                                               #4  0x00005568dbcfafa8 n/a (docker + 0x288fa8)
                                               #5  0x00005568dbcfaef1 n/a (docker + 0x288ef1)
                                               #6  0x00005568dbced953 n/a (docker + 0x27b953)
                                               #7  0x00005568dbd1eb41 n/a (docker + 0x2acb41)
                                               ELF object binary architecture: AMD x86-64
Feb 25 10:03:13 ryzen docuum[77751]: [2023-02-25 10:03:13 +10:00 ERROR] Unable to list images.
Feb 25 10:03:13 ryzen docuum[77751]: [2023-02-25 10:03:13 +10:00 INFO] Retrying in 5 seconds…
Feb 25 10:03:13 ryzen systemd[1]: systemd-coredump@0-77768-0.service: Deactivated successfully.

This would repeat every 5 seconds. I ignored this for a while but finally decided to investigate it today. To find the failing command I ran coredumpctl list then identified one of the docker crashes and ran coredumpctl info with its PID:

$ coredumpctl info 78255
           PID: 78255 (docker)
           UID: 0 (root)
           GID: 0 (root)
        Signal: 31 (SYS)
     Timestamp: Sat 2023-02-25 10:03:23 AEST (44min ago)
  Command Line: docker image ls --all --no-trunc --format $'{{.ID}}\\t{{.Repository}}\\t{{.Tag}}\\t{{.CreatedAt}}'
    Executable: /usr/bin/docker
 Control Group: /system.slice/docuum.service
          Unit: docuum.service
         Slice: system.slice
       Boot ID: 0ac9f0dd246548949c3a90a0e7494665
    Machine ID: affcb0b7a7d1464385d65464d9be450e
      Hostname: ryzen
       Storage: /var/lib/systemd/coredump/core.docker.0.0ac9f0dd246548949c3a90a0e7494665.78255.1677283403000000.zst (inaccessible)
       Message: Process 78255 (docker) of user 0 dumped core.

Strangely I could run the command myself (Command Line line) just fine. I figured I needed to see where in docker it was crashing. I learned how to access systemd coredumps with gdb and ran: sudo coredumpctl gdb 78255. sudo is needed because the core dump belongs to root due the crashing docker process belonging to root. This didn't yield much extra info as debug symbols were not present for the binary. It did identify why it crashed though:

Program terminated with signal SIGSYS, Bad system call.

Knowing that docker was implemented in Go and that they make system calls manually on Linux I wondered if this was some sort of Go bug—although given the popularity of Docker this did seem unlikely.

To get more info I needed debug symbols. Arch Linux makes these available via debuginfod and gdb can automatically download debuginfo files if DEBUGINFOD_URLS is set. I reran gdb, telling sudo to pass the DEBUGINFOD_URLS environment variable through (I had already set DEBUGINFOD_URLS=https://debuginfod.archlinux.org/ in my Zsh config some time ago):

sudo --preserve-env=DEBUGINFOD_URLS coredumpctl gdb 78255

Now there was a proper backtrace:

#0  runtime/internal/syscall.Syscall6 () at runtime/internal/syscall/asm_linux_amd64.s:36
#1  0x00005588e7b93c33 in syscall.RawSyscall6 (num=160, a1=7, a2=94046491589710, a3=7, a4=824634289408, a5=0, a6=0, r1=<optimized out>, r2=<optimized out>,
    errno=<optimized out>) at runtime/internal/syscall/syscall_linux.go:38
#2  0x00005588e7c13a3b in syscall.RawSyscall (trap=160, a1=7, a2=94046491589710, a3=7, r1=<optimized out>, r2=<optimized out>, err=<optimized out>)
    at syscall/syscall_linux.go:62
#3  0x00005588e7c1282f in syscall.Setrlimit (resource=<optimized out>, rlim=<optimized out>, err=...) at syscall/zsyscall_linux_amd64.go:1326
#4  0x00005588e7c4a2ee in os.init.1 () at os/rlimit.go:30
#5  0x00005588e7bd8fa8 in runtime.doInit (t=0x5588e91a7be0 <os.[inittask]>) at runtime/proc.go:6506
#6  0x00005588e7bd8ef1 in runtime.doInit (t=0x5588e91a9900 <main.[inittask]>) at runtime/proc.go:6483
#7  0x00005588e7bcb953 in runtime.main () at runtime/proc.go:233
#8  0x00005588e7bfcb41 in runtime.goexit () at runtime/asm_amd64.s:1598
#9  0x0000000000000000 in ?? ()

So the issue seems to be a call to setrlimit. Looking at the code and searching the Go issue tracker didn't turn up anyone else having this issue, which pointed at an issue on my system.

I'm honestly not sure what led me to the next step but I decided to take a look at the Docuum service definition. I was surprised to see that it was more complicated than most definitions I'm used to seeing:

[Unit]
Description=LRU eviction of Docker images
Documentation=https://github.com/stepchowfun/docuum
DefaultDependencies=false
After=docker.service docker.socket
Requires=docker.service docker.socket

[Service]
Type=simple
Environment=DOCUUM_THRESHOLD=10GB
EnvironmentFile=-/etc/default/docuum
ExecStart=/usr/bin/docuum --threshold $DOCUUM_THRESHOLD
ProtectSystem=full
PrivateTmp=true
PrivateDevices=true
PrivateNetwork=true
CapabilityBoundingSet=
KeyringMode=private
RestrictNamespaces=~cgroup ipc net mnt pid user uts
RestrictAddressFamilies=AF_UNIX
ReadWritePaths=/var/run/docker.sock
DeviceAllow=
IPAddressDeny=any
NoNewPrivileges=true
PrivateTmp=true
PrivateDevices=true
PrivateMounts=true
PrivateUsers=true
ProtectControlGroups=true
ProtectSystem=strict
ProtectHome=tmpfs
ProtectKernelModules=true
ProtectKernelTunables=true
RestrictSUIDSGID=true
SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~@privileged @resources
RestrictRealtime=true
LockPersonality=true
MemoryDenyWriteExecute=true
UMask=0077
ProtectHostname=true

[Install]
WantedBy=multi-user.target

Suspiciously it seemed to be doing some sandboxing and filtering system calls (SystemCallFilter). A bit more research pointed me to systemd-analyze syscall-filter, which lists which system calls belong to the predefined system call sets (@privileged, @resources, etc.).

setrlimit was listed under @resources:

@resources
    # Alter resource settings
    ioprio_set
    mbind
    migrate_pages
    move_pages
    nice
    sched_setaffinity
    sched_setattr
    sched_setparam
    sched_setscheduler
    set_mempolicy
    setpriority
    setrlimit

The systemd docs for SystemCallFilter also mentioned:

If the first character of the list is "~", the effect is inverted: only the listed system calls will result in immediate process termination (deny-listing)

So we finally had our culprit: the service definition was denying system calls in the @resources set and at some point docker had started making setrlimit calls, which were resulting in termination.

The fix was simple enough: I removed @resources from the deny list, rebuilt the package, and then re-enabled the docuum service (I'd previously disabled it do to the constant crashes). I was pleased to see it start successfully and begin vacuuming up a few months of Docker image detritus.

Conclusion

This small debugging session taught me a number of things: I learned how to find and list core dumps managed by systemd, how to open them in GDB with symbols present, and that systemd has powerful, fine-grained system call sandboxing.

I was ultimately able to resolve the issue and get Docuum working again. I have published my patched version of the AUR package to my personal AUR repo in case it's useful to anyone else:

https://github.com/wezm/aur/tree/master/docuum.