Arch Linux: Improve boot time performance

I run Debian on all my servers. It’s a great stable OS and I love it. Proxmox, which I run on my homelab server, is also based on Debian.

However, on my desktop I run Arch Linux. It’s a great distro to tinker with. It comes with a lot of up to date packages, but it also has the AUR - Arch User Repository. So for any app you can find, there probably is an easy way to install it.

Slllooooowwww…

As of late, I noticed that boot times on my system were getting longer. Which is strange, because I run some pretty okay hardware.

As it turns out, cold booting this box takes 1min 7.538s, according to my logs.

Luckily, the Arch Wiki offers a nice guide on how to trouble shoot boot performance.

There’s systemd-analyze blame which will show the time it takes each service to start up. I’ve copied the top 10 here, which incidentally are also all >1 second start-up times.

 systemd-analyze blame
20.771s docker.service
 3.514s dev-sdb3.device
 2.459s systemd-journal-flush.service
 1.880s upower.service
 1.806s ldconfig.service
 1.687s systemd-tmpfiles-setup.service
 1.587s containerd.service
 1.287s systemd-modules-load.service
 1.032s systemd-fsck@dev-disk-by\x2duuid-96EB\x2d4C82.service
 1.028s cups.service

Docker is a clear offender here. dev-sdb3 is also quite slow it seems.

Another command recommended in the wiki is systemd-analyze critical-chain. This will show you the critical chain to boot your system. Again, docker is here clearly a big offender.

 systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

graphical.target @33.660s
└─multi-user.target @33.660s
  └─docker.service @12.888s +20.771s
    └─containerd.service @11.264s +1.587s
      └─network.target @11.236s
        └─wpa_supplicant.service @27.465s +268ms
          └─basic.target @10.366s
            └─dbus-broker.service @9.822s +541ms
              └─dbus.socket @9.793s
                └─sysinit.target @9.759s
                  └─systemd-update-done.service @9.722s +36ms
                    └─systemd-journal-catalog-update.service @9.375s +326ms
                      └─systemd-tmpfiles-setup.service @7.657s +1.687s
                        └─local-fs.target @7.587s
                          └─boot.mount @7.458s +128ms
                            └─systemd-fsck@dev-disk-by\x2duuid-96EB\x2d4C82.service @6.398s +1.032s
                              └─dev-disk-by\x2duuid-96EB\x2d4C82.device @6.397s

But wait, there’s more. systemd-analyze plot > plot.svg will generate an SVG image showing you the entire boot process in time. It’s big, but there are some clear red markers that indicate issues.

At the bottom right you’ll find graphical.target, where we want to end up as quickly as possible. And it’s clear docker is in the way.

Open the SVG in a new window to see more detail.

Fixed it!

So, with docker as a clear offender in slowing down the boot process, let’s fix that.

There are two systemd units: docker.service and docker.socket.

  • docker.service is there to start docker and make sure it is up and running.
  • docker.socket listens on /run/docker.sock (or /var/run/docker.sock through a symlink) and will start docker.service when needed.

I think you know where this is going. docker.socket is disabled by default and docker.service is enabled. Which makes sense, because when you boot your machine you want docker up and running as well. Especially for servers this makes sense.

For my desktop, not so much. I use docker, but not always and I prefer to login and check my email while docker is booting in the background anyway.

The trick thus is to disable docker.service from starting automatically and make sure docker.socket is enabled. That will take docker out of the criticial chain when booting and start docker when I’m logged in and ready to use it.

1$ sudo systemctl disable docker.service
2$ sudo systemctl enable docker.socket

So, what does that look like in systemd-analyze?

 systemd-analyze critical-chain
The time when unit became active or started is printed after the "@" character.
The time the unit took to start is printed after the "+" character.

graphical.target @3.893s
└─multi-user.target @3.893s
  └─cups.service @3.672s +220ms
    └─nss-user-lookup.target @3.763s
 systemd-analyze blame
2.152s systemd-modules-load.service
1.295s dev-sdb3.device
 622ms boot.mount
 385ms NetworkManager.service
 310ms systemd-udev-trigger.service
 280ms udisks2.service
 258ms systemd-remount-fs.service
 220ms cups.service
 203ms user@1000.service
 189ms systemd-tmpfiles-setup.service
 

Open the SVG in a new window to see more detail.

 systemctl status docker.socket
 docker.socket - Docker Socket for the API
     Loaded: loaded (/usr/lib/systemd/system/docker.socket; enabled; preset: disabled)
     Active: active (running) since Thu 2024-02-08 10:38:47 CET; 5min ago
   Triggers:  docker.service
     Listen: /run/docker.sock (Stream)
      Tasks: 0 (limit: 38400)
     Memory: 0B (peak: 516.0K)
        CPU: 1ms
     CGroup: /system.slice/docker.socket

and

 systemctl status docker.service
 docker.service - Docker Application Container Engine
     Loaded: loaded (/usr/lib/systemd/system/docker.service; disabled; preset: disabled)
     Active: active (running) since Thu 2024-02-08 10:39:33 CET; 5min ago
TriggeredBy:  docker.socket
       Docs: https://docs.docker.com
   Main PID: 2522 (dockerd)
      Tasks: 42
     Memory: 222.1M (peak: 235.7M)
        CPU: 797ms
     CGroup: /system.slice/docker.service
             └─2522 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock

Was it worth it?

Before:

Startup finished in 14.729s (firmware) + 6.386s (loader) + 12.761s (kernel) + 33.661s (userspace) = 1min 7.538s graphical.target reached after 33.660s in userspace.

After:

Startup finished in 13.735s (firmware) + 4.074s (loader) + 6.744s (kernel) + 3.893s (userspace) = 28.448s graphical.target reached after 3.893s in userspace.

Total boot time went down from 1m8s to 28s. I cannot explain the difference in kernel boot time, but the userspace savings are significant.

From here I could probably optimize more by compiling a customized kernel or using a different bootloader. Suspend to RAM would be even faster, but that feels like cheating against a hard boot.

Hopefully this will give you some pointers in how to troubleshoot slow boot times on your machine.