Modular Application Updates: Libvirt and QEMU
Dmitrii Shcherbakov
dmitrii.shcherbakov at canonical.com
Mon Mar 27 19:13:43 UTC 2017
Hi everybody,
TL;DR: Putting libvirt and QEMU into the same snap removes an ability
to update them independently and only use new QEMU binaries after VM
shutdown. The update process is not graceful as all processes are
terminated (SIGTERM) or killed (SIGKILL) by snapd if termination was
not successful - this will result in a file system corruption for VMs
due to caches not being dropped. Using 2 separate snaps does not solve
the problem.
----
I gave an example of libvirt and qemu but this problem is very generic
if you think more about it. Both QEMU and Libvirt are quite complex
and use many Linux kernel mechanisms, therefore, they are a good
example of something complicated when it comes to snapping.
Libvirt/qemu context:
1 libvirt has many drivers for domain creation, one of them being QEMU;
2 libvirt communicates with QEMU via a unix socket. QEMU creates that
socket upon startup and talks with anybody over QEMU Machine Protocol
(you can kill libvirt and use that text protocol yourself via nc
utility - nothing prevents you from doing that);
3 QEMU instances are daemonized so a given qemu process is not a child
of libvirt - pid 1 is its parent - yet another reason to stay alive if
libvirtd is dead;
4 It is not mandatory to have a running libvirtd process for QEMU operation;
5 Libvirt may use cgroups to constrain qemu processes
(https://libvirt.org/cgroups.html#systemdLayout). A single pid can
only belong to one cgroup in a given cgroupv1 hierarchy;
6 QEMU binary and shared object updates done by a package manager (via
mv) do not require QEMU processes to be killed;
7 If a QEMU process is terminated via SIGTERM or SIGKILL, the guest
kernel page cache and buffer cache will not be dropped which will
highly likely cause a file system corruption.
How a systemd unit of a combined libvirt & qemu snap looks like:
snap.libvirt.libvirt-bin.service - Service for snap application
libvirt.libvirt-bin
Loaded: loaded
(/etc/systemd/system/snap.libvirt.libvirt-bin.service; enabled; vendor
preset: enabled)
Active: active (running) since Sun 2017-03-26 03:29:06 MSK; 1 day 16h ago
Main PID: 17128 (rundaemon)
Tasks: 23 (limit: 4915)
Memory: 56.1M
CPU: 16min 1.435s
CGroup: /system.slice/snap.libvirt.libvirt-bin.service
├─17128 /bin/sh /snap/libvirt/x1/bin/rundaemon
sbin/libvirtd /snap/libvirt/current/bin /snap/libvirt/current/usr/bin
├─17155 /snap/libvirt/x1/sbin/libvirtd
└─17357 /snap/libvirt/current/bin/qemu-system-x86_64 -name
guest=ubuntu-xenial,debug-threads=on -S -object
secret,id=masterKey0,format=raw,file=/var/snap/libvirt/current/lib/libvirt/qemu/domain-1-ubuntu-xenial/master-key
-----------
In the snapd code, this is how updates are implemented with regards to
process lifetime:
https://paste.ubuntu.com/24262077/
The idea with any 'classic' package management system (for debs, rpms
etc.) is as follows:
1 Updates move new files over the old ones. That is, shared objects
and binaries are unlinked but not overwritten - if there is still a
process that has a file open (or mmaped which requires a file to be
open) an old inode and the related data on a file system is kept until
the reference count is zero;
2 Running programs can use old binaries and shared objects which they
have open until restart (new 'dlopen's or 'open's before restart will,
of course, use the new files);
3 The old and the new files reside on the same file system (a package
may have files on multiple file systems but for each individual old
file/new file pairs the file system remains the same).
With snaps this is completely different:
1 A new squashfs and an old squash fs are obviously different file
systems - hence inodes refer to different file systems;
2 All processes are killed during an update unconditionally and the
new file system is used to run new processes;
3 Some libraries are taken from the core snap's file system which
remains the same (but may change as the core snap may have been
updated before while a particular snap used an old version of it).
-----------
It is hardly possible to separate QEMU and Libvirt into different
snaps so that QEMU processes are not in the same cgroup used by
systemd to kill all unit-related processes.
Even if I hacked my way to do this by some sort of an executor process
on the QEMU snap side which libvirt would run, it still wouldn't be
the same: all qemu processes would be in the same cgroup and would be
killed on QEMU snap updates (which would be better than on a combined
snap updates but still not good enough).
The bottom line is that packaging these two applications as snaps
results in a serious change of application behavior. Other
applications are potentially affected (lxd comes to mind).
Any feedback/ideas with regards to the above? It doesn't look right to
force a certain application behavior due to a packaging system change
(in this case - VM downtime and fs corruption).
Best Regards,
Dmitrii Shcherbakov
Field Software Engineer
IRC (freenode): Dmitrii-Sh
More information about the Snapcraft
mailing list