qcow2 snapshotting: does it influence future IO performance?

Thu Mar 31 21:03:20 UTC 2011

On Thursday 31 March 2011 15:15:35 jurgen.depicker at let.be wrote:
> Hi Alvin.
>
> I read on your site that you stopped work on a project because LVM
> snapshots potentially decreased IO performance, and I also saw that you
> worked on a perl project to make copies of VMs, so therefore I address my
> question directly to you.
>
> Your work reminded me of something I read some weeks ago on qcow2 on
> http://people.gnome.org/~markmc/qcow-image-format.html and which raised my
> brows at the time:
> "Snapshots - "real snapshots" - are represented in the original image
> itself. Each snapshot is a read-only record of the image a past instant.
> The original image remains writable and as modifications are made to it, a
> copy of the original data is made for any snapshots referring to it. "
>
> I tried to find more info about this, but didn't yet.  So here's my
> question: does anybody know whether, having made snapshots of a VM during
> several stages of its life (clean install, important service installed,
> ...), affects IO performance of any writes made after snapshotting, as one
> could suspect from my quote above?

There is a difference between qcow2 snapshots and lvm snapshots. You 
probably want qcow2 snapshots in a typical testing scenario where you 
start from a good working virtual machine, snapshot it, break it, and 
then return to your snapshotted situation.
I didn't do that yet. Last time I tried, the snapshotting itself hung 
libvirt, but that was a long time ago. Things might be better now. I 
have no idea about the resulting performance. They are both 
'copy-on-write', so they might suffer from the same problems. I don't 
know if that's true, but it would be interesting to do some research here.

The performance drop caused by LVM snapshots is the reason why I 
abandoned the project. I originally took the idea from our ZFS setup at 
work. We're using /a lot/ of ZFS snapshots. It's a success story. If I 
would ever take that away, the users would be at my door with torches 
and pitchforks. It's the single reason to still have dealings with 
Oracle support. So, I started using LVM snapshots at regular intervals. 
The script I wrote was for easier management and auto-mounting of the 
snapshots. It ended in disaster and halted production on the Ubuntu 
servers. The reason is that ZFS snapshots and LVM snapshots are too 
different. The ZFS faq states that you need about 1GB of RAM for every 
10.000 filesystems. That includes snapshots, so that gives you an idea 
of the possibilities (and the different approach to filesystems). Even a 
single LVM  snapshot can bring an Ubuntu server on its knees and can 
lead to a kernel panic given the right (in a manner of speaking) 
circumstances.

It seemed a good idea at the time, but it will set fire to your daemons, 
hang your kernel and send your users screaming.

I did file a little bug report[1]. Originally, I thought it was 
qemu-img, but qemu-img is just a good I/O stress test. You can also use 
rsync or simply cp to hang the kernel.

What I currently do in the above testing scenario is copying the 
(offline) virtual machine, but I might try the qcow2 snapshots one day. 
Currently this is not possible because my test vm's run on logical 
volumes on RAID0. It speeds up the virtual machine itself, but the 
copying process is heavy and slow.

Maybe btrfs will be the answer in the near future?

[1] https://bugs.launchpad.net/bugs/712392