Recovery recipe for QCOW2 files which were damaged during snapshot creation

QCOW2 is a very popular format for virtual machine volumes which is used widely by service providers, corporates and individuals. It’s very convenient because implements thin provisioning, supports internal snapshots. Volumes are just regular files which can be kept in local filesystems or remote NFS storages. These features have made it the de-facto standard for use with KVM hypervisor which is also used very widely.

QCOW2 is a pretty reliable format, but from time to time disks in QCOW2 format may become corrupted. From our experience, the problem happens frequently when a user tries to snapshot a virtual machine disk which is under high IO load or when whole underlying storage is under high IO load. Sometimes it leads to broken volumes which cannot be attached to VM and VM cannot start with them.

So, it simply happens. Maybe it will be fixed in future Qemu versions and the article will be unuseful, but there are a lot of different versions of Qemu are deployed around the world and the problem mentioned may happen with VMs under their management. In this article, we demonstrate how to recover the problem.

Preparation

Let’s suppose you have got the mentioned situation. You see the message: “Could not read snapshots: File too large” for your QCOW2 volume when VM tries to boot. To solve it you need qemu-img of version 1.7.2. We were lucky to find the workaround fast but the guy which spent a time to find the solution probably spent a lot of time before finding it.

You have to get Qemu 1.7.2 and build it. If you would like to avoid building it, just get statically linked executable from here. You also can read whole thread connected with the problem.

Don’t install qemu-img 1.7.2 instead of your system qemu-img. You only need it once for recovery.

Recovery

To recover the volume, you need to use qemu-img convert command. In our case, you just convert from QCOW2 to RAW with qemu-img 1.7.2 and back from RAW to QCOW2 with system qemu-img:

$ wget -O ./qemu-img-172-static https://bitworks.software/assets/bin/qemu-img-172-static
$ ./qemu-img-172-static convert -f qcow2 -O raw broken-image.qcow2 converted-image.raw
$ qemu-img convert -f raw -O qcow2 converted-image.raw image.qcow2

Don’t forget to make backup for your broken image before doing conversion operations.

Conclusion

If you don’t need VM or volumes snapshots don’t use QCOW2, better to use RAW volumes – either in files or LVM2. Don’t expect you will be able to recover QCOW2 volumes with the method described above for all cases, so configure storage backups and VM internal backups to be ready to tackle disaster recovery without extra stress.

If you like this post and find it useful, please, share it with friends.