Thoughts on Xen filesystem configuration with regard to backups

I’ve been experiencing some issues with regard to backing up my Xen user domains. As I investigated the issues I once again found the lack of documentation to be frustrating, especially about such a fundamentally important issue as backups. There is simply no documentation on how to setup enterprise level backups for Xen. So here on some thoughts on what I see as the state of things, and where they could go form here to get better.

There are 2 main ways to configure Xen storage. One is to use the “phy” driver, often combined with the use of LVM to create partitions. The second is to use the “tap:aio” driver which is often combined with image files. I have preferred the former so far. But in either case the important aspect is that neither one relies on a filesystem mounted by the host domain.

Using LVM, the ideal backup strategy would be to take a snapshot of a running domain and then mount that read-only and back it up. LVM has a VFS-lock patch which enables it to create a snapshot of a consistent filesystem. However, this only works for mounted filesystems. And one of the cardinal rules when dealing with Xen, or any other virtualization software (safe OS-level virtualization), or in general, is never mount a filesystem twice.

So LVM cannot create a consistent snapshot of a filesystem that is not mounted, as is the case for the Xen Dom0. So the only solution is to get the Xen DomU’s filesystem(s) into a consistent state prior to taking the snapshot. I believe there is only 1 ways to do this and that is to:

  1. Shutdown the user domain
  2. Take a snapshot
  3. Startup the user domain

Of course, that is dreadful looking to most system administrators because it means interrupting service. Not to mention there is a real performance hit when restarting as all the various caches will be flushed.

Of course, people talk of other schemes and here are some of them and why they won’t work.

  1. xm save - By using xm save the state of the machine is allegedly perfectly preserved. This will actually be useful if one is either taking an image of the lvm partition, or using file images in the first place. But it will not work for a backup strategy requiring an LVM snapshot as xm save does not guarantee (to the best of my knowledge) that the filesystem will be consistent. I’d also point out that this operation is also fairly slow, not restart slow but not quick in my experience.

  2. xm pause - xm pause and then taking a snapshot is like hitting the power switch on the user domain. The filesystem is by no means guaranteed to be consistent. It is fast though, if it had reliability it would be great.

  3. xm sysrq s - This will send the sync signal to a running domU. Unfortunately, a person has no way of knowing how long that sync will take so there is no way to pause the domain at the exact moment necessary. I think this method, which is often combined with xm pause is wishful thinking.

  4. xfs_freeze - For those running XFS xfs_freeze seems like a good idea, until one realized that it only works on a mounted filesystem and that that the VFS-lock patch should handle an XFS filesystem in any case.

Now one method that might work acceptable but still gives me pause is the following:

  1. take a snapshot, possibly pausing and unpausing the domain to prevent writing. I have no idea is lvm snapshots can handle data writes in the midst of a snapshot of an unmounted volume.

  2. repair the snapshot:

    • ext3: e2fsck -f -y /dev/vg/snapshot
    • xfs: xfs_repair /dev/vg/snapshot
    • jfs: jfs_fsck -f /dev/vg/snapshot
  3. mount the snapshot (read-only presumably) and then proceed with the backup.

This is a somewhat risky option but might be better for some situations compared to restarting.

There is talk of using a network filesystem or a cluster filesystem to alleviate some of these issues but I’m not convinced that any of the suggestions I’ve read actually propose a better solution. It has also been mentioned that one can run backup software from within the DomU. However, from a management perspective it seems easier to just provide backup to all the domains than to configure each one individually.

What is really needed is a way to signal the user domain to flush its disk(s) and then pause. I’m not a kernel expert so I’m not sure the best way for this to be done but I can imagine that it could utilize the VFS-lock mechanism or perhaps just a sync. Another idea would be to actually trigger a kernel suspend (this is not what happens with xm suspend. Although slower than a pause this could enable the backup of a running machine via an LVM snapshot.

Perhaps this will come someday and until then I’ll keep searching for a better solution than what I’ve documented here.

Here are some threads I browsed in researching these issues:

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Thoughts on Xen fs backup

I may be missing something here, but ...
I backup my domUs with TSM nightly by installing the TSM client as
on any other box I back up. There, the machine is backed up now.
I don't care about the dom0 (except for /etc/xen/mydomU*) as it holds
nothing precious to me and my domUs will run on any Xen box where I
restore the TSM backup. I believe that being able to run the virtual
machine on any Xen box is the point.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.
Creative Commons License Except where otherwise noted, content on this site is licensed under a Creative Commons by-nc-sa 3.0 License