Challenges to using Xen in a production environment

I’ve been using Xen for a little while now and while there are many aspects I like there are a few that are troubling. I’ve been wanting to write up some tutorials on using Xen but before that can happen I need to feel comfortable recommending Xen as a virtualization solution. At the moment I’m not sure I can do that, partially because of the issues I’ve encountered but mostly because those issues do not seem to be acknowledged or discussed in any meaningful way. That is perhaps the most troubling thing of all.

Entropy

This was not the first problem I’ve encountered but so far it is the most serious. The user domains in Xen seem to be unable to generate enough entropy to perform operations that require cryptography. Now, given the way Linux generates entropy it has occurred to me that this issue might be prevalent in other virtualization solutions as well (though not OS-level virtualization which would use the host kernel for entropy).

To demonstrate this issue one needs only issue the command on both the host domain (dom0) and the user domain (domU): cat /proc/sys/kernel/random/entropy_avail. On one of my test servers the dom0 entropy is about 3500 and the domU is less than 300. The the user domains will generate entropy they do so at a slow rate, far slower than can support the cryptographic needs of many services including ssh and ssl web servers.

I was able to find many emails discussing this problem:

The only solution offered is to use rngd (in the rng-tools package) to collect data from the dom0 and feed it to the domU. Of course, how to do this exactly is not detailed. It also strikes me as cumbersome and potentially a security risk itself (though not as much as the suggestion that each domU be given an ssh key to connect to the dom0 to collect entropy). I’m frankly a little surprised that such an important issue has been overlooked thus far. So many important applications depend on cryptography that this would seem a no-brainer to fix. In a virtualized setup, the host is going to have an abundance of entropy because it presumably won’t be running services like secure web and email and guests will have a lack of entropy due to being virtualized. The solution is simple, automate the process of supplying entropy from a host to a guest. Some suggestions on how to do this are outlined in this email: http://lkml.org/lkml/2006/5/12/103.

Of course in the short term I may have to come up with another solution (assuming I decide to continue using Xen). If I do I will publish it.

Disks, Filesystems, and Backup

There is far more discussion about the means to store domU filesystems but I have yet to see something truly comprehensive about the matter. My particular concern is about creating a sensible backup strategy though I also want to make sure that performance and security are also addressed. This is probably the first major decision to be made when starting to use Xen and there is no agreement on the best way to proceed or even a good list of considerations about the various options.

The 2 most common storage mechanisms are to use either LVM or image files. Though there are other methods as well, and from my experience their poor documentation should be an indicator about the lack of maturity of any of them. I was excited to read about the qcow image format for example only to find it unready for serious use. Using LVM is a nice solution because the user domains are actually given a partition so there’s no need to worry about the performance impacts of nested filesystems. It’s also reasonably easy to manage, including creation and resizing of the filesystem. Presumably backup could be done by creating an LVN snapshot and then backing up the snapshot with any number of tools.

However, when it comes to backup of a user domain one would want to be able to do a full restore from backup to a fresh partition and run it without any additional steps. So while this is certainly possible with LVM it does require testing. One reason to use image files is because everything is contained in 1 file. Of course, backing up that 1 file may be a challenge as well. I’d be interested in testing if rdiff-backup might do a good job with large files. This simplifies the restore process somewhat if you can just restore the image file and know that it will run.

Another issue that I’ve been investigating with regard to Xen storage is the possibility of using encryption to fully encrypt user domains. One problem with this is the act of encryption itself and what tools to use. Should the user domains, whether file or LVM based, utilize block device encryption (perhaps dmcrypt) or filesystem encryption (perhaps ecryptfs)? Should the encryption be outside the user domain or within. I would tend toward outside but a case could be made for inside as well, and there is always the chance that the environment could be mixed. And then from these choices how does it affect the backup and restore strategy. What particularly needs to be considered is that encrypted data must stay encrypted in backups (assuming all backup data isn’t encrypted which is what I’ve often practiced).

These are all issues that are extremely important to a server administrator but I can’t find any good recommendations. With new technology that is often the case but Xen has been around for a few years now and it would be nice to see a better set of guides laid out for various server setups. I had been excited about Xen and I have found much that I’ve been satisfied with. But whether these issues, or others I come across, will be deal-breakers for using Xen in a production environment remains to be seen.

Creative Commons License Except where otherwise noted, content on this site is licensed under a Creative Commons by-nc-sa 3.0 License