jurgen.depicker at let.be jurgen.depicker at let.be
Thu Mar 10 13:02:39 UTC 2011

Hi all.

Presently, 7 VMs (4 windows, 3 ubuntu) are using the same nfs pool.  The 
machine serving that NFS pool (VLET1) has high load as soon as there is 
some continuous disk activity:

top - 13:25:30 up 23:16,  5 users,  load average: 4.51, 4.41, 3.98
Tasks: 191 total,   1 running, 189 sleeping,   0 stopped,   1 zombie
Cpu0  :  3.0%us,  2.6%sy,  0.0%ni, 44.6%id, 49.8%wa,  0.0%hi,  0.0%si, 
Cpu1  :  4.2%us,  2.9%sy,  0.0%ni, 54.2%id, 38.1%wa,  0.0%hi,  0.6%si, 
Cpu2  :  3.6%us,  2.3%sy,  0.0%ni, 55.4%id, 38.7%wa,  0.0%hi,  0.0%si, 
Cpu3  :  2.9%us,  4.2%sy,  0.0%ni, 56.4%id, 36.5%wa,  0.0%hi,  0.0%si, 

It's a quad-core Xeon, running SW Raid on sataII disks:
cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] 
[raid4] [raid10]
md0 : active raid1 sdb1[1] sda1[0]
      966796224 blocks [2/2] [UU]

When the VM client's disk activity is low, the load on VLET1 is very 
top - 13:53:30 up 23:44,  5 users,  load average: 0.11, 0.81, 1.83
Tasks: 191 total,   1 running, 189 sleeping,   0 stopped,   1 zombie
Cpu0  :  1.0%us,  2.0%sy,  0.0%ni, 92.1%id,  4.6%wa,  0.0%hi,  0.3%si, 
Cpu1  :  2.2%us,  5.3%sy,  0.0%ni, 90.7%id,  1.9%wa,  0.0%hi,  0.0%si, 
Cpu2  :  4.2%us,  3.6%sy,  0.0%ni, 87.7%id,  4.5%wa,  0.0%hi,  0.0%si, 
Cpu3  :  3.2%us,  3.2%sy,  0.0%ni, 93.3%id,  0.3%wa,  0.0%hi,  0.0%si, 

What can I do to reduce the load?  Now, sometimes a VM refuses to come up 
(windows blue screen) except when I launch it on the same host (VLET1) 
hosting the nfs share.  After booting the VM, I can live migrate it to 
another VM host, eg VLET2.
I did spot an error in /var/log/messages at boot failure of the windows VM 
when starting it on VLET2:
Mar 10 11:06:46 VLET2 libvirtd: 11:06:46.137: warning : 
qemudParsePCIDeviceStrs:1422 : Unexpected exit status '1', qemu probably 
failedbut I'm not sure this is related.

This is a test config; our company's mail server will soon be running in 
this cluster of KVM hosts (VLET1 and VLET2 will be joined by VLET3 next 
week).  Since the Domino mail server is very disk intensive, I'm a bit 
worried now.  I would rather not run it on local disks, sionce this makes 
live migration impossible.  I'll need to decide where to put the storage 
pool, and using which protocol (NFS or iSCSI).

Which brings me to my last question: I was wondering whether it would be 
better to use iSCSI instead of NFS?  I started with this, but couldn't get 
a pool defined through virt-manager (it always showed as 100% full, even 
when I created a completely new iSCSI target/lun).  According to e.g. 
http://communities.vmware.com/thread/150775 it doesn't seem to make much 
difference, whether I use iSCSI or NFS, performance-wise.  Anyhow, I don't 
grasp how KVM running on different hosts, connected to the same iSCSI LUN, 
can work?  It corrupts data, having the same LUN mounted on different 
hosts...  I obviously understood something wrong about iSCSI...

Regards, Jürgen
