Major issues with Xen and Hardy

Eujon Sellers eujon at introspeed.com
Mon Dec 8 22:23:41 UTC 2008


I hope this is the right area for this. I've been having some major
issues running Xen on Ubuntu server 8.04 recently. I previously posted
this up on the forums without any responses, but here is what that
post consisted of:

- Running Ubuntu 8.04 server as the dom0
- Currently using the 2.6.24-19-xen from the ubuntu archives
- domU in question is also Ubuntu 8.04 built using "xen-create-image"

About a week ago, the website that is run on this problem domU
wouldn't respond and I couldn't ssh into the domU. I logged into my
dom0 and attached myself to the domU console and found this error
scrolling over and over:
[259458.110263] __find_get_block_slow() failed. block=1360, b_blocknr=32148
[259458.110265] b_state=0x00000029, b_size=4096
[259458.110269] device blocksize: 4096

I tried to figure out what the cause was (a reboot was the only fix),
but the only thing I could come up with is that it happened around
3am. As an FYI, this happened when I was running the same kernel from
this howto: http://www.howtoforge.com/ubuntu-8.04-server-install-xen-from-ubuntu-repositories.
So I never figured out what the cause was and assumed it was a one off
thing. Three days later though I ran into a similar situation (still
with this other kernel). The domU wouldn't respond so I logged into
the dom0 and attached myself to the domU console again. This time I
found this error message:
[231747.850967]  =======================
[231759.665823] BUG: soft lockup - CPU#0 stuck for 11s! [apache2:1782]
[231759.665826]
[231759.665828] Pid: 1782, comm: apache2 Tainted: G      D (2.6.24-16-xen #2)
[231759.665831] EIP: 0061:[<c0327dc7>] EFLAGS: 00000286 CPU: 0
[231759.665835] EIP is at _spin_lock+0x7/0x10
[231759.665837] EAX: cf802c18 EBX: cf802bd4 ECX: c1671ec0 EDX: 00000000
[231759.665840] ESI: c1671ec0 EDI: 00000000 EBP: c0f15ddc ESP: c0f15c4c
[231759.665843]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
[231759.665846] CR0: 8005003b CR2: b7388e80 CR3: 01741000 CR4: 00000660
[231759.665849] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[231759.665852] DR6: ffff0ff0 DR7: 00000400
[231759.665854]  [<c01a730d>] try_to_free_buffers+0x2d/0x90
[231759.665859]  [<c0167c95>] shrink_page_list+0x4b5/0x5f0
[231759.665864]  [<c01777e3>] page_check_address+0x1d3/0x3d0
[231759.665869]  [<c0166e20>] isolate_lru_pages+0x50/0x1c0
[231759.665874]  [<c0167eeb>] shrink_inactive_list+0x11b/0x3b0
[231759.665879]  [<c0168224>] shrink_zone+0xa4/0x100
[231759.665883]  [<c0168d62>] try_to_free_pages+0x152/0x250
[231759.665888]  [<c016305b>] __alloc_pages+0x12b/0x390
[231759.665893]  [<c0171739>] handle_mm_fault+0x7f9/0x1330
[231759.665899]  [<c0107deb>] local_clock+0x3b/0x80
[231759.665903]  [<c010824b>] sched_clock+0x1b/0x60
[231759.665908]  [<c03299ac>] do_page_fault+0x3bc/0xee0
[231759.665912]  [<c011c6d0>] update_curr+0x70/0x110
[231759.665916]  [<c03263d4>] schedule+0x244/0x600
[231759.665921]  [<c0175824>] do_mmap_pgoff+0x314/0x330
[231759.665925]  [<c0109ef5>] sys_mmap2+0x65/0xd0
[231759.665930]  [<c03295f0>] do_page_fault+0x0/0xee0
[231759.665934]  [<c0328285>] error_code+0x35/0x40
[231759.665940]  =======================

Again, looking through the logs I couldn't find anything to set this
off, just that it happened between midnight and 4am. So now I think
it's possibly the kernel and I update to the 2.6.24-19-xen kernel from
the ubuntu repo's. Well, it worked for three days again until this
morning I noticed the site was not responding AGAIN. I logged into the
console from the dom0 and found this error:
[266281.773639] BUG: soft lockup - CPU#0 stuck for 11s! [webalizer:11884]

As of today, other domU's are starting to see similar problems. I lost
one domU last night to the "CPU#0 stuck for 11s" error, and a
different domU locked up this morning with the same issue. In all of
these cases there are no entries in any of the logfiles showing a
problem that sets these errors off. In addition, the dom0 never seems
to have a problem, it's only the domU's that are having issues.

I've searched launchpad and found multiple bugs that referenced these
various errors, but they are usually open or don't offer a solution.
Am I better off trying to build Xen and a Xen enabled kernel from
source? One bug report mentioned using the debian
linux-image-2.6.26-1-xen-686 kernel package, but in my testing that
doesn't work because the domU's won't boot up with it (an error about
"XENBUS: Waiting for devices to initialise" holds everything up). I'd
really like to figure this out and keep Ubuntu, but the more I look at
this the more it looks like I'll be heading back to CentOS.

If you've made it this far thanks for reading everything. I appreciate
any help/suggestions people can offer...




More information about the ubuntu-server mailing list