Major issues with Xen and Hardy

Mon Dec 8 23:01:29 UTC 2008

Thomas, I actually have been using the official 2.6.24-19-xen ubuntu
kernel for about the last week. Both the un-official and official
kernels are having issues though.

Thanks

On Mon, Dec 8, 2008 at 3:54 PM, Thomas E. Maleshafske
<tmaleshafske at maleshafske.com> wrote:
> Have you tried upgrading the kernel to an offical ubuntu version.  The
> bug that is mentioned in that how to doesn't exist anymore.  I have been
> using xen for awhile now with the kernel out of the Repos without the
> patch and haven't had an issue.  Try that it may fix your problems.
> -
> V/R
> Thomas E. Maleshafske
> tmaleshafske at maleshafske.com
> Helping People Take Control Over Their Computer!
> http://www.skishosting.com
> Instant Message me at anyone of the following
> jabber=gravity1187 at binaryfreedom.info
> aim=gravity1187
> yahoo=gravity_1187
> IRC=gravity1187 on freenode.net
>
>
> On Mon, 2008-12-08 at 15:23 -0700, Eujon Sellers wrote:
>> I hope this is the right area for this. I've been having some major
>> issues running Xen on Ubuntu server 8.04 recently. I previously posted
>> this up on the forums without any responses, but here is what that
>> post consisted of:
>>
>> - Running Ubuntu 8.04 server as the dom0
>> - Currently using the 2.6.24-19-xen from the ubuntu archives
>> - domU in question is also Ubuntu 8.04 built using "xen-create-image"
>>
>> About a week ago, the website that is run on this problem domU
>> wouldn't respond and I couldn't ssh into the domU. I logged into my
>> dom0 and attached myself to the domU console and found this error
>> scrolling over and over:
>> [259458.110263] __find_get_block_slow() failed. block=1360, b_blocknr=32148
>> [259458.110265] b_state=0x00000029, b_size=4096
>> [259458.110269] device blocksize: 4096
>>
>> I tried to figure out what the cause was (a reboot was the only fix),
>> but the only thing I could come up with is that it happened around
>> 3am. As an FYI, this happened when I was running the same kernel from
>> this howto: http://www.howtoforge.com/ubuntu-8.04-server-install-xen-from-ubuntu-repositories.
>> So I never figured out what the cause was and assumed it was a one off
>> thing. Three days later though I ran into a similar situation (still
>> with this other kernel). The domU wouldn't respond so I logged into
>> the dom0 and attached myself to the domU console again. This time I
>> found this error message:
>> [231747.850967]  =======================
>> [231759.665823] BUG: soft lockup - CPU#0 stuck for 11s! [apache2:1782]
>> [231759.665826]
>> [231759.665828] Pid: 1782, comm: apache2 Tainted: G      D (2.6.24-16-xen #2)
>> [231759.665831] EIP: 0061:[<c0327dc7>] EFLAGS: 00000286 CPU: 0
>> [231759.665835] EIP is at _spin_lock+0x7/0x10
>> [231759.665837] EAX: cf802c18 EBX: cf802bd4 ECX: c1671ec0 EDX: 00000000
>> [231759.665840] ESI: c1671ec0 EDI: 00000000 EBP: c0f15ddc ESP: c0f15c4c
>> [231759.665843]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0069
>> [231759.665846] CR0: 8005003b CR2: b7388e80 CR3: 01741000 CR4: 00000660
>> [231759.665849] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
>> [231759.665852] DR6: ffff0ff0 DR7: 00000400
>> [231759.665854]  [<c01a730d>] try_to_free_buffers+0x2d/0x90
>> [231759.665859]  [<c0167c95>] shrink_page_list+0x4b5/0x5f0
>> [231759.665864]  [<c01777e3>] page_check_address+0x1d3/0x3d0
>> [231759.665869]  [<c0166e20>] isolate_lru_pages+0x50/0x1c0
>> [231759.665874]  [<c0167eeb>] shrink_inactive_list+0x11b/0x3b0
>> [231759.665879]  [<c0168224>] shrink_zone+0xa4/0x100
>> [231759.665883]  [<c0168d62>] try_to_free_pages+0x152/0x250
>> [231759.665888]  [<c016305b>] __alloc_pages+0x12b/0x390
>> [231759.665893]  [<c0171739>] handle_mm_fault+0x7f9/0x1330
>> [231759.665899]  [<c0107deb>] local_clock+0x3b/0x80
>> [231759.665903]  [<c010824b>] sched_clock+0x1b/0x60
>> [231759.665908]  [<c03299ac>] do_page_fault+0x3bc/0xee0
>> [231759.665912]  [<c011c6d0>] update_curr+0x70/0x110
>> [231759.665916]  [<c03263d4>] schedule+0x244/0x600
>> [231759.665921]  [<c0175824>] do_mmap_pgoff+0x314/0x330
>> [231759.665925]  [<c0109ef5>] sys_mmap2+0x65/0xd0
>> [231759.665930]  [<c03295f0>] do_page_fault+0x0/0xee0
>> [231759.665934]  [<c0328285>] error_code+0x35/0x40
>> [231759.665940]  =======================
>>
>> Again, looking through the logs I couldn't find anything to set this
>> off, just that it happened between midnight and 4am. So now I think
>> it's possibly the kernel and I update to the 2.6.24-19-xen kernel from
>> the ubuntu repo's. Well, it worked for three days again until this
>> morning I noticed the site was not responding AGAIN. I logged into the
>> console from the dom0 and found this error:
>> [266281.773639] BUG: soft lockup - CPU#0 stuck for 11s! [webalizer:11884]
>>
>> As of today, other domU's are starting to see similar problems. I lost
>> one domU last night to the "CPU#0 stuck for 11s" error, and a
>> different domU locked up this morning with the same issue. In all of
>> these cases there are no entries in any of the logfiles showing a
>> problem that sets these errors off. In addition, the dom0 never seems
>> to have a problem, it's only the domU's that are having issues.
>>
>> I've searched launchpad and found multiple bugs that referenced these
>> various errors, but they are usually open or don't offer a solution.
>> Am I better off trying to build Xen and a Xen enabled kernel from
>> source? One bug report mentioned using the debian
>> linux-image-2.6.26-1-xen-686 kernel package, but in my testing that
>> doesn't work because the domU's won't boot up with it (an error about
>> "XENBUS: Waiting for devices to initialise" holds everything up). I'd
>> really like to figure this out and keep Ubuntu, but the more I look at
>> this the more it looks like I'll be heading back to CentOS.
>>
>> If you've made it this far thanks for reading everything. I appreciate
>> any help/suggestions people can offer...
>>
>
>