[ubuntu-x] (Update) Re: -nvidia upgrade issues
jldugger at ubuntu.com
Sat Nov 7 19:33:50 GMT 2009
When thinking about concurrency, it's important to assume everything
that can go wrong will. If you can't point out how this scenario is
_prevented_ then there's not much point trying to duplicate it, we
know it's _possible_. I've been neglecting upstart, but surely there's
a way to force X startup to depend on a DKMS build? You could perhaps
define a small task that executes rapidly in the common case (no build
required), but blocks X startup until the build is complete?
On Fri, Nov 6, 2009 at 6:21 PM, Bryce Harrington <bryce at canonical.com> wrote:
> The two worst bugs are fixed, and the other two are at least understood
> now but I could use a bit more advice. It seems there is a weird race
> condition with DKMS/upstart/nvidia which has cropped up because due to
> faster boot, that looks tricky to get sorted, so feedback from people
> with experience in DKMS/upstart matters would be helpful.
> From what I understand, when doing an upgrade it installs both nvidia
> and a new kernel (2.6.31). At that point nvidia.ko is built against the
> *old* kernel (2.6.28). Fine, a nvidia.ko was successfully built so
> installation completes without error. xorg.conf is updated and the
> system is ready to run nvidia. Or so it thinks.
> Now the user reboots.
> During boot, dpkg notes that it needs to build a new nvidia.ko for
> 2.6.31 and dutifully gets to work. Meanwhile, since X is being started
> early on in the boot cycle, it in fact starts up before dkms has
> finished building the new nvidia.ko. X starts booting nvidia but since
> there is not yet an nvidia.ko for the current kernel it exits with an
> I'm going to see if I can reproduce this synthetically, but meanwhile
> does this theory make sense? If so, is there a dkms/upstart trick we
> could do to work around the issue in Karmic? And for Lucid what would
> the "right" solution be?
> Further notes on the other nvidia issues below...
> On Wed, Nov 04, 2009 at 02:26:56PM -0800, Bryce Harrington wrote:
>> I've been looking into some problems people have been reporting
>> upgrading to Karmic with -nvidia installed.
>> One thing I've noticed is aside from whatever issue is occuring with
>> nvidia, there are bugs elsewhere which are compounding the problems and
>> leading to some poor user experiences. A common scenario occurs if for
>> whatever reason the -nvidia kernel module fails to build in DKMS:
>> 438398 - If DKMS fails to build the kernel module, the package upgrade
>> does not kick out. It shows package upgrade as successful. So this
>> leads directly to...
> In reviewing instances of nvidia failures, this particular scenario
> appears to pop up less frequently in practice than I had initially
> assumed, and mostly due to unusual corner cases like not having patch
> installed, upgrading to Karmic directly from Hardy, etc. It seems most
> of these specific issues got fixed during development, just that the
> bug reports didn't get closed. The important point though is that these
> failures ended up worse than they should have been, due to the following
>> 451305 - Jockey misses that the driver failed to build, and so is not
>> letting users know about the potential problem. It goes ahead and
>> updates xorg.conf as if the driver was there. X tries to obey the
>> configuration settings, but of course they won't work, so it exits on
>> startup with an error message. *Normally* bulletproof-X would kick in
>> at this point, display the error to the user, and give them some tools
>> to diagnose and/or debug the situation. Unfortunately...
> Elsewhere in this thread several fixes/workarounds to this issue were
> identified, which should greatly lessen the severity of these kinds of
> error situations.
>> 474806 - The new gdm no longer supports the FailsafeXServer option, so
>> the diagnostic session no longer can be triggered to come up. Instead,
>> gdm tries several times, then gives up, but then...
> This is fixed; we now no longer rely on gdm for doing the failsafe but
> instead catch it with a simple upstart job and kick into failsafe-x mode.
> Thanks Steve!
>> 441638 - The gdm upstart job notices gdm has failed and so restarts it.
>> X of course continues to fail, gdm tries a few times and continues to
>> fail, repeat ad infinitum, and the user is just left looking at a
>> flashing screen. Ick.
> Now that we have an upstart job handling this case, the blinking
> situation will no longer happen. This fix is SRU'd and uploaded to
> ubuntu-proposed, and will go live before long.
> Since this particular situation crops up right now mostly with nvidia,
> people installing via the release livecd should be okay - that boots
> with open source drivers, and when they choose to install nvidia it will
> download that and (I assume) also update xorg to the version that
> contains this fix. So by the time they reboot they'll have the fix.
> Steve, can you confirm?
>> The above appears to be a pretty common scenario that we're getting a
>> rash of bug reports about. It's hard to be certain because many of the
>> bug reports are only including information about the failed boot, not on
>> the failed build. So I'm not sure if it is just one reason why the
>> build fails, or several. However if we can solve the above bugs it
>> should give much better visibility into things.
>> Btw, workaround for anyone experiencing this issue is to purge your
>> nvidia (and fglrx) packages, remove /etc/X11/xorg.conf, and reinstall
>> nvidia (or fglrx). It appears that in most of the bug reports this gets
>> the system functioning again. Doing a full reinstall of Ubuntu rather
>> than an upgrade also appears to work around the issues.
> It looks like simply doing a dpkg-reconfigure on the nvidia package is
> sufficient to work around the issue, no need for reinstalling it
> (although that'll work too).
More information about the ubuntu-devel