No subject


Tue Nov 3 10:50:01 GMT 2009


and a new kernel (2.6.31).  At that point nvidia.ko is built against the
*old* kernel (2.6.28).  Fine, a nvidia.ko was successfully built so
installation completes without error.  xorg.conf is updated and the
system is ready to run nvidia.  Or so it thinks.

Now the user reboots.

During boot, dpkg notes that it needs to build a new nvidia.ko for
2.6.31 and dutifully gets to work.  Meanwhile, since X is being started
early on in the boot cycle, it in fact starts up before dkms has
finished building the new nvidia.ko.  X starts booting nvidia but since
there is not yet an nvidia.ko for the current kernel it exits with an
error.

I'm going to see if I can reproduce this synthetically, but meanwhile
does this theory make sense?  If so, is there a dkms/upstart trick we
could do to work around the issue in Karmic?  And for Lucid what would
the "right" solution be?


Further notes on the other nvidia issues below...

On Wed, Nov 04, 2009 at 02:26:56PM -0800, Bryce Harrington wrote:
> I've been looking into some problems people have been reporting
> upgrading to Karmic with -nvidia installed.
> 
> One thing I've noticed is aside from whatever issue is occuring with
> nvidia, there are bugs elsewhere which are compounding the problems and
> leading to some poor user experiences.  A common scenario occurs if for
> whatever reason the -nvidia kernel module fails to build in DKMS:
> 
> 438398 - If DKMS fails to build the kernel module, the package upgrade
> does not kick out.  It shows package upgrade as successful.  So this
> leads directly to...

In reviewing instances of nvidia failures, this particular scenario
appears to pop up less frequently in practice than I had initially 
assumed, and mostly due to unusual corner cases like not having patch
installed, upgrading to Karmic directly from Hardy, etc.  It seems most
of these specific issues got fixed during development, just that the
bug reports didn't get closed.  The important point though is that these
failures ended up worse than they should have been, due to the following
bugs...

> 451305 - Jockey misses that the driver failed to build, and so is not
> letting users know about the potential problem.  It goes ahead and
> updates xorg.conf as if the driver was there.  X tries to obey the
> configuration settings, but of course they won't work, so it exits on
> startup with an error message.  *Normally* bulletproof-X would kick in
> at this point, display the error to the user, and give them some tools
> to diagnose and/or debug the situation.  Unfortunately...

Elsewhere in this thread several fixes/workarounds to this issue were
identified, which should greatly lessen the severity of these kinds of
error situations.
 
> 474806 - The new gdm no longer supports the FailsafeXServer option, so
> the diagnostic session no longer can be triggered to come up.  Instead,
> gdm tries several times, then gives up, but then...

This is fixed; we now no longer rely on gdm for doing the failsafe but
instead catch it with a simple upstart job and kick into failsafe-x mode.
Thanks Steve!

> 441638 - The gdm upstart job notices gdm has failed and so restarts it.
> X of course continues to fail, gdm tries a few times and continues to
> fail, repeat ad infinitum, and the user is just left looking at a
> flashing screen.  Ick.

Now that we have an upstart job handling this case, the blinking
situation will no longer happen.  This fix is SRU'd and uploaded to
ubuntu-proposed, and will go live before long.

Since this particular situation crops up right now mostly with nvidia,
people installing via the release livecd should be okay - that boots
with open source drivers, and when they choose to install nvidia it will
download that and (I assume) also update xorg to the version that
contains this fix.  So by the time they reboot they'll have the fix.
Steve, can you confirm?

> The above appears to be a pretty common scenario that we're getting a
> rash of bug reports about.  It's hard to be certain because many of the
> bug reports are only including information about the failed boot, not on
> the failed build.  So I'm not sure if it is just one reason why the
> build fails, or several.  However if we can solve the above bugs it
> should give much better visibility into things.
> 
> 
> Btw, workaround for anyone experiencing this issue is to purge your
> nvidia (and fglrx) packages, remove /etc/X11/xorg.conf, and reinstall
> nvidia (or fglrx).  It appears that in most of the bug reports this gets
> the system functioning again.  Doing a full reinstall of Ubuntu rather
> than an upgrade also appears to work around the issues.

It looks like simply doing a dpkg-reconfigure on the nvidia package is
sufficient to work around the issue, no need for reinstalling it
(although that'll work too).

Bryce



More information about the ubuntu-devel mailing list