[ubuntu-x] (Update) Re: -nvidia upgrade issues

Sat Nov 7 12:18:10 GMT 2009

On Saturday 07 Nov 2009 01:21:11 Bryce Harrington wrote:
> The two worst bugs are fixed, and the other two are at least understood
> now but I could use a bit more advice.  It seems there is a weird race
> condition with DKMS/upstart/nvidia which has cropped up because due to
> faster boot, that looks tricky to get sorted, so feedback from people
> with experience in DKMS/upstart matters would be helpful.
> 
> From what I understand, when doing an upgrade it installs both nvidia
> and a new kernel (2.6.31).  At that point nvidia.ko is built against the
> *old* kernel (2.6.28).  Fine, a nvidia.ko was successfully built so
> installation completes without error.  xorg.conf is updated and the
> system is ready to run nvidia.  Or so it thinks.
> 
> Now the user reboots.
> 
> During boot, dpkg notes that it needs to build a new nvidia.ko for
> 2.6.31 and dutifully gets to work.  Meanwhile, since X is being started
> early on in the boot cycle, it in fact starts up before dkms has
> finished building the new nvidia.ko.  X starts booting nvidia but since
> there is not yet an nvidia.ko for the current kernel it exits with an
> error.
> 
> I'm going to see if I can reproduce this synthetically, but meanwhile
> does this theory make sense?  If so, is there a dkms/upstart trick we
> could do to work around the issue in Karmic?  And for Lucid what would
> the "right" solution be?
> 

As far as I know, if the new kernel is installed after the nvidia package then 
/etc/init.d/dkms_autoinstaller should kick in (and build the module for the 
new kernel) and things should go well.

If, however, nvidia is installed (or updated) after installing the new kernel, 
the kernel module will be built only for the kernel in use.

For this reason I think it makes sense to make sure that, when we install 
nvidia (or any other driver which relies on DKMS), the module is built for the 
current kernel (e.g. 2.6.28) and for the most recent one (e.g. 2.6.31). If the 
current kernel and the most recent one are the same, then nothing changes and 
we build only 1 module.

In my opinion, a clean implementation of this solution would involve doing 
this in one file i.e. in the really handy template which DKMS provides in 
/usr/lib/dkms/common.postinst and source it from the postinst of all the dkms-
based packages (as nvidia already does). This way - as opposed to rewriting 
code in the postinst script of each dkms package - we can make sure that all 
dkms packages are compliant with this new behaviour and reduce efforts in the 
process.

You can see how nvidia uses the dkms script here:
http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/karmic/nvidia-graphics-
drivers-180/karmic/annotate/head:/debian/nvidia-185-kernel-source.postinst

@Mario
Would you accept a patch for the DKMS script to comply with this new 
behaviour?

Better ideas are always welcome.

> > 438398 - If DKMS fails to build the kernel module, the package upgrade
> > does not kick out.  It shows package upgrade as successful.  So this
> > leads directly to...
> ...
> 
> > 451305 - Jockey misses that the driver failed to build, and so is not
> > letting users know about the potential problem.  It goes ahead and
> > updates xorg.conf as if the driver was there.  X tries to obey the
> > configuration settings, but of course they won't work, so it exits on
> > startup with an error message.  *Normally* bulletproof-X would kick in
> > at this point, display the error to the user, and give them some tools
> > to diagnose and/or debug the situation.  Unfortunately...
> 

I think I can fix this by making Jockey test the existence of the target 
kernel module in 
/var/lib/dkms/$DRIVER_NAME/$KERNEL_NAME-$ARCH/module/$KERNEL_MODULE.ko

e.g. /var/lib/dkms/nvidia/kernel-2.6.31-14-generic-pae-i686/module/nvidia.ko

This can be done either in each handler (e.g. nvidia, fglrx, etc.) or in a 
more generic way (maybe in the KernelModuleHandler class?) so that each 
handler can benefit from this check.

@Martin
any ideas on this?

Regards,

-- 
Alberto Milone
Sustaining Engineer (system)
Foundations Team
Canonical OEM Services