Debugging tools/approach for GPU hangs?

Matt Zimmerman mdz at canonical.com
Fri Sep 4 01:02:45 BST 2009


With more of the graphics stack moving into the kernel, we are starting to
see more bugs of this type:

http://launchpad.net/bugs/359392
http://launchpad.net/bugs/388357
http://launchpad.net/bugs/424055

Where the GPU is hung, but the system is otherwise still responsive.  This
is annoyingly difficult to debug, with the primary technique being to ssh
into the system from a nearby one (because the console is useless).

I think it would be a worthwhile investment to work on improved tools and
methods for debugging this scenario, including:

 * Detecting (programatically) when this situation occurs and capturing
   an apport problem report, as described in
   http://mdzlog.alcor.net/2009/06/17/collecting-debug-information-when-your-gpu-hangs/

   Bryce (and Jesse Barnes at Intel) mentioned that the kernel is now
   supposed to log an error message when this happens, but I've never seen
   evidence of that happening.

 * Providing some means for the user to get the system into a debuggable
   state, i.e. where they can see something on the screen.  Maybe it's
   possible to re-POST the video device to see if it gets back to a sane
   state?

 * Documenting all of the above so that it can be easily executed by
   reasonably technical users

-- 
 - mdz



More information about the ubuntu-devel mailing list