Debugging tools/approach for GPU hangs?

Matt Zimmerman mdz at
Fri Sep 4 01:02:45 BST 2009

With more of the graphics stack moving into the kernel, we are starting to
see more bugs of this type:

Where the GPU is hung, but the system is otherwise still responsive.  This
is annoyingly difficult to debug, with the primary technique being to ssh
into the system from a nearby one (because the console is useless).

I think it would be a worthwhile investment to work on improved tools and
methods for debugging this scenario, including:

 * Detecting (programatically) when this situation occurs and capturing
   an apport problem report, as described in

   Bryce (and Jesse Barnes at Intel) mentioned that the kernel is now
   supposed to log an error message when this happens, but I've never seen
   evidence of that happening.

 * Providing some means for the user to get the system into a debuggable
   state, i.e. where they can see something on the screen.  Maybe it's
   possible to re-POST the video device to see if it gets back to a sane

 * Documenting all of the above so that it can be easily executed by
   reasonably technical users

 - mdz

