Debugging tools/approach for GPU hangs?
Matt Zimmerman
mdz at canonical.com
Fri Sep 4 01:02:45 BST 2009
With more of the graphics stack moving into the kernel, we are starting to
see more bugs of this type:
http://launchpad.net/bugs/359392
http://launchpad.net/bugs/388357
http://launchpad.net/bugs/424055
Where the GPU is hung, but the system is otherwise still responsive. This
is annoyingly difficult to debug, with the primary technique being to ssh
into the system from a nearby one (because the console is useless).
I think it would be a worthwhile investment to work on improved tools and
methods for debugging this scenario, including:
* Detecting (programatically) when this situation occurs and capturing
an apport problem report, as described in
http://mdzlog.alcor.net/2009/06/17/collecting-debug-information-when-your-gpu-hangs/
Bryce (and Jesse Barnes at Intel) mentioned that the kernel is now
supposed to log an error message when this happens, but I've never seen
evidence of that happening.
* Providing some means for the user to get the system into a debuggable
state, i.e. where they can see something on the screen. Maybe it's
possible to re-POST the video device to see if it gets back to a sane
state?
* Documenting all of the above so that it can be easily executed by
reasonably technical users
--
- mdz
More information about the ubuntu-devel
mailing list