Debugging tools/approach for GPU hangs?

Bryce Harrington bryce at canonical.com
Fri Sep 4 10:25:45 BST 2009


On Thu, Sep 03, 2009 at 05:02:45PM -0700, Matt Zimmerman wrote:
> With more of the graphics stack moving into the kernel, we are starting to
> see more bugs of this type:
> 
> http://launchpad.net/bugs/359392
> http://launchpad.net/bugs/388357
> http://launchpad.net/bugs/424055
>
> Where the GPU is hung, but the system is otherwise still responsive.  This
> is annoyingly difficult to debug, with the primary technique being to ssh
> into the system from a nearby one (because the console is useless).

Actually there have been GPU hang bugs for a long time.  It's just that
they wasn't a way to debug them until recently.

> I think it would be a worthwhile investment to work on improved tools and
> methods for debugging this scenario, including:
> 
>  * Detecting (programatically) when this situation occurs and capturing
>    an apport problem report, as described in
>    http://mdzlog.alcor.net/2009/06/17/collecting-debug-information-when-your-gpu-hangs/
> 
>    Bryce (and Jesse Barnes at Intel) mentioned that the kernel is now
>    supposed to log an error message when this happens, but I've never seen
>    evidence of that happening.

I'm cc'ing jbarnes here.  Last I heard this was implemented upstream but
hadn't yet filtered down.

>  * Providing some means for the user to get the system into a debuggable
>    state, i.e. where they can see something on the screen.  Maybe it's
>    possible to re-POST the video device to see if it gets back to a sane
>    state?
> 
>  * Documenting all of the above so that it can be easily executed by
>    reasonably technical users

Bryce



More information about the ubuntu-devel mailing list