[Pull Request][Lucid] Fix Xorg hangs with some i915 chipsets

Stefan Bader stefan.bader at canonical.com
Wed Jun 22 14:50:34 UTC 2011


On 17.06.2011 18:38, Seth Forshee wrote:
> This is a rather large series of non-trivial backports to fix a Xorg
> hang caused by the i915 driver. Patches 5 and 6 are the key patches that
> implement an LRU-based eviction algorithm; patches 1-4 are
> prerequisites, patch 7 fixes some of the clean-up, and patch 8 adds a
> periodic flush to retire active buffers.
> 
> This is a pretty severe bug for those affected by it, but it's also a
> huge amount of change for a stable kernel, so let me know if it's beyond
> hope for an SRU.
> 
> SRU Justification:
> 
>   Impact: When the GPU aperture becomes sufficiently fragmented two or
>   more mmapped buffers in active use can repeatedly push each other out
>   of the aperture. This results in a sort of livelock situation, with
>   Xorg appearing to be hung with high CPU utilization.
> 
>   Fix: Backport of a series of patches that convert the i915 eviction
>   algorithm from a best-fit approach to one based on evicting the least
>   recently used objects and a patch that adds a periodic flush requests
>   to retire active buffers when no client is active.
> 
>   Testcase: Without these patches this scenario can be triggered
>   ocassionally when visiting certain web pages that utilize Flash or
>   viewing certain images in Firefox, as described on bugs #599017 and
>   #330460. Testing the same pages/images with these patches does not
>   induce the hang.
> 
> 
> The following changes since commit 9cc333d02cb4620561e91a63ed86a2159ec32638:
> 
>   UBUNTU: Ubuntu-2.6.32-33.67 (2011-06-16 11:26:04 -0500)
> 
> are available in the git repository at:
>   git://kernel.ubuntu.com/sforshee/ubuntu-lucid.git lp599017
> 
> Chris Wilson (5):
>       drm/i915: Move the eviction logic to its own file.
>       drm/i915: Implement fair lru eviction across both rings.  (v2)
>       drm/i915: Maintain LRU order of inactive objects upon access by CPU (v2)
>       drm/i915/evict: Ensure we completely cleanup on failure
>       drm/i915: Periodically flush the active lists and requests
> 
> Daniel Vetter (3):
>       drm_mm: extract check_free_mm_node
>       drm: implement helper functions for scanning lru list
>       drm/i915: prepare for fair lru eviction
> 
>  drivers/gpu/drm/drm_mm.c              |  236 +++++++++++++++++++++++++-----
>  drivers/gpu/drm/i915/Makefile         |    1 +
>  drivers/gpu/drm/i915/i915_drv.h       |   13 ++
>  drivers/gpu/drm/i915/i915_gem.c       |  228 ++++-------------------------
>  drivers/gpu/drm/i915/i915_gem_evict.c |  253 +++++++++++++++++++++++++++++++++
>  include/drm/drm_mm.h                  |   15 ++-
>  6 files changed, 510 insertions(+), 236 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_gem_evict.c
> 

I also tend to say the bug sounds serious enough to make an exception to the
rule of limiting the change.

I would probably add those to the 2.6.32+drm33 tree. Though there are two
details which I think should be resolved before:
1. In the bug report (#599017) the last request for testing points to a place
which only shows 6 patches. Does the kernel provided actually contain all of them?
2. It would be good to get some feedback about the test kernel in the bug.




More information about the kernel-team mailing list