SRU request for LP#210637
stefan.bader at canonical.com
Thu May 8 17:21:18 UTC 2008
Colin Ian King wrote:
> SRU Justification
> Impact: Regression for sata_nv on Hardy - system reboots or causes XFS
> filesystem shutdown due to errors.
> Fix description: Patch to check for NV_ADMA_STAT_CMD_COMPLETE to handle
> a completed command rather than checking for NV_ADMA_STAT_DONE.
> Testcase: Turn off drive caching using hdparm -W 0 /dev/sda and exercise
> the XFS filesystem on the drive (a HDS7250SASUN500G).
> commit a1fe782414b7122d4c0501d3a0988b7302fa586f
> Author: Robert Hancock <hancockr at shaw.ca>
> Date: Tue Jan 29 19:53:19 2008 -0600
> sata_nv: fix for completion handling
> This patch is based on an original patch from Kuan Luo of NVIDIA,
> posted under subject "fixed a bug of adma in rhel4u5 with
> His description follows. I've reworked it a bit to avoid some
> repeated checks but it should be functionally identical.
> "The patch is to solve the error message "ata1: CPB flags CMD err,
> flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
> I tested this hd in 2.6.24-rc7 which needed to remove the mask in
> blacklist to run the ncq and the same error also showed up.
> I traced the bug and found that the interrupt finished a command
> example, tag=0) when the driver got that adma status is
> NV_ADMA_STAT_DONE and cpb->resp_flags is NV_CPB_RESP_DONE.
> However, For this hd, the drive maybe didn't clear bit 0 at this
> It meaned the hardware had not completely finished the command.
> If at the same time the driver freed the command(tag 0) and sended
> another command (tag 0), the error happened.
> The notifier register is 32-bit register containing notifier value.
> Value is bit vector containing one bit per tag number (0-31) in
> corresponding bit positions (bit 0 is for tag 0, etc). When bit is
> then ADMA indicates that command with corresponding tag number
> So i added the check notifier code. Sometimes i saw that the
> reg set some bits , but the adma status set
> ,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE
> Signed-off-by: Robert Hancock <hancockr at shaw.ca>
> Signed-off-by: Jeff Garzik <jeff at garzik.org>
> diff attached.
Even with the knowledge that this is a reversed patch, I still am a bit
reluctant to ack since the changes are not easy to grasp. If nobody else
has a better idea, I would suggest you do it as I have done for other
stuff that is more complicated and create a PPA kernel with that change
for regression testing.
More information about the kernel-team