SRU request for LP#210637

Stefan Bader stefan.bader at canonical.com
Thu May 8 17:21:18 UTC 2008


Colin Ian King wrote:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/210637
> 
> 
> SRU Justification
> -----------------
> 
> Impact: Regression for sata_nv on Hardy - system reboots or causes XFS
> filesystem shutdown due to errors.
> 
> Fix description: Patch to check for NV_ADMA_STAT_CMD_COMPLETE to handle
> a completed command rather than checking for NV_ADMA_STAT_DONE.
> 
> Testcase: Turn off drive caching using hdparm -W 0 /dev/sda and exercise
> the XFS filesystem on the drive (a HDS7250SASUN500G).
> 
> commit a1fe782414b7122d4c0501d3a0988b7302fa586f
> Author: Robert Hancock <hancockr at shaw.ca>
> Date:   Tue Jan 29 19:53:19 2008 -0600
> 
>     sata_nv: fix for completion handling
>     
>     This patch is based on an original patch from Kuan Luo of NVIDIA,
>     posted under subject "fixed a bug of adma in rhel4u5 with
> HDS7250SASUN500G".
>     His description follows. I've reworked it a bit to avoid some
> unnecessary
>     repeated checks but it should be functionally identical.
>     
>     "The patch is to solve the error message "ata1: CPB flags CMD err,
>     flags=0x11" when testing HDS7250SASUN500G in rhel4u5.
>     I tested this hd in 2.6.24-rc7 which needed to remove the mask in
>     blacklist to run the ncq and the same error also showed up.
>     
>     I traced the  bug and found that the interrupt finished a command
> (for
>     example, tag=0) when the driver got that adma status is
>     NV_ADMA_STAT_DONE  and  cpb->resp_flags is NV_CPB_RESP_DONE.
>     However, For this hd, the drive maybe didn't clear bit 0 at this
> moment.
>     It meaned the hardware  had not completely finished the command.
>     If at the same time  the driver freed the command(tag 0) and sended
>     another command (tag 0), the error happened.
>     
>     The notifier register is 32-bit register containing notifier value.
>     Value is bit vector containing one bit per tag number (0-31) in
>     corresponding bit positions (bit 0 is for tag 0, etc). When bit is
> set
>     then ADMA indicates that command with corresponding tag number
> completed
>     execution.
>     
>     So i added the check notifier code. Sometimes i saw that the
> notifier
>     reg set some bits  , but the adma status set
> NV_ADMA_STAT_CMD_COMPLETE
>     ,not NV_ADMA_STAT_DONE. So i added the NV_ADMA_STAT_CMD_COMPLETE
> check
>     code."
>     
>     Signed-off-by: Robert Hancock <hancockr at shaw.ca>
>     Signed-off-by: Jeff Garzik <jeff at garzik.org>
> 
> 
> 
> diff attached.
> 
Even with the knowledge that this is a reversed patch, I still am a bit
reluctant to ack since the changes are not easy to grasp. If nobody else
has a better idea, I would suggest you do it as I have done for other
stuff that is more complicated and create a PPA kernel with that change
for regression testing.

Stefan




More information about the kernel-team mailing list