SATA errors, Ubuntu Hardy and kernel 2.6.24
Paul Boddie
paul at boddie.org.uk
Mon Dec 7 21:22:36 UTC 2009
Hello,
I'm trying to track down an apparent change in behaviour between the kernels
supplied in the most recent packages for Ubuntu Hardy, specifically
linux-image-generic 2.6.24.25.27 and linux-image-generic 2.6.24.26.28,
concerning SATA error reporting.
I recently observed the following kind of error in my syslog:
Dec 5 19:03:24 jeremy kernel: [ 274.853653] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Dec 5 19:03:24 jeremy kernel: [ 274.853660] ata3.00: BMDMA stat 0x24
Dec 5 19:03:24 jeremy kernel: [ 274.853666] ata3.00: cmd c8/00:18:97:c1:5c/00:00:00:00:00/e1 tag 0 dma 12288 in
Dec 5 19:03:24 jeremy kernel: [ 274.853667] res 51/40:00:97:c1:5c/00:00:00:00:00/e1 Emask 0x9 (media error)
Dec 5 19:03:24 jeremy kernel: [ 274.853670] ata3.00: status: { DRDY ERR }
Dec 5 19:03:24 jeremy kernel: [ 274.853672] ata3.00: error: { UNC }
Dec 5 19:03:24 jeremy kernel: [ 274.868536] ata3.00: configured for UDMA/133
Dec 5 19:03:24 jeremy kernel: [ 274.868550] sd 2:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Dec 5 19:03:24 jeremy kernel: [ 274.868554] sd 2:0:0:0: [sda] Sense Key : Medium Error [current] [descriptor]
Dec 5 19:03:24 jeremy kernel: [ 274.868558] Descriptor sense data with sense descriptors (in hex):
Dec 5 19:03:24 jeremy kernel: [ 274.868560] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Dec 5 19:03:24 jeremy kernel: [ 274.868568] 01 5c c1 97
Dec 5 19:03:24 jeremy kernel: [ 274.868571] sd 2:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocate failed
Dec 5 19:03:24 jeremy kernel: [ 274.868577] end_request: I/O error, dev sda, sector 22856087
Dec 5 19:03:24 jeremy kernel: [ 274.868593] ata3: EH complete
Dec 5 19:03:24 jeremy kernel: [ 274.878168] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
Dec 5 19:03:24 jeremy kernel: [ 274.889984] sd 2:0:0:0: [sda] Write Protect is off
Dec 5 19:03:24 jeremy kernel: [ 274.889989] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
Dec 5 19:03:24 jeremy kernel: [ 274.902842] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
These errors started to appear on 3rd December, mostly at or around boot or
restart time, becoming unavoidable on 5th and 6th December. As a
consequence, I have made backups and have been running short smartctl tests
to see what might be occurring; some of these do report the "LBA of first
error" and I have consulted the smartmontools "how to" about such errors.
According to the statistics generated by smartd and information about
SMART [*], there may well be a problem, specifically with the following
attributes with non-zero "raw values":
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 186 051 Pre-fail Always - 774
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 3
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 5
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 26
According to "popular wisdom", this means that my disk is failing, yet
since updating to the latter kernel package mentioned above, I don't get
the above syslog messages any more. What I'm trying to work out is whether
the kernel was reporting a real problem or whether this was some bug that
was fixed in the most recent kernel. I tried to look for changes in the
package diffs, but only managed to find what looked like a bunch of
exported symbol definitions.
Can anyone point me to any release notes which might explain these changes,
perhaps explaining also what might have caused the errors? (I've also
perused a number of bug trackers, including Launchpad, but found nothing
conclusive apart from people either claiming that some boot options "fix"
their problem or that their hard disk actually failed.)
Paul
P.S. The hard disk concerned is a 500GB Western Digital Caviar Green,
bought in January 2009, operating with an "Intel Corporation 82801EB (ICH5)
SATA Controller (rev 02)".
[*] http://en.wikipedia.org/wiki/Self-Monitoring%2C_Analysis%2C_and_Reporting_Technology
More information about the kernel-team
mailing list