RAID drop-out ERC/TLER was: DANGER!!! Problems with 10.04 installer (RAID devices *will* get corrupted)
dhoworth at mrc-lmb.cam.ac.uk
Tue Apr 27 15:46:02 UTC 2010
CLIFFORD ILKAY wrote:
>> On 04/23/2010 12:17 PM, CLIFFORD ILKAY wrote:
>>> The issue is documented here
>>> <http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery> and
>>> elsewhere. Western Digital isn't the only manufacturer with this issue
>>> (and a solution).
> However, TLER *might* have something to do with "we'll separate the
> drives into consumer and pro lines and charge more for the pro lines
> because we can". Most consumers don't care about this issue and are
> unaffected by it. Those who do care about it grumble, pay more, and move on.
Thanks very much for that link. You pointed me to a real issue that's
very relevant since I'm just building a new machine. So I've been doing
some reading and here's a summary in case it helps someone.
For anybody that hasn't followed the link, the issue is that RAIDs can
sometimes suffer drive drop-outs because the drive's error recovery
efforts take longer than the RAID controller allows. The RAID controller
then fails the drive.
There is a feature in the ATA-8 interface, in the SMART Command
Transport (SCT), that allows the drive to be setup to abort recovery
attempts sooner, so the RAID controller can have a go. This is called
Error Recovery Control (ERC).
Most manufacturers implement this capability. WD did, though calling it
TLER, but apparently they've now removed the feature so their 'consumer'
drives are problematic in RAID arrays. So much for the I in RAID :(
The version of smartctl in SVN is able to issue the ERC commands but it
won't be formally released until V5.40. See
<http://www.csc.liv.ac.uk/~greg/projects/erc/> for details.
By pure luck, I think I'm OK. I've bought some WD RE4 drives, which
apparently have ERC enabled and I bought some Seagate 7200.11, which can
have ERC enabled via smartctl.
More information about the ubuntu-users