[Bug 59695] Re: High frequency of load/unload cycles on some hard disks may shorten lifetime
Yves-Eric Martin
59695 at bugs.launchpad.net
Sun Mar 18 01:46:07 UTC 2012
@JoonasSaarinen
It is normal only if done not too frequently. Because otherwise, it can
kill a drive. And that is not just theoretical, it can happen much more
quickly than you may think: I had a drive with that issue, but very
quiet so I did not notice. The result: the drive died in a catastrophic
failure after only 5 months of operations (and over 800,000
Load_Cycle_Count!).
I don't think destroying itself in 5 months qualifies as "normal drive
operations."
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to acpi-support in Ubuntu.
https://bugs.launchpad.net/bugs/59695
Title:
High frequency of load/unload cycles on some hard disks may shorten
lifetime
Status in acpi-support:
Invalid
Status in The Dell Project:
Fix Released
Status in “acpi-support” package in Ubuntu:
Fix Released
Status in “linux-meta” package in Ubuntu:
Invalid
Status in “pm-utils” package in Ubuntu:
Fix Released
Status in “acpi-support” source package in Hardy:
Fix Released
Status in “linux-meta” source package in Hardy:
Invalid
Status in “pm-utils” source package in Hardy:
Fix Released
Status in “acpi-support” source package in Intrepid:
Fix Released
Status in “linux-meta” source package in Intrepid:
Invalid
Status in “pm-utils” source package in Intrepid:
Fix Released
Status in “acpi-support” source package in Jaunty:
Fix Released
Status in “linux-meta” source package in Jaunty:
Invalid
Status in “pm-utils” source package in Jaunty:
Fix Released
Status in “acpi-support” package in Baltix:
Fix Released
Status in “acpi-support” package in Debian:
Fix Released
Status in “pm-utils” package in Fedora:
Invalid
Status in “laptop-mode-tools” package in Mandriva:
Unknown
Status in Suse Linux:
Fix Released
Bug description:
The kernel wiki gathers info about drives with too aggressive power saving defaults. A script called "storage-fixup" is also available.
https://ata.wiki.kernel.org/index.php/Known_issues#Drives_which_perform_frequent_head_unloads_under_Linux
This is not a support forum. Please do not use it as such (even though it has been used as such already).
You can scan through the bug for links to the Ubuntu forums where
many, many different questions have been asked, answered, and re-
answered. The temporary workaround is just below.
See https://wiki.ubuntu.com/PowerManagement for an overview about what
is involved and for a remedy.
SRU justification: current behavior may lead to premature disk failure
in laptops due to excessive unnecessary drive parking. Fix will
disable disk cycling by default when on AC power, by correcting an
error in the hdparm logic of acpi-support.
For jaunty, this issue is addressed in acpi-support 0.115.
TEST CASE:
1. With acpi-support 0.109 (hardy) or 0.114 (intrepid) installed and laptop-mode *not* enabled in either /etc/default/laptop-mode or /etc/default/acpi-support, monitor the load cycle count of your hard drive by running 'sudo smartctl -a /dev/sda|grep Load_Cycle_Count' over an interval of several minutes, and observe that it is incrementing. (If it does not increment, your hard drive's manufacturer defaults are sane and you are not affected by this problem.)
2. install acpi-support from hardy-proposed or intrepid-proposed
3. while connected to AC power, monitor 'sudo smartctl -a /dev/sda|grep Load_Cycle_Count' again to confirm that the number is no longer incrementing
4. (assuming that the system is a laptop:) disconnect the system from AC power, and confirm that the number is incrementing again
5. enable laptop mode by setting ENABLE_LAPTOP_MODE=true in /etc/default/laptop-mode and running 'sudo /etc/init.d/laptop-mode restart'
6. reconnect the system to AC power and confirm that the Load_Cycle_Count stops incrementing.
7. suspend and resume the system and confirm that the Load_Cycle_Count is still not incrementing.
REGRESSION POTENTIAL:
As this patch causes "hdparm -B 128" and "hdparm -B 254" to be invoked
automatically on systems where it was not being run before, there is
some risk that this change will have a measurable impact on the disk
throughput, power consumption, and temperature of some hard drives.
Nevertheless, it is believed that these APM power settings are the
sensible default settings for the vast majority of hard drives and
that the current behavior poses a significant risk to the longevity of
hard drives used in a wide range of laptop models, so this update
should only be blocked if it results in confirmed hardware damage that
can be expected to apply to a similar range of configurations.
Following is a summary of the issue:
It is confirmed that some systems are seeing an unusually high number of load/unload cycles on their hard disks, as evidenced by smartctl.
It was originally surmised that this was related to laptop-mode being
enabled, but this especially affects systems where laptop-mode is
disabled. In fact, aggressive APM is not a bad idea while a system is
not on AC, as that system is much more likely to encounter a physical
impact.
This is due to disk APM settings that let the heads park or disk spin
down after an idle period that is shorter than the regular disk access
patterns of the OS.
Then, the heads are only parked for a very short period of time and
almost imediately loaded again. Making impact protection much
ineffective and wearing out the drive.
It can happen when the disk asumes aggressive APM settings (like many
laptop disks) and the OS does not take care to set the APM settings
accordingly to its current disk access pattern.
This problem has been confirmed in Ubuntu as well as in other
distributions and on MacOS X and Windows.
Symptoms of this bug are:
* Frequent HD clicks -- more than one per 3 minutes while idle, louder than the typical access sounds. Often more than twice per minute. On some disks, the click is very quiet
* Rapidly Increasing Load_Cycle_Count as displayed in the final number in "sudo smartctl -a /dev/hda | grep Load_Cycle_Count" (where /dev/hda is replaced with your own hard disk device)
The problem is only present due to the existence of *all four* of the following factors:
* Hardware is set (default or otherwise) to aggressive power management, causing heads to park. (default behaviour of many drives and often the only user available type of power management)
* Disk is touched often, causing heads to unpark. (default behaviour of many distributions)
* Drives are spec'd to a limited number of these cycles. (600,000 is the most common, although some may be spec'd higher or lower).
* The OS not setting disk APM variables according to current disk access pattern.
Reasonable Limits / Criteria for a fix:
* There should be fewer than ~15 load cycles per hour, except during heavy usage while on battery.
* This provides a life expectancy of over four years, which is reasonable for a hard disk.
Temporary Workaround:
* Follow the above link.
Some hardware with this issue:
WD1200VE -- http://www.wdc.com/en/library/portable/2879-001121.pdf -- This aggressive parking is a feature of this disk, but that feature relies on behaviour that allows for significant amounts of (truly) idle time without the disk being touched. Notice the "Load/unload cycles" of 600,000.
Example Load_Cycle_Counts:
* Thinkpad Z60m/Hitachi HTS541080G9SA00 with well over 7000 load cycles in only 100 hours. That's >70 per hour.
* Gateway MT6451/Western Digital WD1200VE with 164762 load cycles in 3747 hours (156 days) of uptime. That's ~43 per hour -- except that the system was patched during the initial third of its life, which puts it at ~63/hour since Gutsy was installed (and wasn't patched, as I had done with feisty).
* Dell Inspiron 8600/Hitachi HTS721010G9AT00 with 200 to 280 load cycles per hour
Please see for yourself how often your drive is load cycling:
smartctl -d ata -a /dev/sda
(This command is for an SATA drive; you'll need to install the smartmontools package first.)
You can get the average per hour by the following division:
Load_Cycle_Count / Power_On_Hours
Old workaround for 7.10 (not working in 8.04): https://bugs.launchpad.net/ubuntu/+source/acpi-support/+bug/59695/comments/14
A more extensive description of the workaround: http://ubuntuforums.org/showthread.php?t=591503
You may need to use '254', or a bit lower, as opposed to '255'. If HD temperature gets high, you may want to set it all the way "down" to 200 or so. ~1 click every 2.5-3 minutes is fine.
Note: Some disks are unresponsive to having their APM changed by hdparm, and therefore the workaround doesn't work. It would be a good idea, in such cases, to disable APM in the BIOS if possible.
See also http://paul.luon.net/journal/hacking/BrokenHDDs.html for a
rather dramatic account of the effects the current default values may
have.
To manage notifications about this bug go to:
https://bugs.launchpad.net/acpi-support/+bug/59695/+subscriptions
More information about the foundations-bugs
mailing list