[Bug 297058] Re: Consistent repeating [ata1: link is slow to respond, please be patient ]

Rob Jacobson Rob104 at jacoserv.com
Tue Apr 6 19:07:37 UTC 2010


Please forgive this intrusion, but on reading through this entire
thread, I could not but help seeing how little good information must be
out there, to help users interpret exceptions correctly.  And I thought
that perhaps I could offer a little guidance and a few suggestions, that
hopefully will improve your own diagnostic efforts, as well as improve
the issue reporting quality, and thereby possibly improve the general
stability of the kernel and device code.  I am not an expert, but have
read a lot of syslogs, and tried to help a number of users.

An exception is just the report of something that appears unusual, could
be nothing, or could be a symptom of something wrong.  An exception
handler has kicked in, and will try to report as much as it can, and may
also attempt a few actions to resolve the issue, if it appears
warranted.  The reporting is a sequence of lines that start with a line
beginning with "exception", includes various lines with additional
information about the issue, and ends with a line with "EH complete"
(Error Handler is finished).  If there were error flags reported to it,
then a verbose version of those flags will be listed.  If there were
SATA link errors (SErr is non-zero), then they will also be expanded in
the following lines.  A great resource for these is
http://ata.wiki.kernel.org/index.php/Libata_error_messages.

So any exception is analogous to hearing an unusual noise from your car.
Something may be wrong, but you need more data, and possibly an
experienced mechanic to interpret whatever symptoms you have detected.
Some of the messages are exactly what they sound like.  For example,
"link is slow to respond" and "timeout" and "frozen" just mean that a
response did not occur within the normal time frame.  They aren't bugs,
just symptoms, an indication that something may be wrong.  Analogy: your
car unexpectedly feels sluggish, not responding as quickly as usual.

Unfortunately, many of the reports above do not have any errors
reported, only symptoms of 'sluggishness' or a loss of communications.
Something may very well be wrong, but it is not obvious from these
reports, and there are a *lot* of very different causes.  It could be
the device itself (bad media, buggy firmware, too hot, etc), could be
the cabling or connections (bad cable, bad or loose connectors, loose
backplane, faulty power splitter, etc), could be the controller chipset,
could be over-heated chipsets, could be power issues in the device,
could be general power issues, could be a mis-configured device, could
be incompatible hardware, could be a buggy 'driver' module, could even
be bad memory, etc.

A last tip, the single most common (in my experience) issue, and the
easiest to fix, is faulty cables.  If you see the word ICRC and/or
BadCRC within the error handler exception report, then replacing the
cable with a good quality cable will (I believe) fix over 80% of these
exceptions (perhaps over 95%).  I doubt there is a more common reason
for RMA'ing drives wrongly, than drive exceptions that actually were
caused by bad cables.  The next easy fix is check for loose connections
in both the data and power cables and any splitters used, and in any
backplanes used.

I think I only saw one ICRC above, so most of the problems reported
above are probably more complex, but they don't have enough info to
really help unfortunately.  A very occasional report of "frozen" is not
uncommon unfortunately.  However, I have noticed that in general these
kinds of reports have been diminishing, with the more recent kernels.
Developers on both sides (firmware and kernel) are constantly tweaking
the communications between devices and kernel.  Occasionally, a tweak
results in new issues, but then is improved in subsequent releases
(firmware and/or kernel).

-- 
Consistent repeating [ata1: link is slow to respond, please be patient ]
https://bugs.launchpad.net/bugs/297058
You received this bug notification because you are a member of Kernel
Bugs, which is subscribed to linux in ubuntu.




More information about the kernel-bugs mailing list