[Bug 593635] [NEW] HDD freezes caused by ata exception that results in soft resetting of link

Deepak Sarda deepak at antrix.net
Mon Jun 14 11:55:21 UTC 2010


Public bug reported:

Under even moderately heavy disk writes, I am seeing exceptions like the below in my kern.log
-----------------------------------------------
Jun 13 13:33:03 cellar kernel: [66188.434868] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6
Jun 13 13:33:03 cellar kernel: [66188.434874] ata4.01: BMDMA stat 0x46
Jun 13 13:33:03 cellar kernel: [66188.434879] ata4.01: failed command: WRITE DMA EXT
Jun 13 13:33:03 cellar kernel: [66188.434886] ata4.01: cmd 35/00:00:00:94:b2/00:04:13:00:00/f0 tag 0 dma 524288 out
Jun 13 13:33:03 cellar kernel: [66188.434888]          res 51/84:01:ff:95:b2/84:02:13:00:00/f0 Emask 0x30 (host bus error)
Jun 13 13:33:03 cellar kernel: [66188.434892] ata4.01: status: { DRDY ERR }
Jun 13 13:33:03 cellar kernel: [66188.434895] ata4.01: error: { ICRC ABRT }
Jun 13 13:33:03 cellar kernel: [66188.434907] ata4: soft resetting link
Jun 13 13:33:03 cellar kernel: [66188.622000] ata4.01: configured for UDMA/100
Jun 13 13:33:03 cellar kernel: [66188.622013] ata4: EH complete
----------------------------------------------

This is with the latest stable lucid kernel (2.6.32-22-generic
#36-Ubuntu).

I've also tried a mainline kernel (2.6.35-020635rc1) & still get the
same errors except that there's an additional stack trace:

-----------------------------------------------

Jun 14 18:55:40 cellar kernel: [  152.874172] irq 19: nobody cared (try booting with the "irqpoll" option)
Jun 14 18:55:40 cellar kernel: [  152.874182] Pid: 0, comm: swapper Tainted: P            2.6.35-020635rc1-generic #020635rc1
Jun 14 18:55:40 cellar kernel: [  152.874185] Call Trace:
Jun 14 18:55:40 cellar kernel: [  152.874198]  [<c01a58cc>] __report_bad_irq+0x2c/0x90
Jun 14 18:55:40 cellar kernel: [  152.874204]  [<c016fee3>] ? sched_clock_tick+0x73/0xa0
Jun 14 18:55:40 cellar kernel: [  152.874209]  [<c01a5a44>] note_interrupt+0xe4/0x120
Jun 14 18:55:40 cellar kernel: [  152.874214]  [<c0179da0>] ? tick_nohz_update_jiffies+0x60/0x70
Jun 14 18:55:40 cellar kernel: [  152.874219]  [<c01a6364>] handle_fasteoi_irq+0x84/0xe0
Jun 14 18:55:40 cellar kernel: [  152.874224]  [<c0104abf>] handle_irq+0x1f/0x30
Jun 14 18:55:40 cellar kernel: [  152.874230]  [<c05afefb>] do_IRQ+0x4b/0xc0
Jun 14 18:55:40 cellar kernel: [  152.874234]  [<c01032f0>] common_interrupt+0x30/0x40
Jun 14 18:55:40 cellar kernel: [  152.874239]  [<c010a3a7>] ? mwait_idle+0x57/0xa0
Jun 14 18:55:40 cellar kernel: [  152.874243]  [<c010189c>] cpu_idle+0x8c/0xc0
Jun 14 18:55:40 cellar kernel: [  152.874249]  [<c05a4337>] start_secondary+0xf7/0x130
Jun 14 18:55:40 cellar kernel: [  152.874252] handlers:
Jun 14 18:55:40 cellar kernel: [  152.874254] [<c0431060>] (ata_bmdma_interrupt+0x0/0x190)
Jun 14 18:55:40 cellar kernel: [  152.874261] [<c044fb10>] (usb_hcd_irq+0x0/0x90)
Jun 14 18:55:40 cellar kernel: [  152.874268] Disabling IRQ #19
Jun 14 18:56:09 cellar kernel: [  181.856015] ata4: lost interrupt (Status 0x51)
Jun 14 18:56:09 cellar kernel: [  181.856034] ata4.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Jun 14 18:56:09 cellar kernel: [  181.856039] ata4.01: BMDMA stat 0x46, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0, BMDMA stat 0x0
Jun 14 18:56:09 cellar kernel: [  181.856045] ata4.01: failed command: WRITE DMA EXT
Jun 14 18:56:09 cellar kernel: [  181.856053] ata4.01: cmd 35/00:00:00:84:08/00:04:3b:00:00/f0 tag 0 dma 524288 out
Jun 14 18:56:09 cellar kernel: [  181.856054]          res 40/00:00:00:4f:c2/00:00:00:00:00/50 Emask 0x24 (host bus error)
Jun 14 18:56:09 cellar kernel: [  181.856058] ata4.01: status: { DRDY }
Jun 14 18:56:09 cellar kernel: [  181.856072] ata4: soft resetting link
Jun 14 18:56:09 cellar kernel: [  182.160065] ata4.01: configured for UDMA/133
Jun 14 18:56:09 cellar kernel: [  182.160072] ata4.01: device reported invalid CHS sector 0
Jun 14 18:56:09 cellar kernel: [  182.160080] ata4: EH complete
--------------------------------------------------------------------

I've tried booting with "libata.force=noncq" on both kernels (lucid
stable & 2.6.35 mainline) but makes no difference.

I didn't see these errors in Jaunty. I think they started sometime in
Karmic. I upgraded to Lucid in the hopes that the newer release fixed it
but no difference.

I think I've ruled out HDD failure. I get these errors on 2 old (3+
years) Seagate 7200.10 disks as well as a brand new Seagate 7200.12
disk.

There are similar bug reports in launchpad but one difference that I
noticed is that I consistently see the message "failed command: WRITE
DMA EXT" while the other reports fail during a read or some other
command.

I can very reliably reproduce the errors by running a rdiff-backup
'restore' operation from an external USB HDD.

== Steps to reproduce ==
1. Boot into Gnome & login
2. Run 'tail -f /var/log/kern.log' in one terminal window
3. Run 'rdiff-backup --force -r now /media/freeagent/share /share/' in another terminal

Within a few seconds, I can see the errors show up in the kernel logs.

Running a fast torrent download will do the trick too.

Since I can reproduce the problem so easily, I'll be very willing to try
any special kernel builds to help solve this one.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-22-generic 2.6.32-22.36
Regression: Yes
Reproducible: Yes
ProcVersionSignature: Ubuntu 2.6.32-22.36-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-22-generic i686
NonfreeKernelModules: nvidia
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.21.
Architecture: i386
AudioDevicesInUse:
 USER        PID ACCESS COMMAND
 /dev/snd/controlC0:  antrix     1387 F.... pulseaudio
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xf9ffc000 irq 16'
   Mixer name	: 'Realtek ALC662 rev1'
   Components	: 'HDA:10ec0662,15650000,00100101'
   Controls      : 36
   Simple ctrls  : 19
Date: Mon Jun 14 19:23:00 2010
HibernationDevice: RESUME=UUID=c6dab799-13a8-443e-b2a3-4b93f3bbb42e
IwConfig:
 lo        no wireless extensions.
 
 eth0      no wireless extensions.
MachineType: BIOSTAR Group G31-M7 TE
ProcCmdLine: BOOT_IMAGE=/vmlinuz-2.6.32-22-generic root=UUID=466535ad-0b59-4fd0-b18b-ba486150f91a ro quiet splash
ProcEnviron:
 PATH=(custom, user)
 LANG=en_SG.utf8
 SHELL=/bin/bash
RelatedPackageVersions: linux-firmware 1.34
RfKill:
 
SourcePackage: linux
dmi.bios.date: 04/10/2009
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 080014
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: G31-M7 TE
dmi.board.vendor: BIOSTAR Group
dmi.chassis.asset.tag: None
dmi.chassis.type: 3
dmi.chassis.vendor: BIOSTAR Group
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr080014:bd04/10/2009:svnBIOSTARGroup:pnG31-M7TE:pvr:rvnBIOSTARGroup:rnG31-M7TE:rvr:cvnBIOSTARGroup:ct3:cvr:
dmi.product.name: G31-M7 TE
dmi.sys.vendor: BIOSTAR Group

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: New


** Tags: apport-bug filesystem i386 lucid needs-upstream-testing regression-release

-- 
HDD freezes caused by ata exception that results in soft resetting of link
https://bugs.launchpad.net/bugs/593635
You received this bug notification because you are a member of Kernel
Bugs, which is subscribed to linux in ubuntu.




More information about the kernel-bugs mailing list