00:1f.0 ISA bridge: Intel Corporation 82801IR (ICH9R) LPC Interface Controller (rev 02)
Subsystem: Giga-byte Technology Device 5001
Flags: bus master, medium devsel, latency 0
Capabilities: <access denied>
00:1f.2 SATA controller: Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port SATA AHCI Controller (rev 02) (prog-if 01 [AHCI 1.0])
Subsystem: Giga-byte Technology Device b005
Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 45
I/O ports at e600 [size=8]
I/O ports at e700 [size=4]
I/O ports at e800 [size=8]
I/O ports at e900 [size=4]
I/O ports at ea00 [size=32]
Memory at fa106000 (32-bit, non-prefetchable) [size=2K]
Capabilities: <access denied>
Kernel driver in use: ahci
00:1f.3 SMBus: Intel Corporation 82801I (ICH9 Family) SMBus Controller (rev 02)
Subsystem: Giga-byte Technology Device 5001
Flags: medium devsel, IRQ 18
Memory at fa107000 (64-bit, non-prefetchable) [size=256]
I/O ports at 0500 [size=32]
Kernel driver in use: i801_smbus
01:00.0 VGA compatible controller: ATI Technologies Inc RV620 LE [Radeon HD 3450] (prog-if 00 [VGA controller])
Subsystem: Hightech Information System Ltd. Device 2252
Flags: bus master, fast devsel, latency 0, IRQ 48
Memory at e0000000 (64-bit, prefetchable) [size=256M]
Memory at f5000000 (64-bit, non-prefetchable) [size=64K]
I/O ports at a000 [size=256]
[virtual] Expansion ROM at f4000000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: fglrx_pci
01:00.1 Audio device: ATI Technologies Inc RV620 Audio device [Radeon HD 34xx Series]
Subsystem: Hightech Information System Ltd. Device aa28
Flags: bus master, fast devsel, latency 0, IRQ 47
Memory at f5010000 (64-bit, non-prefetchable) [size=16K]
Capabilities: <access denied>
Kernel driver in use: HDA Intel
03:00.0 SATA controller: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02) (prog-if 01 [AHCI 1.0])
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, fast devsel, latency 0, IRQ 19
Memory at fa000000 (32-bit, non-prefetchable) [size=8K]
Capabilities: <access denied>
Kernel driver in use: ahci
03:00.1 IDE interface: JMicron Technology Corp. JMB362/JMB363 Serial ATA Controller (rev 02) (prog-if 85 [Master SecO PriO])
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, fast devsel, latency 0, IRQ 16
I/O ports at b000 [size=8]
I/O ports at b100 [size=4]
I/O ports at b200 [size=8]
I/O ports at b300 [size=4]
I/O ports at b400 [size=16]
Capabilities: <access denied>
Kernel driver in use: pata_jmicron
04:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 01)
Subsystem: Giga-byte Technology GA-EP45-DS5 Motherboard
Flags: bus master, fast devsel, latency 0, IRQ 44
I/O ports at c000 [size=256]
Memory at f7000000 (64-bit, non-prefetchable) [size=4K]
[virtual] Expansion ROM at c0600000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: r8169
05:01.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8169 Gigabit Ethernet (rev 10)
Subsystem: Realtek Semiconductor Co., Ltd. RTL8169/8110 Family PCI Gigabit Ethernet NIC
Flags: bus master, 66MHz, medium devsel, latency 64, IRQ 19
I/O ports at d000 [size=256]
Memory at f9000000 (32-bit, non-prefetchable) [size=256]
[virtual] Expansion ROM at c0800000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: r8169
05:02.0 SCSI storage controller: Adaptec AHA-2940U2/U2W
Subsystem: Adaptec AHA-2940U2B SCSI Controller
Flags: bus master, medium devsel, latency 32, IRQ 18
BIST result: 00
I/O ports at d100 [disabled] [size=256]
Memory at f9001000 (64-bit, non-prefetchable) [size=4K]
[virtual] Expansion ROM at c0820000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: aic7xxx
lsscsi output:
[0:0:5:0] tape IBM ULTRIUM-TD2 53Y2 /dev/st0
[1:0:0:0] disk ATA MAXTOR STM332082 3.AA /dev/sda
[3:0:0:0] disk ATA ST2000DL003-9VT1 CC32 /dev/sdb
[4:0:0:0] disk ATA ST2000DL003-9VT1 CC32 /dev/sdc
[5:0:0:0] disk ATA ST2000DL003-9VT1 CC32 /dev/sdd
[6:0:0:0] disk ATA ST2000DL003-9VT1 CC32 /dev/sde
[7:0:0:0] disk ATA ST2000DL003-9VT1 CC32 /dev/sdf
Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
Vendor: Award Software International, Inc.
Version: F4
Release Date: 06/11/2007
Address: 0xE0000
Runtime Size: 128 kB
ROM Size: 1024 kB
Characteristics:
PCI is supported
PNP is supported
APM is supported
BIOS is upgradeable
BIOS shadowing is allowed
Boot from CD is supported
Selectable boot is supported
EDD is supported
5.25"/360 KB floppy services are supported (int 13h)
5.25"/1.2 MB floppy services are supported (int 13h)
3.5"/720 KB floppy services are supported (int 13h)
3.5"/2.88 MB floppy services are supported (int 13h)
Print screen service is supported (int 5h)
8042 keyboard services are supported (int 9h)
Serial services are supported (int 14h)
Printer services are supported (int 17h)
CGA/mono video services are supported (int 10h)
ACPI is supported
USB legacy is supported
LS-120 boot is supported
ATAPI Zip drive boot is supported
BIOS boot specification is supported
Targeted content distribution is supported
Handle 0x0001, DMI type 1, 27 bytes
System Information
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: P35-DS3R
Version:
Serial Number:
UUID: 00000000-0000-0000-0000-001A4D4C68B9
Wake-up Type: Power Switch
SKU Number:
Family:
Handle 0x0002, DMI type 2, 8 bytes
Base Board Information
Manufacturer: Gigabyte Technology Co., Ltd.
Product Name: P35-DS3R
Version: x.x
Serial Number:
Handle 0x0003, DMI type 3, 17 bytes
Chassis Information
Manufacturer: Gigabyte Technology Co., Ltd.
Type: Desktop
Lock: Not Present
Version:
Serial Number:
Asset Tag:
Boot-up State: Unknown
Power Supply State: Unknown
Thermal State: Unknown
Security Status: Unknown
OEM Information: 0x00000000
Handle 0x0004, DMI type 4, 35 bytes
Processor Information
Socket Designation: Socket 775
Type: Central Processor
Family: Other
Manufacturer: Intel
ID: FD 06 00 00 FF FB EB BF
Version: Intel(R) Pentium(R) Dual C
Voltage: 1.2 V
External Clock: 200 MHz
Max Speed: 4000 MHz
Current Speed: 1600 MHz
Status: Populated, Enabled
Upgrade: Socket 478
L1 Cache Handle: 0x000A
L2 Cache Handle: 0x000B
L3 Cache Handle: Not Provided
Serial Number:
Asset Tag:
Part Number:
Handle 0x0005, DMI type 5, 24 bytes
Memory Controller Information
Error Detecting Method: 8-bit Parity
Error Correcting Capabilities:
None
Supported Interleave: One-way Interleave
Current Interleave: One-way Interleave
Maximum Memory Module Size: 1024 MB
Maximum Total Memory Size: 4096 MB
Supported Speeds:
Other
Supported Memory Types:
Other
Memory Module Voltage: 5.0 V
Associated Memory Slots: 4
0x0006
0x0007
0x0008
0x0009
Enabled Error Correcting Capabilities:
None
Handle 0x0006, DMI type 6, 12 bytes
Memory Module Information
Socket Designation: A0
Bank Connections: 1
Current Speed: Unknown
Type: Other
Installed Size: 1024 MB (Double-bank Connection)
Enabled Size: 1024 MB (Double-bank Connection)
Error Status: OK
Handle 0x0007, DMI type 6, 12 bytes
Memory Module Information
Socket Designation: A1
Bank Connections: 2
Current Speed: Unknown
Type: Other
Installed Size: 1024 MB (Double-bank Connection)
Enabled Size: 1024 MB (Double-bank Connection)
Error Status: OK
Handle 0x0008, DMI type 6, 12 bytes
Memory Module Information
Socket Designation: A2
Bank Connections: 3
Current Speed: Unknown
Type: Unknown
Installed Size: Not Installed
Enabled Size: Not Installed
Error Status: OK
Handle 0x0009, DMI type 6, 12 bytes
Memory Module Information
Socket Designation: A3
Bank Connections: 4
Current Speed: Unknown
Type: Unknown
Installed Size: Not Installed
Enabled Size: Not Installed
Error Status: OK
Handle 0x000A, DMI type 7, 19 bytes
Cache Information
Socket Designation: Internal Cache
Configuration: Enabled, Not Socketed, Level 1
Operational Mode: Write Back
Location: Internal
Installed Size: 64 KB
Maximum Size: 64 KB
Supported SRAM Types:
Synchronous
Installed SRAM Type: Synchronous
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Unknown
Handle 0x000B, DMI type 7, 19 bytes
Cache Information
Socket Designation: External Cache
Configuration: Enabled, Not Socketed, Level 2
Operational Mode: Write Back
Location: Internal
Installed Size: 1024 KB
Maximum Size: 2048 KB
Supported SRAM Types:
Synchronous
Installed SRAM Type: Synchronous
Speed: Unknown
Error Correction Type: Unknown
System Type: Unknown
Associativity: Unknown
Handle 0x000C, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: PRIMARY IDE
Internal Connector Type: On Board IDE
External Reference Designator:
External Connector Type: None
Port Type: Other
Handle 0x000D, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: SECONDARY IDE
Internal Connector Type: On Board IDE
External Reference Designator:
External Connector Type: None
Port Type: Other
Handle 0x000E, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: FDD
Internal Connector Type: On Board Floppy
External Reference Designator:
External Connector Type: None
Port Type: 8251 FIFO Compatible
Handle 0x000F, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: COM1
Internal Connector Type: 9 Pin Dual Inline (pin 10 cut)
External Reference Designator:
External Connector Type: DB-9 male
Port Type: Serial Port 16450 Compatible
Handle 0x0010, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: COM2
Internal Connector Type: 9 Pin Dual Inline (pin 10 cut)
External Reference Designator:
External Connector Type: DB-9 male
Port Type: Serial Port 16450 Compatible
Handle 0x0011, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: LPT1
Internal Connector Type: DB-25 female
External Reference Designator:
External Connector Type: DB-25 female
Port Type: Parallel Port ECP/EPP
Handle 0x0012, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: Keyboard
Internal Connector Type: Other
External Reference Designator:
External Connector Type: PS/2
Port Type: Keyboard Port
Handle 0x0013, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: PS/2 Mouse
Internal Connector Type: PS/2
External Reference Designator: Detected
External Connector Type: PS/2
Port Type: Mouse Port
Handle 0x0014, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: USB
Internal Connector Type: None
External Reference Designator:
External Connector Type: Access Bus (USB)
Port Type: USB
Handle 0x0015, DMI type 8, 9 bytes
Port Connector Information
Internal Reference Designator: USB
Internal Connector Type: None
External Reference Designator:
External Connector Type: Access Bus (USB)
Port Type: USB
Handle 0x0016, DMI type 9, 13 bytes
System Slot Information
Designation: PCI
Type: 32-bit PCI
Current Usage: Available
Length: Long
ID: 0
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
SMBus signal is supported
Handle 0x0017, DMI type 9, 13 bytes
System Slot Information
Designation: PCI
Type: 32-bit PCI
Current Usage: In Use
Length: Long
ID: 1
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
SMBus signal is supported
Handle 0x0018, DMI type 9, 13 bytes
System Slot Information
Designation: PCI
Type: 32-bit PCI
Current Usage: In Use
Length: Long
ID: 2
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
SMBus signal is supported
Handle 0x0019, DMI type 9, 13 bytes
System Slot Information
Designation: PCI
Type: 32-bit PCI
Current Usage: Available
Length: Long
ID: 0
Characteristics:
5.0 V is provided
3.3 V is provided
PME signal is supported
SMBus signal is supported
Handle 0x001A, DMI type 13, 22 bytes
BIOS Language Information
Installable Languages: 3
n|US|iso8859-1
n|US|iso8859-1
r|CA|iso8859-1
Currently Installed Language: n|US|iso8859-1
Handle 0x001B, DMI type 16, 15 bytes
Physical Memory Array
Location: System Board Or Motherboard
Use: System Memory
Error Correction Type: None
Maximum Capacity: 4 GB
Error Information Handle: Not Provided
Number Of Devices: 4
Handle 0x001C, DMI type 17, 27 bytes
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 1024 MB
Form Factor: DIMM
Set: None
Locator: A0
Bank Locator: Bank0/1
Type: Unknown
Type Detail: None
Speed: 800 MHz (1.2 ns)
Manufacturer:
Serial Number:
Asset Tag:
Part Number:
Handle 0x001D, DMI type 17, 27 bytes
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 1024 MB
Form Factor: DIMM
Set: None
Locator: A1
Bank Locator: Bank2/3
Type: Unknown
Type Detail: None
Speed: 800 MHz (1.2 ns)
Manufacturer:
Serial Number:
Asset Tag:
Part Number:
Handle 0x001E, DMI type 17, 27 bytes
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: DIMM
Set: None
Locator: A2
Bank Locator: Bank4/5
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer:
Serial Number:
Asset Tag:
Part Number:
Handle 0x001F, DMI type 17, 27 bytes
Memory Device
Array Handle: 0x001B
Error Information Handle: Not Provided
Total Width: Unknown
Data Width: Unknown
Size: No Module Installed
Form Factor: DIMM
Set: None
Locator: A3
Bank Locator: Bank6/7
Type: Unknown
Type Detail: None
Speed: Unknown
Manufacturer:
Serial Number:
Asset Tag:
Part Number:
Handle 0x0025, DMI type 32, 11 bytes
System Boot Information
Status: No errors detected
Handle 0x0026, DMI type 127, 4 bytes
End Of Table
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 0EE883A1-2B65-4822-8037-B43AC5D15482@claunia.com">http://lists.debian.org/0EE883A1-2B65-4822-8037-B43AC5D15482@claunia.com
08-28-2011, 08:07 AM
Juhani Karlsson
Bug#625922: SATA devices get reset without real hardware failure
I can confirm the same problem.
cat /var/log/messages.0 |grep ata
Aug 28 00:11:45 lrdlnx kernel: ata2: hard resetting link
Aug 28 00:11:45 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port
Aug 28 00:11:45 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Aug 28 00:11:45 lrdlnx kernel: ata2.00: configured for UDMA/133
Aug 28 00:11:45 lrdlnx kernel: ata2: EH complete
Aug 28 00:31:24 lrdlnx kernel: ata2: hard resetting link
Aug 28 00:31:24 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port
Aug 28 00:31:24 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Aug 28 00:31:24 lrdlnx kernel: ata2.00: configured for UDMA/133
Aug 28 00:31:24 lrdlnx kernel: ata2: EH complete
Aug 28 01:02:13 lrdlnx clamd[4832]: SelfCheck: Database status OK.
Aug 28 02:39:01 lrdlnx freshclam[4935]: Database updated (1029731
signatures) from db.local.clamav.net (IP: 85.254.217.235)
Aug 28 02:50:15 lrdlnx kernel: ata2: hard resetting link
Aug 28 02:50:15 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port
Aug 28 02:50:15 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Aug 28 02:50:15 lrdlnx kernel: ata2.00: configured for UDMA/133
Aug 28 02:50:15 lrdlnx kernel: ata2: EH complete
Aug 28 03:02:07 lrdlnx clamd[4832]: SelfCheck: Database modification
detected. Forcing reload.
Aug 28 03:02:08 lrdlnx clamd[4832]: Reading databases from /var/lib/clamav
Aug 28 03:02:18 lrdlnx clamd[4832]: Database correctly reloaded (1028330
signatures)
Aug 28 03:08:55 lrdlnx kernel: ata2: hard resetting link
Aug 28 03:08:55 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port
Aug 28 03:08:56 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Aug 28 03:08:56 lrdlnx kernel: ata2.00: configured for UDMA/133
Aug 28 03:08:56 lrdlnx kernel: ata2: EH complete
Aug 28 03:08:58 lrdlnx kernel: ata2: hard resetting link
Aug 28 03:08:58 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port
Aug 28 03:08:58 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Aug 28 03:08:58 lrdlnx kernel: ata2.00: configured for UDMA/133
Aug 28 03:08:58 lrdlnx kernel: ata2: EH complete
after 5PM no errors /var/log/messages, sometimes error can be seen in
log once every few minutes, sometimes hours
or even days, system is running 24/7
around the time I started notice errrors I had just replaced smaller
drives with 2TB Western Digital Caviar Green WD20EARS
which use "IntelliPower", variable spin rate 5400-7200rpm
just to be sure I already replaced SATA cables with new ones
SATA is Nvidia:
root@lrdlnx:~# lspci |grep -i sata
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)
my raid:
root@lrdlnx:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda5[2] sdb5[1]
1857650986 blocks super 1.2 [2/2] [UU]
md1 : active raid1 sdb2[1] sda2[0]
70011200 blocks [2/2] [UU]
md3 : active raid1 sdd1[1] sdc1[0]
730957376 blocks [2/2] [UU]
md0 : active raid1 sdb1[1] sda1[0]
136448 blocks [2/2] [UU]
unused devices: <none>
I have run tests few time with no errors and only thing is I these
errors but everything is working perfectly:
root@lrdlnx:~# badblocks -vv /dev/sda
Checking blocks 0 to 1953514583
Checking for bad blocks (read-only test):
done
Pass completed, 0 bad blocks found.
root@lrdlnx:~# badblocks -vv /dev/sdb
Checking blocks 0 to 1953514583
Checking for bad blocks (read-only test):
done
Pass completed, 0 bad blocks found.
root@lrdlnx:~# badblocks -vv /dev/sdc
Checking blocks 0 to 732574583
Checking for bad blocks (read-only test):
done
Pass completed, 0 bad blocks found.
root@lrdlnx:~# badblocks -vv /dev/sdd
Checking blocks 0 to 732574583
Checking for bad blocks (read-only test):
done
Pass completed, 0 bad blocks found.
root@lrdlnx:~#
root@lrdlnx:~# smartctl -t short /dev/sda
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in
off-line mode".
Drive command "Execute SMART Short self-test routine immediately in
off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Fri Aug 19 08:21:57 2011
Use smartctl -X to abort test.
root@lrdlnx:~# smartctl -t short /dev/sdb
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in
off-line mode".
Drive command "Execute SMART Short self-test routine immediately in
off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Fri Aug 19 08:22:02 2011
Use smartctl -X to abort test.
root@lrdlnx:~# smartctl -t short /dev/sdc
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in
off-line mode".
Drive command "Execute SMART Short self-test routine immediately in
off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Fri Aug 19 08:22:05 2011
Use smartctl -X to abort test.
root@lrdlnx:~# smartctl -t short /dev/sdd
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Short self-test routine immediately in
off-line mode".
Drive command "Execute SMART Short self-test routine immediately in
off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Fri Aug 19 08:22:08 2011
Use smartctl -X to abort test.
root@lrdlnx:~# smartctl -l selftest /dev/sda
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00%
1109 -
# 2 Short offline Completed without error 00%
1104 -
# 3 Short offline Completed without error 00%
1080 -
# 4 Short offline Completed without error 00%
1057 -
# 5 Short offline Completed without error 00%
1033 -
# 6 Short offline Completed without error 00%
1009 -
# 7 Short offline Completed without error 00%
985 -
# 8 Short offline Completed without error 00%
961 -
# 9 Short offline Completed without error 00%
937 -
#10 Short offline Completed without error 00%
913 -
#11 Short offline Completed without error 00%
889 -
#12 Short offline Completed without error 00%
865 -
#13 Short offline Completed without error 00%
841 -
#14 Short offline Completed without error 00%
817 -
#15 Short offline Completed without error 00%
793 -
#16 Short offline Completed without error 00%
770 -
#17 Short offline Completed without error 00%
748 -
#18 Short offline Completed without error 00%
724 -
#19 Short offline Completed without error 00%
700 -
#20 Short offline Completed without error 00%
676 -
#21 Short offline Completed without error 00%
652 -
root@lrdlnx:~# smartctl -l selftest /dev/sdb
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00%
1116 -
# 2 Short offline Completed without error 00%
1111 -
# 3 Short offline Completed without error 00%
1087 -
# 4 Short offline Completed without error 00%
1063 -
# 5 Short offline Completed without error 00%
1039 -
# 6 Short offline Completed without error 00%
1015 -
# 7 Short offline Completed without error 00%
991 -
# 8 Short offline Completed without error 00%
967 -
# 9 Short offline Completed without error 00%
943 -
#10 Short offline Completed without error 00%
919 -
#11 Short offline Completed without error 00%
895 -
#12 Short offline Completed without error 00%
871 -
#13 Short offline Completed without error 00%
847 -
#14 Short offline Completed without error 00%
823 -
#15 Short offline Completed without error 00%
800 -
#16 Short offline Completed without error 00%
776 -
#17 Short offline Completed without error 00%
754 -
#18 Short offline Completed without error 00%
730 -
#19 Short offline Completed without error 00%
706 -
#20 Short offline Completed without error 00%
682 -
#21 Short offline Completed without error 00%
658 -
root@lrdlnx:~# smartctl -l selftest /dev/sdc
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00%
16121 -
# 2 Short offline Completed without error 00%
16116 -
# 3 Short offline Completed without error 00%
16092 -
# 4 Short offline Completed without error 00%
16068 -
# 5 Short offline Completed without error 00%
16044 -
# 6 Short offline Completed without error 00%
16020 -
# 7 Short offline Completed without error 00%
15996 -
# 8 Short offline Completed without error 00%
15972 -
# 9 Short offline Completed without error 00%
15948 -
#10 Short offline Completed without error 00%
15924 -
#11 Short offline Completed without error 00%
15900 -
#12 Short offline Completed without error 00%
15876 -
#13 Short offline Completed without error 00%
15852 -
#14 Short offline Completed without error 00%
15828 -
#15 Short offline Completed without error 00%
15804 -
#16 Short offline Completed without error 00%
15780 -
#17 Short offline Completed without error 00%
15758 -
#18 Short offline Completed without error 00%
15734 -
#19 Short offline Completed without error 00%
15710 -
#20 Short offline Completed without error 00%
15686 -
#21 Short offline Completed without error 00%
15662 -
root@lrdlnx:~# smartctl -l selftest /dev/sdd
smartctl 5.40 2010-07-12 r3124 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00%
16122 -
# 2 Short offline Completed without error 00%
16117 -
# 3 Short offline Completed without error 00%
16093 -
# 4 Short offline Completed without error 00%
16069 -
# 5 Short offline Completed without error 00%
16045 -
# 6 Short offline Completed without error 00%
16021 -
# 7 Short offline Completed without error 00%
15997 -
# 8 Short offline Completed without error 00%
15973 -
# 9 Short offline Completed without error 00%
15949 -
#10 Short offline Completed without error 00%
15925 -
#11 Short offline Completed without error 00%
15901 -
#12 Short offline Completed without error 00%
15877 -
#13 Short offline Completed without error 00%
15853 -
#14 Short offline Completed without error 00%
15829 -
#15 Short offline Completed without error 00%
15805 -
#16 Short offline Completed without error 00%
15781 -
#17 Short offline Completed without error 00%
15759 -
#18 Short offline Completed without error 00%
15735 -
#19 Short offline Completed without error 00%
15711 -
#20 Short offline Completed without error 00%
15687 -
#21 Short offline Completed without error 00%
15663 -
these error just make worried because last time I had real hdd failure,
I saw similiar port reset errors
but also actual errors on drive like I/O error, read failure
Apr 16 21:44:19 lrd-selleri kernel: res 40/00:00:00:00:e0/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Apr 16 21:44:19 lrd-selleri kernel: ata1: hard resetting port
Apr 16 21:44:19 lrd-selleri kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Apr 16 21:44:19 lrd-selleri kernel: ata1.00: configured for UDMA/133
Apr 16 21:44:19 lrd-selleri kernel: ata1: EH complete
Apr 16 21:44:19 lrd-selleri kernel: sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB)
Apr 16 21:44:19 lrd-selleri kernel: sd 0:0:0:0: [sda] Write Protect is off
Apr 16 21:44:19 lrd-selleri kernel: sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Apr 16 21:50:32 lrd-selleri kernel: res 40/00:00:00:00:e0/00:00:00:00:00/00 Emask 0x14 (ATA bus error)
Apr 16 21:50:32 lrd-selleri kernel: ata1: hard resetting port
Apr 16 21:50:32 lrd-selleri kernel: ata1: port is slow to respond, please be patient (Status 0x80)
Apr 16 21:50:32 lrd-selleri kernel: ata1: hard resetting port
Apr 16 21:50:32 lrd-selleri kernel: ata1: SATA link down (SStatus 0 SControl 300)
Apr 16 21:50:32 lrd-selleri kernel: ata1: failed to recover some devices, retrying in 5 secs
Apr 16 21:50:32 lrd-selleri kernel: ata1: hard resetting port
Apr 16 21:50:32 lrd-selleri kernel: ata1: SATA link down (SStatus 0 SControl 300)
Apr 16 21:50:33 lrd-selleri kernel: ata1.00: limiting speed to UDMA/133:PIO3
Apr 16 21:50:33 lrd-selleri kernel: ata1: failed to recover some devices, retrying in 5 secs
Apr 16 21:50:33 lrd-selleri kernel: ata1: hard resetting port
Apr 16 21:50:33 lrd-selleri kernel: ata1: SATA link down (SStatus 0 SControl 300)
Apr 16 21:50:33 lrd-selleri kernel: ata1.00: disabled
Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Sense Key : Aborted Command [current] [descriptor]
Apr 16 21:50:33 lrd-selleri kernel: Descriptor sense data with sense descriptors (in hex):
Apr 16 21:50:33 lrd-selleri kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Apr 16 21:50:33 lrd-selleri kernel: 00 00 00 00
Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Add. Sense: No additional sense information
Apr 16 21:50:33 lrd-selleri kernel: end_request: I/O error, dev sda, sector 272308480
Apr 16 21:50:33 lrd-selleri kernel: md: super_written gets error=-5, uptodate=0
Apr 16 21:50:33 lrd-selleri kernel: ^IOperation continuing on 1 devices
Apr 16 21:50:33 lrd-selleri kernel: ata1: EH complete
Apr 16 21:50:33 lrd-selleri kernel: ata1.00: detaching (SCSI 0:0:0:0)
Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Synchronizing SCSI cache
Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Stopping disk
Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] START_STOP FAILED
Apr 16 21:50:33 lrd-selleri kernel: sd 0:0:0:0: [sda] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK
Apr 16 21:50:33 lrd-selleri kernel: RAID1 conf printout:
Apr 16 21:50:33 lrd-selleri kernel: --- wd:1 rd:2
Apr 16 21:50:33 lrd-selleri kernel: disk 0, wo:0, o:1, dev:sdb2
Apr 16 21:50:33 lrd-selleri kernel: disk 1, wo:1, o:0, dev:sda2
Apr 16 21:50:33 lrd-selleri kernel: RAID1 conf printout:
Apr 16 21:50:33 lrd-selleri kernel: --- wd:1 rd:2
Apr 16 21:50:33 lrd-selleri kernel: disk 0, wo:0, o:1, dev:sdb2
Apr 16 21:50:33 lrd-selleri kernel: ^IOperation continuing on 1 devices
Apr 16 21:50:33 lrd-selleri kernel: RAID1 conf printout:
Apr 16 21:50:33 lrd-selleri kernel: --- wd:1 rd:2
Apr 16 21:50:33 lrd-selleri kernel: disk 0, wo:0, o:1, dev:sdb1
Apr 16 21:50:33 lrd-selleri kernel: disk 1, wo:1, o:0, dev:sda1
Apr 16 21:50:33 lrd-selleri kernel: RAID1 conf printout:
Apr 16 21:50:33 lrd-selleri kernel: --- wd:1 rd:2
Apr 16 21:50:33 lrd-selleri kernel: disk 0, wo:0, o:1, dev:sdb1
Apr 16 21:50:33 lrd-selleri kernel: to dead device
Apr 16 21:50:33 lrd-selleri kernel: ^IOperation continuing on 1 devices
Apr 16 21:50:34 lrd-selleri kernel: to dead device
--
-------------------------
Juhani Karlsson
juhani dot karlsson at iki dot fi
http://lrdlnx.iki.fi
-------------------------
X-Virus-Scanned: Debian amavisd-new (with ClamAV) at lrdlnx.iki.fi
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4E59F74F.1020500@lrdlnx.iki.fi">http://lists.debian.org/4E59F74F.1020500@lrdlnx.iki.fi
08-28-2011, 09:24 AM
Juhani Karlsson
Bug#625922: SATA devices get reset without real hardware failure
I use custom kernels from:
root@lrdlnx:~# dpkg -l |grep linux-source
ii linux-source-2.6.32
2.6.32-35 Linux kernel source for version 2.6.32
with Debian patches
ii linux-source-2.6.38
2.6.38-5~bpo60+1 Linux kernel source for version 2.6.38
with Debian patches
No external patches or anything, just official Debian sources and stuff.
Same error with 2.6.32 and 2.6.38, first time I noticed errors was Aug 5
and I started using 2.6.38 Aug 16.
I have also changed my configuration between my two dekstop computers,
these drives have been attached to
different motherboard, also Nvidia chipset, not exactly same but
anyway....same error also with other mainboard.
Aug 5 04:38:46 lrdlnx kernel: ata2: hard resetting link
Aug 5 04:38:46 lrdlnx kernel: ata2: nv: skipping hardreset on occupied port
Aug 5 04:38:46 lrdlnx kernel: ata2: SATA link up 3.0 Gbps (SStatus 123
SControl 300)
Aug 5 04:38:46 lrdlnx kernel: ata2.00: configured for UDMA/133
Aug 5 04:38:46 lrdlnx kernel: ata2: EH complete
--
-------------------------
Juhani Karlsson
juhani dot karlsson at iki dot fi
http://lrdlnx.iki.fi
-------------------------
X-Virus-Scanned: Debian amavisd-new (with ClamAV) at lrdlnx.iki.fi
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4E5A0961.2030608@lrdlnx.iki.fi">http://lists.debian.org/4E5A0961.2030608@lrdlnx.iki.fi
10-17-2011, 10:37 PM
"Javier Ortega Conde (Malkavian)"
Bug#625922: SATA devices get reset without real hardware failure
This bug (in general, not just this on this web) have been in GNU/Linux since
a long time with various disks, mainboards, SATA controllers, distros and
kernels (maybe since changes after 2.6.24).
In https://bugzilla.redhat.com/show_bug.cgi?id=684599 David Zeuthen says
"it's most probably caused by this commit
http://git.kernel.org/?p=linux/hotplug/udev.git;a=commitdiff;h=560de575148b7efda3b34a7f70 73abd483c5f08e
"
Possible workarounds readed to this bug:
-1: Add "libata.atapi_passthru16=0" to the kernel boot options (because some
devices may not support 16-byte ATA commands) (
https://bugzilla.redhat.com/show_bug.cgi?id=684599 )
-2: (Same as 1) Add options libata atapi_passthru16=0 to
/etc/modprobe.d/modprobe.conf and add FILES="/etc/modprobe.d/modprobe.conf" to
/etc/mkinitcpio.conf ( https://bbs.archlinux.org/viewtopic.php?pid=895404 )
-3: Somebody called Fujisan said in 2009 "adding 'acpi=off noapic' to the
kernel in /etc.grub.conf seems to have solved the problem for me" (
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=462425 ). Raman
Gupta and Andreas M. Kirchwitz say in other forums that adding 'acpi=off'
doesn't work ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
-4: (Similar to 3) Completely disable ACPI in mainboard BIOS. (
http://lists.debian.org/debian-user/2010/01/msg00023.html )
-5: Gaetan Cambier says "add the option line to grub to disable ncq :
'libata.force=noncq' for me, with this, i have no froze". (
https://bugzilla.redhat.com/show_bug.cgi?id=549981 ). Others reply that it
doesn't work for them. PsYcHoK9 sys it works for him but John Doe replies that
not for him ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 ).
-6: Reartes Guillermo says "booting with the kernel parameter: pcie_aspm=off ?
For me it worked (nvidia)". Raman Gupta replies that "I tried this and it did
not fix the problem." ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
-7: A. Mani says "For the SB600 controller, the right thing to do is to
restrict all drives to 1.5Gbps by jumpers or with a boot option." Raman Gupta
replies "I also tried this -- but with this setting all drives attached to my
Marvell controller could not even be started by the kernel -- permanent
"failed to IDENTIFY" errors." (
https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
-8: DjznBR (djzn-br) says he have trying some things WITHOUT success it and
finally one that works. Doesn't work: TURNED HDPARM OFF, CHANGED CABLE,
EXPERIMENTED AHCI & RAID MODES, DISABLED NCQ, COMPILED KERNEL WITH
CONFIG_SATA_PMP DISABLED, TRYING NOW LIBATA.FORCE=1.5GBPS, changed the cables
to different routes... SATA1 -> SATA2 SATA2 -> SATA3 ---- Works (but still
gives "softreset failed (device not ready)" messages in dmesg and afterwards
recover without data loss) : Added option for kernel in grub configuration
"libata.noacpi=1". Also says "libata.force=norst ... prevents soft and hard
link resettings. If you have that switch on, when this bug comes up, there is
a system lock down (because obviously the kernel prevented the soft & hard
resetting." ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 )
Same problem in my old PC/Server Pentium II MMX with Debian 6.0.3 (stable)
with kernel 2.6.32-5-686 and libata version 3.00 in an "IBM-DTLA-305010" 10Gb
IDE disk (configured by debian as sda) in an old mainboard . No RAID used, but
only soft reset, and no hard reset, so I don't lose data. Could send logs, but
I think they wouldn't give any more info.
Same problem in my desktop PC every 2 or 3 months in Debian testing with
kernels 3.0.0-1-amd64, 3.0.0-rc2-amd64, 2.6.39-2-amd64, 2.6.39-amd64,
2.6.38-2-amd64, 2.6.38-amd64 and maybe others older, and libata 3.00 in two
Seagate 7200.11 "ST3500320AS" 500Gb SATA2 disks (with last firmware) from a
RAID10. Fortunately the other two Western Digital "WDC WD1002FAEX-00Z3A0" 1Tb
SATA3 disks don't fail, but I have to reboot and re-add disk to reconstruct
raid. Could send logs, but I think they wouldn't give any more info.
Possibly these are the same bug: #539059, #603061, #524876
Same bug in other distros and kernels:
-Archlinux with udev-165 and udev-166:
https://bbs.archlinux.org/viewtopic.php?pid=895404
-Fedora with kernel 2.6.38-0.rc8.git0.1.fc15.x86_64 and udev-166 in a DVD
reader: https://bugzilla.redhat.com/show_bug.cgi?id=684599
-Fedora 13 with kernel 2.6.33.8-149.fc13.i686.PAE or Fedora 13 64bit on a Mac
Mini
-Fedora 14 with kernels 2.6.31.6-166.fc12@x86_64, 2.6.32.11-99.fc12.x86_64,
2.6.35.9-64.fc14.x86_64, 2.6.35.10-72.fc14.i686 and 2.6.35.10-74.fc14.x86_64
and 2.6.35.11-83.fc14.x86_64 and 2.6.35.14-95.fc14.x86_64:
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-Fedora 15 (updated from Fedora 14):
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-Centos5.5-x64 with kernel 2.6.18-194-x64:
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-RHEL5 with vanilla kernel 2.6.37.3:
https://bugzilla.redhat.com/show_bug.cgi?id=549981
-Ubuntu since 8.10 64bit with kernels 2.6.27-7, 2.6.28-15-generic, 2.6.31-14-
generic, 2.6.31-15-generic (on a Macbook2), 2.6.38-7-generic (kernel-ppa):
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892
-Ubuntu 10.04: https://bugzilla.redhat.com/show_bug.cgi?id=549981
--
Bye: Javier Ortega Conde (Malkavian)
__________________________________________________ ______________________
The Malkavian's webpage: Many things http://malkavian.dyndns.org
Member of LinUxers Group from Bizkaia (GLUB) http://glub.biz
Member of GoBi Go Club, Eghost, Itsas, Aske, Guardianes del Túmulo...
__________________________________________________ ______________________
Microsoft is to operating systems and security what McDonald's to gourmet food
and healthy nutrition. (Javier Ortega Conde (Malkavian))
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 201110180037.21064.malkavian666@gmail.com">http://lists.debian.org/201110180037.21064.malkavian666@gmail.com
10-18-2011, 01:56 AM
Ben Hutchings
Bug#625922: SATA devices get reset without real hardware failure
On Tue, 2011-10-18 at 00:37 +0200, Javier Ortega Conde (Malkavian)
wrote:
> This bug (in general, not just this on this web) have been in GNU/Linux since
> a long time with various disks, mainboards, SATA controllers, distros and
> kernels (maybe since changes after 2.6.24).
Just because you see the same error messages, that does not mean you are
seeing the same bug.
> In https://bugzilla.redhat.com/show_bug.cgi?id=684599 David Zeuthen says
> "it's most probably caused by this commit
> http://git.kernel.org/?p=linux/hotplug/udev.git;a=commitdiff;h=560de575148b7efda3b34a7f70 73abd483c5f08e
> "
So that's a bug in some drives, though we need to work around it.
> Possible workarounds readed to this bug:
> -1: Add "libata.atapi_passthru16=0" to the kernel boot options (because some
> devices may not support 16-byte ATA commands) (
> https://bugzilla.redhat.com/show_bug.cgi?id=684599 )
> -2: (Same as 1) Add options libata atapi_passthru16=0 to
> /etc/modprobe.d/modprobe.conf and add FILES="/etc/modprobe.d/modprobe.conf" to
> /etc/mkinitcpio.conf ( https://bbs.archlinux.org/viewtopic.php?pid=895404 )
OK.
> -3: Somebody called Fujisan said in 2009 "adding 'acpi=off noapic' to the
> kernel in /etc.grub.conf seems to have solved the problem for me" (
> https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=462425 ). Raman
> Gupta and Andreas M. Kirchwitz say in other forums that adding 'acpi=off'
> doesn't work ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
> -4: (Similar to 3) Completely disable ACPI in mainboard BIOS. (
> http://lists.debian.org/debian-user/2010/01/msg00023.html )
These are workarounds for bugs in IRQ routing on some motherboards.
They are also outdated advice. 10 years ago when both ACPI and the APIC
architecture were quite new, there were a lot of bugs in both BIOS and
kernel support for them. It was therefore sensible to try disabling it
when a new system seemed unstable. Today, this is not the case.
> -5: Gaetan Cambier says "add the option line to grub to disable ncq :
> 'libata.force=noncq' for me, with this, i have no froze". (
> https://bugzilla.redhat.com/show_bug.cgi?id=549981 ). Others reply that it
> doesn't work for them. PsYcHoK9 sys it works for him but John Doe replies that
> not for him ( https://bugs.launchpad.net/ubuntu/+source/linux/+bug/285892 ).
Not even the same symptoms.
> -6: Reartes Guillermo says "booting with the kernel parameter: pcie_aspm=off ?
> For me it worked (nvidia)". Raman Gupta replies that "I tried this and it did
> not fix the problem." ( https://bugzilla.redhat.com/show_bug.cgi?id=549981 )
This is a workaround for a controller or chipset bug.
[...]
> Same problem in my old PC/Server Pentium II MMX with Debian 6.0.3 (stable)
> with kernel 2.6.32-5-686 and libata version 3.00 in an "IBM-DTLA-305010" 10Gb
> IDE disk (configured by debian as sda) in an old mainboard . No RAID used, but
> only soft reset, and no hard reset, so I don't lose data. Could send logs, but
> I think they wouldn't give any more info.
>
> Same problem in my desktop PC every 2 or 3 months in Debian testing with
> kernels 3.0.0-1-amd64, 3.0.0-rc2-amd64, 2.6.39-2-amd64, 2.6.39-amd64,
> 2.6.38-2-amd64, 2.6.38-amd64 and maybe others older, and libata 3.00 in two
> Seagate 7200.11 "ST3500320AS" 500Gb SATA2 disks (with last firmware) from a
> RAID10. Fortunately the other two Western Digital "WDC WD1002FAEX-00Z3A0" 1Tb
> SATA3 disks don't fail, but I have to reboot and re-add disk to reconstruct
> raid. Could send logs, but I think they wouldn't give any more info.
[...]
Use reportbug to open a *separate* bug report for *each* of these
systems. Do send the logs. Please do not try to find connections with
other bug reports.
Ben.
--
Ben Hutchings
No political challenge can be met by shopping. - George Monbiot
10-19-2011, 01:07 PM
Ben Hutchings
Bug#625922: SATA devices get reset without real hardware failure
On Wed, 2011-10-19 at 13:31 +0200, U.Mutlu wrote:
> Javier Ortega Conde (Malkavian) wrote, On 2011-10-18 00:37:
> > This bug (in general, not just this on this web) have been in GNU/Linux since
> > a long time with various disks, mainboards, SATA controllers, distros and
> > kernels (maybe since changes after 2.6.24).
>
> I'm using kernel 2.6.37.6 and there this bug is still present.
Not a Debian kernel version, so please don't bother this list with it.
> Has it been fixed in any recent kernel versions?
> IMO it deserves the highest priority to fix this ASAP.
[...]
It's not a single bug.
Ben.
--
Ben Hutchings
73.46% of all statistics are made up.
10-19-2011, 03:11 PM
"U.Mutlu"
Bug#625922: SATA devices get reset without real hardware failure
Ben Hutchings wrote, On 2011-10-19 15:07:
On Wed, 2011-10-19 at 13:31 +0200, U.Mutlu wrote:
Javier Ortega Conde (Malkavian) wrote, On 2011-10-18 00:37:
This bug (in general, not just this on this web) have been in GNU/Linux since
a long time with various disks, mainboards, SATA controllers, distros and
kernels (maybe since changes after 2.6.24).
I'm using kernel 2.6.37.6 and there this bug is still present.
Not a Debian kernel version, so please don't bother this list with it.
I haven't mentioned anything of Debian, I'm using the kernel from kernel.org :
http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.37.6.tar.bz2
Has it been fixed in any recent kernel versions?
IMO it deserves the highest priority to fix this ASAP.
[...]
It's not a single bug.
It's disastrous situation: an OS with buggy HD kernel driver, and no fix on the way...
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: j7mpa9$8ns$1@dough.gmane.org">http://lists.debian.org/j7mpa9$8ns$1@dough.gmane.org
11-26-2011, 06:49 AM
Jonathan Nieder
Bug#625922: SATA devices get reset without real hardware failure
Hi,
Natalia Portillo wrote:
> While running stock Debian's sid linux 2.6.38-8-amd64 kernel I'm
> getting random fails on SATA devices.
>
> I have a RAID5 system with 5 disks and 3 of them showed the same
> exact failure, one each 48 hours.
>
> On reboot, the devices work perfectly, and badblocks runs through
> them without a single failure.
>
> Kernel exact failure is:
>
> [255352.928063] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
> [255352.928071] ata4.00: failed command: FLUSH CACHE EXT
[...]
> Devices are in different SATA ports (first failed ata2, then ata5,
> then ata4) and are all Seagate ST2000DL003-9VT166.
>
> Same exact hardware has been running on Linux 2.6.32-gentoo for
> weeks without a single failure.
Thanks for reporting it, and sorry for the slow response.
Some questions:
- what kernel are you using now?
- can you still reproduce this?
- can you reproduce it with a squeeze kernel, too?
- do you know what exact version the working 2.6.32-gentoo kernel
was?
- please attach a log of the initialization of the kernel, either by
saving full "dmesg" output right after booting or by gathering it
from /var/log/dmesg*
- any workarounds or other weird symptoms?
If you can reproduce this reliably with a 3.1.y kernel, we should
take this upstream (looks like that's linux-ide@vger.kernel.org
plus linux-kernel@vger.kernel.org; please cc me or this bug log if
writing there so we can track it).
Hope that helps,
Jonathan
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20111126074919.GA22656@elie.hsd1.il.comcast.net">h ttp://lists.debian.org/20111126074919.GA22656@elie.hsd1.il.comcast.net
11-26-2011, 01:12 PM
Natalia Portillo
Bug#625922: SATA devices get reset without real hardware failure
El 26/11/2011, a las 07:49, Jonathan Nieder escribió:
> Hi,
>
> Natalia Portillo wrote:
>
>> While running stock Debian's sid linux 2.6.38-8-amd64 kernel I'm
>> getting random fails on SATA devices.
>>
>> I have a RAID5 system with 5 disks and 3 of them showed the same
>> exact failure, one each 48 hours.
>>
>> On reboot, the devices work perfectly, and badblocks runs through
>> them without a single failure.
>>
>> Kernel exact failure is:
>>
>> [255352.928063] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
>> [255352.928071] ata4.00: failed command: FLUSH CACHE EXT
> [...]
>> Devices are in different SATA ports (first failed ata2, then ata5,
>> then ata4) and are all Seagate ST2000DL003-9VT166.
>>
>> Same exact hardware has been running on Linux 2.6.32-gentoo for
>> weeks without a single failure.
>
> Thanks for reporting it, and sorry for the slow response.
>
> Some questions:
>
> - what kernel are you using now?
claunia@hades:~$ uname -a
Linux hades 3.0.0-1-amd64 #1 SMP Sat Aug 27 16:21:11 UTC 2011 x86_64 GNU/Linux
wheezy
> - can you still reproduce this?
have been only two weeks with this kernel, and there is a bug, another one
> - can you reproduce it with a squeeze kernel, too?
with all squeeze kernels up to two weeks away
> - do you know what exact version the working 2.6.32-gentoo kernel
> was?
r6 I think
> - please attach a log of the initialization of the kernel, either by
> saving full "dmesg" output right after booting or by gathering it
> from /var/log/dmesg*
I will have to dig up on the rotated logs, stay tuned
> - any workarounds or other weird symptoms?
Curiously, no workarounds, but other weird symptons in same and other kernels.
On both squeeze and wheezy kernel the following happen almost once a day (always on high network transfers):
And repeats a lot of times (the stack trace is always different, always being the process that's doing the transfer, like bacula-sd or netatalk, or the XFS or MDRAID processes)
On squeeze kernel when this happens nothing works. That is, if you open another processes, it does not open. If you kill one process, it stays opened. Hard reboot is the only way.
On wheezy system continues working.
Curiously I received an Efika MX Smartbook machine yesterday that exhibits another bug, but really similar.
With kernel Linux 2.6.31.14.26-efikamx the internal SSD suffers a lost interrupt and resets when there is high cpu usage. Sorry have to dig logs also.
>
> If you can reproduce this reliably with a 3.1.y kernel, we should
> take this upstream (looks like that's linux-ide@vger.kernel.org
> plus linux-kernel@vger.kernel.org; please cc me or this bug log if
> writing there so we can track it).
>
> Hope that helps,
> Jonathan
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 81165706-7715-4647-A055-64029A59A79C@claunia.com">http://lists.debian.org/81165706-7715-4647-A055-64029A59A79C@claunia.com
12-15-2011, 12:06 AM
Alessio Treglia
Bug#625922: SATA devices get reset without real hardware failure
found 625922 3.1.1-1
thanks
Hi, still reproducible here with linux-image-3.1.0-1-amd64 3.1.1-1:
Dec 14 11:56:53 alessio-laptop kernel: [ 6838.837215] ata2.00:
exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 14 11:56:53 alessio-laptop kernel: [ 6838.837222] ata2.00: failed
command: FLUSH CACHE EXT
Dec 14 11:56:53 alessio-laptop kernel: [ 6838.837230] ata2.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Dec 14 11:56:53 alessio-laptop kernel: [ 6838.837231] res
40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 11:56:53 alessio-laptop kernel: [ 6838.837241] ata2.00: status: { DRDY }
Dec 14 11:56:53 alessio-laptop kernel: [ 6838.837254] ata2: hard resetting link
Dec 14 11:56:58 alessio-laptop kernel: [ 6844.199464] ata2: link is
slow to respond, please be patient (ready=0)
Dec 14 11:57:03 alessio-laptop kernel: [ 6848.846062] ata2: COMRESET
failed (errno=-16)
Dec 14 11:57:03 alessio-laptop kernel: [ 6848.846075] ata2: hard resetting link
Dec 14 11:57:08 alessio-laptop kernel: [ 6854.208316] ata2: link is
slow to respond, please be patient (ready=0)
Dec 14 11:57:13 alessio-laptop kernel: [ 6858.854933] ata2: COMRESET
failed (errno=-16)
Dec 14 11:57:13 alessio-laptop kernel: [ 6858.854943] ata2: hard resetting link
Dec 14 11:57:14 alessio-laptop acpid: client 1649[0:0] has disconnected
Dec 14 11:57:18 alessio-laptop kernel: [ 6864.213249] ata2: link is
slow to respond, please be patient (ready=0)
Dec 14 11:57:29 alessio-laptop kernel: [ 6875.073958] ata2: SATA link
up 3.0 Gbps (SStatus 123 SControl 300)
Dec 14 11:57:29 alessio-laptop kernel: [ 6875.117586] ata2.00:
configured for UDMA/100
Dec 14 11:57:29 alessio-laptop kernel: [ 6875.117598] ata2.00:
retrying FLUSH 0xea Emask 0x4
Dec 14 11:57:29 alessio-laptop kernel: [ 6875.129847] ata2.00: device
reported invalid CHS sector 0
Dec 14 11:57:29 alessio-laptop kernel: [ 6875.129864] ata2: EH complete
Dec 14 11:57:30 alessio-laptop acpid: client connected from 1649[0:0]
Dec 14 11:57:30 alessio-laptop acpid: 1 client rule loaded
Dec 14 11:57:36 alessio-laptop acpid: client 1649[0:0] has disconnected
Dec 14 11:57:57 alessio-laptop acpid: client connected from 1649[0:0]
Dec 14 11:57:57 alessio-laptop acpid: 1 client rule loaded
Dec 14 11:58:06 alessio-laptop acpid: client 1649[0:0] has disconnected
Dec 14 11:59:27 alessio-laptop acpid: client connected from 1649[0:0]
Dec 14 11:59:27 alessio-laptop acpid: 1 client rule loaded
Dec 14 12:00:01 alessio-laptop /USR/SBIN/CRON[12428]: (root) CMD
(/usr/lib/prey/prey.sh >/var/log/prey.log)
Dec 14 12:00:29 alessio-laptop acpid: client 1649[0:0] has disconnected
Dec 14 12:00:32 alessio-laptop acpid: client connected from 1649[0:0]
Dec 14 12:00:32 alessio-laptop acpid: 1 client rule loaded
Dec 14 12:01:02 alessio-laptop kernel: [ 7087.848152] ata2.00:
exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
Dec 14 12:01:02 alessio-laptop kernel: [ 7087.848162] ata2.00: failed
command: FLUSH CACHE EXT
Dec 14 12:01:02 alessio-laptop kernel: [ 7087.848174] ata2.00: cmd
ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Dec 14 12:01:02 alessio-laptop kernel: [ 7087.848177] res
40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Dec 14 12:01:02 alessio-laptop kernel: [ 7087.848184] ata2.00: status: { DRDY }
Dec 14 12:01:02 alessio-laptop kernel: [ 7087.848198] ata2: hard resetting link
Dec 14 12:01:07 alessio-laptop kernel: [ 7093.206409] ata2: link is
slow to respond, please be patient (ready=0)
Dec 14 12:01:12 alessio-laptop kernel: [ 7097.853046] ata2: COMRESET
failed (errno=-16)
Dec 14 12:01:12 alessio-laptop kernel: [ 7097.853056] ata2: hard resetting link
Dec 14 12:01:12 alessio-laptop kernel: [ 7098.172921] ata2: SATA link
up 3.0 Gbps (SStatus 123 SControl 300)
Dec 14 12:01:12 alessio-laptop kernel: [ 7098.236041] ata2.00:
configured for UDMA/100
Dec 14 12:01:12 alessio-laptop kernel: [ 7098.236050] ata2.00:
retrying FLUSH 0xea Emask 0x4
Dec 14 12:01:12 alessio-laptop kernel: [ 7098.248896] ata2.00: device
reported invalid CHS sector 0
Dec 14 12:01:12 alessio-laptop kernel: [ 7098.248908] ata2: EH complete
Dec 14 12:01:57 alessio-laptop acpid: client 1649[0:0] has disconnected