Bug#582275: linux-image-2.6.32-bpo.4-686: ext3 filesystem corruption with md RAID1 on Seagate disks
Package: linux-2.6
Version: 2.6.32-11~bpo50+1
Severity: critical
Justification: causes serious data loss
I keep getting ext3 filesystem corruptions on one of my md RAID1 arrays. Shortly after booting, I get messages like the following one:
EXT3-fs error (device md1): htree_dirblock_to_tree: bad entry in directory #17269110: rec_len is smaller than minimal - offset=0, inode=0, rec_len=0, name_len=0
This forces an automatic fsck at the next reboot that fails. The manual fsck.ext3 -y /dev/md1 takes a long time but manages to get a clean FS again. After the reboot, it takes just a few minutes until the first of these messages appear again.
The two disks used in the RAID1 md device are both Seagate ST31000528AS that show no errors in long and short SMART test and Seatools. Memtest shows no memory problems. Two other RAID1 systems connected to the same Intel Ibex Peak 6 port SATA AHCI Controller (rev 06) show no such problems. A RAID5 with 4 Seagate ST3750640AS on a Promise PDC40718 (SATA 300 TX4) also works without problems in the same system.
I saw that sata_sil.c has a blacklist that includes mainly Seagate drives but do not know if this is related to my problem since my system uses an Intel SATA controller.
-- Package-specific info:
** Version:
Linux version 2.6.32-bpo.4-686 (Debian 2.6.32-11~bpo50+1) (norbert@tretkowski.de) (gcc version 4.3.2 (Debian 4.3.2-1.1) ) #1 SMP Mon Apr 12 16:20:13 UTC 2010
*** Protocol statistics:
Ip:
5509409 total packets received
0 forwarded
0 incoming packets discarded
5509339 incoming packets delivered
5360199 requests sent out
Icmp:
1306 ICMP messages received
103 input ICMP message failed.
ICMP-Eingabehistogramm:
destination unreachable: 891
echo requests: 1
echo replies: 414
1311 ICMP messages sent
0 ICMP messages failed
ICMP-Ausgabehistogramm:
destination unreachable: 896
echo request: 414
echo replies: 1
IcmpMsg:
InType0: 414
InType3: 891
InType8: 1
OutType0: 1
OutType3: 896
OutType8: 414
Tcp:
25672 active connections openings
14620 passive connection openings
5929 failed connection attempts
704 connection resets received
5 connections established
5467995 segments received
5326935 segments send out
2131 segments retransmited
0 bad segments received.
978 resets sent
Udp:
39322 packets received
585 packets to unknown port received.
0 packet receive errors
29818 packets sent
UdpLite:
TcpExt:
14295 TCP sockets finished time wait in fast timer
4 time wait sockets recycled by time stamp
2 packets rejects in established connections because of timestamp
10081 delayed acks sent
Quick ack mode was activated 17 times
8268 packets directly queued to recvmsg prequeue.
420298 bytes directly in process context from backlog
9836328 bytes directly received in process context from prequeue
3774010 packet headers predicted
877 packets header predicted and directly queued to user
113577 acknowledgments not containing data payload received
2152771 predicted acknowledgments
1255 times recovered from packet loss by selective acknowledgements
Detected reordering 2 times using FACK
3 congestion windows recovered without slow start by DSACK
16 congestion windows recovered without slow start after partial ack
947 TCP data loss events
TCPLostRetransmit: 16
11 timeouts after SACK recovery
1451 fast retransmits
21 forward retransmits
17 retransmits in slow start
419 other TCP timeouts
1 SACK retransmits failed
14 DSACKs sent for old packets
39 DSACKs received
550 connections reset due to unexpected data
15 connections reset due to early user close
1 connections aborted due to timeout
TCPDSACKIgnoredOld: 34
TCPDSACKIgnoredNoUndo: 1
TCPSackShiftFallback: 13522
IpExt:
InMcastPkts: 5617
OutMcastPkts: 2606
InBcastPkts: 10819
OutBcastPkts: 3114
InOctets: 904181062
OutOctets: 173964484
InMcastOctets: 2015874
OutMcastOctets: 937975
InBcastOctets: 1299201
OutBcastOctets: 640717
03:00.0 Multimedia video controller [0400]: Micronas Semiconductor Holding AG Device [18c3:0720] (rev 01)
Subsystem: Micronas Semiconductor Holding AG Device [18c3:abc4]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 12
Region 0: Memory at fbef0000 (32-bit, non-prefetchable) [size=64K]
Region 1: Memory at fbee0000 (64-bit, non-prefetchable) [size=64K]
Capabilities: <access denied>
04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller [10ec:8168] (rev 03)
Subsystem: Giga-byte Technology Device [1458:e000]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 33
Region 0: I/O ports at ee00 [size=256]
Region 2: Memory at fbbff000 (64-bit, prefetchable) [size=4K]
Region 4: Memory at fbbf8000 (64-bit, prefetchable) [size=16K]
[virtual] Expansion ROM at fbb00000 [disabled] [size=128K]
Capabilities: <access denied>
Kernel driver in use: r8169
Kernel modules: r8169
05:04.0 Mass storage controller [0180]: Promise Technology, Inc. PDC40718 (SATA 300 TX4) [105a:3d17] (rev 02)
Subsystem: Promise Technology, Inc. PDC40718 (SATA 300 TX4) [105a:3d17]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 72 (1000ns min, 4500ns max), Cache Line Size: 4 bytes
Interrupt: pin A routed to IRQ 17
Region 0: I/O ports at cf00 [size=128]
Region 2: I/O ports at c800 [size=256]
Region 3: Memory at fbdff000 (32-bit, non-prefetchable) [size=4K]
Region 4: Memory at fbdc0000 (32-bit, non-prefetchable) [size=128K]
[virtual] Expansion ROM at f0600000 [disabled] [size=32K]
Capabilities: <access denied>
Kernel driver in use: sata_promise
Kernel modules: sata_promise
05:06.0 IDE interface [0101]: Integrated Technology Express, Inc. Device [1283:8213] (prog-if 85 [Master SecO PriO])
Subsystem: Giga-byte Technology Device [1458:b000]
Control: I/O+ Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0 (2000ns min, 2000ns max)
Interrupt: pin A routed to IRQ 17
Region 0: I/O ports at ce00 [size=8]
Region 1: I/O ports at cd00 [size=4]
Region 2: I/O ports at cc00 [size=8]
Region 3: I/O ports at cb00 [size=4]
Region 4: I/O ports at ca00 [size=16]
Capabilities: <access denied>
Kernel driver in use: ITE8213_IDE
Kernel modules: it8213
** USB devices:
Bus 008 Device 002: ID 058f:6361 Alcor Micro Corp. Multimedia Card Reader
Bus 008 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 007 Device 004: ID 0bc7:0006 X10 Wireless Technology, Inc. Wireless Transceiver (ACPI-compliant)
Bus 007 Device 003: ID 0461:4d17 Primax Electronics, Ltd Optical Mouse
Bus 007 Device 002: ID 04b4:6560 Cypress Semiconductor Corp. CY7C65640 USB-2.0 "TetraHub"
Bus 007 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Bus 001 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Kernel: Linux 2.6.32-bpo.4-686 (SMP w/4 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages linux-image-2.6.32-bpo.4-686 depends on:
ii debconf [debconf-2.0] 1.5.24 Debian configuration management sy
ii initramfs-tools [linux 0.92o tools for generating an initramfs
ii linux-base 2.6.32-11~bpo50+1 Linux image base package
ii module-init-tools 3.4-1 tools for managing Linux kernel mo
Versions of packages linux-image-2.6.32-bpo.4-686 recommends:
ii firmware-linux-free 2.6.32-9~bpo50+1 Binary firmware for various driver
ii libc6-i686 2.7-18lenny2 GNU C Library: Shared libraries [i
Versions of packages linux-image-2.6.32-bpo.4-686 suggests:
ii grub 0.97-47lenny2 GRand Unified Bootloader (Legacy v
pn linux-doc-2.6.32 <none> (no description available)
Versions of packages linux-image-2.6.32-bpo.4-686 is related to:
pn firmware-bnx2 <none> (no description available)
pn firmware-bnx2x <none> (no description available)
pn firmware-ipw2x00 <none> (no description available)
pn firmware-ivtv <none> (no description available)
pn firmware-iwlwifi <none> (no description available)
ii firmware-linux 0.23~bpo50+1 Binary firmware for various driver
ii firmware-linux-nonfree 0.23~bpo50+1 Binary firmware for various driver
pn firmware-qlogic <none> (no description available)
pn firmware-ralink <none> (no description available)
pn xen-hypervisor <none> (no description available)
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20100519165051.29314.99750.reportbug@bilbo.lan.bue hl.net">http://lists.debian.org/20100519165051.29314.99750.reportbug@bilbo.lan.bue hl.net
05-21-2010, 11:08 PM
Ben Hutchings
Bug#582275: linux-image-2.6.32-bpo.4-686: ext3 filesystem corruption with md RAID1 on Seagate disks
On Wed, 2010-05-19 at 18:50 +0200, Reiner Buehl wrote:
> Package: linux-2.6
> Version: 2.6.32-11~bpo50+1
> Severity: critical
> Justification: causes serious data loss
>
> I keep getting ext3 filesystem corruptions on one of my md RAID1
> arrays. Shortly after booting, I get messages like the following one:
[...]
I see that you've also send a bug report to some kernel mailing lists,
and the problem has now disappeared. Do you still want to keep this bug
report open?
Ben.
--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
05-23-2010, 05:06 PM
Ben Hutchings
Bug#582275: linux-image-2.6.32-bpo.4-686: ext3 filesystem corruption with md RAID1 on Seagate disks
On Sun, 2010-05-23 at 18:38 +0200, Reiner Buehl wrote:
> Hi Ben,
>
> as you might have seen from my last mails on the linux-fsdevel list, the
> problem has not disappeared. If it does not cause too much trouble, I
> would like to keep the report open at least until Ted Tso has had a
> chance to look at the fsck output. Is this possible?
Yes, that's OK. Please add the bug address <582275@bugs.debian.org> to
the cc list in further discussions.
Ben.
--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.