Package: linux-image-2.6.24-etchnhalf.1-amd64
Version: 2.6.24-6~etchnhalf.8etch3
Severity: normal
Please see the XFS crash trace included below. The summary is:
XFS internal error xfs_trans_cancel at line 1163 of file
fs/xfs/xfs_trans.c
I've not logged this to an existing ticket sice it seems to be a class
of bug rather, a symptom, so I don't feel qualified to make that call.
This happens on a fairly regular basis with a 250GB XFS filesystem which
is one of several logical volumes on a 1.36TB LVM volume group. The
underlying device is an Areca hardware RAID card.
When the fs will go offline is not predicatable but it always happens
during reasonably heavy fs activity. This particular crash happened
while I was doing 'cp -a' on a directory containing about 27GB in 812,714 files and 12,833 directories.
Please let me know if I can provide any further information.
I can do limited testing since the server is a production system. I
could possibly upgrade to a newer kernel from say backports.org if
that's a good idea.
Thanks.
Ronny
-- Package-specific info:
** Version:
Linux version 2.6.24-etchnhalf.1-amd64 (Debian 2.6.24-6~etchnhalf.8etch3) (dannf@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Sat Aug 15 20:38:41 UTC 2009
** Command line:
auto BOOT_IMAGE=Linux ro root=900
** Not tainted
** Kernel log:
Ending XFS recovery on filesystem: dm-10 (logdev: internal)
Filesystem "dm-6": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-6
Ending clean XFS mount for filesystem: dm-6
Filesystem "dm-16": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-16
Ending clean XFS mount for filesystem: dm-16
Filesystem "dm-7": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-7
Ending clean XFS mount for filesystem: dm-7
Filesystem "dm-8": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-8
Ending clean XFS mount for filesystem: dm-8
Filesystem "dm-17": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-17
Ending clean XFS mount for filesystem: dm-17
Filesystem "dm-4": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-4
Ending clean XFS mount for filesystem: dm-4
Filesystem "dm-3": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-3
Ending clean XFS mount for filesystem: dm-3
Filesystem "dm-15": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-15
Ending clean XFS mount for filesystem: dm-15
Filesystem "dm-14": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-14
Ending clean XFS mount for filesystem: dm-14
e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
RPC: Registered udp transport module.
RPC: Registered tcp transport module.
process `syslogd' is using obsolete setsockopt SO_BSDCOMPAT
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
lp0: using parport0 (interrupt-driven).
ppdev: user-space parallel port driver
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
eth0: no IPv6 routers present
Filesystem "dm-20": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-20
Starting XFS recovery on filesystem: dm-20 (logdev: internal)
XFS resetting qflags for filesystem dm-20
Ending XFS recovery on filesystem: dm-20 (logdev: internal)
Filesystem "dm-23": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-23
Starting XFS recovery on filesystem: dm-23 (logdev: internal)
Ending XFS recovery on filesystem: dm-23 (logdev: internal)
Filesystem "dm-26": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-26
Starting XFS recovery on filesystem: dm-26 (logdev: internal)
Ending XFS recovery on filesystem: dm-26 (logdev: internal)
Filesystem "dm-29": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-29
XFS resetting qflags for filesystem dm-29
Ending clean XFS mount for filesystem: dm-29
Filesystem "dm-32": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-32
Starting XFS recovery on filesystem: dm-32 (logdev: internal)
Ending XFS recovery on filesystem: dm-32 (logdev: internal)
Filesystem "dm-35": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-35
Starting XFS recovery on filesystem: dm-35 (logdev: internal)
Ending XFS recovery on filesystem: dm-35 (logdev: internal)
Filesystem "dm-38": Disabling barriers, not supported by the underlying device
XFS mounting filesystem dm-38
XFS resetting qflags for filesystem dm-38
Ending clean XFS mount for filesystem: dm-38
md: data-check of RAID array md0
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
md: using 128k window, over a total of 489856 blocks.
md: delaying data-check of md1 until md0 has finished (they share one or more physical units)
md: delaying data-check of md2 until md1 has finished (they share one or more physical units)
md: delaying data-check of md1 until md0 has finished (they share one or more physical units)
md: md0: data-check done.
md: data-check of RAID array md1
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
md: using 128k window, over a total of 1951808 blocks.
md: delaying data-check of md2 until md1 has finished (they share one or more physical units)
RAID1 conf printout:
--- wd:2 rd:2
disk 0, wo:0, o:1, dev:sdb1
disk 1, wo:0, o:1, dev:sdc1
md: md1: data-check done.
md: data-check of RAID array md2
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
md: using 128k window, over a total of 114776320 blocks.
RAID1 conf printout:
--- wd:2 rd:2
disk 0, wo:0, o:1, dev:sdb2
disk 1, wo:0, o:1, dev:sdc2
md: md2: data-check done.
RAID1 conf printout:
--- wd:2 rd:2
disk 0, wo:0, o:1, dev:sdb3
disk 1, wo:0, o:1, dev:sdc3
06:0e.0 RAID bus controller [0104]: Areca Technology Corp. ARC-1220 8-Port PCI-Express to SATA RAID Controller [17d3:1220]
Subsystem: Areca Technology Corp. ARC-1220 8-Port PCI-Express to SATA RAID Controller [17d3:1220]
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping+ SERR- FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 32 (32000ns min), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at dd300000 (32-bit, non-prefetchable) [size=4K]
Region 2: Memory at df400000 (32-bit, prefetchable) [size=4M]
[virtual] Expansion ROM at dd310000 [disabled] [size=64K]
Capabilities: <access denied>
08:01.0 Ethernet controller [0200]: Intel Corporation 82541GI/PI Gigabit Ethernet Controller [8086:1076] (rev 05)
Subsystem: Super Micro Computer Inc Unknown device [15d9:1076]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (63750ns min), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 24
Region 0: Memory at dd400000 (64-bit, non-prefetchable) [size=128K]
Region 4: I/O ports at 3000 [size=64]
Capabilities: <access denied>
08:02.0 Ethernet controller [0200]: Intel Corporation 82541GI/PI Gigabit Ethernet Controller [8086:1076] (rev 05)
Subsystem: Super Micro Computer Inc Unknown device [15d9:1076]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 64 (63750ns min), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 25
Region 0: Memory at dd420000 (64-bit, non-prefetchable) [size=128K]
Region 4: I/O ports at 3040 [size=64]
Capabilities: <access denied>
09:01.0 VGA compatible controller [0300]: ATI Technologies Inc Rage XL [1002:4752] (rev 27) (prog-if 00 [VGA])
Subsystem: ATI Technologies Inc Rage XL [1002:8008]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 66 (2000ns min), Cache Line Size: 32 bytes
Interrupt: pin A routed to IRQ 5
Region 0: Memory at de000000 (32-bit, non-prefetchable) [size=16M]
Region 1: I/O ports at 4000 [size=256]
Region 2: Memory at dd500000 (32-bit, non-prefetchable) [size=4K]
[virtual] Expansion ROM at dd520000 [disabled] [size=128K]
Capabilities: <access denied>
*** /home/ronny/2009-10-31_vimes_fs_crash_oops.txt
Oct 31 17:41:25 vimes kernel: Filesystem "dm-10": XFS internal error xfs_trans_cancel at line 1163 of file fs/xfs/xfs_trans.c. Caller 0xffffffff8818e897
Oct 31 17:41:25 vimes kernel: Pid: 13670, comm: cp Not tainted 2.6.24-etchnhalf.1-amd64 #1
Oct 31 17:41:25 vimes kernel:
Oct 31 17:41:25 vimes kernel: Call Trace:
Oct 31 17:41:25 vimes kernel: [<ffffffff8818e897>] :xfs:xfs_create+0x442/0x4d2
Oct 31 17:41:25 vimes kernel: [<ffffffff8818770d>] :xfs:xfs_trans_cancel+0x5b/0xf3
Oct 31 17:41:25 vimes kernel: [<ffffffff8818e897>] :xfs:xfs_create+0x442/0x4d2
Oct 31 17:41:25 vimes kernel: [<ffffffff88197fab>] :xfs:xfs_vn_mknod+0x14f/0x249
Oct 31 17:41:25 vimes kernel: [<ffffffff881976ae>] :xfs:xfs_vn_permission+0x15/0x19
Oct 31 17:41:25 vimes kernel: [<ffffffff802a0a24>] vfs_create+0xcf/0x140
Oct 31 17:41:25 vimes kernel: [<ffffffff802a2c51>] open_namei+0x19d/0x644
Oct 31 17:41:25 vimes kernel: [<ffffffff80297147>] do_filp_open+0x1c/0x3d
Oct 31 17:41:25 vimes kernel: [<ffffffff80296e34>] get_unused_fd_flags+0x72/0x11e
Oct 31 17:41:25 vimes kernel: [<ffffffff802971ae>] do_sys_open+0x46/0xc3
Oct 31 17:41:25 vimes kernel: [<ffffffff8020be2e>] system_call+0x7e/0x83
Oct 31 17:41:25 vimes kernel:
Oct 31 17:41:25 vimes kernel: xfs_force_shutdown(dm-10,0x8) called from line 1164 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff88187726
Oct 31 17:41:25 vimes kernel: Filesystem "dm-10": Corruption of in-memory data detected. Shutting down filesystem: dm-10
Oct 31 17:41:25 vimes kernel: Please umount the filesystem, and rectify the problem(s)
-- System Information:
Debian Release: 4.0
APT prefers oldstable
APT policy: (500, 'oldstable')
Architecture: amd64 (x86_64)
Shell: /bin/sh linked to /bin/bash
Kernel: Linux 2.6.24-etchnhalf.1-amd64
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Versions of packages linux-image-2.6.24-etchnhalf.1-amd64 depends on:
ii debconf [debconf-2.0] 1.5.11etch2 Debian configuration management sy
ii initramfs-tools [linux-initr 0.85i tools for generating an initramfs
ii module-init-tools 3.3-pre4-2 tools for managing Linux kernel mo
linux-image-2.6.24-etchnhalf.1-amd64 recommends no packages.
On Sun, 2009-11-01 at 18:18 +0000, Ronny Adsetts wrote:
[...]
> Please let me know if I can provide any further information.
>
> I can do limited testing since the server is a production system. I
> could possibly upgrade to a newer kernel from say backports.org if
> that's a good idea.
Bugs in Debian 4.0 'etch' are now unlikely to be fixed, except for
security vulnerabilities. Please try installing the security update
version for the stable release (linux-image-2.6.26-2-amd64, version
2.6.26-19lenny1).
Ben.
--
Ben Hutchings
The generation of random numbers is too important to be left to chance.
- Robert Coveyou
Ben Hutchings said at 01/11/2009 18:58:
> On Sun, 2009-11-01 at 18:18 +0000, Ronny Adsetts wrote:
> [...]
>> Please let me know if I can provide any further information.
>>
>> I can do limited testing since the server is a production system. I
>> could possibly upgrade to a newer kernel from say backports.org if
>> that's a good idea.
>
> Bugs in Debian 4.0 'etch' are now unlikely to be fixed, except for
> security vulnerabilities. Please try installing the security update
> version for the stable release (linux-image-2.6.26-2-amd64, version
> 2.6.26-19lenny1).
Thanks for the fast response Ben.
I'd overlooked that this system was still running Etch.
Should I simply be able to install the Lenny kernel without any problems? Alternatively there is an linux-image-2.6.26-bpo.2-amd64 (2.6.26-20~bpo40+1) package on etch-backports...
Thanks.
Ronny
--
Ronny Adsetts
Technical Director
Amazing Internet Ltd, London
t: +44 20 8607 9535
f: +44 20 8607 9536
w: www.amazinginternet.com
Registered office: UK House, 82 Heath Road, Twickenham TW1 4BW
Registered in England. Company No. 4042957
On Thu, 2009-11-05 at 19:26 +0000, Ronny Adsetts wrote:
> Ben Hutchings said at 01/11/2009 18:58:
> > On Sun, 2009-11-01 at 18:18 +0000, Ronny Adsetts wrote:
> > [...]
> >> Please let me know if I can provide any further information.
> >>
> >> I can do limited testing since the server is a production system. I
> >> could possibly upgrade to a newer kernel from say backports.org if
> >> that's a good idea.
> >
> > Bugs in Debian 4.0 'etch' are now unlikely to be fixed, except for
> > security vulnerabilities. Please try installing the security update
> > version for the stable release (linux-image-2.6.26-2-amd64, version
> > 2.6.26-19lenny1).
>
> Thanks for the fast response Ben.
>
> I'd overlooked that this system was still running Etch.
>
> Should I simply be able to install the Lenny kernel without any
> problems?
Yes, that should work.
Ben.
--
Ben Hutchings
The generation of random numbers is too important to be left to chance.
- Robert Coveyou