Package: linux-source-2.6.26
Version: 2.6.26-21
Severity: normal
Hi,
I just experienced an XFS corruption on one of my machines during a
remote run of rdiff-backup. No indications of I/O errors or hardware
problems. The XFS volume runs on top of a encrypted loop-aes device,
on top of lvm, on top of software raid 1. I've been using this setup
for years without any problems.
After this, xfs refuses to mount again, and xfs_repair only runs with
-L to ignore the log. The filesystem is up now again and I'm trying to
figure out the damage. I still have the original corrupted one as an
lvm snapshot.
Here are the syslog messages and the xfs_repair output.
| Feb 11 23:46:07 marvin kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1650 of file fs/xfs/xfs_alloc.c. Caller 0xc01e6284
| Feb 11 23:46:07 marvin kernel: Pid: 1510, comm: rdiff-backup Not tainted 2.6.26 #1
| Feb 11 23:46:07 marvin kernel: [<c01e479f>] xfs_free_ag_extent+0x5ef/0x730
| Feb 11 23:46:07 marvin kernel: [<c01e6284>] xfs_free_extent+0xb4/0xe0
| Feb 11 23:46:07 marvin kernel: [<c01e6284>] xfs_free_extent+0xb4/0xe0
| Feb 11 23:46:07 marvin kernel: [<c01f6bc3>] xfs_bmap_finish+0x123/0x170
| Feb 11 23:46:07 marvin kernel: [<c0219cea>] xfs_itruncate_finish+0x1ea/0x460
| Feb 11 23:46:07 marvin kernel: [<c0235ad5>] xfs_inactive+0x3c5/0x4e0
| Feb 11 23:46:07 marvin kernel: [<c0183fb7>] inotify_inode_is_dead+0x17/0x80
| Feb 11 23:46:07 marvin kernel: [<c0241766>] xfs_fs_clear_inode+0x36/0x70
| Feb 11 23:46:07 marvin kernel: [<c016d455>] clear_inode+0x65/0x140
| Feb 11 23:46:07 marvin kernel: [<c016d9be>] generic_delete_inode+0xde/0xf0
| Feb 11 23:46:07 marvin kernel: [<c016cb44>] iput+0x44/0x50
| Feb 11 23:46:07 marvin kernel: [<c0163a29>] do_unlinkat+0xf9/0x180
| Feb 11 23:46:07 marvin kernel: [<c0170303>] mntput_no_expire+0x13/0xa0
| Feb 11 23:46:07 marvin kernel: [<c0157a27>] filp_close+0x47/0x80
| Feb 11 23:46:07 marvin kernel: [<c0102f4e>] syscall_call+0x7/0xb
| Feb 11 23:46:07 marvin kernel: =======================
| Feb 11 23:46:07 marvin kernel: xfs_force_shutdown(loop0,0x8) called from line 4261 of file fs/xfs/xfs_bmap.c. Return address = 0xc01f6c00
| Feb 11 23:46:07 marvin kernel: Filesystem "loop0": Corruption of in-memory data detected. Shutting down filesystem: loop0
| Feb 11 23:46:07 marvin kernel: Please umount the filesystem, and rectify the problem(s)
| Feb 11 23:46:08 marvin kernel: Filesystem "loop0": xfs_log_force: error 5 returned.
| Feb 11 23:46:20 marvin kernel: Filesystem "loop0": xfs_log_force: error 5 returned.
| Feb 11 23:46:50 marvin kernel: Filesystem "loop0": xfs_log_force: error 5 returned.
[last line repeats every 30 seconds until volume is unmounted]
Kernel: Linux 2.6.26
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash
Versions of packages linux-source-2.6.26 depends on:
ii binutils 2.18.1~cvs20080103-7 The GNU assembler, linker and bina
ii bzip2 1.0.5-1 high-quality block-sorting file co
Versions of packages linux-source-2.6.26 recommends:
ii gcc 4:4.3.2-2 The GNU C compiler
ii libc6-dev [libc-dev] 2.7-18lenny2 GNU C Library: Development Librari
ii make 3.81-5 The GNU version of the "make" util
Versions of packages linux-source-2.6.26 suggests:
ii kernel-package 11.015 A utility for building Linux kerne
ii libncurses5-dev [ncurses- 5.7+20081213-1 developer's libraries and docs for
pn libqt3-mt-dev <none> (no description available)
-- no debconf information
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
02-14-2010, 06:39 PM
Ben Hutchings
Bug#569598: XFS corruption
On Fri, 2010-02-12 at 14:35 -0500, Philipp Weis wrote:
> Package: linux-source-2.6.26
> Version: 2.6.26-21
> Severity: normal
Since you are using a custom kernel, please send the build config you
used.
> I just experienced an XFS corruption on one of my machines during a
> remote run of rdiff-backup. No indications of I/O errors or hardware
> problems. The XFS volume runs on top of a encrypted loop-aes device,
> on top of lvm, on top of software raid 1. I've been using this setup
> for years without any problems.
[...]
So how old are the disks now?
Ben.
--
Ben Hutchings
It is easier to change the specification to fit the program than vice versa.
02-14-2010, 06:52 PM
Philipp Weis
Bug#569598: XFS corruption
On 2010-02-14 19:39, Ben Hutchings <ben@decadent.org.uk> wrote:
> On Fri, 2010-02-12 at 14:35 -0500, Philipp Weis wrote:
> > Package: linux-source-2.6.26
> > Version: 2.6.26-21
> > Severity: normal
>
> Since you are using a custom kernel, please send the build config you
> used.
Attached.
> > I just experienced an XFS corruption on one of my machines during a
> > remote run of rdiff-backup. No indications of I/O errors or hardware
> > problems. The XFS volume runs on top of a encrypted loop-aes device,
> > on top of lvm, on top of software raid 1. I've been using this setup
> > for years without any problems.
> [...]
>
> So how old are the disks now?
4 years, 2 years and 2 months on a 3-way RAID-1. I keep replacing them
as they break. No I/O errors or mdadm messages in the syslog, so I
don't think that's an issue here.
Thanks!
Philipp
02-16-2010, 12:54 AM
Ben Hutchings
Bug#569598: XFS corruption
On Sun, 2010-02-14 at 14:52 -0500, Philipp Weis wrote:
> On 2010-02-14 19:39, Ben Hutchings <ben@decadent.org.uk> wrote:
> > On Fri, 2010-02-12 at 14:35 -0500, Philipp Weis wrote:
> > > Package: linux-source-2.6.26
> > > Version: 2.6.26-21
> > > Severity: normal
> >
> > Since you are using a custom kernel, please send the build config you
> > used.
>
> Attached.
I notice you set CONFIG_PREEMPT_VOLUNTARY. The kernel image packages of
2.6.26 all use CONFIG_PREEMPT_NONE due to concern about the stability of
preemption. (This has been changed for more recent versions.) So it
may be worth changing this option.
When did you upgrade to kernel version 2.6.26? Which version were you
using before?
Ben.
--
Ben Hutchings
Humour is the best antidote to reality.
02-16-2010, 01:12 AM
Philipp Weis
Bug#569598: XFS corruption
On 2010-02-16 01:54, Ben Hutchings <ben@decadent.org.uk> wrote:
> I notice you set CONFIG_PREEMPT_VOLUNTARY. The kernel image packages of
> 2.6.26 all use CONFIG_PREEMPT_NONE due to concern about the stability of
> preemption. (This has been changed for more recent versions.) So it
> may be worth changing this option.
Ok, thanks for the hint, I'll disable preemption for now.
> When did you upgrade to kernel version 2.6.26? Which version were you
> using before?
I've been using 2.6.18 before, and switched to 2.6.26 between August
and December of 2009. Sorry I can't be more specific.