FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > EXT3 Users

 
 
LinkBack Thread Tools
 
Old 10-20-2008, 02:35 AM
Robert Davidson
 
Default ext3 file system I/O blocks until reboot

Hi all,

We have a server that has a 580GB ext3 file system on it. Until
recently we ran around 15 virtual servers from this file system. It was
fine for at least a few months, then the file system would periodically
become inaccessible, getting more frequent as time went on. Eventually
we wouldn't even get through a 15-hour period without having to reboot
the server.

When the I/O got blocked, all processes accessing files on
/var/lib/vservers (its mount point) would get stuck waiting for I/O to
complete ("D" state) and I couldn't find any way to revive it apart from
rebooting the server. I tried sending various signals (TERM and KILL)
to some kernel threads but that didn't help at all.

The "kjournald" process also got stuck in the "D" state.

The server is running kernel 2.6.22.19 with the Linux-Vserver patch
vs2.2.0.7, DRBD 8.2.6 and the Areca RAID driver updated to
1.20.0X.15-80603 which was the latest available from Areca at the time.
The OS is Debian etch.

As part of troubleshooting the problem I'd taken DRBD out of the mix,
tried updating the RAID driver in the kernel, replaced the RAID card
with another one with slightly later firmware, and also replaced the
power supply with a known-good one at the same time and disabled the
swap space. None of that helped.

What did help was copying the files from the existing file system to a
newly formatted ext3 file system. The newly formatted file system is
only around 320GB, but is also set up the same as the existing one (both
are hardware RAID-6, running on the same host, same controller, same
physical disks, etc).

When the file system would become inaccessible, there were no notices
from the kernel about any issue at all. We have a serial console on
this server and nothing was captured by the serial console when this
happened, nor is there anything in the system logs (which should have
been writable all this time as they are not on the broken file system).

I used 'dd' to check if I could read from the underlying device files
that the file system was on (/dev/sdc1 and /dev/drbd1), there was no
problem doing that. I didn't test writes to these devices though since
I don't know of any safe way to do so, but using the SysRq feature, an
emergency sync would not complete, nor would an emergency umount, so I
assume writes were out of the question. Doing an 'ls' on
/var/lib/vservers just left me with yet another process stuck in the "D"
state.

A forced fsck of the file system (using a fresh build of e2fsprogs
1.41.3 with the matching libraries) provides no hint of any problems.

The root file system is an ext3 file system as well, and there were no
problems reading/writing to that file system while the ext3 file system
on /var/lib/vservers was inaccessible. The filesystem is also on the
same RAID card, physical disks, etc.

One reason I've not moved to a newer kernel yet is because there isn't a
stable linux-vserver patch for anything newer than 2.6.22.19, so I'm
kind of stuck with that kernel until there is. I made a start on
backporting the ext3 code from 2.6.26.5 to 2.6.22.19 but its not
something I trust myself to get right, so I'd rather avoid that approach
unless there is another way of doing that.

So my questions are:

Are there any further diagnostics I can perform on the old file system
to try and track down the problem? If so, what are they?

Is this a known bug/problem with ext3 or something related to it?

Is it likely that one of the 3 or so deadlocks that have been fixed in
kernels since 2.6.22.19 would have cured this problem, or would these
deadlocks have taken down the hole box and not just affected the one
file system?

Or even this bug: http://bugzilla.kernel.org/show_bug.cgi?id=10882 (the
softlockup part, I think not though because I was able to copy
everything off that file system and on to a new one without having any
lockups or any other complaints from the kernel).

Thanks.

--
Regards,
Robert Davidson.
Obsidian Consulting Group.
Ph. 03-9355-7844
E-Mail: support@obsidian.com.au


_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 10-20-2008, 01:34 PM
Bruno Wolff III
 
Default ext3 file system I/O blocks until reboot

On Mon, Oct 20, 2008 at 13:35:54 +1100,
Robert Davidson <rdavidson@obsidian.com.au> wrote:
>
> So my questions are:
>
> Are there any further diagnostics I can perform on the old file system
> to try and track down the problem? If so, what are they?
>
> Is this a known bug/problem with ext3 or something related to it?

I saw stuff like this happening starting with later 2.6.20 kernels that
wasn't fixed until the 2.6.24 kernels. (See bug 235043.) I wasn't using
VM's, so it might not be the same as the bug you are seeing. I do remember
seeing some other similar problems people were having that didn't appear
to be the same bug as I had when I did bugzilla searches. So you might
want to do your own bugzilla search to see what you can find.

I have also been getting disk IO lockups in F10, but in a more limited set
of circumstances. (Memory pressure on an X86_64 system.)

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 10-21-2008, 12:40 AM
Robert Davidson
 
Default ext3 file system I/O blocks until reboot

Bruno Wolff III wrote:
> I saw stuff like this happening starting with later 2.6.20 kernels that
> wasn't fixed until the 2.6.24 kernels. (See bug 235043.) I wasn't using
> VM's, so it might not be the same as the bug you are seeing. I do remember
> seeing some other similar problems people were having that didn't appear
> to be the same bug as I had when I did bugzilla searches. So you might
> want to do your own bugzilla search to see what you can find.
>
> I have also been getting disk IO lockups in F10, but in a more limited set
> of circumstances. (Memory pressure on an X86_64 system.)
>

Hi Bruno,

I've had a look through bugzilla but couldn't find any similar bugs (the
closest I can find is 439548 but I doubt very much that thats it). Your
bug 235043 does sound rather different since it sounds like new
processes would be able to access the file system without a problem,
where as on my system any new attempt to read (writing wasn't tested)
just resulted in one more process stuck in the "D" state.

I might try taking a byte-for-byte copy of the FS and see if I can find
a way to reliably re-produce the problem on a similar server.

--
Regards,
Robert Davidson.
Obsidian Consulting Group.
Ph. 03-9355-7844
E-Mail: support@obsidian.com.au


_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 10-21-2008, 03:37 AM
Bruno Wolff III
 
Default ext3 file system I/O blocks until reboot

On Tue, Oct 21, 2008 at 11:40:06 +1100,
Robert Davidson <rdavidson@obsidian.com.au> wrote:
>
> I've had a look through bugzilla but couldn't find any similar bugs (the
> closest I can find is 439548 but I doubt very much that thats it). Your
> bug 235043 does sound rather different since it sounds like new
> processes would be able to access the file system without a problem,
> where as on my system any new attempt to read (writing wasn't tested)
> just resulted in one more process stuck in the "D" state.

For a while. Eventually everything would lock up.

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 10-25-2008, 11:22 PM
Christian Kujau
 
Default ext3 file system I/O blocks until reboot

Probably too late anyway, but:

On Mon, 20 Oct 2008, Robert Davidson wrote:

The "kjournald" process also got stuck in the "D" state.


Did you try a SysReq-w to show all blocked tasks? OR even -d, or -t. You
mentioned /var/log was on a different filesystem, so this information
might make it to the disks. If not, your serial console should catch
it. Maybe then we'll find out *why* these process are in "D" state.


Christian.
--
BOFH excuse #25:

Decreasing electron flux

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 10-27-2008, 12:10 AM
Robert Davidson
 
Default ext3 file system I/O blocks until reboot

Christian Kujau wrote:
> Probably too late anyway, but:
>
> On Mon, 20 Oct 2008, Robert Davidson wrote:
>> The "kjournald" process also got stuck in the "D" state.
>
> Did you try a SysReq-w to show all blocked tasks? OR even -d, or -t.
> You mentioned /var/log was on a different filesystem, so this
> information might make it to the disks. If not, your serial console
> should catch it. Maybe then we'll find out *why* these process are in
> "D" state.

Hi Christian,

Not too late - this is an ongoing problem still. I'm currently trying
to see if I can get some newer vserver patches so I can build a newer
kernel and try that. Currently I'm stuck with 2.6.22.19

I've tried doing various SysRq requests, none of them would give me
anything back on the serial console, but it seems that may have been my
own fault for having the console logging set too low. I've fixed that
up now.

In any case, the responses you'd expect to see from the kernel for the
various SysRq commands never made it into the logs.

About a month ago when the server last had problems, I made a new ext3
filesystem and copied everything from the old filesystem to the new
one. I thought that worked but then last night we lost the same
filesystem again and had to reboot.

After copying everything off the original filesystem (also ext3) I ran a
forced fsck.ext3 on it and it didn't find any problems.

--
Regards,
Robert Davidson.
Obsidian Consulting Group.
Ph. 03-9355-7844
E-Mail: support@obsidian.com.au


_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 

Thread Tools




All times are GMT. The time now is 09:42 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org