Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   EXT3 Users (http://www.linux-archive.org/ext3-users/)
-   -   kjournald blocked in D state (http://www.linux-archive.org/ext3-users/387220-kjournald-blocked-d-state.html)

Mike Miller 06-17-2010 04:08 PM

kjournald blocked in D state
 
I have a system on which kjournald becomes blocked in D state quite often.
Looking at a core file we have 5 mounted ext3 filesystems:

crash> mount
VFSMOUNT SUPERBLK TYPE DEVNAME DIRNAME
10037e07b00 10037e4ec00 rootfs rootfs /
10037e07ec0 10037e4e400 proc /proc /proc
10037e07d40 102188abc00 tmpfs none /dev
10037e07e00 102188b2400 ext3 /dev/root /
10037e07200 102188abc00 tmpfs none /dev
10037e07140 10037e4e400 proc /proc /proc
1021652bc00 102188b1c00 usbfs /proc/bus/usb /proc/bus/usb
1021652bf00 10037e4c400 sysfs /sys /sys
1021652bb40 10006967400 devpts devpts /dev/pts
1021652b180 100dfeda400 ext3 /dev/cciss/c0d0p1 /boot
1021652b240 100dfecb800 ext3 /dev/sys/home /home
1021652b300 100dfecbc00 ext3 /dev/sys/tmp /tmp
1021652b3c0 100dfeda800 ext3 /dev/sys/var /var
1021652b480 100dfedac00 tmpfs tmpfs /dev/shm
1021652bcc0 100dfecb400 binfmt_misc none /proc/sys/fs/binfmt_misc

So we have 5 corresponding journal threads:

crash> ps | grep kjournald
626 1 2 10218109030 IN 0.0 0 0 [kjournald]
3015 1 0 102168f2030 IN 0.0 0 0 [kjournald]
3016 1 1 102168f27f0 UN 0.0 0 0 [kjournald]
3017 1 1 1021837b030 IN 0.0 0 0 [kjournald]
3018 1 7 10216fd0030 UN 0.0 0 0 [kjournald]

2 are in the UNITERRUPTIBLE state. But only PID 3018 shows __wait_on_buffer
in its stack:

crash> bt -f 3018
PID: 3018 TASK: 10216fd0030 CPU: 7 COMMAND: "kjournald"
-----snip-----
#2 [10215a83b30] __wait_on_buffer at ffffffff8017d504
10215a83b38: 000001005fa12ce8 0000000000000000
10215a83b48: 0000010216fd0030 ffffffff8017d38a
10215a83b58: 0000010215a83b88 0000010215a83b88
10215a83b68: 000001005fa12ce8 0000000000000000
10215a83b78: 0000010216fd0030 ffffffff8017d38a
10215a83b88: ffffffff804ac808 ffffffff804ac808
10215a83b98: 000001005fa12ce8 0000000000000001
10215a83ba8: 000001004f4e90e0 ffffffffa0080ffe
-----snip-----

I'm not a crash expert so I then looked the last address pushed onto its
stack and traced down to the inode semaphore:

crash> struct file.f_dentry 000001005fa12ce8
f_dentry = 0x1021f4e5510,
crash> struct dentry.d_inode 0x1021f4e5510
d_inode = 0x100c95c17c0,
crash> struct inode.i_sem 0x100c95c17c0
i_sem = {
count = {
counter = -916711312 <-------------------- This looks wrong
},
sleepers = 256,
wait = {
lock = {
lock = 497690456,
magic = 258
},
task_list = {
next = 0x100000000000, <--------------- This also looks wrong
prev = 0x30f75c3
}
}
},

At this point I'm not sure how to continue or even if I went down the right
path. From this info can anyone tell what's wrong? Or did I not go down the
patch to reach this conclusion.

-- mikem
In this case /home is a heavily accessed filesystem.

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Christian Kujau 06-20-2010 07:44 AM

kjournald blocked in D state
 
On Thu, 17 Jun 2010 at 11:08, Mike Miller wrote:
> I have a system on which kjournald becomes blocked in D state quite often.

Did this happen "just now", or after a kernel upgrade? Which kernel are
you using? Do other systems (with the same kernel?) show similar
behaviour?

Christian.
--
BOFH excuse #414:

tachyon emissions overloading the system

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Mike Miller 06-21-2010 04:46 PM

kjournald blocked in D state
 
On Sun, Jun 20, 2010 at 12:44:37AM -0700, Christian Kujau wrote:
> On Thu, 17 Jun 2010 at 11:08, Mike Miller wrote:
> > I have a system on which kjournald becomes blocked in D state quite often.
>
> Did this happen "just now", or after a kernel upgrade? Which kernel are
> you using? Do other systems (with the same kernel?) show similar
> behaviour?

The kernel is a 2.6.9 variant. According to the user 2.6.9-89 exhibits the
problem. Kernel 2.6.9-78 does not appear to exhibit the problem.

Aside from that I've seen the the symptoms written against 2.6.18 and 2.6.32
kernels. It's not easy to reproduce. The customer is using clusters of 50+
nodes all using internal storage. AFAIK, they are not sharing filesystems
between nodes.

The driver differences between the 2 kernels are minimal with nothing in the
main code path.

-- mikem

>
> Christian.
> --
> BOFH excuse #414:
>
> tachyon emissions overloading the system

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users


All times are GMT. The time now is 01:10 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.