FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 03-10-2008, 12:46 AM
Christian Kujau
 
Default INFO: task mount:11202 blocked for more than 120 seconds

On Mon, 10 Mar 2008, David Chinner wrote:

Oh, dm-crypt. Well, I'd definitely start looking there. XFS has a
history of exposing dm-crypt bugs, and these hangs appear to be I/O
congestion/scheduling related and not XFS.


Yeah, I noticed that too, thanks for verifying this: during the 2nd bisect
run, the box locked up hard when I accessed the device-mapper. I'm using a
wrapper script to set up my luks/dm-crypt devices and still have to find
out which command exactly triggers the lockup. So, maybe the hard lockup
and the hangs are not so unrelated after all...oh well.



Also, we haven't changed
anything related to plug/unplug of block devices in XFS recently, so
that also points to some other change as well...


Thanks for your assistance, David, I really appreciate it. I'll try to
find out more about this dm-crypt thingy....


Christian.
--
BOFH excuse #396:

Mail server hit by UniSpammer.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-12-2008, 05:03 PM
Samuel Tardieu
 
Default INFO: task mount:11202 blocked for more than 120 seconds

>>>>> "David" == David Chinner <dgc@sgi.com> writes:

>> No, no I/O errors at all. See the kern.log above, I could even do
>> dd(1) from the md1 (dm-crypt on raid1), no errors either.

David> Oh, dm-crypt. Well, I'd definitely start looking there. XFS has
David> a history of exposing dm-crypt bugs, and these hangs appear to
David> be I/O congestion/scheduling related and not XFS. Also, we
David> haven't changed anything related to plug/unplug of block
David> devices in XFS recently, so that also points to some other
David> change as well...

For what it is worth, I notice the same error last week (with a kernel
of the day) on my laptop but was too busy at work to investigate. And
I use ext3... on dm-crypt.

Sam
--
Samuel Tardieu -- sam@rfc1149.net -- http://www.rfc1149.net/

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-12-2008, 06:53 PM
Chr
 
Default INFO: task mount:11202 blocked for more than 120 seconds

On Wednesday 12 March 2008 19:03:51 Samuel Tardieu wrote:
>
> David> Oh, dm-crypt. Well, I'd definitely start looking there. XFS has
> David> a history of exposing dm-crypt bugs, and these hangs appear to
> David> be I/O congestion/scheduling related and not XFS. Also, we
> David> haven't changed anything related to plug/unplug of block
> David> devices in XFS recently, so that also points to some other
> David> change as well...
>
me too...

http://lkml.org/lkml/2008/3/11/377
(BTW: I'm still bisecting it... can somebody please confirm that 2.6.24-git17
http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.24-git17.bz2
was the _last_ "good" one?)

BTW:
by pressing "SYSRQ-S (Emerg. Sync)" over and over again can unlock
some but not all "D" tasks...

Regards,
Chr.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-12-2008, 10:07 PM
Christian Kujau
 
Default INFO: task mount:11202 blocked for more than 120 seconds

On Wed, 12 Mar 2008, Chr wrote:

http://lkml.org/lkml/2008/3/11/377
(BTW: I'm still bisecting it... can somebody please confirm that 2.6.24-git17
http://www.kernel.org/pub/linux/kernel/v2.6/snapshots/patch-2.6.24-git17.bz2
was the _last_ "good" one?)


Hm, -git17 locked up hard when trying to access the device mapper
(just like -rc1), not even sysrq was usable any more. Sadly,
nothing was logged to the disks/network. Will try earlier snapshots
then. And yes, this resyncing of the md devices after an unclean shutdown
is a real pita right now...


Thanks for reporting!
Christian.
--
BOFH excuse #197:

I'm sorry a pentium won't do, you need an SGI to connect with us.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-13-2008, 12:45 PM
Christian Kujau
 
Default INFO: task mount:11202 blocked for more than 120 seconds

On Thu, 13 Mar 2008, Christian Kujau wrote:

Hm, -git17 locked up hard when trying to access the device mapper


and so does -git8 and -git9...sigh

C.
--
BOFH excuse #412:

Radial Telemetry Infiltration

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-13-2008, 08:33 PM
Chr
 
Default INFO: task mount:11202 blocked for more than 120 seconds

On Thursday 13 March 2008 14:45:39 Christian Kujau wrote:
> and so does -git8 and -git9...sigh

hmm, I can boot any of these -rcX-gitY kernels (but not without some stuck
tasks after about a hour or so...)

but back: I tried lockdep and ran into this: (debug blocker)

pktcdvd: writer pktcdvd0 mapped to sr0

=============================================
[ INFO: possible recursive locking detected ]
2.6.25-rc5-git3 #2
---------------------------------------------
scsi_id/8262 is trying to acquire lock:
(&bdev->bd_mutex){--..}, at: [<ffffffff802a8e51>] do_open+0x7b/0x2c0

but task is already holding lock:
(&bdev->bd_mutex){--..}, at: [<ffffffff802a8e51>] do_open+0x7b/0x2c0

other info that might help us debug this:
2 locks held by scsi_id/8262:
#0: (&bdev->bd_mutex){--..}, at: [<ffffffff802a8e51>] do_open+0x7b/0x2c0
#1: (&ctl_mutex#2){--..}, at: [<ffffffff88231af2>] pkt_open+0x26/0x516
[pktcdvd]

stack backtrace:
Pid: 8262, comm: scsi_id Not tainted 2.6.25-rc5-git3 #2

Call Trace:
[<ffffffff8024ffe4>] __lock_acquire+0x8c9/0xc72
[<ffffffff804bb23e>] ? __mutex_unlock_slowpath+0xf9/0x105
[<ffffffff8025079f>] lock_acquire+0x5e/0x77
[<ffffffff802a8e51>] ? do_open+0x7b/0x2c0
[<ffffffff804bb336>] mutex_lock_nested+0xe1/0x270
[<ffffffff802a8e51>] ? do_open+0x7b/0x2c0
[<ffffffff802a8e51>] do_open+0x7b/0x2c0
[<ffffffff802a9116>] __blkdev_get+0x80/0x92
[<ffffffff802a9133>] blkdev_get+0xb/0xd
[<ffffffff88231b6b>] ktcdvdkt_open+0x9f/0x516
[<ffffffff802a8e84>] do_open+0xae/0x2c0
[<ffffffff802a92a4>] ? blkdev_open+0x0/0x6a
[<ffffffff804bcbab>] ? _spin_unlock+0x26/0x2a
[<ffffffff802a92a4>] ? blkdev_open+0x0/0x6a
[<ffffffff802a92db>] blkdev_open+0x37/0x6a
[<ffffffff80282571>] __dentry_open+0xe6/0x1c0
[<ffffffff802826e4>] nameidata_to_filp+0x2e/0x40
[<ffffffff8028272f>] do_filp_open+0x39/0x4b
[<ffffffff804bcbab>] ? _spin_unlock+0x26/0x2a
[<ffffffff8028246f>] ? get_unused_fd_flags+0x10d/0x11c
[<ffffffff80282792>] do_sys_open+0x51/0xd9
[<ffffffff80282843>] sys_open+0x1b/0x1d
[<ffffffff8020b22b>] system_call_after_swapgs+0x7b/0x80

soooo.....?

scsi_id/8262 is trying to acquire lock:
(&bdev->bd_mutex){--..}, at: [<ffffffff802a8e51>] do_open+0x7b/0x2c0

but task is already holding lock:
(&bdev->bd_mutex){--..}, at: [<ffffffff802a8e51>] do_open+0x7b/0x2c0

sounds a bit strange? Is this a SMP related problem after all?
(*disabled pktcddvd for now... to see if still locks)

Regards,
Chr...

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-13-2008, 08:54 PM
Christian Kujau
 
Default INFO: task mount:11202 blocked for more than 120 seconds

On Thu, 13 Mar 2008, Chr wrote:

On Thursday 13 March 2008 14:45:39 Christian Kujau wrote:

and so does -git8 and -git9...sigh


hmm, I can boot any of these -rcX-gitY kernels (but not without some stuck
tasks after about a hour or so...)


Ah, when you said -git17 would be the last "good" kernel, I thought
everything earlier would not have the stuck tasks.



but back: I tried lockdep and ran into this: (debug blocker)


I have lockdep enabled as well, but no warnings so far...


sounds a bit strange? Is this a SMP related problem after all?


I don't think so, as I don't have an SMP system (and CONFIG_SMP is
disabled as well).


C.
--
BOFH excuse #441:

Hash table has woodworm

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-13-2008, 11:15 PM
Chr
 
Default INFO: task mount:11202 blocked for more than 120 seconds

On Thursday 13 March 2008 22:54:25 Christian Kujau wrote:
> On Thu, 13 Mar 2008, Chr wrote:
> > On Thursday 13 March 2008 14:45:39 Christian Kujau wrote:
> >> and so does -git8 and -git9...sigh
> >
> > hmm, I can boot any of these -rcX-gitY kernels (but not without some
> > stuck tasks after about a hour or so...)
>
> Ah, when you said -git17 would be the last "good" kernel, I thought
> everything earlier would not have the stuck tasks.
well, no... even 2.6.24.3 has _hangs_, however not forever (approx. 5-10 min)
and everything is back to normal...

> > sounds a bit strange? Is this a SMP related problem after all?
>
> I don't think so, as I don't have an SMP system (and CONFIG_SMP is
> disabled as well).
Then maybe scheduler? because these values looks suspecious:
(that's why I added Ingo to CC, because he probably knows what's NOT going on
;-) )

(full dmesg there:
http://www.pastebin.ca/941845

config here:
http://www.pastebin.ca/941856
)

the funny stuff is highlighted with a >:
cpu#1, 2211.332 MHz
.nr_running : 8
.load : 188884
.nr_switches : 9615826
.nr_load_updates : 1879402
> .nr_uninterruptible : -2053
.jiffies : 4296946718
.next_balance : 4296.946740
.curr->pid : 0
.clock : 9800558.756490
.idle_clock : 6521432.794621
.prev_clock_raw : 4476359.432375
.clock_warps : 0
.clock_overflows : 8676899
.clock_underflows : 688045
.clock_deep_idle_events : 0
.clock_max_delta : 4.000250
.cpu_load[0] : 0
.cpu_load[1] : 24
.cpu_load[2] : 112
.cpu_load[3] : 155
.cpu_load[4] : 177

cfs_rq
.exec_clock : 1353227.466775
.MIN_vruntime : 1808656.942015
.min_vruntime : 1808696.942015
.max_vruntime : 1808657.235424
.spread : 0.293409
> .spread0 : -267982.528057
.nr_running : 7
.load : 11362
.bkl_count : 60751
.nr_spread_over : 6234

runnable tasks:
task PID tree-key switches prio exec-runtime
sum-exec sum-sleep
----------------------------------------------------------------------------------------------------------
kcryptd 2266 1808656.942015 12778 115 1808656.942015
3689.257692 3125323.031084
hald-addon-inpu 5504 1808656.942015 622 120 1808656.942015
10.739211 3044831.003941
kcryptd 7312 1808656.960820 578891 115 1808656.960820
71166.387493 2625492.750280
Xorg 9564 1808656.942016 513338 120 1808656.942016
245777.204573 2587613.331559
pdflush 10059 1808656.942015 14575 120 1808656.942015
4400.355110 2985523.630368
pdflush 10696 1808656.942015 9433 120 1808656.942015
3607.302084 3103638.660388
xine 16773 1808657.235424 13455 120 1808657.235424
2135.923058 574348.774173
> watchdog 16959 -206866.028086 0 98 -206866.028086
0.000000 0.000000

(it's at the end of the dmesg, the kernel, X.org and probably everything else
was stuck...)

Thanks,
Christian.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-14-2008, 10:58 PM
Christian Kujau
 
Default INFO: task mount:11202 blocked for more than 120 seconds

On Fri, 14 Mar 2008, Milan Broz wrote:

Yes, there is bug in dm-crypt...
Please try if the patch here helps: http://lkml.org/lkml/2008/3/14/71


Hm, it seems to help the hangs, yes. Applied to today's -git a few hours
ago, the hangs are gone. However, when doing lots of disk I/O, the machine
locks up after a few (10-20) minutes. Sadly, netconsole got nothing


After the first lockup I tried again and shortly after bootup I got:

[ 866.681441] [ INFO: possible circular locking dependency detected ]
[ 866.681876] 2.6.25-rc5 #1
[ 866.682203] -------------------------------------------------------
[ 866.682637] kswapd0/132 is trying to acquire lock:
[ 866.683028] (&(&ip->i_iolock)->mr_lock){----}, at: [<c027a686>] xfs_ilock+0x96/0xb0
[ 866.683916]
[ 866.683917] but task is already holding lock:

[ 866.684582] (iprune_mutex){--..}, at: [<c017b592>] shrink_icache_memory+0x72/0x220
[ 866.685461]
[ 866.685462] which lock already depends on the new lock.
[ 866.685463]
[ 866.686440]
[ 866.686441] the existing dependency chain (in reverse order) is:
[ 866.687151]
[ 866.687152] -> #1 (iprune_mutex){--..}:

[ 866.687339] [<c0136914>] add_lock_to_list+0x44/0xc0
[ 866.687339] [<c01393a6>] __lock_acquire+0xc26/0x10b0
[ 866.687339] [<c017b592>] shrink_icache_memory+0x72/0x220
[ 866.687339] [<c013890f>] __lock_acquire+0x18f/0x10b0
[ 866.687339] [<c013988e>] lock_acquire+0x5e/0x80
[ 866.687339] [<c017b592>] shrink_icache_memory+0x72/0x220
[ 866.687339] [<c043fc79>] mutex_lock_nested+0x89/0x240
[ 866.687339] [<c017b592>] shrink_icache_memory+0x72/0x220
[ 866.687339] [<c017b592>] shrink_icache_memory+0x72/0x220
[ 866.687339] [<c017b592>] shrink_icache_memory+0x72/0x220
[ 866.687339] [<c0150051>] shrink_slab+0x21/0x160
[ 866.687340] [<c0150131>] shrink_slab+0x101/0x160
[ 866.687340] [<c01502e2>] try_to_free_pages+0x152/0x230
[ 866.687340] [<c014f060>] isolate_pages_global+0x0/0x60
[ 866.687340] [<c014b95b>] __alloc_pages+0x14b/0x370
[ 866.687340] [<c04413a0>] _read_unlock_irq+0x20/0x30
[ 866.687340] [<c0146601>] __grab_cache_page+0x81/0xc0
[ 866.687340] [<c01896f6>] block_write_begin+0x76/0xe0
[ 866.687340] [<c029ec76>] xfs_vm_write_begin+0x46/0x50
[ 866.687340] [<c029f4c0>] xfs_get_blocks+0x0/0x30
[ 866.687340] [<c0147297>] generic_file_buffered_write+0x117/0x650
[ 866.687340] [<c027a65d>] xfs_ilock+0x6d/0xb0
[ 866.687340] [<c02a73cc>] xfs_write+0x7ac/0x8a0
[ 866.687340] [<c0174ac1>] core_sys_select+0x21/0x350
[ 866.687340] [<c02a32bc>] xfs_file_aio_write+0x5c/0x70
[ 866.687340] [<c0167bf5>] do_sync_write+0xd5/0x120
[ 866.687340] [<c012c630>] autoremove_wake_function+0x0/0x40
[ 866.687340] [<c019cfd5>] dnotify_parent+0x35/0x90
[ 866.687340] [<c0167b20>] do_sync_write+0x0/0x120
[ 866.687340] [<c016846f>] vfs_write+0x9f/0x140
[ 866.687340] [<c0168a21>] sys_write+0x41/0x70
[ 866.687340] [<c0102dee>] sysenter_past_esp+0x5f/0xa5
[ 866.687340] [<ffffffff>] 0xffffffff
[ 866.687340]
[ 866.687340] -> #0 (&(&ip->i_iolock)->mr_lock){----}:

[ 866.687340] [<c0136ba0>] print_circular_bug_entry+0x40/0x50

The box was running fine then for ~20 minutes, then it locked up again.

Full dmesg and .config: http://nerdbynature.de/bits/2.6.25-rc5/

Right now I'm back to 2.6.24.3...

Thanks,
Christian.
--
BOFH excuse #350:

paradigm shift...without a clutch

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-14-2008, 11:08 PM
Chr
 
Default INFO: task mount:11202 blocked for more than 120 seconds

On Friday 14 March 2008 10:27:07 Milan Broz wrote:
>
> > Oh, dm-crypt. Well, I'd definitely start looking there. XFS has a
> > history of exposing dm-crypt bugs, and these hangs appear to be I/O
> > congestion/scheduling related and not XFS. Also, we haven't changed
> > anything related to plug/unplug of block devices in XFS recently, so
> > that also points to some other change as well...
>
> Yes, there is bug in dm-crypt...
> Please try if the patch here helps: http://lkml.org/lkml/2008/3/14/71
>
hmm, :-/

stuck again... (tried the patch on top of 2.6.25-rc5 and 2.6.25-rc5-git4)

SYS-RQ (several times) T+W+Q
right here:
http://www.pastebin.ca/943145

Regards,
Christian

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 07:52 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org