On Tue, Feb 23, 2010 at 09:42:40PM +0100, Mikael Abrahamsson wrote:
> On Wed, 24 Feb 2010, Dave Chinner wrote:
> Thought I should send the dump only to you, would you like me to post to
> the list as well and include a link to the logfile (think attachments
> shouldn't be done too much to lkml, right?). Feel free to post any
> information from this letter or attachment back onto lkml if you feel
> it's appropriate.
Attachments under roughly 100kb (IIRC) are fine for LKML. This is
allowed specifically to let people attach log files and configs.
I've re-added the lkml CC to continue the discussion there. I also
added the DM mailing list, so that they know directly that barriers
are causing _significant_ system slowdowns. This is important,
because there have been several reports of this problem since the
start of the year to XFS forums as people are upgrading kernels.
>> Barriers are only recently supported across DM and MD, so it would be
>> worth checking you logs for the last mount of the filesystems to
> You're right, it doesn't say that anymore in 2.6.31, so I think I'm
> indeed running with barriers on.
And the stack traces confirm that. Every single blocked process
output set has this trace in it:
That is, it appears that DM is blocking waiting for a barrier to
complete. Everything else is backed up waiting for IO completion to
>>> Currently it's "lazy-count=0", so I'll change that setting tonight.
> I didn't do this before the test I'm referring to now.
>> When IO is really slow so we get a better idea of where things are
>> blocking. Running a few of these 30s apart will give a fair indication
>> of what is blocked and what is making progress....
> Attached to this email, logfile from yesterday and today.
> Some interesting parts as well, that didn't trigger from the above sysrq
> Feb 22 19:56:16 ub kernel: [201245.583915] INFO: task md0_raid5:425 blocked for more than 120 seconds.
> Feb 22 22:36:16 ub kernel: [210846.031875] INFO: task xfssyncd:3167 blocked for more than 120 seconds.
> Feb 23 18:10:18 ub kernel: [281287.492499] INFO: task kdmflush:3082 blocked for more than 120 seconds.
> Feb 23 18:12:18 ub kernel: [281407.491041] INFO: task kdmflush:3082 blocked for more than 120 seconds.
> Feb 23 21:36:18 ub kernel: [293647.665917] INFO: task md0_raid5:425 blocked for more than 120 seconds.
> Didn't really think md0_raid5 could be blocked like that...