FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > EXT3 Users

 
 
LinkBack Thread Tools
 
Old 02-05-2009, 11:58 AM
Ralf Hildebrandt
 
Default Questions regarding journal replay

Today, I had to uncleanly shutdown one of our machines due to an error
in 2.6.28.3. Durin the boot sequence, the ext4 partition /home
experienced a journal replay. /home looks like this:

/dev/mapper/volg1-logv1 on /home type ext4 (rw,noexec,nodev,noatime,errors=remount-ro)

Filesystem Size Used Avail Use% Mounted on
/dev/mapper/volg1-logv1 2,4T 1,4T 1022G 58% /home

Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/volg1-logv1 19519488 8793310 10726178 46% /home

The journal replay too quite a while. About 800 seconds.

# dumpe2fs -h /dev/mapper/volg1-logv1
dumpe2fs 1.41.3 (12-Oct-2008)
Filesystem volume name: <none>
Last mounted on: <not available>
Filesystem UUID: 032613d3-6035-4872-bc0a-11db92feec5e
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal resize_inode dir_index filetype needs_recovery extent sparse_super large_file uninit_bg
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 19519488
Block count: 624605184
Reserved block count: 0
Free blocks: 267655114
Free inodes: 10726118
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 875
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 1024
Inode blocks per group: 32
Filesystem created: Tue May 8 21:04:31 2007
Last mount time: Thu Feb 5 11:08:27 2009
Last write time: Thu Feb 5 11:08:27 2009
Mount count: 12
Maximum mount count: -1
Last checked: Sat Dec 27 23:16:47 2008
Check interval: 0 (<none>)
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
First orphan inode: 17529831
Default directory hash: tea
Directory Hash Seed: 44337061-e542-44bb-afb9-40597ccf1c6d
Journal backup: inode blocks
Journal size: 128M

Questions:
==========

* Why does it take so long?
* What happens during that time?
* Is my journal maybe too big?

--
Ralf Hildebrandt Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin Tel. +49 (0)30-450 570-155
Geschäftsbereich IT | Abt. Netzwerk Fax. +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 02-06-2009, 01:26 PM
Curtis Doty
 
Default Questions regarding journal replay

Yesterday Ralf Hildebrandt said:


The journal replay too quite a while. About 800 seconds.



Were there any other background iops on the underlying volume devices?
Like maybe raid reconstruction?


../C

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 02-06-2009, 01:28 PM
Ralf Hildebrandt
 
Default Questions regarding journal replay

* Curtis Doty <Curtis@GreenKey.net>:
> Yesterday Ralf Hildebrandt said:
>
>> The journal replay too quite a while. About 800 seconds.
>>
>
> Were there any other background iops on the underlying volume
> devices? Like maybe raid reconstruction?

I don't think so. The machine never powered off...

--
Ralf Hildebrandt Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin Tel. +49 (0)30-450 570-155
Geschäftsbereich IT | Abt. Netzwerk Fax. +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 02-25-2009, 03:24 PM
Ralf Hildebrandt
 
Default Questions regarding journal replay

* Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>:
> * Curtis Doty <Curtis@GreenKey.net>:
> > Yesterday Ralf Hildebrandt said:
> >
> >> The journal replay too quite a while. About 800 seconds.
> >>
> >
> > Were there any other background iops on the underlying volume
> > devices? Like maybe raid reconstruction?
>
> I don't think so. The machine never powered off...

Again, 2.6.28.7 failed us and now we're encountering another journal
replay. Taking ages. This sucks.

Questions:

How can I find out (during normal operation) HOW MUCH of the
journal is actually in use?

How can I resize the journal to be smaller, thus making a journal
replay faster?

--
Ralf Hildebrandt Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin Tel. +49 (0)30-450 570-155
Geschäftsbereich IT | Abt. Netzwerk Fax. +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 02-25-2009, 03:31 PM
Eric Sandeen
 
Default Questions regarding journal replay

Ralf Hildebrandt wrote:
> * Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>:
>> * Curtis Doty <Curtis@GreenKey.net>:
>>> Yesterday Ralf Hildebrandt said:
>>>
>>>> The journal replay too quite a while. About 800 seconds.
>>>>
>>> Were there any other background iops on the underlying volume
>>> devices? Like maybe raid reconstruction?
>> I don't think so. The machine never powered off...
>
> Again, 2.6.28.7 failed us and now we're encountering another journal
> replay. Taking ages. This sucks.
>
> Questions:
>
> How can I find out (during normal operation) HOW MUCH of the
> journal is actually in use?
>
> How can I resize the journal to be smaller, thus making a journal
> replay faster?
>

It'd be better to get to the bottom of the problem ... maybe iostat
while it's happening to see if IO is actually happening; run blktrace to
see where IO is going, do a few sysrq-t's to see where threads are at, etc.

Can you find a way to reproduce this at will?

Journal replay should *never* take this long, AFAIK.

-Eric

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 02-25-2009, 04:23 PM
Ralf Hildebrandt
 
Default Questions regarding journal replay

* Eric Sandeen <sandeen@redhat.com>:
> Ralf Hildebrandt wrote:
> > * Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>:
> >> * Curtis Doty <Curtis@GreenKey.net>:
> >>> Yesterday Ralf Hildebrandt said:
> >>>
> >>>> The journal replay too quite a while. About 800 seconds.
> >>>>
> >>> Were there any other background iops on the underlying volume
> >>> devices? Like maybe raid reconstruction?
> >> I don't think so. The machine never powered off...
> >
> > Again, 2.6.28.7 failed us and now we're encountering another journal
> > replay. Taking ages. This sucks.
> >
> > Questions:
> >
> > How can I find out (during normal operation) HOW MUCH of the
> > journal is actually in use?
> >
> > How can I resize the journal to be smaller, thus making a journal
> > replay faster?
> >
>
> It'd be better to get to the bottom of the problem ... maybe iostat
> while it's happening to see if IO is actually happening; run blktrace to
> see where IO is going, do a few sysrq-t's to see where threads are at, etc.

We had 24GB of reading from the journal device (or 12GB if it's
512byte blocks). I wonder why?

> Can you find a way to reproduce this at will?

Yes. My users will kill me, though.

> Journal replay should *never* take this long, AFAIK.

Amen

--
Ralf Hildebrandt Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin Tel. +49 (0)30-450 570-155
Geschäftsbereich IT | Abt. Netzwerk Fax. +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 02-25-2009, 04:34 PM
Theodore Tso
 
Default Questions regarding journal replay

On Wed, Feb 25, 2009 at 10:31:42AM -0600, Eric Sandeen wrote:
>
> It'd be better to get to the bottom of the problem ... maybe iostat
> while it's happening to see if IO is actually happening; run blktrace to
> see where IO is going, do a few sysrq-t's to see where threads are at, etc.
>
> Can you find a way to reproduce this at will?
>
> Journal replay should *never* take this long, AFAIK.

Indeed. The journal is 128 megs, as I recall. So even if the journal
was completely full, if it's taking 800 seconds, that's a write rate
of 0.16 Mb/S (164 kb/second). That is indeed way too slow.

I assume this wasn't your boot partition, so the journal replay was
being done by e2fsck, right? Or are you guys skipping e2fsck and the
journal replay was happening when you mounted the partition? If the
journal replay is happening via e2fsck, is fsck running any other
filesystem checks in parallel?

Also, what is the geometry of your raid? How many disks, what RAID
level, and what is the chunk size? The journal replay is done a
filesystem block at a time, so it could be that it's turning into a
large number of read-modify-writes, which is trashing your performance
if the chunk size is really large.

The other thing that might explain the performan problem is if the
somehow the number of multiple outstanding requests allowed by the
hard drive has been clamped down to a very small number, and so a
large number of small read/write requests is really killing
performance. The system dmesg log might have some hidden clues about
that.

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 02-25-2009, 04:36 PM
Eric Sandeen
 
Default Questions regarding journal replay

Ralf Hildebrandt wrote:
> * Eric Sandeen <sandeen@redhat.com>:
>> Ralf Hildebrandt wrote:
>>> * Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>:
>>>> * Curtis Doty <Curtis@GreenKey.net>:
>>>>> Yesterday Ralf Hildebrandt said:
>>>>>
>>>>>> The journal replay too quite a while. About 800 seconds.
>>>>>>
>>>>> Were there any other background iops on the underlying volume
>>>>> devices? Like maybe raid reconstruction?
>>>> I don't think so. The machine never powered off...
>>> Again, 2.6.28.7 failed us and now we're encountering another journal
>>> replay. Taking ages. This sucks.
>>>
>>> Questions:
>>>
>>> How can I find out (during normal operation) HOW MUCH of the
>>> journal is actually in use?
>>>
>>> How can I resize the journal to be smaller, thus making a journal
>>> replay faster?
>>>
>> It'd be better to get to the bottom of the problem ... maybe iostat
>> while it's happening to see if IO is actually happening; run blktrace to
>> see where IO is going, do a few sysrq-t's to see where threads are at, etc.
>
> We had 24GB of reading from the journal device (or 12GB if it's
> 512byte blocks). I wonder why?

24GB of reading from the journal device (during that 800s of replay
during mount?), and your journal is 128M ... well that's odd.

You say journal device; is this an external journal? I didn't think so
from your first email, but is it?

>> Can you find a way to reproduce this at will?
>
> Yes. My users will kill me, though.

No spare box, eh

>> Journal replay should *never* take this long, AFAIK.
>
> Amen
>

so let's figure it out

-Eric

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 02-25-2009, 04:39 PM
Ralf Hildebrandt
 
Default Questions regarding journal replay

* Theodore Tso <tytso@mit.edu>:
> On Wed, Feb 25, 2009 at 10:31:42AM -0600, Eric Sandeen wrote:
> >
> > It'd be better to get to the bottom of the problem ... maybe iostat
> > while it's happening to see if IO is actually happening; run blktrace to
> > see where IO is going, do a few sysrq-t's to see where threads are at, etc.
> >
> > Can you find a way to reproduce this at will?
> >
> > Journal replay should *never* take this long, AFAIK.
>
> Indeed. The journal is 128 megs, as I recall. So even if the journal
> was completely full, if it's taking 800 seconds, that's a write rate
> of 0.16 Mb/S (164 kb/second). That is indeed way too slow.

The problem seems to be with the external journal which I recently
changed to. It's a 32GB partition. My timings seem to indicate that
ALL OF IT was being replayed

> I assume this wasn't your boot partition, so the journal replay was
> being done by e2fsck, right?

Yes

> Or are you guys skipping e2fsck and the journal replay was happening
> when you mounted the partition?

Both. We tried both ways

> If the journal replay is happening via e2fsck, is fsck running any
> other filesystem checks in parallel?

No, it's running alone.

> Also, what is the geometry of your raid? How many disks, what RAID
> level, and what is the chunk size? The journal replay is done a
> filesystem block at a time, so it could be that it's turning into a
> large number of read-modify-writes, which is trashing your performance
> if the chunk size is really large.

The RAID is made up from one logical volume, consisting of two drives
sda and sdb, each containing 6 disks in a hardware RAID5 setup.

> The other thing that might explain the performan problem is if the
> somehow the number of multiple outstanding requests allowed by the
> hard drive has been clamped down to a very small number, and so a
> large number of small read/write requests is really killing
> performance. The system dmesg log might have some hidden clues about
> that.

dmesg is silent

--
Ralf Hildebrandt Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin Tel. +49 (0)30-450 570-155
Geschäftsbereich IT | Abt. Netzwerk Fax. +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 02-25-2009, 04:40 PM
Ralf Hildebrandt
 
Default Questions regarding journal replay

* Eric Sandeen <sandeen@redhat.com>:

> >> It'd be better to get to the bottom of the problem ... maybe iostat
> >> while it's happening to see if IO is actually happening; run blktrace to
> >> see where IO is going, do a few sysrq-t's to see where threads are at, etc.
> >
> > We had 24GB of reading from the journal device (or 12GB if it's
> > 512byte blocks). I wonder why?
>
> 24GB of reading from the journal device (during that 800s of replay
> during mount?), and your journal is 128M ... well that's odd.

After my initial report I removed the journal and created an external
journal on a 32GB partition. Hoping it would be faster, since
accoriding to the docs. the journal size is limited to 128MB.

> You say journal device; is this an external journal? I didn't think so
> from your first email, but is it?

It is now.

# dumpe2fs -h /dev/cciss/c0d0p6
dumpe2fs 1.41.3 (12-Oct-2008)
Filesystem volume name: journal_device
Last mounted on: <not available>
Filesystem UUID: 1a7063f5-8965-40f2-9feb-e37d6ac467e9
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: journal_dev
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 0
Block count: 8488436
Reserved block count: 0
Free blocks: 0
Free inodes: 0
First block: 0
Block size: 4096
Fragment size: 4096
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 0
Inode blocks per group: 0
Filesystem created: Thu Feb 5 14:05:36 2009
Last mount time: n/a
Last write time: Thu Feb 5 14:15:26 2009
Mount count: 0
Maximum mount count: 30
Last checked: Thu Feb 5 14:05:36 2009
Check interval: 15552000 (6 months)
Next check after: Tue Aug 4 15:05:36 2009
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Default directory hash: half_md4
Directory Hash Seed: fddb247a-97df-4582-bfcd-816ef8c17ab2

Journal block size: 4096
Journal length: 8488436
Journal first block: 2
Journal sequence: 0x0027c611
Journal start: 2
Journal number of users: 1
Journal users: 032613d3-6035-4872-bc0a-11db92feec5e

--
Ralf Hildebrandt Ralf.Hildebrandt@charite.de
Charite - Universitätsmedizin Berlin Tel. +49 (0)30-450 570-155
Geschäftsbereich IT | Abt. Netzwerk Fax. +49 (0)30-450 570-962
Hindenburgdamm 30 | 12200 Berlin

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 

Thread Tools




All times are GMT. The time now is 02:54 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org