FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > EXT3 Users

 
 
LinkBack Thread Tools
 
Old 06-11-2008, 05:33 AM
Eric Sandeen
 
Default Poor Performance WhenNumber of Files > 1M

John Kalucki wrote:

> Performance seems to always map directly to the number of files in the
> ext3 filesystem.
>
> After some initial run-fast time, perhaps once dirty pages begin to be
> written aggressively, for every 5,000 files added, my files created per
> second tends to drop by about one. So, depending on the variables, say
> with 6 RAID10 spindles, I might start at ~700 files/sec, quickly drop,
> then more slowly drop to ~300 files/sec at perhaps 1 million files, then
> see 299 files/sec for the next 5,000 creations, 298 files/sec, etc. etc.
>
> As you'd expect, there isn't much CPU utilization, other than iowait,
> and some kjournald activity.
>
> Is this a known limitation of ext3? Is expecting to write to
> O(10^6)-O(10^7) files in something approaching constant time expecting
> too much from a filesystem? What, exactly, am I stressing to cause this
> unbounded performance degradation?

I think this is a linear search through the block groups for the new
inode allocation, which always starts at the parent directory's block
group; and starts over from there each time. See find_group_other().

So if the parent's group is full and so are the next 1000 block groups,
it will search 1000 groups and find space in the 1001st. On the next
inode allocation it will re-search(!) those 1000 groups, and again find
space in the 1001st. And so on. Until the 1001st is full, and then
it'll search 1001 groups and find space in the 1002nd... etc (If I'm
remembering/reading correctly, but this does jive with what you see.).

I've toyed with keeping track (in the parent's inode) where the last
successful child allocation happened, and start the search there. I'm a
bit leery of how this might age, though... plus I'm not sure if it
should be on-disk or just in memory.... But this behavior clearly needs
some help. I should probably just get it sent out for comment.

-Eric

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 06-11-2008, 10:04 PM
John Kalucki
 
Default Poor Performance WhenNumber of Files > 1M

Eric Sandeen wrote:

John Kalucki wrote:


Performance seems to always map directly to the number of files in the
ext3 filesystem.


After some initial run-fast time, perhaps once dirty pages begin to be
written aggressively, for every 5,000 files added, my files created per
second tends to drop by about one. So, depending on the variables, say
with 6 RAID10 spindles, I might start at ~700 files/sec, quickly drop,
then more slowly drop to ~300 files/sec at perhaps 1 million files, then
see 299 files/sec for the next 5,000 creations, 298 files/sec, etc. etc.


As you'd expect, there isn't much CPU utilization, other than iowait,
and some kjournald activity.


Is this a known limitation of ext3? Is expecting to write to
O(10^6)-O(10^7) files in something approaching constant time expecting
too much from a filesystem? What, exactly, am I stressing to cause this
unbounded performance degradation?



I think this is a linear search through the block groups for the new
inode allocation, which always starts at the parent directory's block
group; and starts over from there each time. See find_group_other().

So if the parent's group is full and so are the next 1000 block groups,
it will search 1000 groups and find space in the 1001st. On the next
inode allocation it will re-search(!) those 1000 groups, and again find
space in the 1001st. And so on. Until the 1001st is full, and then
it'll search 1001 groups and find space in the 1002nd... etc (If I'm
remembering/reading correctly, but this does jive with what you see.).

I've toyed with keeping track (in the parent's inode) where the last
successful child allocation happened, and start the search there. I'm a
bit leery of how this might age, though... plus I'm not sure if it
should be on-disk or just in memory.... But this behavior clearly needs
some help. I should probably just get it sent out for comment.

-Eric



This is the best explanation I've read so far. There does indeed appear
to be some O(n) behavior that is exacerbated by having many directories
in the working set (not open, just referenced often) and perhaps
moderate fragmentation. I read up on ext3 inode allocation, and the
attempt to place files in the same cylinder group as directories. Trying
to work with this system, I started on a fresh filesystem and flattened
the directory depth to just 4 levels, I've managed to boost performance
greatly, and flatten the degradation curve quite a bit.


I can get to about 2,800,000 files before performance starts to slowly
drop from a nearly constant ~1,700 file/sec. At ~4,000,000 files, I see
about ~1,500 files/sec, and afterwards I start to see the old behavior
of greater decline. By 5,500,000 files, it's down to 1,230 files/sec.
I've used 9% of the space and 8% of the inodes at this point.


Changing journal size and /proc/sys/fs/file-max had no effect. Even
dir_index had only marginal impact, as my directories have only about
300 files each.


I think the biggest factor to making performance nearly linear is the
number of directories in the working set. If this grows too large, the
linear allocation behavior is magnified, and performance drops. My
version of RHEL doesn't seem to allow tweaking of directory cache
behavior, perhaps a deprecated feature from the 2.4 days.


If I discover anything else, I'll be sure to update this thread.
-John








_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 06-11-2008, 10:25 PM
John Kalucki
 
Default Poor Performance WhenNumber of Files > 1M

Ric Wheeler wrote:

Eric Sandeen wrote:

John Kalucki wrote:


Performance seems to always map directly to the number of files in
the ext3 filesystem.


After some initial run-fast time, perhaps once dirty pages begin to
be written aggressively, for every 5,000 files added, my files
created per second tends to drop by about one. So, depending on the
variables, say with 6 RAID10 spindles, I might start at ~700
files/sec, quickly drop, then more slowly drop to ~300 files/sec at
perhaps 1 million files, then see 299 files/sec for the next 5,000
creations, 298 files/sec, etc. etc.


As you'd expect, there isn't much CPU utilization, other than
iowait, and some kjournald activity.


Is this a known limitation of ext3? Is expecting to write to
O(10^6)-O(10^7) files in something approaching constant time
expecting too much from a filesystem? What, exactly, am I stressing
to cause this unbounded performance degradation?



I think this is a linear search through the block groups for the new
inode allocation, which always starts at the parent directory's block
group; and starts over from there each time. See find_group_other().

So if the parent's group is full and so are the next 1000 block groups,
it will search 1000 groups and find space in the 1001st. On the next
inode allocation it will re-search(!) those 1000 groups, and again find
space in the 1001st. And so on. Until the 1001st is full, and then
it'll search 1001 groups and find space in the 1002nd... etc (If I'm
remembering/reading correctly, but this does jive with what you see.).

I've toyed with keeping track (in the parent's inode) where the last
successful child allocation happened, and start the search there. I'm a
bit leery of how this might age, though... plus I'm not sure if it
should be on-disk or just in memory.... But this behavior clearly needs
some help. I should probably just get it sent out for comment.

-Eric


I run a very similar test, but normally run with a synchronous write
work load (i.e., fsync before close). In my testing, you will see a
slow but gradual decline in the files/sec. For example, on a 1TB S-ATA
drive, the latest test run started off at a rate of 22 files/sec (each
file is 40k) and is currently chugging along at a bit over 17
files/sec when it has hit 2.8 million files in one directory. I am
using the ext3 run to get a baseline for a similar run of xfs and btrfs.


One other random tuning thought - you can help by writing into
separate directories, but you will need to make sure that you don't
produce a random write pattern when you select your target
subdirectory. I think that the use case mentioned using a hashed
directory structure which is fine, but you want to hash in a way that
writes into a shared subdirectory for some period of time (say get a
rotation of every X files or Y seconds). Easiest way to do this is to
use a GUID with a time stamp and hash on the time stamp bits.


Note that there is a multi-threaded performance bug in ext3 (Josef
Bacik had looked at fixing this) which throttles writes/sec down to
around 230 when you do synchronous transactions so you might be
hitting that as well.


ric


Unfortunately, I don't have the opportunity to limit the directories. My
application is taking random-ish data and organizing it into logical
groups for subsequent quick reading. But I did take your suggestion into
account and it contains what seems to be the important nugget -- too
many active directories makes a bad situation worse.


But still, my test reaches a steady state of active directories pretty
quickly -- or so I'd like to think. The performance does indeed continue
to creep downwards.


I'm doing everything single-threaded. Introducing a second thread seems
to be an immediate disaster, even though I'm stripped across 3 disks.
Unfortunate. Perhaps moving the journal to another filesystem would
allow better multi-threaded throughput, but I'm not sure that this is
important to me.


xfs, zfs, btrfs, and reiser could be attractive for my use-case.

Thanks for your response,
John







_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 

Thread Tools




All times are GMT. The time now is 02:34 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org