I am seeing similar problems to Sean McCauliff (2007-08-02) using ext3.
I have a simple test that times file creations in a hashed directory
structure. File creation time inexorably increases as the number of
files in the filesystem increases. Altering variables can change the
absolute performance, but I always see the steady performance degradation.
All of the following have no material effect on the steady drop in
File length (1k, 4k, 16k)
Directory depth (5, 10, 15)
Average & Max files per directory (10, 20, 100)
Single or multi-threaded test
Moving test directory to a new name on same filesystem, restarting test.
RAID10 vs. simple disk
Linux version (RHE, Ubuntu)
System memory (32gig, 2gig)
Syncing after each close
Partition Age (old, perhaps fragmented, a bit dirty, new fs)
Performance seems to always map directly to the number of files in the
After some initial run-fast time, perhaps once dirty pages begin to be
written aggressively, for every 5,000 files added, my files created per
second tends to drop by about one. So, depending on the variables, say
with 6 RAID10 spindles, I might start at ~700 files/sec, quickly drop,
then more slowly drop to ~300 files/sec at perhaps 1 million files, then
see 299 files/sec for the next 5,000 creations, 298 files/sec, etc. etc.
As you'd expect, there isn't much CPU utilization, other than iowait,
and some kjournald activity.
Is this a known limitation of ext3? Is expecting to write to
O(10^6)-O(10^7) files in something approaching constant time expecting
too much from a filesystem? What, exactly, am I stressing to cause this
unbounded performance degradation?
I plan on having about 100M files totaling about 8.5TiBytes. To see
how ext3 would perform with large numbers of files I've written a test
program which creates a configurable number of files into a
number of directories, reads from those files, lists them and then
deletes them. Even up to 1M files ext3 seems to perform well and scale
linearly; the time to execute the program on 1M files is about double
the time it takes it to execute on .5M files. But past 1M files it
seems to have n^2 scalability. Test details appear below.
Looking at the various options for ext3 nothing jumps out as the
one to use to improve performance.
Ext3-users mailing list