On Mar 17, 2008 09:32 -0400, Theodore Ts'o wrote:
> On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote:
> > Theodore Tso,
> >
> > In 64bit system, directory size can not be bigger than 2GB?
>
> No, because the high 32-bits for i_size are overloaded to store the
> directory creation acl.
I think we should change the code (kernel and e2fsprogs) to allow
i_size_high for directories also.
> In practice, you really don't want to have a directory that huge
> anyway. Iterating through it all with readdir() gets horribly slow,
> and applications that try do anything with really huge directories
> would be well advised to use a database, because they will get *much*
> better performance that way....
Actually, for many HPC applications they never do readdir at all.
The job creates 1 file/process and always uses a predefined filename
like {job}-{timestamp}-{process} that it will directly look up.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
03-19-2008, 05:35 AM
"Stephen Samuel"
The maximum number of files under a folder
The OS will have to search the directory to see if the file already exists before creating it.
Well, if you hash it such that it splits up something like:
jobid(upper part)/jobid(lower- part)[/-]timestamp-process,
*you'll find that your access times will be must faster (especially if
you don't use H-Trees).* This also applies if* you're just creating a
file, because you'll have to search the entire directory to see if that
filename exists
With regular directories, searching through them to see if a file
already exist increases linearly with the number of entries.* If you
hash on 3 levels with 8-bits per level, you'll have to open 2 or 3
extra inodes, but you'll cut your directory search times down by a
factor of 20000-1.* You'll also skip having to deal with any sort of
directory-size limit. (=2^24/256/3)
I did something similar on a Solaris box which had 200000 emails in
the /var/spool/mqueue directory. That many messages was slowing the
system to a crawl.* I hashed it into 100 directories with 2000* entries
each,** it sped things up enormously.
On Tue, Mar 18, 2008 at 3:56 PM, Andreas Dilger <adilger@sun.com> wrote:
On Mar 17, 2008 *09:32 -0400, Theodore Ts'o wrote:
> On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote:
> > Theodore Tso,
> >
> > * * In 64bit system, directory size can not be bigger than 2GB?
>
> No, because the high 32-bits for i_size are overloaded to store the
> directory creation acl.
I think we should change the code (kernel and e2fsprogs) to allow
i_size_high for directories also.
> In practice, you really don't want to have a directory that huge
> anyway. *Iterating through it all with readdir() gets horribly slow,
> and applications that try do anything with really huge directories
> would be well advised to use a database, because they will get *much*
> better performance that way....
Actually, for many HPC applications they never do readdir at all.
The job creates 1 file/process and always uses a predefined filename
like {job}-{timestamp}-{process} that it will directly look up.
Cheers, Andreas
--
Stephen Samuel http://www.bcgreen.com
778-861-7641
_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
03-19-2008, 11:16 AM
John Nelson
The maximum number of files under a folder
What does what does the h stand for in h-tree? Like the b in btree is
binary Tree
Stephen Samuel wrote:
The OS will have to search the directory to see if the file already
exists before creating it.
Well, if you hash it such that it splits up something like:
jobid(upper part)/jobid(lower- part)[/-]timestamp-process,
you'll find that your access times will be must faster (especially if
you don't use H-Trees). This also applies if you're just creating a
file, because you'll have to search the entire directory to see if
that filename exists
With regular directories, searching through them to see if a file
already exist increases linearly with the number of entries. If you
hash on 3 levels with 8-bits per level, you'll have to open 2 or 3
extra inodes, but you'll cut your directory search times down by a
factor of 20000-1. You'll also skip having to deal with any sort of
directory-size limit. (=2^24/256/3)
I did something similar on a Solaris box which had 200000 emails in
the /var/spool/mqueue directory. That many messages was slowing the
system to a crawl. I hashed it into 100 directories with 2000
entries each, it sped things up *enormously.*
On Tue, Mar 18, 2008 at 3:56 PM, Andreas Dilger <adilger@sun.com
<mailto:adilger@sun.com>> wrote:
On Mar 17, 2008 09:32 -0400, Theodore Ts'o wrote:
> On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote:
> > Theodore Tso,
> >
> > In 64bit system, directory size can not be bigger than 2GB?
>
> No, because the high 32-bits for i_size are overloaded to store the
> directory creation acl.
I think we should change the code (kernel and e2fsprogs) to allow
i_size_high for directories also.
> In practice, you really don't want to have a directory that huge
> anyway. Iterating through it all with readdir() gets horribly slow,
> and applications that try do anything with really huge directories
> would be well advised to use a database, because they will get
*much*
> better performance that way....
Actually, for many HPC applications they never do readdir at all.
The job creates 1 file/process and always uses a predefined filename
like {job}-{timestamp}-{process} that it will directly look up.
Cheers, Andreas
--
Stephen Samuel http://www.bcgreen.com
778-861-7641
_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
03-20-2008, 09:59 AM
"liuyue"
The maximum number of files under a folder
Thank you all.
Now I find a patch which can extend ext3 subdirectory limit.
http://osdir.com/ml/file-systems.ext2.devel/2004-12/msg00026.html
>On Mar 17, 2008 09:32 -0400, Theodore Ts'o wrote:
>> On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote:
>> > Theodore Tso,
>> >
>> > In 64bit system, directory size can not be bigger than 2GB?
>>
>> No, because the high 32-bits for i_size are overloaded to store the
>> directory creation acl.
>
>I think we should change the code (kernel and e2fsprogs) to allow
>i_size_high for directories also.
>
>> In practice, you really don't want to have a directory that huge
>> anyway. Iterating through it all with readdir() gets horribly slow,
>> and applications that try do anything with really huge directories
>> would be well advised to use a database, because they will get *much*
>> better performance that way....
>
>Actually, for many HPC applications they never do readdir at all.
>The job creates 1 file/process and always uses a predefined filename
>like {job}-{timestamp}-{process} that it will directly look up.
>
>Cheers, Andreas
>--
>Andreas Dilger
>Sr. Staff Engineer, Lustre Group
>Sun Microsystems of Canada, Inc.
>
>
>
_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
03-20-2008, 10:28 AM
Theodore Tso
The maximum number of files under a folder
On Thu, Mar 20, 2008 at 06:59:59PM +0800, liuyue wrote:
> Thank you all.
>
> Now I find a patch which can extend ext3 subdirectory limit.
> http://osdir.com/ml/file-systems.ext2.devel/2004-12/msg00026.html
That's *subdirectories*, not files. The maximum number of files per
directory are basically limited as discussed in this thread. The
number of subdirectories was limited by the 16-bit i_nlink field.
Andreas' idea for extending this limit, as described above, is in
ext4.
Regards,
- Ted
_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users