FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > EXT3 Users

 
 
LinkBack Thread Tools
 
Old 03-18-2008, 09:56 PM
Andreas Dilger
 
Default The maximum number of files under a folder

On Mar 17, 2008 09:32 -0400, Theodore Ts'o wrote:
> On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote:
> > Theodore Tso,
> >
> > In 64bit system, directory size can not be bigger than 2GB?
>
> No, because the high 32-bits for i_size are overloaded to store the
> directory creation acl.

I think we should change the code (kernel and e2fsprogs) to allow
i_size_high for directories also.

> In practice, you really don't want to have a directory that huge
> anyway. Iterating through it all with readdir() gets horribly slow,
> and applications that try do anything with really huge directories
> would be well advised to use a database, because they will get *much*
> better performance that way....

Actually, for many HPC applications they never do readdir at all.
The job creates 1 file/process and always uses a predefined filename
like {job}-{timestamp}-{process} that it will directly look up.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 03-19-2008, 05:35 AM
"Stephen Samuel"
 
Default The maximum number of files under a folder

The OS will have to search the directory to see if the file already exists before creating it.

Well, if you hash it such that it splits up something like:
jobid(upper part)/jobid(lower- part)[/-]timestamp-process,


*you'll find that your access times will be must faster (especially if
you don't use H-Trees).* This also applies if* you're just creating a
file, because you'll have to search the entire directory to see if that
filename exists


With regular directories, searching through them to see if a file
already exist increases linearly with the number of entries.* If you
hash on 3 levels with 8-bits per level, you'll have to open 2 or 3
extra inodes, but you'll cut your directory search times down by a
factor of 20000-1.* You'll also skip having to deal with any sort of
directory-size limit. (=2^24/256/3)


I did something similar on a Solaris box which had 200000 emails in
the /var/spool/mqueue directory. That many messages was slowing the
system to a crawl.* I hashed it into 100 directories with 2000* entries
each,** it sped things up enormously.

On Tue, Mar 18, 2008 at 3:56 PM, Andreas Dilger <adilger@sun.com> wrote:

On Mar 17, 2008 *09:32 -0400, Theodore Ts'o wrote:

> On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote:

> > Theodore Tso,

> >

> > * * In 64bit system, directory size can not be bigger than 2GB?

>

> No, because the high 32-bits for i_size are overloaded to store the

> directory creation acl.



I think we should change the code (kernel and e2fsprogs) to allow

i_size_high for directories also.



> In practice, you really don't want to have a directory that huge

> anyway. *Iterating through it all with readdir() gets horribly slow,

> and applications that try do anything with really huge directories

> would be well advised to use a database, because they will get *much*

> better performance that way....



Actually, for many HPC applications they never do readdir at all.

The job creates 1 file/process and always uses a predefined filename

like {job}-{timestamp}-{process} that it will directly look up.



Cheers, Andreas


--
Stephen Samuel http://www.bcgreen.com
778-861-7641
_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 03-19-2008, 11:16 AM
John Nelson
 
Default The maximum number of files under a folder

What does what does the h stand for in h-tree? Like the b in btree is
binary Tree




Stephen Samuel wrote:
The OS will have to search the directory to see if the file already
exists before creating it.


Well, if you hash it such that it splits up something like:
jobid(upper part)/jobid(lower- part)[/-]timestamp-process,
you'll find that your access times will be must faster (especially if
you don't use H-Trees). This also applies if you're just creating a
file, because you'll have to search the entire directory to see if
that filename exists


With regular directories, searching through them to see if a file
already exist increases linearly with the number of entries. If you
hash on 3 levels with 8-bits per level, you'll have to open 2 or 3
extra inodes, but you'll cut your directory search times down by a
factor of 20000-1. You'll also skip having to deal with any sort of
directory-size limit. (=2^24/256/3)


I did something similar on a Solaris box which had 200000 emails in
the /var/spool/mqueue directory. That many messages was slowing the
system to a crawl. I hashed it into 100 directories with 2000
entries each, it sped things up *enormously.*


On Tue, Mar 18, 2008 at 3:56 PM, Andreas Dilger <adilger@sun.com
<mailto:adilger@sun.com>> wrote:


On Mar 17, 2008 09:32 -0400, Theodore Ts'o wrote:
> On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote:
> > Theodore Tso,
> >
> > In 64bit system, directory size can not be bigger than 2GB?
>
> No, because the high 32-bits for i_size are overloaded to store the
> directory creation acl.

I think we should change the code (kernel and e2fsprogs) to allow
i_size_high for directories also.

> In practice, you really don't want to have a directory that huge
> anyway. Iterating through it all with readdir() gets horribly slow,
> and applications that try do anything with really huge directories
> would be well advised to use a database, because they will get
*much*
> better performance that way....

Actually, for many HPC applications they never do readdir at all.
The job creates 1 file/process and always uses a predefined filename
like {job}-{timestamp}-{process} that it will directly look up.

Cheers, Andreas




--
Stephen Samuel http://www.bcgreen.com
778-861-7641


_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 03-20-2008, 09:59 AM
"liuyue"
 
Default The maximum number of files under a folder

Thank you all.

Now I find a patch which can extend ext3 subdirectory limit.
http://osdir.com/ml/file-systems.ext2.devel/2004-12/msg00026.html

======= 2008-03-19 06:56:58 ζ‚¨εœ¨ζ₯δΏ‘δΈ*ε†™ι“οΌš=======

>On Mar 17, 2008 09:32 -0400, Theodore Ts'o wrote:
>> On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote:
>> > Theodore Tso,
>> >
>> > In 64bit system, directory size can not be bigger than 2GB?
>>
>> No, because the high 32-bits for i_size are overloaded to store the
>> directory creation acl.
>
>I think we should change the code (kernel and e2fsprogs) to allow
>i_size_high for directories also.
>
>> In practice, you really don't want to have a directory that huge
>> anyway. Iterating through it all with readdir() gets horribly slow,
>> and applications that try do anything with really huge directories
>> would be well advised to use a database, because they will get *much*
>> better performance that way....
>
>Actually, for many HPC applications they never do readdir at all.
>The job creates 1 file/process and always uses a predefined filename
>like {job}-{timestamp}-{process} that it will directly look up.
>
>Cheers, Andreas
>--
>Andreas Dilger
>Sr. Staff Engineer, Lustre Group
>Sun Microsystems of Canada, Inc.
>
>
>

= = = = = = = = = = = = = = = = = = = =


        致
瀼!


γ€€γ€€γ€€γ€€γ€€γ€€γ€€γ€€liuyue
γ€€γ€€γ€€γ€€γ€€γ€€γ€€γ€€liuyue@ncic.ac.cn
γ€€γ€€γ€€γ€€γ€€γ€€γ€€γ€€γ€€γ€€2008-03-20


_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 03-20-2008, 10:28 AM
Theodore Tso
 
Default The maximum number of files under a folder

On Thu, Mar 20, 2008 at 06:59:59PM +0800, liuyue wrote:
> Thank you all.
>
> Now I find a patch which can extend ext3 subdirectory limit.
> http://osdir.com/ml/file-systems.ext2.devel/2004-12/msg00026.html

That's *subdirectories*, not files. The maximum number of files per
directory are basically limited as discussed in this thread. The
number of subdirectories was limited by the 16-bit i_nlink field.
Andreas' idea for extending this limit, as described above, is in
ext4.

Regards,

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 

Thread Tools




All times are GMT. The time now is 07:34 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org