FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > EXT3 Users

 
 
LinkBack Thread Tools
 
Old 08-01-2008, 09:43 AM
Thomas Trauner
 
Default duplicate entries on ext3 when using readdir/readdir64

Hello,


I have a problem with directories that contain more than 10000 entries
(Ubuntu 8.04.1) or with more than 70000 entries (RHEL 5.2). If you use
readdir(3) or readdir64(3) you get one entry twice, with same name and
inode.

Some analyzing showed that disabling dir_index solves this problem, but
I think that this is a bug in the ext3 code, as no other file-system
shows this behavior.

I've found the following regarding this bug, but nothing about whether
if it is fixed nor if a back-port for older 2.6 kernels exists.

<https://www.redhat.com/archives/ext3-users/2007-December/msg00004.html>
and
<http://episteme.arstechnica.com/eve/forums/a/tpc/f/96509133/m/199007643931?r=494000843931#494000843931>

On linux-fsdevel I've found the following, but they delete directory
entries in between multiple readdir calls.

<http://kerneltrap.org/mailarchive/linux-fsdevel/2005/9/15/310258>

Does anyone know where I could find more information or report this bug?

Thanks in advance!
Regards.
Tom Trauner

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 08-01-2008, 12:16 PM
Theodore Tso
 
Default duplicate entries on ext3 when using readdir/readdir64

On Fri, Aug 01, 2008 at 11:43:40AM +0200, Thomas Trauner wrote:
>
> I have a problem with directories that contain more than 10000 entries
> (Ubuntu 8.04.1) or with more than 70000 entries (RHEL 5.2). If you use
> readdir(3) or readdir64(3) you get one entry twice, with same name and
> inode.
>

How reproducible is this; can you reproduce it on this one filesystem?
Can you reproduce it on multiple filesystems? What sort of file names
are you using?

Also, are you testing by using "ls", or do you have your own program
getting the names of the files. If the latter, are you using
telldir()/seekdir() in any way?

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 08-01-2008, 02:00 PM
Thomas Trauner
 
Default duplicate entries on ext3 when using readdir/readdir64

On Fri, 2008-08-01 at 08:16 -0400, Theodore Tso wrote:
> On Fri, Aug 01, 2008 at 11:43:40AM +0200, Thomas Trauner wrote:
> >
> > I have a problem with directories that contain more than 10000 entries
> > (Ubuntu 8.04.1) or with more than 70000 entries (RHEL 5.2). If you use
> > readdir(3) or readdir64(3) you get one entry twice, with same name and
> > inode.
> >
>
> How reproducible is this; can you reproduce it on this one filesystem?
> Can you reproduce it on multiple filesystems? What sort of file names
> are you using?

Every time I tried. It is reproducible on the same filesystem, and also on other
systems with different filesystem sizes and usage patterns.
It showed up when on of our own script working through a Subversion directory failed.

File names are numbers, starting with "0" counting up.

> Also, are you testing by using "ls", or do you have your own program
> getting the names of the files. If the latter, are you using
> telldir()/seekdir() in any way?

I'm testing with 'ls|sort -n|uniq -d' and also with a simple program
that simply counts how often readdir can be called.

> - Ted
>

Tom

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 08-01-2008, 02:47 PM
Eric Sandeen
 
Default duplicate entries on ext3 when using readdir/readdir64

Thomas Trauner wrote:
> On Fri, 2008-08-01 at 08:16 -0400, Theodore Tso wrote:
>> On Fri, Aug 01, 2008 at 11:43:40AM +0200, Thomas Trauner wrote:
>>> I have a problem with directories that contain more than 10000 entries
>>> (Ubuntu 8.04.1) or with more than 70000 entries (RHEL 5.2). If you use
>>> readdir(3) or readdir64(3) you get one entry twice, with same name and
>>> inode.
>>>
>> How reproducible is this; can you reproduce it on this one filesystem?
>> Can you reproduce it on multiple filesystems? What sort of file names
>> are you using?
>
> Every time I tried. It is reproducible on the same filesystem, and also on other
> systems with different filesystem sizes and usage patterns.
> It showed up when on of our own script working through a Subversion directory failed.
>
> File names are numbers, starting with "0" counting up.
>
>> Also, are you testing by using "ls", or do you have your own program
>> getting the names of the files. If the latter, are you using
>> telldir()/seekdir() in any way?
>
> I'm testing with 'ls|sort -n|uniq -d' and also with a simple program
> that simply counts how often readdir can be called.
>

Hm, a bog-simple test here doesn't show any trouble:

[root@inode dirtest]# for I in `seq 0 10500`; do touch $I; done
[root@inode dirtest]# ls | sort -n | uniq -d
[root@inode dirtest]# ls | wc -l
10501

does that reflect what you're doing? Do you have a testcase you can share?

-Eric

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 08-04-2008, 07:48 AM
Thomas Trauner
 
Default duplicate entries on ext3 when using readdir/readdir64

On Fri, 2008-08-01 at 09:47 -0500, Eric Sandeen wrote:
> Hm, a bog-simple test here doesn't show any trouble:
>
> [root@inode dirtest]# for I in `seq 0 10500`; do touch $I; done
> [root@inode dirtest]# ls | sort -n | uniq -d
> [root@inode dirtest]# ls | wc -l
> 10501
>
> does that reflect what you're doing? Do you have a testcase you can share?

Yes, but I've written incorrect values, sorry. It's a little bit higher,
a run of my program outputs this on 2.6.24-19-generic (ubuntu 8.04.1):

expected 11778 files, but readdir reports 11779
expected 11862 files, but readdir64 reports 11863

And on 2.6.18-92.1.6.el5 (rhel 5.2):
expected 72922 files, but readdir reports 72923
expected 73131 files, but readdir64 reports 73132

The testcase is here: <http://www.unet.univie.ac.at/~a9100884/readdir.c>
> -Eric

Tom

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 08-05-2008, 10:53 AM
Thomas Trauner
 
Default duplicate entries on ext3 when using readdir/readdir64

On Fri, 2008-08-01 at 08:16 -0400, Theodore Tso wrote:
> On Fri, Aug 01, 2008 at 11:43:40AM +0200, Thomas Trauner wrote:
> >
> > I have a problem with directories that contain more than 10000 entries
> > (Ubuntu 8.04.1) or with more than 70000 entries (RHEL 5.2). If you use
> > readdir(3) or readdir64(3) you get one entry twice, with same name and
> > inode.
> >
>
> How reproducible is this; can you reproduce it on this one filesystem?
> Can you reproduce it on multiple filesystems? What sort of file names
> are you using?

I made new tests with the code under
<http://www.unet.univie.ac.at/~a9100884/readdir.c> on a bunch of freshly
generated and empty filesystems, every about 38GB large, of type fat
(aborted after about 22000 entries because it took to long), ext2, xfs,
jfs and again ext3. All tests made with 2.6.24-19-generic (ubuntu
8.04.1).

I also tried minix fs, just for fun, but I could only create 126 files.

Ext3 shows the same effect as before, but at 103033 entries (readdir)
and 104136 entries (readdir64). 'ls|sort -n|uniq -d' output (ls uses
getdents64, so I asume it uses readdir64, but I don't checked the ls
source):

root@darfnix:/readdir/ext3/testdir# ls|sort -n|uniq -d
102456
root@darfnix:/readdir/ext3/testdir#

Can I do anything else?

Regards
Tom

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 08-06-2008, 04:46 AM
Theodore Tso
 
Default duplicate entries on ext3 when using readdir/readdir64

On Tue, Aug 05, 2008 at 12:53:51PM +0200, Thomas Trauner wrote:
> > How reproducible is this; can you reproduce it on this one filesystem?
> > Can you reproduce it on multiple filesystems? What sort of file names
> > are you using?
>
> I made new tests with the code under
> <http://www.unet.univie.ac.at/~a9100884/readdir.c> on a bunch of freshly
> generated and empty filesystems, every about 38GB large, of type fat
> (aborted after about 22000 entries because it took to long), ext2, xfs,
> jfs and again ext3. All tests made with 2.6.24-19-generic (ubuntu
> 8.04.1).

I was able to reproduce using ext3. It looks like it's caused by a
hash collision; but ext3 has code that's supposed to avoid returning a
directory entry doubled in this fashion. I'll have to look into it.

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 08-06-2008, 01:33 PM
Thomas Trauner
 
Default duplicate entries on ext3 when using readdir/readdir64

On Wed, 2008-08-06 at 00:46 -0400, Theodore Tso wrote:
> I was able to reproduce using ext3. It looks like it's caused by a
> hash collision; but ext3 has code that's supposed to avoid returning a
> directory entry doubled in this fashion. I'll have to look into it.
>
> - Ted

Thank you.

Tom

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 08-06-2008, 02:07 PM
Theodore Tso
 
Default duplicate entries on ext3 when using readdir/readdir64

On Tue, Aug 05, 2008 at 12:53:51PM +0200, Thomas Trauner wrote:
> On Fri, 2008-08-01 at 08:16 -0400, Theodore Tso wrote:
> > On Fri, Aug 01, 2008 at 11:43:40AM +0200, Thomas Trauner wrote:
> > >
> > > I have a problem with directories that contain more than 10000 entries
> > > (Ubuntu 8.04.1) or with more than 70000 entries (RHEL 5.2). If you use
> > > readdir(3) or readdir64(3) you get one entry twice, with same name and
> > > inode.
> > >
> I made new tests with the code under
> <http://www.unet.univie.ac.at/~a9100884/readdir.c> on a bunch of freshly
> generated and empty filesystems, every about 38GB large, of type fat
> (aborted after about 22000 entries because it took to long), ext2, xfs,
> jfs and again ext3....

OK, I have a workaroud for you. It appears there's a kernel bug
hiding here, since there shouldn't be duplicates returned by readdir()
even if we have hash collisions.

It turns out though that the TEA hash we are currently using as the
default is a really sucky hash. I can't remember who suggested it; I
may go looking in the archives just out of curiosity. My fault,
though, I should have tested it much more thoroughly, although it
*looked* good, and it was take from the core of an encryption
algorithm, so I thought it would be OK.

The claim was that it was just as good for our purposes as the
cut-down md4 hash we were using, but it was faster (so it would burn
less cpu cycles). Unfortunately, (a) at least on modern hardware (I
tested on an X61s laptop) the TEA hash is in fact a little slower, and
(b) for small filenames with small hamming distances between them,,
such as what you are using in your test, it's generating lots of
collisions.

Anyway, the workaround is as follows:

debugfs -w /dev/sdXXX
debugfs: set_super_value def_hash_version half_md4
debugfs: quit

Then completely delete any directories where you were having problems,
and recreate them. (You can do the "mkdir foo.new; mv foo/* foo.new;
rmdir foo; mv foo.new foo" trick if you want to preserve the files in
that directory.)

In any case, here's the test case which shows the hash collision
problem much more quickly. You can also use it for benchmarks, like
so:

time tst_hash -q -a tea -n 3000000
time tst_hash -q -a half_md4 -n 3000000

With the following options, we can also see with the right filename
lengths, the tea algorithm doesn't create any hash collisions, so
maybe whoever tested the algorithm before they suggested it just got
unlucky with the set of filenames that he/she chose:

tst_hash -p 0000 -a tea -n 3000000

In any case, unless someone comes up with a really good reason, I
probably will change the default hash algorithm for mke2fs to
half_md4, since it is both faster and a better hash function.


This doesn't change the fact that the kernel should do the right thing
with hash collisions, at least in the simple case without
telldir/seekdir. When I merged the htree code I had tested it with
the Douglas Adams hash (always returns a hash value of
0x00000042:0000000 no matter what its inputs), and it did the right
thing, so we must have regressed somewhere along the line...

- Ted

/*
* tst_htree.c
*
* Copyright (C) 2008 by Theodore Ts'o.
*
* This file may be redistributed under the terms of the GNU Public
* License, Version 2
*
* Compile command:
* cc -g -O2 -o tst_hash tst_hash.c -lext2fs -lcom_err -luuid -le2p
*/

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <time.h>
#include <errno.h>
#include <sys/types.h>
#include <getopt.h>

#include "ext2fs/ext2fs.h"
#include "uuid/uuid.h"
#include "et/com_err.h"

#define SEED "87fd5d61-4612-4147-8bf5-a21948e7e909"

struct hash {
int num;
ext2_dirhash_t hash, minor_hash;
};

static EXT2_QSORT_TYPE hash_cmp(const void *a, const void *b)
{
const struct hash *db_a =
(const struct hash *) a;
const struct hash *db_b =
(const struct hash *) b;

if (db_a->hash != db_b->hash)
return (int) (db_a->hash - db_b->hash);

return (int) (db_a->minor_hash - db_b->minor_hash);
}

main(int argc, char **argv)
{
errcode_t errcode;
ext2_dirhash_t hash, minor_hash;
int hash_alg = EXT2_HASH_TEA;
char name[200], *tmp, prefix[100];
unsigned char uuid[16];
int thislen, i, c, quiet = 0, num_hashes = 300000;
struct hash *hash_array;

uuid_parse(SEED, uuid);
prefix[0] = 0;

while ((c = getopt(argc, argv, "s:a:n:qp:")) != EOF)
switch (c) {
case 's':
uuid_parse(optarg, uuid);
break;
case 'a':
hash_alg = e2p_string2hash(optarg);
if (hash_alg < 0) {
fprintf(stderr, "Invalid hash algorithm: %s
",
optarg);
exit(1);
}
break;
case 'n':
num_hashes = strtoul (optarg, &tmp, 0);
if (*tmp) {
com_err (argv[0], 0, "count - %s", optarg);
exit(1);
}
break;
case 'p':
if (strlen(optarg)+1 > sizeof(prefix)) {
fprintf(stderr, "%s: prefix too large!
",
argv[0]);
exit(1);
}
strcpy(prefix, optarg);
break;
case 'q':
quiet = 1;
break;
default:
fprintf(stderr, "Usage: %s [-q] [-s hash_seed] "
"[-a hash_alg] [-n num_hashes]
", argv[0]);
exit(1);
}

hash_array = malloc(num_hashes * sizeof(struct hash));
if (hash_array == NULL) {
fprintf(stderr, "Couldn't allocate hash_array
");
exit(1);
}

for (i=0; i < num_hashes; i++) {
sprintf(name, "%s%d", prefix, i);

errcode = ext2fs_dirhash(hash_alg, name, strlen(name),
(__u32 *) uuid,
&hash_array[i].hash,
&hash_array[i].minor_hash);
if (errcode) {
com_err("ext2fs_dirhash", errcode,
"while trying to hash '%s'", name);
exit(1);
}
hash_array[i].num = i;
}

qsort(hash_array, (size_t) num_hashes, sizeof(struct hash), hash_cmp);

for (c=0,i=0; i < num_hashes-1; i++) {
if ((hash_array[i].hash == hash_array[i+1].hash) &&
(hash_array[i].minor_hash == hash_array[i+1].minor_hash)) {
c++;
if (quiet)
continue;
printf("hash collision: %d, %d: %08x:%08x
",
hash_array[i].num, hash_array[i+1].num,
hash_array[i].hash, hash_array[i].minor_hash);
}
}
printf("%d collisions
", c);

exit(0);
}




_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 08-06-2008, 02:45 PM
Theodore Tso
 
Default duplicate entries on ext3 when using readdir/readdir64

On Mon, Aug 04, 2008 at 09:48:39AM +0200, Thomas Trauner wrote:
> Yes, but I've written incorrect values, sorry. It's a little bit higher,
> a run of my program outputs this on 2.6.24-19-generic (ubuntu 8.04.1):
>
> expected 11778 files, but readdir reports 11779
> expected 11862 files, but readdir64 reports 11863
>
> And on 2.6.18-92.1.6.el5 (rhel 5.2):
> expected 72922 files, but readdir reports 72923
> expected 73131 files, but readdir64 reports 73132

BTW, I doubt the difference in what you had on your Ubuntu and RHEL
system has anything to do with the kernel version or the distribution,
but just the luck of the draw. If you run "dumpe2fs -h /dev/sdXX |
grep "Hash Seed" from both systms, and then take that uuid and feed it
to the tst_hash program via the -s option, you'll probably see it was
simply the different directory hash seed which is changing when the
first collision happened:

%./tst_hash -s 27e0ed94-069c-44c0-bea0-044b1a8d7bcc
hash collision: 142886, 142987: 7104d654:131c0700
hash collision: 188030, 188131: aefe1dc2:f7517103
hash collision: 14020, 14031: fc717efa:87ce3eaa
hash collision: 120336, 120732: 34c3f1b6:cee72d50
4 collisions

vs.

% ./tst_hash -s 7089e459-07c2-43cc-b25f-bafdcce9cd05
hash collision: 167469, 167568: 4de08834:3fa2a17a
hash collision: 133356, 133752: ce1bfd8e:a1bce824
hash collision: 179218, 179319: ea71d5c8:43471df9
hash collision: 111503, 111701: fbfcea6c:760591e8
hash collision: 134034, 134135: 0ff24a86:f627f5a1
hash collision: 252452, 252553: 6631082a:43adb3f4
hash collision: 101107, 101305: a1a99e86:8d50e974
hash collision: 62302, 62313: 2689a56c:38ccd31d
hash collision: 60242, 60253: d9e3f444:f252b5f5
9 collisions

With the first hash seed, the first collision happened with the
filenames 14020 and 14031. With the second hash seed, you don't get a
collision until 60242 and 60253.

Regards,

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 

Thread Tools




All times are GMT. The time now is 07:23 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org