Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   EXT3 Users (http://www.linux-archive.org/ext3-users/)
-   -   ext3 with maildir++ = huge disk latency and high load (http://www.linux-archive.org/ext3-users/579106-ext3-maildir-huge-disk-latency-high-load.html)

Andrey 09-23-2011 09:52 AM

ext3 with maildir++ = huge disk latency and high load
 
Thank you for reply,

BTW, other webserver has almost the same bonnie results (10283ms and
5884ms) on ext3 partition with 45GB of data (1.5 millions of files)?!


Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4
with SmartArray 6i controller (as I see it comes with 128MB BBWC enabler
but not kit).


I did not tried to mount fs with barriers disabled. Does it have any
crititcal risks?


Bonnie tests was performed in the morning when we have a mininmal user load.

But why the same server with the same RAID(4 disks) but with FreeBSD+UFS
was much better? I guess problem is in ext3 then?


With regards, Andrey.

23.09.2011 11:31, Janne Pikkarainen пишет:

Hello,

On 09/23/2011 08:51 AM, Andrey wrote:

Hello,

I have a production mail server with maildir++ structure and about
250GB (~10 millions) of files on the ext3 partition on RAID5. It's
mounted with noatime option. These mail server is responsible to local
delivery and storing mail messages.

System has Debian Squeeze installed and Exim as MDA + Dovecot as
IMAP+POP3 server.

Bonnie results are terrible. Sequential output for Block and Rewrite
are 10722ms and 9232ms. So if there is a 1000 messages in the mail
queue load is extremely high, delivery time is very big and server can
hang. I did not see such problems with UFS on FreeBSD server.

As I understand ext3 file system is really bad for such configurations
with Maildir++ (many smaill files)? Is there a way to decrease disk
latency on ext3 or speed up it?

With regards, Andrey

___


(replying off-list, so the ext3 developers will not start a flamewar)

In my opinion ext3 is a terrible file system for your kind of workload,
especially if you have lots of concurrent clients accessing their
mailboxes. Even though ext3 has evolved over the years and has gained
features such as directory indexes, it still is not good for tens of
million of frequently changing small files with lots of concurrency.
Been there, done that, not gonna do it again. I administer servers with
50 000 - 100 000 user accounts, with couple of thousands active IMAP
connections.

Personally I switched from ext3 to ReiserFS many years ago and happily
used it between 2004-2008, then after things went downhill from ReiserFS
development point of view, I switched to XFS during a server hardware
refresh. ReiserFS was excellent, but it really started to slow down if
file system was more than 85% full and it also got fragmented over time.

XFS has been rock-solid and fast since 2008 for me, but it has an
achilles heel of its own: if I need to remove lots of files from a huge
directory tree, the delete performance is quite sucky compared to other
file systems. This has been improved in the later kernel versions with
the new delaylog parameter, but how much, I've not yet tested.

All this said, the performance of ext3 should not be THAT bad you are
describing. Is the bonnie result done while the server is idle or while
it has mail clients accessing it all the time? If you have hardware
RAID, is there a battery-backed up write cache and are you sure it's
enabled? Also, have you tried to mount your file system with barriers
disabled? What kind of server setup you have?

Something is very wrong.

Best regards,

Janne Pikkarainen




_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Eric Sandeen 09-23-2011 02:43 PM

ext3 with maildir++ = huge disk latency and high load
 
On 9/23/11 4:52 AM, Andrey wrote:
> Thank you for reply,
>
> BTW, other webserver has almost the same bonnie results (10283ms and
> 5884ms) on ext3 partition with 45GB of data (1.5 millions of
> files)?!
>
> Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4
> with SmartArray 6i controller (as I see it comes with 128MB BBWC
> enabler but not kit).
>
> I did not tried to mount fs with barriers disabled. Does it have any
> crititcal risks?

Yes. If you have write caches on either the raid controller or on
the disks behind it which can be lost on a power outage, running
without barriers will potentially corrupt your filesystem if you lose
power, even though you have ext3's journaling.

Journaling depends on write guarantees which are lost if drive
write caches evaporate.

-Eric

> Bonnie tests was performed in the morning when we have a mininmal user load.
>
> But why the same server with the same RAID(4 disks) but with FreeBSD+UFS was much better? I guess problem is in ext3 then?
>
> With regards, Andrey.
>
> 23.09.2011 11:31, Janne Pikkarainen пишет:
>> Hello,
>>
>> On 09/23/2011 08:51 AM, Andrey wrote:
>>> Hello,
>>>
>>> I have a production mail server with maildir++ structure and about
>>> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's
>>> mounted with noatime option. These mail server is responsible to local
>>> delivery and storing mail messages.
>>>
>>> System has Debian Squeeze installed and Exim as MDA + Dovecot as
>>> IMAP+POP3 server.
>>>
>>> Bonnie results are terrible. Sequential output for Block and Rewrite
>>> are 10722ms and 9232ms. So if there is a 1000 messages in the mail
>>> queue load is extremely high, delivery time is very big and server can
>>> hang. I did not see such problems with UFS on FreeBSD server.
>>>
>>> As I understand ext3 file system is really bad for such configurations
>>> with Maildir++ (many smaill files)? Is there a way to decrease disk
>>> latency on ext3 or speed up it?
>>>
>>> With regards, Andrey
>>>
>>> ___
>>
>> (replying off-list, so the ext3 developers will not start a flamewar)
>>
>> In my opinion ext3 is a terrible file system for your kind of workload,
>> especially if you have lots of concurrent clients accessing their
>> mailboxes. Even though ext3 has evolved over the years and has gained
>> features such as directory indexes, it still is not good for tens of
>> million of frequently changing small files with lots of concurrency.
>> Been there, done that, not gonna do it again. I administer servers with
>> 50 000 - 100 000 user accounts, with couple of thousands active IMAP
>> connections.
>>
>> Personally I switched from ext3 to ReiserFS many years ago and happily
>> used it between 2004-2008, then after things went downhill from ReiserFS
>> development point of view, I switched to XFS during a server hardware
>> refresh. ReiserFS was excellent, but it really started to slow down if
>> file system was more than 85% full and it also got fragmented over time.
>>
>> XFS has been rock-solid and fast since 2008 for me, but it has an
>> achilles heel of its own: if I need to remove lots of files from a huge
>> directory tree, the delete performance is quite sucky compared to other
>> file systems. This has been improved in the later kernel versions with
>> the new delaylog parameter, but how much, I've not yet tested.
>>
>> All this said, the performance of ext3 should not be THAT bad you are
>> describing. Is the bonnie result done while the server is idle or while
>> it has mail clients accessing it all the time? If you have hardware
>> RAID, is there a battery-backed up write cache and are you sure it's
>> enabled? Also, have you tried to mount your file system with barriers
>> disabled? What kind of server setup you have?
>>
>> Something is very wrong.
>>
>> Best regards,
>>
>> Janne Pikkarainen
>>
>>
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users@redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Eric Sandeen 09-23-2011 02:48 PM

ext3 with maildir++ = huge disk latency and high load
 
On 9/23/11 9:43 AM, Eric Sandeen wrote:
> On 9/23/11 4:52 AM, Andrey wrote:
>> Thank you for reply,
>>
>> BTW, other webserver has almost the same bonnie results (10283ms and
>> 5884ms) on ext3 partition with 45GB of data (1.5 millions of
>> files)?!
>>
>> Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4
>> with SmartArray 6i controller (as I see it comes with 128MB BBWC
>> enabler but not kit).
>>
>> I did not tried to mount fs with barriers disabled. Does it have any
>> crititcal risks?
>
> Yes. If you have write caches on either the raid controller or on
> the disks behind it which can be lost on a power outage, running
> without barriers will potentially corrupt your filesystem if you lose
> power, even though you have ext3's journaling.
>
> Journaling depends on write guarantees which are lost if drive
> write caches evaporate.

... evaporate unexpectedly that is. barriers manage that cache.

If write caches are battery-backed (or off), then nobarrier is safe.

-Eric

> -Eric


_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Bob 09-23-2011 05:19 PM

ext3 with maildir++ = huge disk latency and high load
 
On 09/22/2011 11:51 PM, Andrey wrote:

Hello,

I have a production mail server with maildir++ structure and about
250GB (~10 millions) of files on the ext3 partition on RAID5. It's
mounted with noatime option. These mail server is responsible to local
delivery and storing mail messages.


System has Debian Squeeze installed and Exim as MDA + Dovecot as
IMAP+POP3 server.


Bonnie results are terrible. Sequential output for Block and Rewrite
are 10722ms and 9232ms. So if there is a 1000 messages in the mail
queue load is extremely high, delivery time is very big and server can
hang. I did not see such problems with UFS on FreeBSD server.


As I understand ext3 file system is really bad for such configurations
with Maildir++ (many smaill files)? Is there a way to decrease disk
latency on ext3 or speed up it?




My guess is that your problem is many files in one directory not necessarily
having many files on the whole file system. In my experience large
directories
eat ext3's lunch. The introduction of indexing did help but it still
fell behind

on performance when compared to some other file systems. You may want
to make sure your file system has indexing turned on but with the vintage of
your Debian I would assume it is on by default.

I ran into this problem many years ago (before indexing was an ext3
option). It

was even worse as the Maildir storage was being accessed over NFS. Ended
up eventually biting the bullet and moving to WAFL (NetApp).

My guess is that users trying to access these large directories via IMAP
and POP

are also facing large delays and possibly even time outs.

Steven

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Andrey 09-24-2011 05:46 PM

ext3 with maildir++ = huge disk latency and high load
 
Sure, indexing is on by default on Debian ext3. I think I'll try to test
some cases an run bonnie++ on freesh HP server with the same configuration.


Also I have maildir with more than 10000 messages an don't have
timesouts and access problesm via IMAP to it, that's strange. Sometimes
I notice that copying message to Sent folder can wait a little but it's
a seldom issue but can corellate with it, I agree. Also I see in Exim
logs that DT (delivery time) is equal to more than 2 seconds although
user's maildir is almost empty, so I intend to that it is a primary
problem of whole ext3 system or RAID5 hardware.


23.09.2011 21:19, Bob пишет:

On 09/22/2011 11:51 PM, Andrey wrote:

Hello,

I have a production mail server with maildir++ structure and about
250GB (~10 millions) of files on the ext3 partition on RAID5. It's
mounted with noatime option. These mail server is responsible to local
delivery and storing mail messages.

System has Debian Squeeze installed and Exim as MDA + Dovecot as
IMAP+POP3 server.

Bonnie results are terrible. Sequential output for Block and Rewrite
are 10722ms and 9232ms. So if there is a 1000 messages in the mail
queue load is extremely high, delivery time is very big and server can
hang. I did not see such problems with UFS on FreeBSD server.

As I understand ext3 file system is really bad for such configurations
with Maildir++ (many smaill files)? Is there a way to decrease disk
latency on ext3 or speed up it?



My guess is that your problem is many files in one directory not
necessarily
having many files on the whole file system. In my experience large
directories
eat ext3's lunch. The introduction of indexing did help but it still
fell behind
on performance when compared to some other file systems. You may want
to make sure your file system has indexing turned on but with the
vintage of
your Debian I would assume it is on by default.

I ran into this problem many years ago (before indexing was an ext3
option). It
was even worse as the Maildir storage was being accessed over NFS. Ended
up eventually biting the bullet and moving to WAFL (NetApp).

My guess is that users trying to access these large directories via IMAP
and POP
are also facing large delays and possibly even time outs.

Steven

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users




_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

"Ted Ts'o" 09-24-2011 07:04 PM

ext3 with maildir++ = huge disk latency and high load
 
On Sat, Sep 24, 2011 at 09:46:49PM +0400, Andrey wrote:
> Sure, indexing is on by default on Debian ext3. I think I'll try to
> test some cases an run bonnie++ on freesh HP server with the same
> configuration.

For really gargantuan directories, indexing definitely hurts when you
do a readdir+stat (i.e. /bin/ls -sF) or readdir+unlink (i.e., rm -rf)/

> Also I have maildir with more than 10000 messages an don't have
> timesouts and access problesm via IMAP to it, that's strange.

That's probably because this problem can be worked around by doing a
readdir, then sorting by the inode number (d_ino), and the doing the
stat or unlink. Some programs, especially those that expressly deal
with Maildir directories, have this optimization already there.

I also have a LD_PRELOAD hack that can be used to demonstrate why
putting this is a good idea. You can google for spd_readdir and find
it. I'll also put the latest version of it in the contrib directory
in e2fsprogs for the next release.

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

"Ted Ts'o" 09-25-2011 01:41 AM

ext3 with maildir++ = huge disk latency and high load
 
On Sat, Sep 24, 2011 at 03:04:47PM -0400, Ted Ts'o wrote:
> I also have a LD_PRELOAD hack that can be used to demonstrate why
> putting this is a good idea. You can google for spd_readdir and find
> it. I'll also put the latest version of it in the contrib directory
> in e2fsprogs for the next release.

While I was looking at spd_readdir.c before including it in
e2fsprogs's contrib directory, I realized the last version I released
was pretty incomplete, and didn't work with modern-day coreutils.

So I'll be including this version into the e2fsprogs git tree, but
since in the past I've distributing by sending it to folks via e-mail,
here's an updated version of spd_readdir.c. Please try this to any
older versions that you might find in mailing list archives.

Note that this preload is not going to work for all programs. In
particular, although it does supply readdir_r(), it is *not* thread
safe. So I can't recommend this as something to be dropped in
/etc/ld.so.preload.

- Ted

/*
* readdir accelerator
*
* (C) Copyright 2003, 2004 by Theodore Ts'o.
*
* Compile using the command:
*
* gcc -o spd_readdir.so -fPIC -shared spd_readdir.c -ldl
*
* Use it by setting the LD_PRELOAD environment variable:
*
* export LD_PRELOAD=/usr/local/sbin/spd_readdir.so
*
* %Begin-Header%
* This file may be redistributed under the terms of the GNU Public
* License, version 2.
* %End-Header%
*
*/

#define ALLOC_STEPSIZE 100
#define MAX_DIRSIZE 0

#define DEBUG

#ifdef DEBUG
#define DEBUG_DIR(x) {if (do_debug) { x; }}
#else
#define DEBUG_DIR(x)
#endif

#define _GNU_SOURCE
#define __USE_LARGEFILE64

#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <stdlib.h>
#include <string.h>
#include <dirent.h>
#include <errno.h>
#include <dlfcn.h>

struct dirent_s {
unsigned long long d_ino;
long long d_off;
unsigned short int d_reclen;
unsigned char d_type;
char *d_name;
};

struct dir_s {
DIR *dir;
int num;
int max;
struct dirent_s *dp;
int pos;
int direct;
struct dirent ret_dir;
struct dirent64 ret_dir64;
};

static int (*real_closedir)(DIR *dir) = 0;
static DIR *(*real_opendir)(const char *name) = 0;
static DIR *(*real_fdopendir)(int fd) = 0;
static void *(*real_rewinddir)(DIR *dirp) = 0;
static struct dirent *(*real_readdir)(DIR *dir) = 0;
static int (*real_readdir_r)(DIR *dir, struct dirent *entry,
struct dirent **result) = 0;
static struct dirent64 *(*real_readdir64)(DIR *dir) = 0;
static off_t (*real_telldir)(DIR *dir) = 0;
static void (*real_seekdir)(DIR *dir, off_t offset) = 0;
static int (*real_dirfd)(DIR *dir) = 0;
static unsigned long max_dirsize = MAX_DIRSIZE;
static int num_open = 0;
#ifdef DEBUG
static int do_debug = 0;
#endif

static void setup_ptr()
{
char *cp;

real_opendir = dlsym(RTLD_NEXT, "opendir");
real_fdopendir = dlsym(RTLD_NEXT, "fdopendir");
real_closedir = dlsym(RTLD_NEXT, "closedir");
real_rewinddir = dlsym(RTLD_NEXT, "rewinddir");
real_readdir = dlsym(RTLD_NEXT, "readdir");
real_readdir_r = dlsym(RTLD_NEXT, "readdir_r");
real_readdir64 = dlsym(RTLD_NEXT, "readdir64");
real_telldir = dlsym(RTLD_NEXT, "telldir");
real_seekdir = dlsym(RTLD_NEXT, "seekdir");
real_dirfd = dlsym(RTLD_NEXT, "dirfd");
if ((cp = getenv("SPD_READDIR_MAX_SIZE")) != NULL) {
max_dirsize = atol(cp);
}
#ifdef DEBUG
if (getenv("SPD_READDIR_DEBUG")) {
printf("initialized!
");
do_debug++;
}
#endif
}

static void free_cached_dir(struct dir_s *dirstruct)
{
int i;

if (!dirstruct->dp)
return;

for (i=0; i < dirstruct->num; i++) {
free(dirstruct->dp[i].d_name);
}
free(dirstruct->dp);
dirstruct->dp = 0;
dirstruct->max = dirstruct->num = 0;
}

static int ino_cmp(const void *a, const void *b)
{
const struct dirent_s *ds_a = (const struct dirent_s *) a;
const struct dirent_s *ds_b = (const struct dirent_s *) b;
ino_t i_a, i_b;

i_a = ds_a->d_ino;
i_b = ds_b->d_ino;

if (ds_a->d_name[0] == '.') {
if (ds_a->d_name[1] == 0)
i_a = 0;
else if ((ds_a->d_name[1] == '.') && (ds_a->d_name[2] == 0))
i_a = 1;
}
if (ds_b->d_name[0] == '.') {
if (ds_b->d_name[1] == 0)
i_b = 0;
else if ((ds_b->d_name[1] == '.') && (ds_b->d_name[2] == 0))
i_b = 1;
}

return (i_a - i_b);
}

struct dir_s *alloc_dirstruct(DIR *dir)
{
struct dir_s *dirstruct;

dirstruct = malloc(sizeof(struct dir_s));
if (dirstruct)
memset(dirstruct, 0, sizeof(struct dir_s));
dirstruct->dir = dir;
return dirstruct;
}

void cache_dirstruct(struct dir_s *dirstruct)
{
struct dirent_s *ds, *dnew;
struct dirent64 *d;

while ((d = (*real_readdir64)(dirstruct->dir)) != NULL) {
if (dirstruct->num >= dirstruct->max) {
dirstruct->max += ALLOC_STEPSIZE;
DEBUG_DIR(printf("Reallocating to size %d
",
dirstruct->max));
dnew = realloc(dirstruct->dp,
dirstruct->max * sizeof(struct dir_s));
if (!dnew)
goto nomem;
dirstruct->dp = dnew;
}
ds = &dirstruct->dp[dirstruct->num++];
ds->d_ino = d->d_ino;
ds->d_off = d->d_off;
ds->d_reclen = d->d_reclen;
ds->d_type = d->d_type;
if ((ds->d_name = malloc(strlen(d->d_name)+1)) == NULL) {
dirstruct->num--;
goto nomem;
}
strcpy(ds->d_name, d->d_name);
DEBUG_DIR(printf("readdir: %lu %s
",
(unsigned long) d->d_ino, d->d_name));
}
qsort(dirstruct->dp, dirstruct->num, sizeof(struct dirent_s), ino_cmp);
return;
nomem:
DEBUG_DIR(printf("No memory, backing off to direct readdir
"));
free_cached_dir(dirstruct);
dirstruct->direct = 1;
}

DIR *opendir(const char *name)
{
DIR *dir;
struct dir_s *dirstruct;
struct stat st;

if (!real_opendir)
setup_ptr();

DEBUG_DIR(printf("Opendir(%s) (%d open)
", name, num_open++));
dir = (*real_opendir)(name);
if (!dir)
return NULL;

dirstruct = alloc_dirstruct(dir);
if (!dirstruct) {
(*real_closedir)(dir);
errno = -ENOMEM;
return NULL;
}

if (max_dirsize && (stat(name, &st) == 0) &&
(st.st_size > max_dirsize)) {
DEBUG_DIR(printf("Directory size %ld, using direct readdir
",
st.st_size));
dirstruct->direct = 1;
return (DIR *) dirstruct;
}

cache_dirstruct(dirstruct);
return ((DIR *) dirstruct);
}

DIR *fdopendir(int fd)
{
DIR *dir;
struct dir_s *dirstruct;
struct stat st;

if (!real_fdopendir)
setup_ptr();

DEBUG_DIR(printf("fdpendir(%d) (%d open)
", fd, num_open++));
dir = (*real_fdopendir)(fd);
if (!dir)
return NULL;

dirstruct = alloc_dirstruct(dir);
if (!dirstruct) {
(*real_closedir)(dir);
errno = -ENOMEM;
return NULL;
}

if (max_dirsize && (fstat(fd, &st) == 0) &&
(st.st_size > max_dirsize)) {
DEBUG_DIR(printf("Directory size %ld, using direct readdir
",
st.st_size));
dirstruct->dir = dir;
dirstruct->direct = 1;
return (DIR *) dirstruct;
}

cache_dirstruct(dirstruct);
return ((DIR *) dirstruct);
}

int closedir(DIR *dir)
{
struct dir_s *dirstruct = (struct dir_s *) dir;

DEBUG_DIR(printf("Closedir (%d open)
", --num_open));
if (dirstruct->dir)
(*real_closedir)(dirstruct->dir);

free_cached_dir(dirstruct);
free(dirstruct);
return 0;
}

struct dirent *readdir(DIR *dir)
{
struct dir_s *dirstruct = (struct dir_s *) dir;
struct dirent_s *ds;

if (dirstruct->direct)
return (*real_readdir)(dirstruct->dir);

if (dirstruct->pos >= dirstruct->num)
return NULL;

ds = &dirstruct->dp[dirstruct->pos++];
dirstruct->ret_dir.d_ino = ds->d_ino;
dirstruct->ret_dir.d_off = ds->d_off;
dirstruct->ret_dir.d_reclen = ds->d_reclen;
dirstruct->ret_dir.d_type = ds->d_type;
strncpy(dirstruct->ret_dir.d_name, ds->d_name,
sizeof(dirstruct->ret_dir.d_name));

return (&dirstruct->ret_dir);
}

int readdir_r(DIR *dir, struct dirent *entry, struct dirent **result)
{
struct dir_s *dirstruct = (struct dir_s *) dir;
struct dirent_s *ds;

if (dirstruct->direct)
return (*real_readdir_r)(dirstruct->dir, entry, result);

if (dirstruct->pos >= dirstruct->num) {
*result = NULL;
return 0;
}

ds = &dirstruct->dp[dirstruct->pos++];
entry->d_ino = ds->d_ino;
entry->d_off = ds->d_off;
entry->d_reclen = ds->d_reclen;
entry->d_type = ds->d_type;
strncpy(entry->d_name, ds->d_name, sizeof(entry->d_name));
*result = entry;

return 0;
}

struct dirent64 *readdir64(DIR *dir)
{
struct dir_s *dirstruct = (struct dir_s *) dir;
struct dirent_s *ds;

if (dirstruct->direct)
return (*real_readdir64)(dirstruct->dir);

if (dirstruct->pos >= dirstruct->num)
return NULL;

ds = &dirstruct->dp[dirstruct->pos++];
dirstruct->ret_dir64.d_ino = ds->d_ino;
dirstruct->ret_dir64.d_off = ds->d_off;
dirstruct->ret_dir64.d_reclen = ds->d_reclen;
dirstruct->ret_dir64.d_type = ds->d_type;
strncpy(dirstruct->ret_dir64.d_name, ds->d_name,
sizeof(dirstruct->ret_dir64.d_name));

return (&dirstruct->ret_dir64);
}

off_t telldir(DIR *dir)
{
struct dir_s *dirstruct = (struct dir_s *) dir;

if (dirstruct->direct)
return (*real_telldir)(dirstruct->dir);

return ((off_t) dirstruct->pos);
}

void seekdir(DIR *dir, off_t offset)
{
struct dir_s *dirstruct = (struct dir_s *) dir;

if (dirstruct->direct) {
(*real_seekdir)(dirstruct->dir, offset);
return;
}

dirstruct->pos = offset;
}

void rewinddir(DIR *dir)
{
struct dir_s *dirstruct = (struct dir_s *) dir;

(*real_rewinddir)(dirstruct->dir);
if (dirstruct->direct)
return;

dirstruct->pos = 0;
free_cached_dir(dirstruct);
cache_dirstruct(dirstruct);
}

int dirfd(DIR *dir)
{
struct dir_s *dirstruct = (struct dir_s *) dir;
int fd = (*real_dirfd)(dirstruct->dir);

DEBUG_DIR(printf("dirfd %d, %p
", fd, real_dirfd));
return fd;
}

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Andreas Dilger 09-25-2011 06:16 AM

ext3 with maildir++ = huge disk latency and high load
 
On Sat, Sep 24, 2011 at 03:04:47PM -0400, Ted Ts'o wrote:
> I also have a LD_PRELOAD hack that can be used to demonstrate why
> putting this is a good idea. You can google for spd_readdir and find
> it. I'll also put the latest version of it in the contrib directory
> in e2fsprogs for the next release.


What we've started doing in Lustre (which has to deal with network
latency, but the same problem in terms of htree vs. inode ordering)
is to detect if the application is doing readdir+stat on the dirents
in readdir order, and then fork a thread to statahead the entries
in the kernel.

It would be possible to do something like this in the ext4 readdir
code to do dirent readahead, sort, and then prefetch the inodes
in order (partially or completely, depending on the directory size),
but as yet we aren't working on anything at the ext4 level.

There was a patch to do something similar to this for btrfs as well,
with the DCACHE_NEED_LOOKUP flag. That avoids a lot of the complexity
between instantiating dcache entries from readdir without yet having
read the inode from disk.

The other proposal I've made in the past is to try and allocate inodes
from the inode table in roughly hash order, so that when it comes time
to do readdir+stat that the dirents and inodes are already partially in
the same order. That breaks down in case of renames, but works well
for normal usage.

Cheers, Andreas





_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

"Ted Ts'o" 09-25-2011 09:21 PM

ext3 with maildir++ = huge disk latency and high load
 
On Sun, Sep 25, 2011 at 12:16:12AM -0600, Andreas Dilger wrote:
>
> It would be possible to do something like this in the ext4 readdir
> code to do dirent readahead, sort, and then prefetch the inodes
> in order (partially or completely, depending on the directory size),
> but as yet we aren't working on anything at the ext4 level.

What we have in ext4 right now is if we need to do disk i/o to read
from the inode table, we will read in adjacent blocks from the inode
table, on the theory that the effort needed to read in 32k versus 4k
is pretty much the same. So if the inodes were allocated all at the
same time, they will be sequentially ordered, and so the inode table
readahead should help quite a lot.

I'll note that with really large maildirs, especially on a mail server
with many other maildirs, over time the inodes for each individual
file will get scattered all over the place, and so pretty much any
scheme that uses a inode table separate from the blocks where the
directory entries are stored is going to get hammered by this use
case.

Ultimately, the best way to solve this problem is a more intelligent
application that caches the contents of the key headers in a database,
so you don't need to scan the contents of the entire Maildir when
doing common IMAP operations.

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Andrey 09-29-2011 07:29 AM

ext3 with maildir++ = huge disk latency and high load
 
Ok. Here are bonnie results on fresh installed Debian with 200GB FREE
ext3 /home partitition (4 disks in RAID5 on HP Proliant DL380 G4 server):


Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
debian 2G 243 97 22555 10 8794 2 1810 97 120444 11
317.0 5
Latency 135ms 967ms 723ms 26526us 13143us
586ms


Latency is also very bad according results. What is the reason? Hardware
or ext3 itseld? Will try with xfs an ext4 and compare then.



26.09.2011 01:21, Ted Ts'o пишет:

On Sun, Sep 25, 2011 at 12:16:12AM -0600, Andreas Dilger wrote:


It would be possible to do something like this in the ext4 readdir
code to do dirent readahead, sort, and then prefetch the inodes
in order (partially or completely, depending on the directory size),
but as yet we aren't working on anything at the ext4 level.


What we have in ext4 right now is if we need to do disk i/o to read
from the inode table, we will read in adjacent blocks from the inode
table, on the theory that the effort needed to read in 32k versus 4k
is pretty much the same. So if the inodes were allocated all at the
same time, they will be sequentially ordered, and so the inode table
readahead should help quite a lot.

I'll note that with really large maildirs, especially on a mail server
with many other maildirs, over time the inodes for each individual
file will get scattered all over the place, and so pretty much any
scheme that uses a inode table separate from the blocks where the
directory entries are stored is going to get hammered by this use
case.

Ultimately, the best way to solve this problem is a more intelligent
application that caches the contents of the key headers in a database,
so you don't need to scan the contents of the entire Maildir when
doing common IMAP operations.

- Ted




_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users


All times are GMT. The time now is 10:49 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.