ext3 with maildir++ = huge disk latency and high load
Thank you for reply,
BTW, other webserver has almost the same bonnie results (10283ms and 5884ms) on ext3 partition with 45GB of data (1.5 millions of files)?! Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4 with SmartArray 6i controller (as I see it comes with 128MB BBWC enabler but not kit). I did not tried to mount fs with barriers disabled. Does it have any crititcal risks? Bonnie tests was performed in the morning when we have a mininmal user load. But why the same server with the same RAID(4 disks) but with FreeBSD+UFS was much better? I guess problem is in ext3 then? With regards, Andrey. 23.09.2011 11:31, Janne Pikkarainen пишет: Hello, On 09/23/2011 08:51 AM, Andrey wrote: Hello, I have a production mail server with maildir++ structure and about 250GB (~10 millions) of files on the ext3 partition on RAID5. It's mounted with noatime option. These mail server is responsible to local delivery and storing mail messages. System has Debian Squeeze installed and Exim as MDA + Dovecot as IMAP+POP3 server. Bonnie results are terrible. Sequential output for Block and Rewrite are 10722ms and 9232ms. So if there is a 1000 messages in the mail queue load is extremely high, delivery time is very big and server can hang. I did not see such problems with UFS on FreeBSD server. As I understand ext3 file system is really bad for such configurations with Maildir++ (many smaill files)? Is there a way to decrease disk latency on ext3 or speed up it? With regards, Andrey ___ (replying off-list, so the ext3 developers will not start a flamewar) In my opinion ext3 is a terrible file system for your kind of workload, especially if you have lots of concurrent clients accessing their mailboxes. Even though ext3 has evolved over the years and has gained features such as directory indexes, it still is not good for tens of million of frequently changing small files with lots of concurrency. Been there, done that, not gonna do it again. I administer servers with 50 000 - 100 000 user accounts, with couple of thousands active IMAP connections. Personally I switched from ext3 to ReiserFS many years ago and happily used it between 2004-2008, then after things went downhill from ReiserFS development point of view, I switched to XFS during a server hardware refresh. ReiserFS was excellent, but it really started to slow down if file system was more than 85% full and it also got fragmented over time. XFS has been rock-solid and fast since 2008 for me, but it has an achilles heel of its own: if I need to remove lots of files from a huge directory tree, the delete performance is quite sucky compared to other file systems. This has been improved in the later kernel versions with the new delaylog parameter, but how much, I've not yet tested. All this said, the performance of ext3 should not be THAT bad you are describing. Is the bonnie result done while the server is idle or while it has mail clients accessing it all the time? If you have hardware RAID, is there a battery-backed up write cache and are you sure it's enabled? Also, have you tried to mount your file system with barriers disabled? What kind of server setup you have? Something is very wrong. Best regards, Janne Pikkarainen _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
ext3 with maildir++ = huge disk latency and high load
On 9/23/11 4:52 AM, Andrey wrote:
> Thank you for reply, > > BTW, other webserver has almost the same bonnie results (10283ms and > 5884ms) on ext3 partition with 45GB of data (1.5 millions of > files)?! > > Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4 > with SmartArray 6i controller (as I see it comes with 128MB BBWC > enabler but not kit). > > I did not tried to mount fs with barriers disabled. Does it have any > crititcal risks? Yes. If you have write caches on either the raid controller or on the disks behind it which can be lost on a power outage, running without barriers will potentially corrupt your filesystem if you lose power, even though you have ext3's journaling. Journaling depends on write guarantees which are lost if drive write caches evaporate. -Eric > Bonnie tests was performed in the morning when we have a mininmal user load. > > But why the same server with the same RAID(4 disks) but with FreeBSD+UFS was much better? I guess problem is in ext3 then? > > With regards, Andrey. > > 23.09.2011 11:31, Janne Pikkarainen пишет: >> Hello, >> >> On 09/23/2011 08:51 AM, Andrey wrote: >>> Hello, >>> >>> I have a production mail server with maildir++ structure and about >>> 250GB (~10 millions) of files on the ext3 partition on RAID5. It's >>> mounted with noatime option. These mail server is responsible to local >>> delivery and storing mail messages. >>> >>> System has Debian Squeeze installed and Exim as MDA + Dovecot as >>> IMAP+POP3 server. >>> >>> Bonnie results are terrible. Sequential output for Block and Rewrite >>> are 10722ms and 9232ms. So if there is a 1000 messages in the mail >>> queue load is extremely high, delivery time is very big and server can >>> hang. I did not see such problems with UFS on FreeBSD server. >>> >>> As I understand ext3 file system is really bad for such configurations >>> with Maildir++ (many smaill files)? Is there a way to decrease disk >>> latency on ext3 or speed up it? >>> >>> With regards, Andrey >>> >>> ___ >> >> (replying off-list, so the ext3 developers will not start a flamewar) >> >> In my opinion ext3 is a terrible file system for your kind of workload, >> especially if you have lots of concurrent clients accessing their >> mailboxes. Even though ext3 has evolved over the years and has gained >> features such as directory indexes, it still is not good for tens of >> million of frequently changing small files with lots of concurrency. >> Been there, done that, not gonna do it again. I administer servers with >> 50 000 - 100 000 user accounts, with couple of thousands active IMAP >> connections. >> >> Personally I switched from ext3 to ReiserFS many years ago and happily >> used it between 2004-2008, then after things went downhill from ReiserFS >> development point of view, I switched to XFS during a server hardware >> refresh. ReiserFS was excellent, but it really started to slow down if >> file system was more than 85% full and it also got fragmented over time. >> >> XFS has been rock-solid and fast since 2008 for me, but it has an >> achilles heel of its own: if I need to remove lots of files from a huge >> directory tree, the delete performance is quite sucky compared to other >> file systems. This has been improved in the later kernel versions with >> the new delaylog parameter, but how much, I've not yet tested. >> >> All this said, the performance of ext3 should not be THAT bad you are >> describing. Is the bonnie result done while the server is idle or while >> it has mail clients accessing it all the time? If you have hardware >> RAID, is there a battery-backed up write cache and are you sure it's >> enabled? Also, have you tried to mount your file system with barriers >> disabled? What kind of server setup you have? >> >> Something is very wrong. >> >> Best regards, >> >> Janne Pikkarainen >> >> > > _______________________________________________ > Ext3-users mailing list > Ext3-users@redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
ext3 with maildir++ = huge disk latency and high load
On 9/23/11 9:43 AM, Eric Sandeen wrote:
> On 9/23/11 4:52 AM, Andrey wrote: >> Thank you for reply, >> >> BTW, other webserver has almost the same bonnie results (10283ms and >> 5884ms) on ext3 partition with 45GB of data (1.5 millions of >> files)?! >> >> Hardware and RAID5(also hardware) are the same: HP Proliant DL380 G4 >> with SmartArray 6i controller (as I see it comes with 128MB BBWC >> enabler but not kit). >> >> I did not tried to mount fs with barriers disabled. Does it have any >> crititcal risks? > > Yes. If you have write caches on either the raid controller or on > the disks behind it which can be lost on a power outage, running > without barriers will potentially corrupt your filesystem if you lose > power, even though you have ext3's journaling. > > Journaling depends on write guarantees which are lost if drive > write caches evaporate. ... evaporate unexpectedly that is. barriers manage that cache. If write caches are battery-backed (or off), then nobarrier is safe. -Eric > -Eric _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
ext3 with maildir++ = huge disk latency and high load
On 09/22/2011 11:51 PM, Andrey wrote:
Hello, I have a production mail server with maildir++ structure and about 250GB (~10 millions) of files on the ext3 partition on RAID5. It's mounted with noatime option. These mail server is responsible to local delivery and storing mail messages. System has Debian Squeeze installed and Exim as MDA + Dovecot as IMAP+POP3 server. Bonnie results are terrible. Sequential output for Block and Rewrite are 10722ms and 9232ms. So if there is a 1000 messages in the mail queue load is extremely high, delivery time is very big and server can hang. I did not see such problems with UFS on FreeBSD server. As I understand ext3 file system is really bad for such configurations with Maildir++ (many smaill files)? Is there a way to decrease disk latency on ext3 or speed up it? My guess is that your problem is many files in one directory not necessarily having many files on the whole file system. In my experience large directories eat ext3's lunch. The introduction of indexing did help but it still fell behind on performance when compared to some other file systems. You may want to make sure your file system has indexing turned on but with the vintage of your Debian I would assume it is on by default. I ran into this problem many years ago (before indexing was an ext3 option). It was even worse as the Maildir storage was being accessed over NFS. Ended up eventually biting the bullet and moving to WAFL (NetApp). My guess is that users trying to access these large directories via IMAP and POP are also facing large delays and possibly even time outs. Steven _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
ext3 with maildir++ = huge disk latency and high load
Sure, indexing is on by default on Debian ext3. I think I'll try to test
some cases an run bonnie++ on freesh HP server with the same configuration. Also I have maildir with more than 10000 messages an don't have timesouts and access problesm via IMAP to it, that's strange. Sometimes I notice that copying message to Sent folder can wait a little but it's a seldom issue but can corellate with it, I agree. Also I see in Exim logs that DT (delivery time) is equal to more than 2 seconds although user's maildir is almost empty, so I intend to that it is a primary problem of whole ext3 system or RAID5 hardware. 23.09.2011 21:19, Bob пишет: On 09/22/2011 11:51 PM, Andrey wrote: Hello, I have a production mail server with maildir++ structure and about 250GB (~10 millions) of files on the ext3 partition on RAID5. It's mounted with noatime option. These mail server is responsible to local delivery and storing mail messages. System has Debian Squeeze installed and Exim as MDA + Dovecot as IMAP+POP3 server. Bonnie results are terrible. Sequential output for Block and Rewrite are 10722ms and 9232ms. So if there is a 1000 messages in the mail queue load is extremely high, delivery time is very big and server can hang. I did not see such problems with UFS on FreeBSD server. As I understand ext3 file system is really bad for such configurations with Maildir++ (many smaill files)? Is there a way to decrease disk latency on ext3 or speed up it? My guess is that your problem is many files in one directory not necessarily having many files on the whole file system. In my experience large directories eat ext3's lunch. The introduction of indexing did help but it still fell behind on performance when compared to some other file systems. You may want to make sure your file system has indexing turned on but with the vintage of your Debian I would assume it is on by default. I ran into this problem many years ago (before indexing was an ext3 option). It was even worse as the Maildir storage was being accessed over NFS. Ended up eventually biting the bullet and moving to WAFL (NetApp). My guess is that users trying to access these large directories via IMAP and POP are also facing large delays and possibly even time outs. Steven _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
ext3 with maildir++ = huge disk latency and high load
On Sat, Sep 24, 2011 at 09:46:49PM +0400, Andrey wrote:
> Sure, indexing is on by default on Debian ext3. I think I'll try to > test some cases an run bonnie++ on freesh HP server with the same > configuration. For really gargantuan directories, indexing definitely hurts when you do a readdir+stat (i.e. /bin/ls -sF) or readdir+unlink (i.e., rm -rf)/ > Also I have maildir with more than 10000 messages an don't have > timesouts and access problesm via IMAP to it, that's strange. That's probably because this problem can be worked around by doing a readdir, then sorting by the inode number (d_ino), and the doing the stat or unlink. Some programs, especially those that expressly deal with Maildir directories, have this optimization already there. I also have a LD_PRELOAD hack that can be used to demonstrate why putting this is a good idea. You can google for spd_readdir and find it. I'll also put the latest version of it in the contrib directory in e2fsprogs for the next release. - Ted _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
ext3 with maildir++ = huge disk latency and high load
On Sat, Sep 24, 2011 at 03:04:47PM -0400, Ted Ts'o wrote:
> I also have a LD_PRELOAD hack that can be used to demonstrate why > putting this is a good idea. You can google for spd_readdir and find > it. I'll also put the latest version of it in the contrib directory > in e2fsprogs for the next release. While I was looking at spd_readdir.c before including it in e2fsprogs's contrib directory, I realized the last version I released was pretty incomplete, and didn't work with modern-day coreutils. So I'll be including this version into the e2fsprogs git tree, but since in the past I've distributing by sending it to folks via e-mail, here's an updated version of spd_readdir.c. Please try this to any older versions that you might find in mailing list archives. Note that this preload is not going to work for all programs. In particular, although it does supply readdir_r(), it is *not* thread safe. So I can't recommend this as something to be dropped in /etc/ld.so.preload. - Ted /* * readdir accelerator * * (C) Copyright 2003, 2004 by Theodore Ts'o. * * Compile using the command: * * gcc -o spd_readdir.so -fPIC -shared spd_readdir.c -ldl * * Use it by setting the LD_PRELOAD environment variable: * * export LD_PRELOAD=/usr/local/sbin/spd_readdir.so * * %Begin-Header% * This file may be redistributed under the terms of the GNU Public * License, version 2. * %End-Header% * */ #define ALLOC_STEPSIZE 100 #define MAX_DIRSIZE 0 #define DEBUG #ifdef DEBUG #define DEBUG_DIR(x) {if (do_debug) { x; }} #else #define DEBUG_DIR(x) #endif #define _GNU_SOURCE #define __USE_LARGEFILE64 #include <stdio.h> #include <unistd.h> #include <sys/types.h> #include <sys/stat.h> #include <stdlib.h> #include <string.h> #include <dirent.h> #include <errno.h> #include <dlfcn.h> struct dirent_s { unsigned long long d_ino; long long d_off; unsigned short int d_reclen; unsigned char d_type; char *d_name; }; struct dir_s { DIR *dir; int num; int max; struct dirent_s *dp; int pos; int direct; struct dirent ret_dir; struct dirent64 ret_dir64; }; static int (*real_closedir)(DIR *dir) = 0; static DIR *(*real_opendir)(const char *name) = 0; static DIR *(*real_fdopendir)(int fd) = 0; static void *(*real_rewinddir)(DIR *dirp) = 0; static struct dirent *(*real_readdir)(DIR *dir) = 0; static int (*real_readdir_r)(DIR *dir, struct dirent *entry, struct dirent **result) = 0; static struct dirent64 *(*real_readdir64)(DIR *dir) = 0; static off_t (*real_telldir)(DIR *dir) = 0; static void (*real_seekdir)(DIR *dir, off_t offset) = 0; static int (*real_dirfd)(DIR *dir) = 0; static unsigned long max_dirsize = MAX_DIRSIZE; static int num_open = 0; #ifdef DEBUG static int do_debug = 0; #endif static void setup_ptr() { char *cp; real_opendir = dlsym(RTLD_NEXT, "opendir"); real_fdopendir = dlsym(RTLD_NEXT, "fdopendir"); real_closedir = dlsym(RTLD_NEXT, "closedir"); real_rewinddir = dlsym(RTLD_NEXT, "rewinddir"); real_readdir = dlsym(RTLD_NEXT, "readdir"); real_readdir_r = dlsym(RTLD_NEXT, "readdir_r"); real_readdir64 = dlsym(RTLD_NEXT, "readdir64"); real_telldir = dlsym(RTLD_NEXT, "telldir"); real_seekdir = dlsym(RTLD_NEXT, "seekdir"); real_dirfd = dlsym(RTLD_NEXT, "dirfd"); if ((cp = getenv("SPD_READDIR_MAX_SIZE")) != NULL) { max_dirsize = atol(cp); } #ifdef DEBUG if (getenv("SPD_READDIR_DEBUG")) { printf("initialized! "); do_debug++; } #endif } static void free_cached_dir(struct dir_s *dirstruct) { int i; if (!dirstruct->dp) return; for (i=0; i < dirstruct->num; i++) { free(dirstruct->dp[i].d_name); } free(dirstruct->dp); dirstruct->dp = 0; dirstruct->max = dirstruct->num = 0; } static int ino_cmp(const void *a, const void *b) { const struct dirent_s *ds_a = (const struct dirent_s *) a; const struct dirent_s *ds_b = (const struct dirent_s *) b; ino_t i_a, i_b; i_a = ds_a->d_ino; i_b = ds_b->d_ino; if (ds_a->d_name[0] == '.') { if (ds_a->d_name[1] == 0) i_a = 0; else if ((ds_a->d_name[1] == '.') && (ds_a->d_name[2] == 0)) i_a = 1; } if (ds_b->d_name[0] == '.') { if (ds_b->d_name[1] == 0) i_b = 0; else if ((ds_b->d_name[1] == '.') && (ds_b->d_name[2] == 0)) i_b = 1; } return (i_a - i_b); } struct dir_s *alloc_dirstruct(DIR *dir) { struct dir_s *dirstruct; dirstruct = malloc(sizeof(struct dir_s)); if (dirstruct) memset(dirstruct, 0, sizeof(struct dir_s)); dirstruct->dir = dir; return dirstruct; } void cache_dirstruct(struct dir_s *dirstruct) { struct dirent_s *ds, *dnew; struct dirent64 *d; while ((d = (*real_readdir64)(dirstruct->dir)) != NULL) { if (dirstruct->num >= dirstruct->max) { dirstruct->max += ALLOC_STEPSIZE; DEBUG_DIR(printf("Reallocating to size %d ", dirstruct->max)); dnew = realloc(dirstruct->dp, dirstruct->max * sizeof(struct dir_s)); if (!dnew) goto nomem; dirstruct->dp = dnew; } ds = &dirstruct->dp[dirstruct->num++]; ds->d_ino = d->d_ino; ds->d_off = d->d_off; ds->d_reclen = d->d_reclen; ds->d_type = d->d_type; if ((ds->d_name = malloc(strlen(d->d_name)+1)) == NULL) { dirstruct->num--; goto nomem; } strcpy(ds->d_name, d->d_name); DEBUG_DIR(printf("readdir: %lu %s ", (unsigned long) d->d_ino, d->d_name)); } qsort(dirstruct->dp, dirstruct->num, sizeof(struct dirent_s), ino_cmp); return; nomem: DEBUG_DIR(printf("No memory, backing off to direct readdir ")); free_cached_dir(dirstruct); dirstruct->direct = 1; } DIR *opendir(const char *name) { DIR *dir; struct dir_s *dirstruct; struct stat st; if (!real_opendir) setup_ptr(); DEBUG_DIR(printf("Opendir(%s) (%d open) ", name, num_open++)); dir = (*real_opendir)(name); if (!dir) return NULL; dirstruct = alloc_dirstruct(dir); if (!dirstruct) { (*real_closedir)(dir); errno = -ENOMEM; return NULL; } if (max_dirsize && (stat(name, &st) == 0) && (st.st_size > max_dirsize)) { DEBUG_DIR(printf("Directory size %ld, using direct readdir ", st.st_size)); dirstruct->direct = 1; return (DIR *) dirstruct; } cache_dirstruct(dirstruct); return ((DIR *) dirstruct); } DIR *fdopendir(int fd) { DIR *dir; struct dir_s *dirstruct; struct stat st; if (!real_fdopendir) setup_ptr(); DEBUG_DIR(printf("fdpendir(%d) (%d open) ", fd, num_open++)); dir = (*real_fdopendir)(fd); if (!dir) return NULL; dirstruct = alloc_dirstruct(dir); if (!dirstruct) { (*real_closedir)(dir); errno = -ENOMEM; return NULL; } if (max_dirsize && (fstat(fd, &st) == 0) && (st.st_size > max_dirsize)) { DEBUG_DIR(printf("Directory size %ld, using direct readdir ", st.st_size)); dirstruct->dir = dir; dirstruct->direct = 1; return (DIR *) dirstruct; } cache_dirstruct(dirstruct); return ((DIR *) dirstruct); } int closedir(DIR *dir) { struct dir_s *dirstruct = (struct dir_s *) dir; DEBUG_DIR(printf("Closedir (%d open) ", --num_open)); if (dirstruct->dir) (*real_closedir)(dirstruct->dir); free_cached_dir(dirstruct); free(dirstruct); return 0; } struct dirent *readdir(DIR *dir) { struct dir_s *dirstruct = (struct dir_s *) dir; struct dirent_s *ds; if (dirstruct->direct) return (*real_readdir)(dirstruct->dir); if (dirstruct->pos >= dirstruct->num) return NULL; ds = &dirstruct->dp[dirstruct->pos++]; dirstruct->ret_dir.d_ino = ds->d_ino; dirstruct->ret_dir.d_off = ds->d_off; dirstruct->ret_dir.d_reclen = ds->d_reclen; dirstruct->ret_dir.d_type = ds->d_type; strncpy(dirstruct->ret_dir.d_name, ds->d_name, sizeof(dirstruct->ret_dir.d_name)); return (&dirstruct->ret_dir); } int readdir_r(DIR *dir, struct dirent *entry, struct dirent **result) { struct dir_s *dirstruct = (struct dir_s *) dir; struct dirent_s *ds; if (dirstruct->direct) return (*real_readdir_r)(dirstruct->dir, entry, result); if (dirstruct->pos >= dirstruct->num) { *result = NULL; return 0; } ds = &dirstruct->dp[dirstruct->pos++]; entry->d_ino = ds->d_ino; entry->d_off = ds->d_off; entry->d_reclen = ds->d_reclen; entry->d_type = ds->d_type; strncpy(entry->d_name, ds->d_name, sizeof(entry->d_name)); *result = entry; return 0; } struct dirent64 *readdir64(DIR *dir) { struct dir_s *dirstruct = (struct dir_s *) dir; struct dirent_s *ds; if (dirstruct->direct) return (*real_readdir64)(dirstruct->dir); if (dirstruct->pos >= dirstruct->num) return NULL; ds = &dirstruct->dp[dirstruct->pos++]; dirstruct->ret_dir64.d_ino = ds->d_ino; dirstruct->ret_dir64.d_off = ds->d_off; dirstruct->ret_dir64.d_reclen = ds->d_reclen; dirstruct->ret_dir64.d_type = ds->d_type; strncpy(dirstruct->ret_dir64.d_name, ds->d_name, sizeof(dirstruct->ret_dir64.d_name)); return (&dirstruct->ret_dir64); } off_t telldir(DIR *dir) { struct dir_s *dirstruct = (struct dir_s *) dir; if (dirstruct->direct) return (*real_telldir)(dirstruct->dir); return ((off_t) dirstruct->pos); } void seekdir(DIR *dir, off_t offset) { struct dir_s *dirstruct = (struct dir_s *) dir; if (dirstruct->direct) { (*real_seekdir)(dirstruct->dir, offset); return; } dirstruct->pos = offset; } void rewinddir(DIR *dir) { struct dir_s *dirstruct = (struct dir_s *) dir; (*real_rewinddir)(dirstruct->dir); if (dirstruct->direct) return; dirstruct->pos = 0; free_cached_dir(dirstruct); cache_dirstruct(dirstruct); } int dirfd(DIR *dir) { struct dir_s *dirstruct = (struct dir_s *) dir; int fd = (*real_dirfd)(dirstruct->dir); DEBUG_DIR(printf("dirfd %d, %p ", fd, real_dirfd)); return fd; } _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
ext3 with maildir++ = huge disk latency and high load
On Sat, Sep 24, 2011 at 03:04:47PM -0400, Ted Ts'o wrote:
> I also have a LD_PRELOAD hack that can be used to demonstrate why > putting this is a good idea. You can google for spd_readdir and find > it. I'll also put the latest version of it in the contrib directory > in e2fsprogs for the next release. What we've started doing in Lustre (which has to deal with network latency, but the same problem in terms of htree vs. inode ordering) is to detect if the application is doing readdir+stat on the dirents in readdir order, and then fork a thread to statahead the entries in the kernel. It would be possible to do something like this in the ext4 readdir code to do dirent readahead, sort, and then prefetch the inodes in order (partially or completely, depending on the directory size), but as yet we aren't working on anything at the ext4 level. There was a patch to do something similar to this for btrfs as well, with the DCACHE_NEED_LOOKUP flag. That avoids a lot of the complexity between instantiating dcache entries from readdir without yet having read the inode from disk. The other proposal I've made in the past is to try and allocate inodes from the inode table in roughly hash order, so that when it comes time to do readdir+stat that the dirents and inodes are already partially in the same order. That breaks down in case of renames, but works well for normal usage. Cheers, Andreas _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
ext3 with maildir++ = huge disk latency and high load
On Sun, Sep 25, 2011 at 12:16:12AM -0600, Andreas Dilger wrote:
> > It would be possible to do something like this in the ext4 readdir > code to do dirent readahead, sort, and then prefetch the inodes > in order (partially or completely, depending on the directory size), > but as yet we aren't working on anything at the ext4 level. What we have in ext4 right now is if we need to do disk i/o to read from the inode table, we will read in adjacent blocks from the inode table, on the theory that the effort needed to read in 32k versus 4k is pretty much the same. So if the inodes were allocated all at the same time, they will be sequentially ordered, and so the inode table readahead should help quite a lot. I'll note that with really large maildirs, especially on a mail server with many other maildirs, over time the inodes for each individual file will get scattered all over the place, and so pretty much any scheme that uses a inode table separate from the blocks where the directory entries are stored is going to get hammered by this use case. Ultimately, the best way to solve this problem is a more intelligent application that caches the contents of the key headers in a database, so you don't need to scan the contents of the entire Maildir when doing common IMAP operations. - Ted _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
ext3 with maildir++ = huge disk latency and high load
Ok. Here are bonnie results on fresh installed Debian with 200GB FREE
ext3 /home partitition (4 disks in RAID5 on HP Proliant DL380 G4 server): Version 1.96 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP debian 2G 243 97 22555 10 8794 2 1810 97 120444 11 317.0 5 Latency 135ms 967ms 723ms 26526us 13143us 586ms Latency is also very bad according results. What is the reason? Hardware or ext3 itseld? Will try with xfs an ext4 and compare then. 26.09.2011 01:21, Ted Ts'o пишет: On Sun, Sep 25, 2011 at 12:16:12AM -0600, Andreas Dilger wrote: It would be possible to do something like this in the ext4 readdir code to do dirent readahead, sort, and then prefetch the inodes in order (partially or completely, depending on the directory size), but as yet we aren't working on anything at the ext4 level. What we have in ext4 right now is if we need to do disk i/o to read from the inode table, we will read in adjacent blocks from the inode table, on the theory that the effort needed to read in 32k versus 4k is pretty much the same. So if the inodes were allocated all at the same time, they will be sequentially ordered, and so the inode table readahead should help quite a lot. I'll note that with really large maildirs, especially on a mail server with many other maildirs, over time the inodes for each individual file will get scattered all over the place, and so pretty much any scheme that uses a inode table separate from the blocks where the directory entries are stored is going to get hammered by this use case. Ultimately, the best way to solve this problem is a more intelligent application that caches the contents of the key headers in a database, so you don't need to scan the contents of the entire Maildir when doing common IMAP operations. - Ted _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
| All times are GMT. The time now is 03:21 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.