FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > EXT3 Users

 
 
LinkBack Thread Tools
 
Old 12-25-2009, 06:32 PM
Christian Kujau
 
Default benchmark results

On Fri, 25 Dec 2009 at 10:56, Christian Kujau wrote:
> Thanks for the hint, I could find sys/vm/drop-caches documented in
------------------------------^ not, was what I meant to say,
but it's all there, as "drop_caches" in Documentation/sysctl/vm.txt

Christian.

> Documentation/ but it's good to know there's a way to flush all these
> caces via this knob. Maybe I should add this to those "genric" tests to be
> more comparable to the other benchmarks.
--
BOFH excuse #129:

The ring needs another token

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 12-26-2009, 06:06 PM
Christian Kujau
 
Default benchmark results

On 26.12.09 08:00, jim owens wrote:
>> I was using "sync" to make sure that the data "should" be on the disks
>
> Good, but not good enough for many tests... info sync
[...]
> On Linux, sync is only guaranteed to schedule the dirty blocks for
> writing; it can actually take a short time before all the blocks are
> finally written.

Noted, many times already. That's why I wrote "should be" - but in this
special scenario (filesystem speed tests) I don't care for file
integrity: if I pull the plug after "sync" and some data didn't make it
to the disks, I'll only look if the testscript got all the timestamps
and move on to the next test. I'm not testing for "filesystem integrity
after someone pulls the plug" here. And remember, I'm doing "sync" for
all the filesystems tested, so the comparison still stands.

Christian.

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 12-26-2009, 06:19 PM
 
Default benchmark results

On Sat, Dec 26, 2009 at 11:00:59AM -0500, jim owens wrote:
> Christian Kujau wrote:
>
> > I was using "sync" to make sure that the data "should" be on the disks
>
> Good, but not good enough for many tests... info sync
>
> CONFORMING TO
> POSIX.2
>
> NOTES
> On Linux, sync is only guaranteed to schedule the dirty blocks for
> writing; it can actually take a short time before all the blocks are
> finally written.
>
> This is consistent with all the feels-like-unix OSes I have used.

Actually, Linux's sync does more than just schedule the writes; it has
for quite some time:

static void sync_filesystems(int wait)
{
...
}

SYSCALL_DEFINE0(sync)
{
wakeup_flusher_threads(0);
sync_filesystems(0);
sync_filesystems(1);
if (unlikely(laptop_mode))
laptop_sync_completion();
return 0;
}

At least for ext3 and ext4, we will even do a device barrier operation
as a restult of a call to sb->s_op->sync_fs() --- which is called by
__sync_filesystem, which is called in turn by sync_filesystems().
This isn't done for all file systems, though, as near as I can tell.
(Ext2 at least doesn't.)

But for quite some time, under Linux the sync(2) system call will wait
for the blocks to be flushed out to HBA, although we currently don't
wait for the blocks to have been committed to the platters (at least
not for all file systems).

Applications shouldn't depend on this, of course, since POSIX and
other legacy Unix systems don't guarantee this. But in terms of
knowing what Linux does, the man page is a bit out of date.

Best regards,

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 12-27-2009, 08:55 PM
Christian Kujau
 
Default benchmark results

On Sun, 27 Dec 2009 at 14:50, jim owens wrote:
> And I don't even care about comparing 2 filesystems, I only care about
> timing 2 versions of code in the single filesystem I am working on,
> and forgetting about hardware cache effects has screwed me there.

Not me, I'm comparing filesystems - and when the HBA or whatever plays
tricks and "sync" doesn't flush all the data, it'll do so for every tested
filesystem. Of course, filesystem could handle "sync" differently, and
they probably do, hence the different times they take to complete. That's
what my tests are about: timing comparision (does that still fall under
the "benchmark" category?), not functional comparision. That's left as a
task for the reader of these results: "hm, filesystem xy is so much faster
when doing foo, why is that? And am I willing to sacrifice e.g. proper
syncs to gain more speed?"

> So unless you are sure you have no hardware cache effects...
> "the comparison still stands" is *false*.

Again, I don't argue with "hardware caches will have effects", but that's
not the point of these tests. Of course hardware is different, but
filesystems are too and I'm testing filesystems (on the same hardware).

Christian.
--
BOFH excuse #278:

The Dilithium Crystals need to be rotated.

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 12-27-2009, 09:33 PM
 
Default benchmark results

On Sun, Dec 27, 2009 at 01:55:26PM -0800, Christian Kujau wrote:
> On Sun, 27 Dec 2009 at 14:50, jim owens wrote:
> > And I don't even care about comparing 2 filesystems, I only care about
> > timing 2 versions of code in the single filesystem I am working on,
> > and forgetting about hardware cache effects has screwed me there.
>
> Not me, I'm comparing filesystems - and when the HBA or whatever plays
> tricks and "sync" doesn't flush all the data, it'll do so for every tested
> filesystem. Of course, filesystem could handle "sync" differently, and
> they probably do, hence the different times they take to complete. That's
> what my tests are about: timing comparision (does that still fall under
> the "benchmark" category?), not functional comparision. That's left as a
> task for the reader of these results: "hm, filesystem xy is so much faster
> when doing foo, why is that? And am I willing to sacrifice e.g. proper
> syncs to gain more speed?"

Yes, but given many of the file systems have almost *exactly* the same
bandwidth measurement for the "cp" test, and said bandwidth
measurement is 5 times the disk bandwidith as measured by hdparm, it
makes me suspect that you are doing this:

/bin/time /bin/cp -r /source/tree /filesystem-under-test
sync
/bin/time /bin/rm -rf /filesystem-under-test/tree
sync

etc.

It is *a* measurement, but the question is whether it's a useful
comparison. Consider two different file systems. One file system
which does a very good job making sure that file writes are done
contiguously to disk, minimizing seek overhead --- and another file
system which is really crappy at disk allocation, and writes the files
to random locations all over the disk. If you are only measuring the
"cp", then the fact that filesystem 'A' has a very good layout, and is
able to write things to disk very efficiently, and filesystem 'B' has
files written in a really horrible way, won't be measured by your
test. This is especially true if, for example, you have 8GB of memory
and you are copying 4GB worth of data.

You might notice it if you include the "sync" in the timing, i.e.:

/bin/time /bin/sh -c "/bin/cp -r /source/tree /filesystem-under-test;/bin/sync"

> Again, I don't argue with "hardware caches will have effects", but that's
> not the point of these tests. Of course hardware is different, but
> filesystems are too and I'm testing filesystems (on the same hardware).

The question is whether your tests are doing the best job of measuring
how good the filesystem really is. If your workload is one where you
will only be copying file sets much smaller than your memory, and you
don't care about when the data actually hits the disk, only when
"/bin/cp" returns, then sure, do whatever you want. But if you want
the tests to have meaning if, for example, you have 2GB of memory and
you are copying 8GB of data, or if later on will be continuously
streaming data to the disk, and sooner or later the need to write data
to the disk will start slowing down your real-life workload, then not
including the time to do the sync in the time to copy your file set
may cause you to assume that filesystems 'A' and 'B' are identical in
performance, and then your filesystem comparison will end up
misleading you.

The bottom line is that it's very hard to do good comparisons that are
useful in the general case.

Best regards,

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 12-28-2009, 12:24 AM
Christian Kujau
 
Default benchmark results

On Sun, 27 Dec 2009 at 17:33, tytso@mit.edu wrote:
> Yes, but given many of the file systems have almost *exactly* the same

"Almost" indeed - but curiously enough some filesystem are *not* the same,
although they should. Again: we have 8GB RAM, I'm copying ~3GB of data, so
why _are_ there differences? (Answer: because filesystems are different).
That's the only point of this test. Also note the disclaimer[0] I added to
the results page a few days ago.

> measurement is 5 times the disk bandwidith as measured by hdparm, it
> makes me suspect that you are doing this:
> /bin/time /bin/cp -r /source/tree /filesystem-under-test
> sync

No, I'm not - see the test script[1] - I'm taking the time for cp/rm/tar
*and* sync. But even if I would only take the time *only* for say "cp",
not the sync part. Still, it would be a valid comparison across
filesystems (the same operation for every filesystem) also a not very
realistic one - because in the real world I *want* to make sure my data is
on the disk. But that's as far as I go in these tests, I'm not even
messing around with disk caches or HBA caches - that's not the scope of
these tests.

> You might notice it if you include the "sync" in the timing, i.e.:
> /bin/time /bin/sh -c "/bin/cp -r /source/tree /filesystem-under-test;/bin/sync"

Yes, that's exactly what the tests do.

> "/bin/cp" returns, then sure, do whatever you want. But if you want
> the tests to have meaning if, for example, you have 2GB of memory and
> you are copying 8GB of data,

For the bonnie++ tests I chose a filesize (16GB) so that disk performance
will matter here. As the generic tests shuffle around much more smaller
data, no disk performance, but filesystem performance is measured (and
compared to other filesystems) - well aware of the fact that caches *Are*
being used. Why would I want to discard caches? My daily usage pattern
(opening webrowsers, terminal windows, spreadcheats deal with much smaller
datasets and I'm happy that Linux is so hungry for cache - yet some
filesystems do not seem to utilize this opportunity as good as others do.
That's the whole point of this particular test. But constantly explaining
my point over and over again I see what I have to do: I shall run the
generic tests again with much bigger datasets, so that disk-performance is
also reflected, as people do seem to care about this (I don't - I can
switch filesystems more easily than disks).

> The bottom line is that it's very hard to do good comparisons that are
> useful in the general case.

And it's difficult to find out what's a "useful comparison" for the
general public :-)

Christian.

[0] http://nerdbynature.de/benchmarks/v40z/2009-12-22/
[1] http://nerdbynature.de/benchmarks/v40z/2009-12-22/env/fs-bench.sh.txt
--
BOFH excuse #292:

We ran out of dial tone and we're and waiting for the phone company to deliver another bottle.

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 12-28-2009, 01:08 PM
Larry McVoy
 
Default benchmark results

> The bottom line is that it's very hard to do good comparisons that are
> useful in the general case.

It has always amazed me watching people go about benchmarking. I should
have a blog called "you're doing it wrong" or something.

Personally, I use benchmarks to validate what I already believe to be true.
So before I start I have a predicition as to what the answer should be,
based on my understanding of the system being measured. Back when I
was doing this a lot, I was always within a factor of 10 (not a big
deal) and usually within a factor of 2 (quite a bit bigger deal).
When things didn't match up that was a clue that either

- the benchmark was broken
- the code was broken
- the hardware was broken
- my understanding was broken

If you start a benchmark and you don't know what the answer should be,
at the very least within a factor of 10 and ideally within a factor of 2,
you shouldn't be running the benchmark. Well, maybe you should, they
are fun. But you sure as heck shouldn't be publishing results unless
you know they are correct.

This is why lmbench, to toot my own horn, measures what it does. If go
run that, memorize the results, you can tell yourself "well, this machine
has sustained memory copy bandwidth of 3.2GB/sec, the disk I'm using
can read at 60MB/sec and write at 52MB/sec (on the outer zone where I'm
going to run my tests), it does small seeks in about 6 milliseconds,
I'm doing sequential I/O, the bcopy is in the noise, the blocks are big
enough that the seeks are hidden, so I'd like to see a steady 50MB/sec
or so on a sustained copy test".

If you have a mental model for how the bits of the system works you
can decompose the benchmark into the parts, predict the result, run
it, and compare. It'll match or Lucy, you have some 'splainin to do.
--
---
Larry McVoy lm at bitmover.com http://www.bitkeeper.com

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 01-04-2010, 03:27 PM
Chris Mason
 
Default benchmark results

On Fri, Dec 25, 2009 at 11:11:46AM -0500, tytso@mit.edu wrote:
> On Fri, Dec 25, 2009 at 02:46:31AM +0300, Evgeniy Polyakov wrote:
> > > [1] http://samba.org/ftp/tridge/dbench/README
> >
> > Was not able to resist to write a small notice, what no matter what, but
> > whatever benchmark is running, it _does_ show system behaviour in one
> > or another condition. And when system behaves rather badly, it is quite
> > a common comment, that benchmark was useless. But it did show that
> > system has a problem, even if rarely triggered one
>
> If people are using benchmarks to improve file system, and a benchmark
> shows a problem, then trying to remedy the performance issue is a good
> thing to do, of course. Sometimes, though the case which is
> demonstrated by a poor benchmark is an extremely rare corner case that
> doesn't accurately reflect common real-life workloads --- and if
> addressing it results in a tradeoff which degrades much more common
> real-life situations, then that would be a bad thing.
>
> In situations where benchmarks are used competitively, it's rare that
> it's actually a *problem*. Instead it's much more common that a
> developer is trying to prove that their file system is *better* to
> gullible users who think that a single one-dimentional number is
> enough for them to chose file system X over file system Y.

[ Look at all this email from my vacation...sorry for the delay ]

It's important that people take benchmarks from filesystem developers
with a big grain of salt, which is one reason the boxacle.net results
are so nice. Steve more than willing to take patches and experiment to
improve a given FS results, but his business is a fair representation of
performance and it shows.

>
> For example, if I wanted to play that game and tell people that ext4
> is better, I'd might pick this graph:
>
> http://btrfs.boxacle.net/repository/single-disk/2.6.29-rc2/2.6.29-rc2/2.6.29-rc2_Mail_server_simulation._num_threads=32.html
>
> On the other hand, this one shows ext4 as the worst compared to all
> other file systems:
>
> http://btrfs.boxacle.net/repository/single-disk/2.6.29-rc2/2.6.29-rc2/2.6.29-rc2_Large_file_random_writes_odirect._num_threads= 8.html
>
> Benchmarking, like statistics, can be extremely deceptive, and if
> people do things like carefully order a tar file so the files are
> optimal for a file system, it's fair to ask whether that's a common
> thing for people to be doing (either unpacking tarballs or unpacking
> tarballs whose files have been carefully ordered for a particular file
> systems).

I tend to use compilebench for testing the ability to create lots of
small files, which puts the file names into FS native order (by
unpacking and then readdiring the results) before it does any timings.

I'd agree with Larry that benchmarking is most useful to test a theory.
Here's a patch that is supposed to do xyz, is that actually true. With
that said we should also be trying to write benchmarks that show the
worst case...we know some of our design weakness and should be able to
show numbers for how bad it really is (see the random write
btrfs.boxacle.net tests for that one).

-chris

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 01-04-2010, 05:57 PM
Michael Rubin
 
Default benchmark results

Google is currently in the middle of upgrading from ext2 to a more up
to date file system. We ended up choosing ext4. This thread touches
upon many of the issues we wrestled with, so I thought it would be
interesting to share. We should be sending out more details soon.

The driving performance reason to upgrade is that while ext2 had been "good
enough" for a very long time the metadata arrangement on a stale file
system was leading to what we call "read inflation". This is where we
end up doing many seeks to read one block of data. In general latency
from poor block allocation was causing performance hiccups.

We spent a lot of time with unix standard benchmarks (dbench, compile
bench, et al) on xfs, ext4, jfs to try to see which one was going to
perform the best. In the end we mostly ended up using the benchmarks
to validate our assumptions and do functional testing. Larry is
completely right IMHO. These benchmarks were instrumental in helping
us understand how the file systems worked in controlled situations and
gain confidence from our customers.

For our workloads we saw ext4 and xfs as "close enough" in performance
in the areas we cared about. The fact that we had a much smoother
upgrade path with ext4 clinched the deal. The only upgrade option we
have is online. ext4 is already moving the bottleneck away from the
storage stack for some of our most intensive applications.

It was not until we moved from benchmarks to customer workload that we
were able to make detailed performance comparisons and find bugs in
our implementation.

"Iterate often" seems to be the winning strategy for SW dev. But when
it involves rebooting a cloud of systems and making a one way
conversion of their data it can get messy. That said I see benchmarks
as tools to build confidence before running traffic on redundant live
systems.

mrubin

PS for some reason "dbench" holds mythical power over many folks I
have met. They just believe it's the most trusted and standard
benchmark for file systems. In my experience it often acts as a random
number generator. It has found some bugs in our code as it exercises
the VFS layer very well.

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 
Old 01-04-2010, 11:41 PM
Dave Chinner
 
Default benchmark results

On Mon, Jan 04, 2010 at 11:27:48AM -0500, Chris Mason wrote:
> On Fri, Dec 25, 2009 at 11:11:46AM -0500, tytso@mit.edu wrote:
> > On Fri, Dec 25, 2009 at 02:46:31AM +0300, Evgeniy Polyakov wrote:
> > > > [1] http://samba.org/ftp/tridge/dbench/README
> > >
> > > Was not able to resist to write a small notice, what no matter what, but
> > > whatever benchmark is running, it _does_ show system behaviour in one
> > > or another condition. And when system behaves rather badly, it is quite
> > > a common comment, that benchmark was useless. But it did show that
> > > system has a problem, even if rarely triggered one
> >
> > If people are using benchmarks to improve file system, and a benchmark
> > shows a problem, then trying to remedy the performance issue is a good
> > thing to do, of course. Sometimes, though the case which is
> > demonstrated by a poor benchmark is an extremely rare corner case that
> > doesn't accurately reflect common real-life workloads --- and if
> > addressing it results in a tradeoff which degrades much more common
> > real-life situations, then that would be a bad thing.
> >
> > In situations where benchmarks are used competitively, it's rare that
> > it's actually a *problem*. Instead it's much more common that a
> > developer is trying to prove that their file system is *better* to
> > gullible users who think that a single one-dimentional number is
> > enough for them to chose file system X over file system Y.
>
> [ Look at all this email from my vacation...sorry for the delay ]
>
> It's important that people take benchmarks from filesystem developers
> with a big grain of salt, which is one reason the boxacle.net results
> are so nice. Steve more than willing to take patches and experiment to
> improve a given FS results, but his business is a fair representation of
> performance and it shows.

Just looking at the results there, I notice that the RAID system XFS
mailserver results dropped by an order of magnitude between
2.6.29-rc2 and 2.6.31. The single disk results are pretty
much identical across the two kernels.

IIRC, in 2.6.31 RAID0 started passing barriers through so I suspect
this is the issue. However, seeing as dmesg is not collected by
the scripts after the run and the output of the mounttab does
not show default options, I cannot tell if this is the case. This
might be worth checking by running XFS with the "nobarrier" mount
option....

FWIW, is it possible to get these benchmarks run on each filesystem for
each kernel release so ext/xfs/btrfs all get some regular basic
performance regression test coverage?

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
 

Thread Tools




All times are GMT. The time now is 06:29 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org