FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 01-27-2012, 12:06 AM
Richard Sharpe
 
Default dd to a striped device with 9 disks gets much lower throughput when oflag=direct used

Hi,

Perhaps I am doing something stupid, but I would like to understand
why there is a difference in the following situation.

I have defined a stripe device thusly:

"echo 0 17560535040 striped 9 8 /dev/sdd 0 /dev/sde 0 /dev/sdf 0
/dev/sdg 0 /dev/sdh 0 /dev/sdi 0 /dev/sdj 0 /dev/sdk 0 /dev/sdl 0 |
dmsetup create stripe_dev"

Then is did the following:

dd if=/dev/zero of=/dev/mapper/stripe_dev bs=262144 count=1000000

and I got 880 MB/s

However, when I changed that command to:

dd if=/dev/zero of=/dev/mapper/stripe_dev bs=262144 count=1000000
oflag=direct

I get 210 MB/s reliably.

The system in question is a 16 core (probably two CPUs) Intel Xeon
E5620 @2.40Ghz with 64GB of memory and 12 7200PRM SATA drives
connected to an LSI SAS controller but set up as a JBOD of 12 drives.

Why do I see such a big performance difference? Does writing to the
device also use the page cache if I don't specify DIRECT IO?

--
Regards,
Richard Sharpe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-27-2012, 05:54 AM
Hannes Reinecke
 
Default dd to a striped device with 9 disks gets much lower throughput when oflag=direct used

On 01/27/2012 02:06 AM, Richard Sharpe wrote:
> Hi,
>
> Perhaps I am doing something stupid, but I would like to understand
> why there is a difference in the following situation.
>
> I have defined a stripe device thusly:
>
> "echo 0 17560535040 striped 9 8 /dev/sdd 0 /dev/sde 0 /dev/sdf 0
> /dev/sdg 0 /dev/sdh 0 /dev/sdi 0 /dev/sdj 0 /dev/sdk 0 /dev/sdl 0 |
> dmsetup create stripe_dev"
>
> Then is did the following:
>
> dd if=/dev/zero of=/dev/mapper/stripe_dev bs=262144 count=1000000
>
> and I got 880 MB/s
>
> However, when I changed that command to:
>
> dd if=/dev/zero of=/dev/mapper/stripe_dev bs=262144 count=1000000
> oflag=direct
>
> I get 210 MB/s reliably.
>
> The system in question is a 16 core (probably two CPUs) Intel Xeon
> E5620 @2.40Ghz with 64GB of memory and 12 7200PRM SATA drives
> connected to an LSI SAS controller but set up as a JBOD of 12 drives.
>
> Why do I see such a big performance difference? Does writing to the
> device also use the page cache if I don't specify DIRECT IO?
>
Yes. All I/O using read/write calls is going via the pagecache.
The only way to circumvent this is to use DIRECT_IO.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-27-2012, 07:52 AM
Christoph Hellwig
 
Default dd to a striped device with 9 disks gets much lower throughput when oflag=direct used

On Thu, Jan 26, 2012 at 05:06:42PM -0800, Richard Sharpe wrote:
> Why do I see such a big performance difference? Does writing to the
> device also use the page cache if I don't specify DIRECT IO?

Yes. Trying adding conv=fdatasync to both versions to get more
realistic results.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-27-2012, 02:03 PM
Richard Sharpe
 
Default dd to a striped device with 9 disks gets much lower throughput when oflag=direct used

On Fri, Jan 27, 2012 at 12:52 AM, Christoph Hellwig <hch@infradead.org> wrote:
> On Thu, Jan 26, 2012 at 05:06:42PM -0800, Richard Sharpe wrote:
>> Why do I see such a big performance difference? Does writing to the
>> device also use the page cache if I don't specify DIRECT IO?
>
> Yes. *Trying adding conv=fdatasync to both versions to get more
> realistic results.

Thank you for that advice. I am comparing btrfs vs rolling my own
thing using the new dm thin-provisioning approach to get something
with resilient metadata, but I need to support two different types of
IO, one that uses directio and one that can take advantage of the page
cache.

So far, btrfs gives me around 800MB/s with a similar setup (can't get
exactly the same setup) without DIRECTIO and 450MB/s with DIRECTIO. a
dm striped setup is giving me about 10% better throughput without
DIRECTIO but only about 45% of the performance with DIRECTIO.

Anyway, I now understand. I will run my scripts with conv=fdatasync as well.

--
Regards,
Richard Sharpe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-27-2012, 02:16 PM
Zdenek Kabelac
 
Default dd to a striped device with 9 disks gets much lower throughput when oflag=direct used

Dne 27.1.2012 16:03, Richard Sharpe napsal(a):

On Fri, Jan 27, 2012 at 12:52 AM, Christoph Hellwig<hch@infradead.org> wrote:

On Thu, Jan 26, 2012 at 05:06:42PM -0800, Richard Sharpe wrote:

Why do I see such a big performance difference? Does writing to the
device also use the page cache if I don't specify DIRECT IO?


Yes. Trying adding conv=fdatasync to both versions to get more
realistic results.


Thank you for that advice. I am comparing btrfs vs rolling my own
thing using the new dm thin-provisioning approach to get something
with resilient metadata, but I need to support two different types of
IO, one that uses directio and one that can take advantage of the page
cache.

So far, btrfs gives me around 800MB/s with a similar setup (can't get
exactly the same setup) without DIRECTIO and 450MB/s with DIRECTIO. a
dm striped setup is giving me about 10% better throughput without
DIRECTIO but only about 45% of the performance with DIRECTIO.



You've mentioned you are using thinp device with stripping - do you have
stripes properly aligned on data-block-size of thinp device ?
(I think 9 disks are properly quite hard to align somehow on 3.2 kernel,
since data block size needs to be power of 2 - I think 3.3 will have this
relaxed to page size boundary.

Zdenek


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-27-2012, 02:28 PM
Richard Sharpe
 
Default dd to a striped device with 9 disks gets much lower throughput when oflag=direct used

On Fri, Jan 27, 2012 at 7:16 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote:
> Dne 27.1.2012 16:03, Richard Sharpe napsal(a):
>
>> On Fri, Jan 27, 2012 at 12:52 AM, Christoph Hellwig<hch@infradead.org>
>> *wrote:
>>>
>>> On Thu, Jan 26, 2012 at 05:06:42PM -0800, Richard Sharpe wrote:
>>>>
>>>> Why do I see such a big performance difference? Does writing to the
>>>> device also use the page cache if I don't specify DIRECT IO?
>>>
>>>
>>> Yes. *Trying adding conv=fdatasync to both versions to get more
>>> realistic results.
>>
>>
>> Thank you for that advice. I am comparing btrfs vs rolling my own
>> thing using the new dm thin-provisioning approach to get something
>> with resilient metadata, but I need to support two different types of
>> IO, one that uses directio and one that can take advantage of the page
>> cache.
>>
>> So far, btrfs gives me around 800MB/s with a similar setup (can't get
>> exactly the same setup) without DIRECTIO and 450MB/s with DIRECTIO. a
>> dm striped setup is giving me about 10% better throughput without
>> DIRECTIO but only about 45% of the performance with DIRECTIO.
>>
>
> You've mentioned you are using thinp device with stripping - do you have
> stripes properly aligned on data-block-size of thinp device ?
> (I think 9 disks are properly quite hard to align somehow on 3.2 kernel,
> since data block size needs to be power of 2 - I think 3.3 will have this
> relaxed to page size boundary.

Actually, so far I have not used any thinp devices, since from reading
the documentation it seemed that, for what I am doing, I need to give
thinp a mirrored device for its metadata and a striped device for its
data, so I thought I would try just a striped device.

Actually, I can cut that back to 8 devices in the stripe. I am using
4kiB block sizes and writing 256kiB blocks in the dd requests and
there is no parity involved so there should be no read-modify-write
cycles.

I imagine that if I push the write sizes up to a MB or more at a time
throughput will get better because at the moment each device is being
given 32kIB or 16kiB (a few devices) with DIRECTIO and with a larger
write size they will get more data at a time.

--
Regards,
Richard Sharpe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-27-2012, 04:24 PM
Zdenek Kabelac
 
Default dd to a striped device with 9 disks gets much lower throughput when oflag=direct used

Dne 27.1.2012 16:28, Richard Sharpe napsal(a):

On Fri, Jan 27, 2012 at 7:16 AM, Zdenek Kabelac<zkabelac@redhat.com> wrote:

Dne 27.1.2012 16:03, Richard Sharpe napsal(a):


On Fri, Jan 27, 2012 at 12:52 AM, Christoph Hellwig<hch@infradead.org>
wrote:


On Thu, Jan 26, 2012 at 05:06:42PM -0800, Richard Sharpe wrote:


Why do I see such a big performance difference? Does writing to the
device also use the page cache if I don't specify DIRECT IO?



Yes. Trying adding conv=fdatasync to both versions to get more
realistic results.



Thank you for that advice. I am comparing btrfs vs rolling my own
thing using the new dm thin-provisioning approach to get something
with resilient metadata, but I need to support two different types of
IO, one that uses directio and one that can take advantage of the page
cache.

So far, btrfs gives me around 800MB/s with a similar setup (can't get
exactly the same setup) without DIRECTIO and 450MB/s with DIRECTIO. a
dm striped setup is giving me about 10% better throughput without
DIRECTIO but only about 45% of the performance with DIRECTIO.



You've mentioned you are using thinp device with stripping - do you have
stripes properly aligned on data-block-size of thinp device ?
(I think 9 disks are properly quite hard to align somehow on 3.2 kernel,
since data block size needs to be power of 2 - I think 3.3 will have this
relaxed to page size boundary.


Actually, so far I have not used any thinp devices, since from reading
the documentation it seemed that, for what I am doing, I need to give
thinp a mirrored device for its metadata and a striped device for its
data, so I thought I would try just a striped device.

Actually, I can cut that back to 8 devices in the stripe. I am using
4kiB block sizes and writing 256kiB blocks in the dd requests and
there is no parity involved so there should be no read-modify-write
cycles.

I imagine that if I push the write sizes up to a MB or more at a time
throughput will get better because at the moment each device is being
given 32kIB or 16kiB (a few devices) with DIRECTIO and with a larger
write size they will get more data at a time.



Well I cannot tell how big influence proper alignment has in your case, but it
would be good to measure it in your case.

Do you use data_block_size equal to stripe size (256KiB 512blocks ?)

Zdenek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-27-2012, 04:48 PM
Richard Sharpe
 
Default dd to a striped device with 9 disks gets much lower throughput when oflag=direct used

On Fri, Jan 27, 2012 at 9:24 AM, Zdenek Kabelac <zkabelac@redhat.com> wrote:
> Dne 27.1.2012 16:28, Richard Sharpe napsal(a):
>> Actually, so far I have not used any thinp devices, since from reading
>> the documentation it seemed that, for what I am doing, I need to give
>> thinp a mirrored device for its metadata and a striped device for its
>> data, so I thought I would try just a striped device.
>>
>> Actually, I can cut that back to 8 devices in the stripe. I am using
>> 4kiB block sizes and writing 256kiB blocks in the dd requests and
>> there is no parity involved so there should be no read-modify-write
>> cycles.
>>
>> I imagine that if I push the write sizes up to a MB or more at a time
>> throughput will get better because at the moment each device is being
>> given 32kIB or 16kiB (a few devices) with DIRECTIO and with a larger
>> write size they will get more data at a time.
>>
>
> Well I cannot tell how big influence proper alignment has in your case, but
> it would be good to measure it in your case.
> Do you use data_block_size equal to stripe size (256KiB 512blocks ?)

I suspect not :-) However, I am not sure what you are asking. I
believe that the stripe size is 9 * 8 * 512B, or 36kiB because I think
I told it to use 8 sectors per device. This might be sub-optimal.

Based on that, I think it will take my write blocks, of 256kiB, and
write sectors that are (offset/512 + 256) mod 9 = {0, 1, 2, ... 8} to
{disk 0, disk 1, disk 2, ... disk 8}.

If I wanted perfectly strip-aligned writes then I think I should write
something like 32*9kiB rather than the 32*8kiB I am currently writing.

Is that what you are asking me?

--
Regards,
Richard Sharpe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-27-2012, 05:06 PM
Zdenek Kabelac
 
Default dd to a striped device with 9 disks gets much lower throughput when oflag=direct used

Dne 27.1.2012 18:48, Richard Sharpe napsal(a):

On Fri, Jan 27, 2012 at 9:24 AM, Zdenek Kabelac<zkabelac@redhat.com> wrote:

Dne 27.1.2012 16:28, Richard Sharpe napsal(a):

Actually, so far I have not used any thinp devices, since from reading
the documentation it seemed that, for what I am doing, I need to give
thinp a mirrored device for its metadata and a striped device for its
data, so I thought I would try just a striped device.

Actually, I can cut that back to 8 devices in the stripe. I am using
4kiB block sizes and writing 256kiB blocks in the dd requests and
there is no parity involved so there should be no read-modify-write
cycles.

I imagine that if I push the write sizes up to a MB or more at a time
throughput will get better because at the moment each device is being
given 32kIB or 16kiB (a few devices) with DIRECTIO and with a larger
write size they will get more data at a time.



Well I cannot tell how big influence proper alignment has in your case, but
it would be good to measure it in your case.
Do you use data_block_size equal to stripe size (256KiB 512blocks ?)


I suspect not :-) However, I am not sure what you are asking. I
believe that the stripe size is 9 * 8 * 512B, or 36kiB because I think
I told it to use 8 sectors per device. This might be sub-optimal.

Based on that, I think it will take my write blocks, of 256kiB, and
write sectors that are (offset/512 + 256) mod 9 = {0, 1, 2, ... 8} to
{disk 0, disk 1, disk 2, ... disk 8}.

If I wanted perfectly strip-aligned writes then I think I should write
something like 32*9kiB rather than the 32*8kiB I am currently writing.

Is that what you are asking me?



There is surely number of things to test to get optimal performance from
striped array and you probably need to make several experiments yourself to
figure out the best settings.


I'd suggest to use 32KiB on each disk and combine them (8 x 32) to 256KiB
array. Then use 512 data_block_size for thinp creation.


You may as well try just 4KiB on each drive and get 64KiB stripe and
use 128 blocks as data_block_size for thinp.

For 9 disks it's hard to say what is the 'optimal' number with 3.2 kernel and
thinp - so it will need some playtime.

Maybe 32KiB on each disk - and use 128KiB data_block_size on 288KiB stripe.
(Though data block size heavily depends on the use case).

Zdenek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 11:13 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org