FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 05-12-2011, 08:47 AM
Milan Broz
 
Default dm-crypt on RAID5/6 write performance - cause & proposed solutions

Hi,

On 05/11/2011 09:11 PM, Chris Lais wrote:
> I've recently installed a system with dm-crypt placed over a software
> RAID5 array, and have noticed some very severe issues with write
> performance due to the way dm-crypt works.
>
> Almost all of these problems are caused by dm-crypt re-ordering bios
> to an extreme degree (as shown by blktrace), such that it is very hard
> for the raid layer to merge them in to full stripes, leading to many
> extra reads and writes. There are minor problems with losing
> io_context and seeking for CFQ, but they have far less impact.

There is no explicit reordering of bios in dmcrypt.

There are basically two situations were dmcrypt can reorder request:

First is when crypto layer process request asynchronously
(probably not a case here - according to your system spec you should
be probably using AES-NI, right?)

The second possible reordering can happen if you run 2.6.38 kernel and
above, where the encryption run always on the cpu core which submitted it.

First thing is to check what's really going on your system and why.

- What's the io pattern here? Several applications issues writes
in parallel? Can you provide commands how do you tested it?

- Can you test older kernel (2.6.37) and check blktrace?
Does it behave differently (it should - no reordering but all
encryption just on one core.)

- Also 2.6.39-rc (with flush changes) can have influence here,
if you can test that the problems is still here, it would be nice
(any fix will be based on this version).

Anyway, we need to find what's really going before suggesting any fix.

> Using RAID5/6 without dm-crypt does /not/ have these problems in my
> setup, even with standard queue sizes, because the raid layer can
> handle the stripe merging when the bios are not so far out of order.
> Using lower RAID levels even with dm-crypt also does not have these
> problems to such an extreme degree, because they don't need
> read-parity-write cycles for partial stripes.

Ah, so you are suggesting that the problem is caused by read/write
interleaving (parity blocks)?
Or you are talking about degraded mode as well?

Milan

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 05-12-2011, 01:07 PM
Chris Lais
 
Default dm-crypt on RAID5/6 write performance - cause & proposed solutions

On Thu, May 12, 2011 at 3:47 AM, Milan Broz <mbroz@redhat.com> wrote:
> Hi,
>
> On 05/11/2011 09:11 PM, Chris Lais wrote:
>> I've recently installed a system with dm-crypt placed over a software
>> RAID5 array, and have noticed some very severe issues with write
>> performance due to the way dm-crypt works.
>>
>> Almost all of these problems are caused by dm-crypt re-ordering bios
>> to an extreme degree (as shown by blktrace), such that it is very hard
>> for the raid layer to merge them in to full stripes, leading to many
>> extra reads and writes. There are minor problems with losing
>> io_context and seeking for CFQ, but they have far less impact.
>
> There is no explicit reordering of bios in dmcrypt.
>
> There are basically two situations were dmcrypt can reorder request:
>
> First is when crypto layer process request asynchronously
> (probably not a case here - according to your system spec you should
> be probably using AES-NI, right?)

No, the i7-870 does not have AES-NI.

>
> The second possible reordering can happen if you run 2.6.38 kernel and
> above, where the encryption run always on the cpu core which submitted it.
>
> First thing is to check what's really going on your system and why.
>
> - What's the io pattern here? Several applications issues writes
> in parallel? Can you provide commands how do you tested it?
>

The I/O pattern is a single dd command, using a block size of 1M or 2M
(does not produce a substantial difference). And before you ask, this
/is/ one of the more major intended workloads, not a failed attempt at
a benchmark.

For the purposes of testing, I'm inputting from /dev/zero, but
normally it will be from an attached drive, which will sometimes be
slower than 180MB/s, and sometimes faster, but will always be
substantially faster than 30MB/s.

The I/O is being submitted by a dirty background thread, which is
jumping cores periodically (and which I don't think I can set the cpu
affinity of reliably).

I don't know why the caches aren't able to cope without very large
cache sizes (and *still* fail to assemble full stripes frequently),
unless the switching is happening very often and is splitting between
stripes (very likely, with a stripe size of 1MB).

Even with perfect splitting (as in the case with a parallel workload
with no reordering), the cache size for merging stripes will have to
be at least stripe_size*threads. I have to think it we'd get far
better performance (for any media with large physical block sizes)
keeping the bios for each block/stripe together starting from the
upper-most block layer, but the system doesn't seem to be designed in
a way that makes this easy at all.


dd if=/dev/zero of=test bs=1048576:

submitted to dm-crypt layer (top-level) [dm-5]:
254,5 5 419 1.019698208 1533 Q W 761892040 + 8 [flush-254:5]
254,5 5 420 1.019699440 1533 Q W 761892048 + 8 [flush-254:5]
254,5 5 421 1.019700449 1533 Q W 761892056 + 8 [flush-254:5]
254,5 5 422 1.019701510 1533 Q W 761892064 + 8 [flush-254:5]
254,5 5 423 1.019702466 1533 Q W 761892072 + 8 [flush-254:5]
254,5 5 424 1.019703528 1533 Q W 761892080 + 8 [flush-254:5]
[snip]
254,5 1 418 1.030607158 1533 Q W 761959960 + 8 [flush-254:5]
254,5 1 419 1.030608679 1533 Q W 761959968 + 8 [flush-254:5]
254,5 1 420 1.030610084 1533 Q W 761959976 + 8 [flush-254:5]
254,5 1 421 1.030611534 1533 Q W 761959984 + 8 [flush-254:5]
254,5 1 422 1.030612991 1533 Q W 761959992 + 8 [flush-254:5]
254,5 1 423 1.030614446 1533 Q W 761960000 + 8 [flush-254:5]
[snip]
254,5 3 423 1.062605245 1533 Q W 762049928 + 8 [flush-254:5]
254,5 3 424 1.062606044 1533 Q W 762049936 + 8 [flush-254:5]
254,5 3 425 1.062606853 1533 Q W 762049944 + 8 [flush-254:5]
254,5 3 426 1.062607616 1533 Q W 762049952 + 8 [flush-254:5]
254,5 3 427 1.062609579 1533 Q W 762049960 + 8 [flush-254:5]
254,5 3 428 1.062610503 1533 Q W 762049968 + 8 [flush-254:5]
254,5 3 429 1.062611306 1533 Q W 762049976 + 8 [flush-254:5]
254,5 3 430 1.062612079 1533 Q W 762049984 + 8 [flush-254:5]
254,5 3 431 1.062612851 1533 Q W 762049992 + 8 [flush-254:5]

submitted to LVM2 logical volume layer (directly below dm-5) [dm-3]:
254,3 1 34 1.055642427 6282 Q W 761959960 + 8 [kworker/1:2]
254,3 3 39 1.055676830 6402 Q W 762049928 + 8 [kworker/3:0]
254,3 5 35 1.055707355 6349 Q W 761892040 + 8 [kworker/5:1]
254,3 3 40 1.055720657 6402 Q W 762049936 + 8 [kworker/3:0]
254,3 1 35 1.055720737 6282 Q W 761959968 + 8 [kworker/1:2]
254,3 3 41 1.055768875 6402 Q W 762049944 + 8 [kworker/3:0]
254,3 5 36 1.055782164 6349 Q W 761892048 + 8 [kworker/5:1]
254,3 1 36 1.055798939 6282 Q W 761959976 + 8 [kworker/1:2]
254,3 3 42 1.055813807 6402 Q W 762049952 + 8 [kworker/3:0]
254,3 5 37 1.055858505 6349 Q W 761892056 + 8 [kworker/5:1]
254,3 3 43 1.055858595 6402 Q W 762049960 + 8 [kworker/3:0]
254,3 1 37 1.055873828 6282 Q W 761959984 + 8 [kworker/1:2]
254,3 3 44 1.055906790 6402 Q W 762049968 + 8 [kworker/3:0]
254,3 5 38 1.055937878 6349 Q W 761892064 + 8 [kworker/5:1]
254,3 3 45 1.055950798 6402 Q W 762049976 + 8 [kworker/3:0]
254,3 1 38 1.055950939 6282 Q W 761959992 + 8 [kworker/1:2]
254,3 3 46 1.055999370 6402 Q W 762049984 + 8 [kworker/3:0]
254,3 5 39 1.056011893 6349 Q W 761892072 + 8 [kworker/5:1]
254,3 1 39 1.056028144 6282 Q W 761960000 + 8 [kworker/1:2]
254,3 3 47 1.056044505 6402 Q W 762049992 + 8 [kworker/3:0]
254,3 5 40 1.056088439 6349 Q W 761892080 + 8 [kworker/5:1]

http://zenthought.org/tmp/dm-crypt+raid5/dm-5,dm-3.single-thread.dd.zero.1M.tar.gz

> - Can you test older kernel (2.6.37) and check blktrace?
> Does it behave differently (it should - no reordering but all
> encryption just on one core.)
>
> - Also 2.6.39-rc (with flush changes) can have influence here,
> if you can test that the problems is still here, it would be nice
> (any fix will be based on this version).

I will test both of these when I'm able (should be in the next few
days), but I suspect 2.6.37 will perform much better if it's doing it
on one core with no re-ordering.

I'll have to let you know on 2.6.39-rc*.

>
> Anyway, we need to find what's really going before suggesting any fix.
>
>> Using RAID5/6 without dm-crypt does /not/ have these problems in my
>> setup, even with standard queue sizes, because the raid layer can
>> handle the stripe merging when the bios are not so far out of order.
>> Using lower RAID levels even with dm-crypt also does not have these
>> problems to such an extreme degree, because they don't need
>> read-parity-write cycles for partial stripes.
>
> Ah, so you are suggesting that the problem is caused by read/write
> interleaving (parity blocks)?
> Or you are talking about degraded mode as well?

Yes, it seems to be caused almost entirely by multiple partial stripe
writes to the same stripes, leading to extra unnecessary reads and
parity calculations (I do suspect that the reads themselves have much
more impact on this system, however).

I'm not talking about degraded mode (I don't expect that to perform well).

>
> Milan
>
>

--
Chris

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 09:22 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org