This patch should be applied after
dm-thin-support-for-non-power-of-2-pool-blocksize.patch. It optimizes
power-of-two blocksize.
Mikulas
---
dm-thin: optimize power of two block size
dm-thin will be most likely used with a block size that is a power of
two. So it should be optimized for this case.
This patch changes division and modulo operations to shifts and bit
masks if block size is a power of two.
A test that bi_sector is divisible by a block size is removed from
io_overlaps_block. Device mapper never sends bios that span block
boundary. Consequently, if we tested that bi_size is equivalent to block
size, bi_sector must already be on a block boundary.
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
06-18-2012, 04:35 PM
Joe Thornber
dm-thin: optimize power of two block size
On Mon, Jun 18, 2012 at 10:09:56AM -0400, Mikulas Patocka wrote:
> Hi
>
> This patch should be applied after
> dm-thin-support-for-non-power-of-2-pool-blocksize.patch. It optimizes
> power-of-two blocksize.
I'm going to nack this unless you can provide a benchmark that shows
it measurably improves performance for some architecture somewhere.
And a real benchmark, with io going through all the devices, not just
a micro benchmark of the 'if' in a tight loop.
- Joe
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
06-25-2012, 01:53 AM
Mikulas Patocka
dm-thin: optimize power of two block size
On Mon, 18 Jun 2012, Joe Thornber wrote:
> On Mon, Jun 18, 2012 at 10:09:56AM -0400, Mikulas Patocka wrote:
> > Hi
> >
> > This patch should be applied after
> > dm-thin-support-for-non-power-of-2-pool-blocksize.patch. It optimizes
> > power-of-two blocksize.
>
> I'm going to nack this unless you can provide a benchmark that shows
> it measurably improves performance for some architecture somewhere.
> And a real benchmark, with io going through all the devices, not just
> a micro benchmark of the 'if' in a tight loop.
>
> - Joe
Hi
Here are some tests ran on the collection of my computers.
This is a do_div benchmark, the source is here:
http://people.redhat.com/~mpatocka/testcases/do_div_benchmark.c
For the "bignum" test, I replaced 0x12345678 with 0xff12345678LL (so that
do_div divides real 64-bit numbers).
It is especially slow on PA-RISC and Alpha because they don't have a
divide instruction.
On that PA-RISC machine, I set up dm-stripe target consisting of two
stripes on a ramdisk, with 4k stripe size. I performed
dd if=/dev/mapper/stripe of=/dev/null bs=512 count=100000 iflag=direct
With the optimization patches: 38.2-38.5 MB/s
Without the optimization patches: 35.3-35.6 MB/s
With larger io size:
dd if=/dev/mapper/stripe of=/dev/null bs=1M count=200 iflag=direct
With the optimization patches: 269-272 MB/s
Without the optimization patches: 250-253 MB/s
Tests with dm-thin on PA-RISC:
A device with 512MB pool and 512MB metadata on ramdisks, 64k chunk.
Overwrite the first time with
dd if=/dev/zero of=/dev/mapper/thin bs=1M oflag=direct
Without the optimization patches: 91.0-91.4
With the optimization patches: 90.6-91.6
Subsequent overwrite with
dd if=/dev/zero of=/dev/mapper/thin bs=1M oflag=direct
Without the optimization patches: 104 MB/s
With the optimization patches: 104 MB/s
Read the overwritten device with
dd if=/dev/mapper/thin of=/dev/null bs=1M iflag=direct
Without the optimization patches: 252-254 MB/s
With the optimization patches: 257-258 MB/s
So the conclusion is that is that that divide instruction degrades
transfer speed, especially on dm-stripe with 4k stripe size (on dm-thin it
is measurable only with raw read, the difference is smaller because it has
a minimum chunk size 64k).
The question is why do you want to avoid such optimization? If it is
because of source code clarity, we can create #define sector_div_optimized
that optimizes the common case of power-of-two divisor and the code would
be no more complicated than with sector div. Or do you have some other
reasons?
BTW. when unloading the dm-thin device with debugging enabled (the tests
were done with debugging disabled), I got this message:
device-mapper: space map checker: free block counts differ, checker
131060, sm-disk:130991
--- so there is supposedly some bug? The kernel is 3.4.3.
Mikulas
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
06-25-2012, 02:09 PM
Joe Thornber
dm-thin: optimize power of two block size
On Sun, Jun 24, 2012 at 09:53:22PM -0400, Mikulas Patocka wrote:
> So the conclusion is that is that that divide instruction degrades
> transfer speed, especially on dm-stripe with 4k stripe size (on dm-thin it
> is measurable only with raw read, the difference is smaller because it has
> a minimum chunk size 64k).
>
>
> The question is why do you want to avoid such optimization?
You've conviced me. I just wanted proof, which you've done very
nicely. Thankyou.
> BTW. when unloading the dm-thin device with debugging enabled (the tests
> were done with debugging disabled), I got this message:
> device-mapper: space map checker: free block counts differ, checker
> 131060, sm-disk:130991
> --- so there is supposedly some bug? The kernel is 3.4.3.
That message is ok. I'm going to remove the sm-checker in 3.6. It's
not earning it's keep.
- Joe
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel