Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Device-mapper Development (http://www.linux-archive.org/device-mapper-development/)
-   -   dm-thin: optimize power of two block size (http://www.linux-archive.org/device-mapper-development/674091-dm-thin-optimize-power-two-block-size.html)

Mikulas Patocka 06-18-2012 02:09 PM

dm-thin: optimize power of two block size
 
Hi

This patch should be applied after
dm-thin-support-for-non-power-of-2-pool-blocksize.patch. It optimizes
power-of-two blocksize.

Mikulas

---

dm-thin: optimize power of two block size

dm-thin will be most likely used with a block size that is a power of
two. So it should be optimized for this case.

This patch changes division and modulo operations to shifts and bit
masks if block size is a power of two.

A test that bi_sector is divisible by a block size is removed from
io_overlaps_block. Device mapper never sends bios that span block
boundary. Consequently, if we tested that bi_size is equivalent to block
size, bi_sector must already be on a block boundary.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
drivers/md/dm-thin.c | 27 +++++++++++++++++++--------
1 file changed, 19 insertions(+), 8 deletions(-)

Index: linux-3.4.2-fast/drivers/md/dm-thin.c
================================================== =================
--- linux-3.4.2-fast.orig/drivers/md/dm-thin.c 2012-06-18 15:38:53.000000000 +0200
+++ linux-3.4.2-fast/drivers/md/dm-thin.c 2012-06-18 16:06:15.000000000 +0200
@@ -512,6 +512,7 @@ struct pool {

dm_block_t low_water_blocks;
uint32_t sectors_per_block;
+ int sectors_per_block_shift;

struct pool_features pf;
unsigned low_water_triggered:1; /* A dm event has been sent */
@@ -678,7 +679,10 @@ static dm_block_t get_bio_block(struct t
{
sector_t block_nr = bio->bi_sector;

- (void) sector_div(block_nr, tc->pool->sectors_per_block);
+ if (tc->pool->sectors_per_block_shift < 0)
+ (void) sector_div(block_nr, tc->pool->sectors_per_block);
+ else
+ block_nr >>= tc->pool->sectors_per_block_shift;

return block_nr;
}
@@ -689,8 +693,12 @@ static void remap(struct thin_c *tc, str
sector_t bi_sector = bio->bi_sector;

bio->bi_bdev = tc->pool_dev->bdev;
- bio->bi_sector = (block * pool->sectors_per_block) +
- sector_div(bi_sector, pool->sectors_per_block);
+ if (tc->pool->sectors_per_block_shift < 0)
+ bio->bi_sector = (block * pool->sectors_per_block) +
+ sector_div(bi_sector, pool->sectors_per_block);
+ else
+ bio->bi_sector = (block << pool->sectors_per_block_shift) |
+ (bi_sector & (pool->sectors_per_block - 1));
}

static void remap_to_origin(struct thin_c *tc, struct bio *bio)
@@ -935,10 +943,7 @@ static void process_prepared(struct pool
*/
static int io_overlaps_block(struct pool *pool, struct bio *bio)
{
- sector_t bi_sector = bio->bi_sector;
-
- return !sector_div(bi_sector, pool->sectors_per_block) &&
- (bio->bi_size == (pool->sectors_per_block << SECTOR_SHIFT));
+ return bio->bi_size == (pool->sectors_per_block << SECTOR_SHIFT);
}

static int io_overwrites_block(struct pool *pool, struct bio *bio)
@@ -1241,7 +1246,9 @@ static void process_discard(struct thin_
* part of the discard that is in a subsequent
* block.
*/
- sector_t offset = bio->bi_sector - (block * pool->sectors_per_block);
+ sector_t offset = pool->sectors_per_block_shift >= 0 ?
+ bio->bi_sector & (pool->sectors_per_block - 1) :
+ bio->bi_sector - block * pool->sectors_per_block;
unsigned remaining = (pool->sectors_per_block - offset) << SECTOR_SHIFT;
bio->bi_size = min(bio->bi_size, remaining);

@@ -1718,6 +1725,10 @@ static struct pool *pool_create(struct m

pool->pmd = pmd;
pool->sectors_per_block = block_size;
+ if (block_size & (block_size - 1))
+ pool->sectors_per_block_shift = -1;
+ else
+ pool->sectors_per_block_shift = __ffs(block_size);
pool->low_water_blocks = 0;
pool_features_init(&pool->pf);
pool->prison = prison_create(PRISON_CELLS);

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Joe Thornber 06-18-2012 04:35 PM

dm-thin: optimize power of two block size
 
On Mon, Jun 18, 2012 at 10:09:56AM -0400, Mikulas Patocka wrote:
> Hi
>
> This patch should be applied after
> dm-thin-support-for-non-power-of-2-pool-blocksize.patch. It optimizes
> power-of-two blocksize.

I'm going to nack this unless you can provide a benchmark that shows
it measurably improves performance for some architecture somewhere.
And a real benchmark, with io going through all the devices, not just
a micro benchmark of the 'if' in a tight loop.

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Mikulas Patocka 06-25-2012 01:53 AM

dm-thin: optimize power of two block size
 
On Mon, 18 Jun 2012, Joe Thornber wrote:

> On Mon, Jun 18, 2012 at 10:09:56AM -0400, Mikulas Patocka wrote:
> > Hi
> >
> > This patch should be applied after
> > dm-thin-support-for-non-power-of-2-pool-blocksize.patch. It optimizes
> > power-of-two blocksize.
>
> I'm going to nack this unless you can provide a benchmark that shows
> it measurably improves performance for some architecture somewhere.
> And a real benchmark, with io going through all the devices, not just
> a micro benchmark of the 'if' in a tight loop.
>
> - Joe

Hi

Here are some tests ran on the collection of my computers.

This is a do_div benchmark, the source is here:
http://people.redhat.com/~mpatocka/testcases/do_div_benchmark.c
For the "bignum" test, I replaced 0x12345678 with 0xff12345678LL (so that
do_div divides real 64-bit numbers).

It is especially slow on PA-RISC and Alpha because they don't have a
divide instruction.

PA-RISC 900MHz 64-bit:
shift+mask: 4 ticks (4.4ns)
shift+mask bignum: 4 ticks (4.4ns)
do_div: 825 ticks (917ns)
do_div bignum: 825 ticks (917ns)

UltraSparc2 440MHz 64-bit:
shift+mask: 3 ticks (6.8ns)
shift+mask bignum: 3 ticks (6.8ns)
do_div: 87 ticks (198ns)
do_div bignum: 93 ticks (211ns)

Alpha ev45 233MHz 64-bit:
shift+mask: 7 ticks (30ns)
shift+mask bignum: 8 ticks (34ns)
do_div: 598 ticks (2563ns)
do_div bignum: 897 ticks (3844ns)

Pentium 3 850MHz:
shift+mask: 12.25 ticks (14ns)
shift+mask bignum: 16 ticks (19ns)
do_div: 63.5 ticks (75ns)
do_div bignum: 94 ticks (111ns)

Core2 Xeon 1600MHz 64-bit:
shift+mask: 3.2 ticks (2ns)
shift+mask bignum: 3.4 ticks (2.1ns)
do_div: 64 ticks (40ns)
do_div bignum: 64 ticks (40ns)

K10 Opteron 2300MHz 64-bit:
shift+mask: 3 ticks (1.3ns)
shift+mask bignum: 3 ticks (1.3ns)
do_div: 46 ticks (20ns)
do_div bignum: 57 ticks (28ns)

---

On that PA-RISC machine, I set up dm-stripe target consisting of two
stripes on a ramdisk, with 4k stripe size. I performed
dd if=/dev/mapper/stripe of=/dev/null bs=512 count=100000 iflag=direct
With the optimization patches: 38.2-38.5 MB/s
Without the optimization patches: 35.3-35.6 MB/s

With larger io size:
dd if=/dev/mapper/stripe of=/dev/null bs=1M count=200 iflag=direct
With the optimization patches: 269-272 MB/s
Without the optimization patches: 250-253 MB/s


Tests with dm-thin on PA-RISC:
A device with 512MB pool and 512MB metadata on ramdisks, 64k chunk.

Overwrite the first time with
dd if=/dev/zero of=/dev/mapper/thin bs=1M oflag=direct
Without the optimization patches: 91.0-91.4
With the optimization patches: 90.6-91.6

Subsequent overwrite with
dd if=/dev/zero of=/dev/mapper/thin bs=1M oflag=direct
Without the optimization patches: 104 MB/s
With the optimization patches: 104 MB/s

Read the overwritten device with
dd if=/dev/mapper/thin of=/dev/null bs=1M iflag=direct
Without the optimization patches: 252-254 MB/s
With the optimization patches: 257-258 MB/s

So the conclusion is that is that that divide instruction degrades
transfer speed, especially on dm-stripe with 4k stripe size (on dm-thin it
is measurable only with raw read, the difference is smaller because it has
a minimum chunk size 64k).


The question is why do you want to avoid such optimization? If it is
because of source code clarity, we can create #define sector_div_optimized
that optimizes the common case of power-of-two divisor and the code would
be no more complicated than with sector div. Or do you have some other
reasons?


BTW. when unloading the dm-thin device with debugging enabled (the tests
were done with debugging disabled), I got this message:
device-mapper: space map checker: free block counts differ, checker
131060, sm-disk:130991
--- so there is supposedly some bug? The kernel is 3.4.3.

Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Joe Thornber 06-25-2012 02:09 PM

dm-thin: optimize power of two block size
 
On Sun, Jun 24, 2012 at 09:53:22PM -0400, Mikulas Patocka wrote:
> So the conclusion is that is that that divide instruction degrades
> transfer speed, especially on dm-stripe with 4k stripe size (on dm-thin it
> is measurable only with raw read, the difference is smaller because it has
> a minimum chunk size 64k).
>
>
> The question is why do you want to avoid such optimization?

You've conviced me. I just wanted proof, which you've done very
nicely. Thankyou.

> BTW. when unloading the dm-thin device with debugging enabled (the tests
> were done with debugging disabled), I got this message:
> device-mapper: space map checker: free block counts differ, checker
> 131060, sm-disk:130991
> --- so there is supposedly some bug? The kernel is 3.4.3.

That message is ok. I'm going to remove the sm-checker in 3.6. It's
not earning it's keep.

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


All times are GMT. The time now is 09:43 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.