FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 07-12-2010, 03:46 AM
Arcady Genkin
 
Default Very slow LVM performance

I'm seeing a 10-fold performance hit when using an LVM2 logical volume
that sits on top of a RAID0 stripe. Using dd to read directly from
the stripe (i.e. a large sequential read) I get speeds over 600MB/s.
Reading from the logical volume using the same method only gives
around 57MB/s. I am new to LVM and I need to for the snapshots.
Would anyone suggest where to start looking for the problem?

The server runs the amd64 version of Lenny. Most packages (including
lvm2) are stock from Lenny, but we had to upgrade the kernel to the
one from lenny-backports (2.6.32).

There are ten RAID1 triplets: md0 through md9 (that's 30 physical
disks arranged into ten 3-way mirrors), connected over iSCSI from six
targets. The ten triplets are then striped together into a RAID0
stripe /dev/md10. I don't think we have any issues with the MD
layers, because each of them seems to perform fairly well; it's when
we add LVM into the soup the speeds start getting slow.

test4:~# uname -a
Linux test4 2.6.32-bpo.4-amd64 #1 SMP Thu Apr 8 10:20:24 UTC 2010
x86_64 GNU/Linux

test4:~# dd of=/dev/null bs=8K count=2500000 if=/dev/md10
2500000+0 records in
2500000+0 records out
20480000000 bytes (20 GB) copied, 33.4619 s, 612 MB/s

test4:~# dd of=/dev/null bs=8K count=2500000 if=/dev/vg0/lvol0
2500000+0 records in
2500000+0 records out
20480000000 bytes (20 GB) copied, 354.951 s, 57.7 MB/s

I used the following commands to create the volume group:

pvcreate /dev/md10
vgcreate vg0 /dev/md10
lvcreate -l 102389 vg0

Here's what LVM reports of its devices:

test4:~# pvdisplay
--- Physical volume ---
PV Name /dev/md10
VG Name vg0
PV Size 399.96 GB / not usable 4.00 MB
Allocatable yes (but full)
PE Size (KByte) 4096
Total PE 102389
Free PE 0
Allocated PE 102389
PV UUID ocIGdd-cqcy-GNQl-jxRo-FHmW-THMi-fqofbd

test4:~# vgdisplay
--- Volume group ---
VG Name vg0
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 2
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 1
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 399.96 GB
PE Size 4.00 MB
Total PE 102389
Alloc PE / Size 102389 / 399.96 GB
Free PE / Size 0 / 0
VG UUID o2TeAm-gPmZ-VvJc-OSfU-quvW-OB3a-y1pQaB

test4:~# lvdisplay
--- Logical volume ---
LV Name /dev/vg0/lvol0
VG Name vg0
LV UUID Q3nA6w-0jgw-ImWY-IYJK-kvMJ-aybW-GAdoOs
LV Write Access read/write
LV Status available
# open 0
LV Size 399.96 GB
Current LE 102389
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 256
Block device 254:0

Many thanks in advance for any pointers!
--
Arcady Genkin


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: AANLkTiksMhwitDv1_iji72TaK_1iRx9DxPj2McCAhs3z@mail .gmail.com">http://lists.debian.org/AANLkTiksMhwitDv1_iji72TaK_1iRx9DxPj2McCAhs3z@mail .gmail.com
 
Old 07-12-2010, 06:05 AM
Stan Hoeppner
 
Default Very slow LVM performance

Arcady Genkin put forth on 7/11/2010 10:46 PM:

> lvcreate -l 102389 vg0

Should be:

lvcreate -i 10 -I [stripe_size] -l 102389 vg0

I believe you're losing 10x performance because you have a 10 "disk" mdadm
stripe but you didn't inform lvcreate about this fact. Delete the vg, and
then recreate the vg with the above command line, specifying 64 for the stripe
size (the mdadm default). If performance is still lacking, recreate it again
with 640 for the stripe size. (I'm not exactly sure of the relationship
between mdadm chunk size and lvm stripe size--it's either equal, or it's mdadm
stripe width * mdadm chunk size)

If you specified a chunk size when you created the mdadm RAID 0 stripe, then
use that chunk size for the lvcreate stripe_size. Again, if performance is
still lacking, recreate with whatever chunk size you specified in mdadm and
multiply that by 10.

Hope this helps. Let us know.

--
Stan


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4C3AB09A.4090304@hardwarefreak.com">http://lists.debian.org/4C3AB09A.4090304@hardwarefreak.com
 
Old 07-12-2010, 04:52 PM
Arcady Genkin
 
Default Very slow LVM performance

On Mon, Jul 12, 2010 at 02:05, Stan Hoeppner <stan@hardwarefreak.com> wrote:

> lvcreate -i 10 -I [stripe_size] -l 102389 vg0
>
> I believe you're losing 10x performance because you have a 10 "disk" mdadm
> stripe but you didn't inform lvcreate about this fact.

Hi, Stan:

I believe that the -i and -I options are for using *LVM* to do the
striping, am I wrong? In our case (when LVM sits on top of one RAID0
MD stripe) the option -i does not seem to make sense:

test4:~# lvcreate -i 10 -I 1024 -l 102380 vg0
Number of stripes (10) must not exceed number of physical volumes (1)

My understanding is that LVM should be agnostic of what's underlying
it as the physical storage, so it should treat the MD stripe as one
large disk, and thus let the MD device to handle the load balancing
(which it seems to be doing fine).

Besides, the speed we are getting from the LVM volume is more than
twice slower than an individual component of the RAID10 stripe. Even
if we assume that LVM manages somehow distribute its data so that it
always hits only one physical disk (a disk triplet in our case), there
would still be the question why it is doing it *that* slow. It's 57
MB/s vs 134 MB/s that an individual triplet can do:

test4:~# dd of=/dev/null bs=8K count=2500000 if=/dev/md0
2500000+0 records in
2500000+0 records out
20480000000 bytes (20 GB) copied, 153.084 s, 134 MB/s

> If you specified a chunk size when you created the mdadm RAID 0 stripe, then
> use that chunk size for the lvcreate stripe_size. *Again, if performance is
> still lacking, recreate with whatever chunk size you specified in mdadm and
> multiply that by 10.

We are using chunk size of 1024 (i.e. 1MB) with the MD devices. For
the record, we used the following commands to create the md devices:

For N in 0 through 9:
mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10
--layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048
--chunk=1024 /dev/sdX /dev/sdY /dev/sdZ

Then the big stripe:
mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe
--metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9}

Thanks,
--
Arcady Genkin


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: AANLkTilK5for3GQ2W9kVAjFe7VgZVQmagyjjBkvFlnHY@mail .gmail.com">http://lists.debian.org/AANLkTilK5for3GQ2W9kVAjFe7VgZVQmagyjjBkvFlnHY@mail .gmail.com
 
Old 07-12-2010, 05:45 PM
Arcady Genkin
 
Default Very slow LVM performance

I just tried to use LVM for striping the RAID1 triplets together
(instead of MD). Using the following three commands to create the
logical volume, I get 550 MB/s sequential read speed, which is quite
faster than before, but is still 10% slower than what plain MD RAID0
stripe can do with the same disks (612 MB/s).

pvcreate /dev/md{0,5,1,6,2,7,3,8,4,9}
vgcreate vg0 /dev/md{0,5,1,6,2,7,3,8,4,9}
lvcreate -i 10 -I 1024 -l 102390 vg0

test4:~# dd of=/dev/null bs=8K count=2500000 if=/dev/vg0/lvol0
2500000+0 records in
2500000+0 records out
20480000000 bytes (20 GB) copied, 37.2381 s, 550 MB/s

I would still like to know why LVM on top of RAID0 performs so poorly
in our case.
--
Arcady Genkin


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: AANLkTilCdXiUEXHnMB7JF9cxz9K_2TkvI_2QSJtLdfDe@mail .gmail.com">http://lists.debian.org/AANLkTilCdXiUEXHnMB7JF9cxz9K_2TkvI_2QSJtLdfDe@mail .gmail.com
 
Old 07-12-2010, 06:54 PM
Aaron Toponce
 
Default Very slow LVM performance

On 7/12/2010 11:45 AM, Arcady Genkin wrote:
> I would still like to know why LVM on top of RAID0 performs so poorly
> in our case.

Can you provide the commands from start to finish when building the volume?

fdisk ...
mdadm ...
pvcreate ...
vgcreate ...
lvcreate ...

etc.

My experience has been that LVM will introduce about a 1-2% performance
hit compared to not using it, in many different situations, whether it
be on top of software/hardware RAID, on plain disk/partitions. So, I'm
curious what commandline options you're passing to each of your
commands, how your partitioned/built your disks, and so forth. Might
help troubleshoot why you're seeing such a hit.

On a side note, I've never seen any reason to increase or decrease the
chunk size with software RAID. However, you may want to match your chunk
size with '-c' for 'lvcreate'.

--
. O . O . O . . O O . . . O .
. . O . O O O . O . O O . . O
O O O . O . . O O O O . O O O
 
Old 07-12-2010, 07:45 PM
Arcady Genkin
 
Default Very slow LVM performance

On Mon, Jul 12, 2010 at 14:54, Aaron Toponce <aaron.toponce@gmail.com> wrote:
> Can you provide the commands from start to finish when building the volume?
>
> fdisk ...
> mdadm ...
> pvcreate ...
> vgcreate ...
> lvcreate ...

Hi, Aaron, I already provided all of the above commands in earlier
messages (except for fdisk, since we are giving the entire disks to
MD, not partitions). I'll repeat them here for your convenience:

Creating the ten 3-way RAID1 triplets - for N in 0 through 9:
mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10
--layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048
--chunk=1024 /dev/sdX /dev/sdY /dev/sdZ

Then the big stripe:
mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe
--metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9}

Then the LVM business:
pvcreate /dev/md10
vgcreate vg0 /dev/md10
lvcreate -l 102389 vg0

Note that the file system is not being created on top of LVM at this
point, and I ran the test by simply dd-ing /dev/vg0/lvol0.

> My experience has been that LVM will introduce about a 1-2% performance
> hit compared to not using it

This is what we were expecting, it's encouraging.

> On a side note, I've never seen any reason to increase or decrease the
> chunk size with software RAID. However, you may want to match your chunk
> size with '-c' for 'lvcreate'.

We have tested a variety of chunk sizes (from 64K to 4MB) with
bonnie++ and found that 1MB chunks worked the best for our usage,
which is a general purpose NFS server, so it's mainly small random
reads. In this scenario it's best to tune the chunk size to increase
the probability that a small read from the stripe would result in only
one read from the disk. If the chunk size is too small, then a 1KB
read has a pretty high chance to be fragmented between two chunks,
and, thus, require two I/Os to service instead of one I/O (and, thus,
most likely two drive head seeks instead of just one). Modern
commodity drives can do about only 100-120 seeks per second. But this
is a side note for your side note. )

>From the man page to 'lvcreate' it seems that the -c option sets the
chunk size for something snapshot-related, so it should have no
bearing in our performance testing, which involved no snapshots. Am I
misreading the man page?

Thanks!
--
Arcady Genkin


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: AANLkTild4UMO3vAQ7h2FOkbsNT8XL2FYI-8VtnPFMEoW@mail.gmail.com">http://lists.debian.org/AANLkTild4UMO3vAQ7h2FOkbsNT8XL2FYI-8VtnPFMEoW@mail.gmail.com
 
Old 07-12-2010, 08:45 PM
Aaron Toponce
 
Default Very slow LVM performance

On 7/12/2010 1:45 PM, Arcady Genkin wrote:
> Creating the ten 3-way RAID1 triplets - for N in 0 through 9:
> mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10
> --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048
> --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ
>
> Then the big stripe:
> mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe
> --metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9}

I must admit, that I haven't seen a software RAID implementation where
you create multiple devices from the same set of disks, then stripe
across those devices. As such, when using LVM, I'm not exactly sure how
the kernel will handle that- mostly if it will see the appropriate
amount of disk, and what physical extents it will use to place the data.
So for me, this is uncharted territory.

But, your commands look sound. I might suggest changing the default PE
size from 4MB to 1MB. That might help. Worth testing anyway. The PE size
can be changed with 'vgcreate -s 1M'.

However, do you really want --bitmap with your mdadm command? I
understand the benefits, but using 'internal' does come with a
performance hit.

> From the man page to 'lvcreate' it seems that the -c option sets the
> chunk size for something snapshot-related, so it should have no
> bearing in our performance testing, which involved no snapshots. Am I
> misreading the man page?

Ah yes, you are correct. I should probably pull up the man page before
replying.


--
. O . O . O . . O O . . . O .
. . O . O O O . O . O O . . O
O O O . O . . O O O O . O O O
 
Old 07-12-2010, 09:00 PM
Mike Bird
 
Default Very slow LVM performance

On Mon July 12 2010 12:45:57 Arcady Genkin wrote:
> Creating the ten 3-way RAID1 triplets - for N in 0 through 9:
> mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10
> --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048
> --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ

RAID 10 with three devices?

--Mike Bird


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 201007121400.42233.mgb-debian@yosemite.net">http://lists.debian.org/201007121400.42233.mgb-debian@yosemite.net
 
Old 07-12-2010, 10:13 PM
Stan Hoeppner
 
Default Very slow LVM performance

Arcady Genkin put forth on 7/12/2010 11:52 AM:
> On Mon, Jul 12, 2010 at 02:05, Stan Hoeppner <stan@hardwarefreak.com> wrote:
>
>> lvcreate -i 10 -I [stripe_size] -l 102389 vg0
>>
>> I believe you're losing 10x performance because you have a 10 "disk" mdadm
>> stripe but you didn't inform lvcreate about this fact.
>
> Hi, Stan:
>
> I believe that the -i and -I options are for using *LVM* to do the
> striping, am I wrong?

If this were the case, lvcreate would require the set of physical or pseudo
(mdadm) device IDs to stripe across wouldn't it? There are no options in
lvcreate to specify physical or pseudo devices. The only input to lvcreate is
a volume group ID. Therefor, lvcreate is ignorant of the physical devices
underlying it, is it not?

> In our case (when LVM sits on top of one RAID0
> MD stripe) the option -i does not seem to make sense:
>
> test4:~# lvcreate -i 10 -I 1024 -l 102380 vg0
> Number of stripes (10) must not exceed number of physical volumes (1)

It makes sense once you accept the fact that lvcreate is ignorant of the
underlying disk device count/configuration. Once you accept that fact, you
will realize the -i option is what allows one to educate lvcreate that there
are, in your case, 10 devices underlying it which one desires to stripe data
across. I believe the -i option exists merely to educate lvcreate about the
underlying device structure.

> My understanding is that LVM should be agnostic of what's underlying
> it as the physical storage, so it should treat the MD stripe as one
> large disk, and thus let the MD device to handle the load balancing
> (which it seems to be doing fine).

If lvcreate is agnostic of the underlying structure, why does it have stripe
width and stripe size options at all? As a parallel example of this,
filesystems such as XFS are ignorant of underlying disk structure as well.
mkfs.xfs has no less than 4 sub options to optimize its performance atop RAID
stripes. One of it's options, sw, specifies stripe width, which is the number
of physical or logical devices in the RAID stripe. In your case, if you use
xfs, this would be "-d sw=10". These options in lvcreate serve the same
function as those in mkfs.xfs, which is to optimize their performance atop a
RAID stripe.

> Besides, the speed we are getting from the LVM volume is more than
> twice slower than an individual component of the RAID10 stripe. Even
> if we assume that LVM manages somehow distribute its data so that it
> always hits only one physical disk (a disk triplet in our case), there
> would still be the question why it is doing it *that* slow. It's 57
> MB/s vs 134 MB/s that an individual triplet can do:

Forget comparing performance to one of your single mdadm mirror sets. What's
key here, and why I suggested "lvcreate -i 10 .." to begin with, is the fact
that your lvm performance is almost exactly 10 times lower than the underlying
mdadm device, which has exactly 10 physical stripes. Isn't that more than
just a bit coincidental? The 10x drop only occurs when talking to the lvm
device. Put on your Sherlock Holmes hat for a minute.

> We are using chunk size of 1024 (i.e. 1MB) with the MD devices. For
> the record, we used the following commands to create the md devices:
>
> For N in 0 through 9:
> mdadm --create /dev/mdN -v --raid-devices=3 --level=raid10
> --layout=n3 --metadata=0 --bitmap=internal --bitmap-chunk=2048
> --chunk=1024 /dev/sdX /dev/sdY /dev/sdZ

Is that a typo, or are you turning those 3 disk mdadm sets into RAID10 as
shown above, instead of the 3-way mirror sets you stated previously? RAID 10
requires a minimum of 4 disks, you have 3. Something isn't right here...

> Then the big stripe:
> mdadm --create /dev/md10 -v --raid-devices=10 --level=stripe
> --metadata=1.0 --chunk=1024 /dev/md{0,5,1,6,2,7,3,8,4,9}

And I'm pretty sure this is the stripe lvcreate needs to know about to fix the
10x performance drop issue. Create a new lvm test volume with the lvcreate
options I've mentioned, and see how it performs against the current 400GB test
volume that's running slow.

--
Stan


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4C3B937C.1080808@hardwarefreak.com">http://lists.debian.org/4C3B937C.1080808@hardwarefreak.com
 
Old 07-12-2010, 10:16 PM
Aaron Toponce
 
Default Very slow LVM performance

On 7/12/2010 4:13 PM, Stan Hoeppner wrote:
> Is that a typo, or are you turning those 3 disk mdadm sets into RAID10 as
> shown above, instead of the 3-way mirror sets you stated previously? RAID 10
> requires a minimum of 4 disks, you have 3. Something isn't right here...

Incorrect. The Linux RAID implementation can do level 10 across 3 disks.
In fact, it can even do it across 2 disks.

http://en.wikipedia.org/wiki/Non-standard_RAID_levels#Linux_MD_RAID_10

--
. O . O . O . . O O . . . O .
. . O . O O O . O . O O . . O
O O O . O . . O O O O . O O O
 

Thread Tools




All times are GMT. The time now is 02:43 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org