FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 04-28-2011, 03:55 PM
Adam Chasen
 
Default Round-robin performance limit

I am very pleased with the features and even some of the documentation
(once I found the RedHat docs) surrounding the latest .49 multipath
tools.

I would imagine that the multipath driver would attempt to max out the
links available to it, but I am not seeing that behavior. I am unable
to achieve bandwidth greater than the value of one of the four links.
Is this the expected behavior?

It is balancing the traffic between the links and when a path fails
the bandwidth increases proportionately between the remaining links. I
originally thought this may be a problem outside of multipath, but
accessing the devices directly allows me to max out all of my links.

If there is a more appropriate venue for this question, I would
appreciate a redirection.

The current setup is as follows:
* iSCSI with 4 portals with two LUNs defined
* server connected to each portal over 4 Gigabit ports (1 to 1 mapping
of ports) yielding 4 devices for each LUN, 8 devices total

There is one device per LUN per portal connection. Multipathing is
enabled with multibus so the round robin will leverage all (4) devices
available per LUN.

I have experienced the following scenarios. I used dd (reading from
device) and bmon (network interface monitor) for all of these tests.
Note that the bandwidth never exceeds 113MB/s.
* direct (no multipath)
** all links fully saturated
** bandwidth close to theoretical max of the gigabit connection (113MB/s).
* all dvices active (multipath)
** all links equally balanced
** links show 1/4 saturation (~30MB/s)
** bandwidth ~113MB/s
* 3 of 4 devices active (multipath)
** remaining links equally balanced
** remaining links show 1/3 saturation (~40MB/s)
** bandwidth ~113MB/s
* 2 of 4 devices active (multipath)
** active links equally balanced
** active links show 1/2 saturation (~60MB/s)
** bandwidth ~113MB/s
* 1 of 4 devices active (multipath)
** active links equally balanced
** active link shows full saturation (~113MB/s)
** bandwidth ~113MB/s

To ensure that it is not the transport or backing storage, I dd from
the direct device, during the tests where the links were not fully
saturated and I was able to fully saturate the link.

[root@zed ~]# multipath -ll
3600c0ff000111346d473554d01000000 dm-3 DotHill,DH3000
size=1.1T features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 88:0:0:0 sdd 8:48 active ready running
|- 86:0:0:0 sdc 8:32 active ready running
|- 89:0:0:0 sdg 8:96 active ready running
`- 87:0:0:0 sdf 8:80 active ready running
3600c0ff00011148af973554d01000000 dm-2 DotHill,DH3000
size=1.1T features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
|- 89:0:0:1 sdk 8:160 active ready running
|- 88:0:0:1 sdi 8:128 active ready running
|- 86:0:0:1 sdh 8:112 active ready running
`- 87:0:0:1 sdl 8:176 active ready running

/etc/multipath.conf
defaults {
path_grouping_policy multibus
rr_min_io 100
}

multipath-tools v0.4.9 (05/33, 2016)
2.6.35.11-2-fl.smp.gcc4.4.x86_64

Thanks,
Adam
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 05-02-2011, 07:25 AM
Pasi Kärkkäinen
 
Default Round-robin performance limit

On Thu, Apr 28, 2011 at 11:55:55AM -0400, Adam Chasen wrote:
>
> [root@zed ~]# multipath -ll
> 3600c0ff000111346d473554d01000000 dm-3 DotHill,DH3000
> size=1.1T features='0' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=1 status=active
> |- 88:0:0:0 sdd 8:48 active ready running
> |- 86:0:0:0 sdc 8:32 active ready running
> |- 89:0:0:0 sdg 8:96 active ready running
> `- 87:0:0:0 sdf 8:80 active ready running
> 3600c0ff00011148af973554d01000000 dm-2 DotHill,DH3000
> size=1.1T features='0' hwhandler='0' wp=rw
> `-+- policy='round-robin 0' prio=1 status=active
> |- 89:0:0:1 sdk 8:160 active ready running
> |- 88:0:0:1 sdi 8:128 active ready running
> |- 86:0:0:1 sdh 8:112 active ready running
> `- 87:0:0:1 sdl 8:176 active ready running
>
> /etc/multipath.conf
> defaults {
> path_grouping_policy multibus
> rr_min_io 100
> }

Did you try a lower value for rr_min_io ?

-- Pasi

>
> multipath-tools v0.4.9 (05/33, 2016)
> 2.6.35.11-2-fl.smp.gcc4.4.x86_64
>
> Thanks,
> Adam
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 05-02-2011, 01:36 PM
Adam Chasen
 
Default Round-robin performance limit

Lowering rr_min_io provides marginal improvement. I see 6MB/s
improvement at an rr_min_io of 3 vs 100. I played around with it
before all the way down to 1. People seems to settle on 3. Still, I am
not seeing the bandwidth I assume it should be (4 aggregated links).

Some additional information. If I attempt to pull from my two
multipath devices simultaneously (different LUNs, but same iSCSI
connections) then I can pull additional data (50MB/s vs 27-30MB/s
from each link).

Adam

This is a response to a direct email I sent to someone who had a
similar issue on this list a while back:
Date: Sat, 30 Apr 2011 00:13:20 +0200
From: Bart Coninckx <bart.coninckx@telenet.be>
Hi Adam,

I believe setting rr_min_io to 3 in stead of 100 improved things
significantly.
What is still an unexplainable issue though is dd-ing to the multipath
device (very slow) while reading from it is very fast. Doing the same
piped over SSH to the original devices on the iSCSI server was OK, so it
seems like either an iSCSI or still a multipath issue.

But I definitely remember that lowering rr_min_io helped quite a bit.
I think the paths are switched faster in this way resulting into more speed.

Good luck,

b.


On Mon, May 2, 2011 at 3:25 AM, Pasi Kärkkäinen <pasik@iki.fi> wrote:
> On Thu, Apr 28, 2011 at 11:55:55AM -0400, Adam Chasen wrote:
>>
>> [root@zed ~]# multipath -ll
>> 3600c0ff000111346d473554d01000000 dm-3 DotHill,DH3000
>> size=1.1T features='0' hwhandler='0' wp=rw
>> `-+- policy='round-robin 0' prio=1 status=active
>> * |- 88:0:0:0 sdd 8:48 *active ready *running
>> * |- 86:0:0:0 sdc 8:32 *active ready *running
>> * |- 89:0:0:0 sdg 8:96 *active ready *running
>> * `- 87:0:0:0 sdf 8:80 *active ready *running
>> 3600c0ff00011148af973554d01000000 dm-2 DotHill,DH3000
>> size=1.1T features='0' hwhandler='0' wp=rw
>> `-+- policy='round-robin 0' prio=1 status=active
>> * |- 89:0:0:1 sdk 8:160 active ready *running
>> * |- 88:0:0:1 sdi 8:128 active ready *running
>> * |- 86:0:0:1 sdh 8:112 active ready *running
>> * `- 87:0:0:1 sdl 8:176 active ready *running
>>
>> /etc/multipath.conf
>> defaults {
>> * * * * path_grouping_policy * *multibus
>> * * * * rr_min_io 100
>> }
>
> Did you try a lower value for rr_min_io ?
>
> -- Pasi
>
>>
>> multipath-tools v0.4.9 (05/33, 2016)
>> 2.6.35.11-2-fl.smp.gcc4.4.x86_64
>>
>> Thanks,
>> Adam
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 05-02-2011, 10:27 PM
"John A. Sullivan III"
 
Default Round-robin performance limit

On Mon, 2011-05-02 at 09:36 -0400, Adam Chasen wrote:
> Lowering rr_min_io provides marginal improvement. I see 6MB/s
> improvement at an rr_min_io of 3 vs 100. I played around with it
> before all the way down to 1. People seems to settle on 3. Still, I am
> not seeing the bandwidth I assume it should be (4 aggregated links).
>
> Some additional information. If I attempt to pull from my two
> multipath devices simultaneously (different LUNs, but same iSCSI
> connections) then I can pull additional data (50MB/s vs 27-30MB/s
> from each link).
>
> Adam
>
> This is a response to a direct email I sent to someone who had a
> similar issue on this list a while back:
> Date: Sat, 30 Apr 2011 00:13:20 +0200
> From: Bart Coninckx <bart.coninckx@telenet.be>
> Hi Adam,
>
> I believe setting rr_min_io to 3 in stead of 100 improved things
> significantly.
> What is still an unexplainable issue though is dd-ing to the multipath
> device (very slow) while reading from it is very fast. Doing the same
> piped over SSH to the original devices on the iSCSI server was OK, so it
> seems like either an iSCSI or still a multipath issue.
>
> But I definitely remember that lowering rr_min_io helped quite a bit.
> I think the paths are switched faster in this way resulting into more speed.
>
> Good luck,
>
> b.
>
>
> On Mon, May 2, 2011 at 3:25 AM, Pasi Kärkkäinen <pasik@iki.fi> wrote:
> > On Thu, Apr 28, 2011 at 11:55:55AM -0400, Adam Chasen wrote:
> >>
> >> [root@zed ~]# multipath -ll
> >> 3600c0ff000111346d473554d01000000 dm-3 DotHill,DH3000
> >> size=1.1T features='0' hwhandler='0' wp=rw
> >> `-+- policy='round-robin 0' prio=1 status=active
> >> |- 88:0:0:0 sdd 8:48 active ready running
> >> |- 86:0:0:0 sdc 8:32 active ready running
> >> |- 89:0:0:0 sdg 8:96 active ready running
> >> `- 87:0:0:0 sdf 8:80 active ready running
> >> 3600c0ff00011148af973554d01000000 dm-2 DotHill,DH3000
> >> size=1.1T features='0' hwhandler='0' wp=rw
> >> `-+- policy='round-robin 0' prio=1 status=active
> >> |- 89:0:0:1 sdk 8:160 active ready running
> >> |- 88:0:0:1 sdi 8:128 active ready running
> >> |- 86:0:0:1 sdh 8:112 active ready running
> >> `- 87:0:0:1 sdl 8:176 active ready running
> >>
> >> /etc/multipath.conf
> >> defaults {
> >> path_grouping_policy multibus
> >> rr_min_io 100
> >> }
> >
> > Did you try a lower value for rr_min_io ?
> >
> > -- Pasi
> >
> >>
> >> multipath-tools v0.4.9 (05/33, 2016)
> >> 2.6.35.11-2-fl.smp.gcc4.4.x86_64
<snip>
I'm quite curious to see what you ultimately find on this as we have a
similar setup (four paths to an iSCSI SAN) and have struggled quite a
bit. We had settled on using multipath for failover but load balancing
using software RAID0 across the four devices. That seemed to provide
more even scaling under various IO patterns until we realized we could
not take a transactionally consistent snapshot of the SAN because we
would not know which RAID transaction had been committed at the timeof
the snapshot. Thus, we are planning to implement multibus.

What scheduler are you using? We found that the default cfq scheduler in
our kernel versions (2.6.28 and 29) did not scale at all to the number
of parallel iSCSI sessions. Deadline or noop scaled almost linearly.
We then realized that our SAN (Nexenta running ZFS) was doing its own
optimization of writing to the physical media (we assumed that's what
the scheduler is for) so we had no need for the overhead of any
scheduler and set ours to noop except for local disks.

I'm also very curious about your findings on rr_min_io. I cannot find
my benchmarks but we tested various settings heavily. I do not recall
if we saw more even scaling with 10 or 100. I remember being surprised
that performance with it set to 1 was poor. I would have thought that,
in a bonded environment, changing paths per iSCSI command would give
optimal performance. Can anyone explain why it does not?

We speculated that it either added too much overhead to manage the
constant switching or it was the nature of iSCSI. Does each iSCSI
command need to be acknowledged before the next one can be sent? If so,
does multibus not increase throughput any individual iSCSI stream but
only as we multiplex iSCSI streams?

If that is the case, it would exacerbate the already significant problem
of Linux, iSCSI, and latency. We have found that in any Linux disk IO
that touches the Linux file system, iSCSI performance is quite poor
because it is latency bound due to the maximum 4KB page size. I'm only
parroting what others have told me so correct me if I am wrong. Since
iSCSI can only commit 4KB at a time in Linux (unless bypassing the file
system with raw devices, dd, or direct writes in something like Oracle),
and since each write needs to be acknowledged before the next is sent,
and because sending 4KB down a high speed pipe like 10Gbps or even 1Gbps
comes nowhere near to saturating the link, iSCSI Linux IO is latency
bound and no amount of increase in bandwidth or number of bound channels
will increase the throughput of an individual iSCSI stream. Only
minimizing latency will.

I hope some of that might have helped and look forward to hearing about
your optimization of multibus. Thanks - John


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 05-03-2011, 05:04 AM
Malahal Naineni
 
Default Round-robin performance limit

John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
> I'm also very curious about your findings on rr_min_io. I cannot find
> my benchmarks but we tested various settings heavily. I do not recall
> if we saw more even scaling with 10 or 100. I remember being surprised
> that performance with it set to 1 was poor. I would have thought that,
> in a bonded environment, changing paths per iSCSI command would give
> optimal performance. Can anyone explain why it does not?

rr_min_io of 1 will give poor performance if your multipath kernel
module doesn't support request based multipath. In those BIO based
multipath, multipath receives 4KB requests. Such requests can't be
coalesced if they are sent on different paths.

Thanks, Malahal.
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 05-03-2011, 10:12 AM
"John A. Sullivan III"
 
Default Round-robin performance limit

On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
> > I'm also very curious about your findings on rr_min_io. I cannot find
> > my benchmarks but we tested various settings heavily. I do not recall
> > if we saw more even scaling with 10 or 100. I remember being surprised
> > that performance with it set to 1 was poor. I would have thought that,
> > in a bonded environment, changing paths per iSCSI command would give
> > optimal performance. Can anyone explain why it does not?
>
> rr_min_io of 1 will give poor performance if your multipath kernel
> module doesn't support request based multipath. In those BIO based
> multipath, multipath receives 4KB requests. Such requests can't be
> coalesced if they are sent on different paths.
<snip>
Ah, that makes perfect sense and why 3 seems to be the magic number in
Linux (4000 / 1460 (or whatever IP payload is)). Does that change with
Jumbo frames? In fact, how would that be optimized in Linux?

9KB seems to be a reasonable common jumbo frame value for various
vendors and that should contain two pages but, I would guess, Linux
can't utilize it as each block must be independently acknowledged. Is
that correct? Thus a frame size of a little over 4KB would be optimal
for Linux?

Would that mean that rr_min_io of 1 would become optimal? However, if
each block needs to be acknowledged before the next is sent, I would
think we are still latency bound, i.e., even if I can send four requests
down four separate paths, I cannot send the second until the first has
been acknowledged and since I can easily place four packets on the same
path within the latency period of four packets, multibus gives me
absolutely no performance advantage for a single iSCSI stream and only
proves useful as I start multiplexing multiple iSCSI streams.

Is that analysis correct? If so, what constitutes a separate iSCSI
stream? Are two separate file requests from the same file systems to the
same iSCSI device considered two iSCSI streams and thus can be
multiplexed and benefit from multipath or are they considered all part
of the same iSCSI stream? If they are considered one, do they become two
if they reside on different partitions and thus different file systems?
If not, then do we only see multibus performance gains between a single
file system host and a single iSCSI host when we use virtualization each
with their own iSCSI connection (as opposed to using iSCSI connections
in the underlying host and exposing them to the virtual machines as
local storage)?

I hope I'm not hijacking this thread and realize I've asked some
convoluted questions but optimizing multibus through bonded links for
single large hosts is still a bit of a mystery to me. Thanks - John

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-04-2011, 03:08 AM
Adam Chasen
 
Default Round-robin performance limit

Malahal,
After your mentioning bio vs request based I attempted to determine if
my kernel contains the request based mpath. It seems in 2.6.31 all
mpath was switched to request based. I have a kernel 2.6.31+ (actually
.35 and .38), so I believe I have requrest-based mpath.

All,
There also appears to be a new multipath configuration option
documented in the RHEL 6 beta documentation:
rr_min_io_rq Specifies the number of I/O requests to route to a path
before switching to the next path in the current path group, using
request-based device-mapper-multipath. This setting should be used on
systems running current kernels. On systems running kernels older than
2.6.31, use rr_min_io. The default value is 1.

http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/DM_Multipath/config_file_multipath.html

I have not tested using this setting vs rr_min_io yet or even if my
system supports the configuration directive.

If I trust some of the claims of several VMware ESX iscsi multipath
setups, it is possible (possibly using different software) to gain a
multiplicative throughput by adding additional Ethernet links. This
makes me hopeful that we can do this with open-iscsi and dm-mulitpath
as well.

It could be something obvious I am missing, but it appears a lot of
people experience this same issue.

Thanks,
Adam

On Tue, May 3, 2011 at 6:12 AM, John A. Sullivan III
<jsullivan@opensourcedevel.com> wrote:
> On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
>> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
>> > I'm also very curious about your findings on rr_min_io. *I cannot find
>> > my benchmarks but we tested various settings heavily. *I do not recall
>> > if we saw more even scaling with 10 or 100. *I remember being surprised
>> > that performance with it set to 1 was poor. *I would have thought that,
>> > in a bonded environment, changing paths per iSCSI command would give
>> > optimal performance. *Can anyone explain why it does not?
>>
>> rr_min_io of 1 will give poor performance if your multipath kernel
>> module doesn't support request based multipath. In those BIO based
>> multipath, multipath receives 4KB requests. Such requests can't be
>> coalesced if they are sent on different paths.
> <snip>
> Ah, that makes perfect sense and why 3 seems to be the magic number in
> Linux (4000 / 1460 (or whatever IP payload is)). *Does that change with
> Jumbo frames? In fact, how would that be optimized in Linux?
>
> 9KB seems to be a reasonable common jumbo frame value for various
> vendors and that should contain two pages but, I would guess, Linux
> can't utilize it as each block must be independently acknowledged. Is
> that correct? Thus a frame size of a little over 4KB would be optimal
> for Linux?
>
> Would that mean that rr_min_io of 1 would become optimal? However, if
> each block needs to be acknowledged before the next is sent, I would
> think we are still latency bound, i.e., even if I can send four requests
> down four separate paths, I cannot send the second until the first has
> been acknowledged and since I can easily place four packets on the same
> path within the latency period of four packets, multibus gives me
> absolutely no performance advantage for a single iSCSI stream and only
> proves useful as I start multiplexing multiple iSCSI streams.
>
> Is that analysis correct? If so, what constitutes a separate iSCSI
> stream? Are two separate file requests from the same file systems to the
> same iSCSI device considered two iSCSI streams and thus can be
> multiplexed and benefit from multipath or are they considered all part
> of the same iSCSI stream? If they are considered one, do they become two
> if they reside on different partitions and thus different file systems?
> If not, then do we only see multibus performance gains between a single
> file system host and a single iSCSI host when we use virtualization each
> with their own iSCSI connection (as opposed to using iSCSI connections
> in the underlying host and exposing them to the virtual machines as
> local storage)?
>
> I hope I'm not hijacking this thread and realize I've asked some
> convoluted questions but optimizing multibus through bonded links for
> single large hosts is still a bit of a mystery to me. *Thanks - John
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-04-2011, 08:19 PM
Adam Chasen
 
Default Round-robin performance limit

Unfortunately even with playing around with various settings, queues,
and other techniques, I was never able to exceed the bandwidth of more
than one of the Ethernet links when accessing a single multipathed
LUN.

When communicating with two different multipathed LUNs, which present
as two different multipath devices, I can saturate two links, but it
is still a one to one ratio of multipath devices to link saturation.

After further research on multipathing, it appears people are using md
raid to achieve multipathed devices. My initial testing of using raid0
md-raid device produces the behavior I expect of multipathed devices.
I can easily saturate both links during read operations.

I feel using md-raid is a less elegant solution than using
dm-multipath, but it will have to suffice until someone can provide me
some additional guidance.

Thanks,
Adam

On Mon, Oct 3, 2011 at 11:08 PM, Adam Chasen <adam@chasen.name> wrote:
> Malahal,
> After your mentioning bio vs request based I attempted to determine if
> my kernel contains the request based mpath. It seems in 2.6.31 all
> mpath was switched to request based. I have a kernel 2.6.31+ (actually
> .35 and .38), so I believe I have requrest-based mpath.
>
> All,
> There also appears to be a new multipath configuration option
> documented in the RHEL 6 beta documentation:
> rr_min_io_rq * *Specifies the number of I/O requests to route to a path
> before switching to the next path in the current path group, using
> request-based device-mapper-multipath. This setting should be used on
> systems running current kernels. On systems running kernels older than
> 2.6.31, use rr_min_io. The default value is 1.
>
> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/DM_Multipath/config_file_multipath.html
>
> I have not tested using this setting vs rr_min_io yet or even if my
> system supports the configuration directive.
>
> If I trust some of the claims of several VMware ESX iscsi multipath
> setups, it is possible (possibly using different software) to gain a
> multiplicative throughput by adding additional Ethernet links. This
> makes me hopeful that we can do this with open-iscsi and dm-mulitpath
> as well.
>
> It could be something obvious I am missing, but it appears a lot of
> people experience this same issue.
>
> Thanks,
> Adam
>
> On Tue, May 3, 2011 at 6:12 AM, John A. Sullivan III
> <jsullivan@opensourcedevel.com> wrote:
>> On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
>>> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
>>> > I'm also very curious about your findings on rr_min_io. *I cannot find
>>> > my benchmarks but we tested various settings heavily. *I do not recall
>>> > if we saw more even scaling with 10 or 100. *I remember being surprised
>>> > that performance with it set to 1 was poor. *I would have thought that,
>>> > in a bonded environment, changing paths per iSCSI command would give
>>> > optimal performance. *Can anyone explain why it does not?
>>>
>>> rr_min_io of 1 will give poor performance if your multipath kernel
>>> module doesn't support request based multipath. In those BIO based
>>> multipath, multipath receives 4KB requests. Such requests can't be
>>> coalesced if they are sent on different paths.
>> <snip>
>> Ah, that makes perfect sense and why 3 seems to be the magic number in
>> Linux (4000 / 1460 (or whatever IP payload is)). *Does that change with
>> Jumbo frames? In fact, how would that be optimized in Linux?
>>
>> 9KB seems to be a reasonable common jumbo frame value for various
>> vendors and that should contain two pages but, I would guess, Linux
>> can't utilize it as each block must be independently acknowledged. Is
>> that correct? Thus a frame size of a little over 4KB would be optimal
>> for Linux?
>>
>> Would that mean that rr_min_io of 1 would become optimal? However, if
>> each block needs to be acknowledged before the next is sent, I would
>> think we are still latency bound, i.e., even if I can send four requests
>> down four separate paths, I cannot send the second until the first has
>> been acknowledged and since I can easily place four packets on the same
>> path within the latency period of four packets, multibus gives me
>> absolutely no performance advantage for a single iSCSI stream and only
>> proves useful as I start multiplexing multiple iSCSI streams.
>>
>> Is that analysis correct? If so, what constitutes a separate iSCSI
>> stream? Are two separate file requests from the same file systems to the
>> same iSCSI device considered two iSCSI streams and thus can be
>> multiplexed and benefit from multipath or are they considered all part
>> of the same iSCSI stream? If they are considered one, do they become two
>> if they reside on different partitions and thus different file systems?
>> If not, then do we only see multibus performance gains between a single
>> file system host and a single iSCSI host when we use virtualization each
>> with their own iSCSI connection (as opposed to using iSCSI connections
>> in the underlying host and exposing them to the virtual machines as
>> local storage)?
>>
>> I hope I'm not hijacking this thread and realize I've asked some
>> convoluted questions but optimizing multibus through bonded links for
>> single large hosts is still a bit of a mystery to me. *Thanks - John
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
>>
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Tue Oct 4 23:30:01 2011
Return-path: <centos-bounces@centos.org>
Envelope-to: tom@linux-archive.org
Delivery-date: Tue, 04 Oct 2011 23:15:07 +0300
Received: from mail.centos.org ([72.26.200.202]:58680)
by s2.java-tips.org with esmtp (Exim 4.69)
(envelope-from <centos-bounces@centos.org>)
id 1RBBON-0008Q4-Pf
for tom@linux-archive.org; Tue, 04 Oct 2011 23:15:07 +0300
Received: from mail.centos.org (voxeldev.centos.org [127.0.0.1])
by mail.centos.org (Postfix) with ESMTP id F08B46F7F7;
Tue, 4 Oct 2011 16:24:21 -0400 (EDT)
X-Original-To: centos@centos.org
Delivered-To: centos@centos.org
Received: from carmen.lorenzomartinez.es
(36.Red-88-12-11.staticIP.rima-tde.net [88.12.11.36])
by mail.centos.org (Postfix) with ESMTP id 8272E6F68B
for <centos@centos.org>; Tue, 4 Oct 2011 16:24:20 -0400 (EDT)
Received: (qmail 19177 invoked by uid 7797); 4 Oct 2011 20:24:20 -0000
Received: from 192.168.52.133 by Carmen (envelope-from
<lorenzo@lorenzomartinez.es>, uid 500) with qmail-scanner-2.08
(clamdscan: 0.97.2/13407. Clear:RC:1(192.168.52.133):.
Processed in 0.015722 secs); 04 Oct 2011 20:24:20 -0000
Received: from unknown (HELO LawBook.local) (192.168.52.133)
by Carmen with (DHE-RSA-CAMELLIA256-SHA encrypted) SMTP;
4 Oct 2011 20:24:20 -0000
Message-ID: <4E8B6B74.6020608@lorenzomartinez.es>
Date: Tue, 04 Oct 2011 22:24:20 +0200
From: =?ISO-8859-1?Q?Lorenzo_Martínez_Rodríguez? <lorenzo@lorenzomartinez.es>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6;
rv:7.0) Gecko/20110922 Thunderbird/7.0
MIME-Version: 1.0
To: CentOS mailing list <centos@centos.org>
References: <4E7C655C.3050609@openbios.org>
<1317040829.41052.YahooMailNeo@web114713.mail.gq1. yahoo.com>
<4E8B1030.40907@openbios.org> <4E8B6A01.7090009@conversis.de>
In-Reply-To: <4E8B6A01.7090009@conversis.de>
X-Content-Filtered-By: Mailman/MimeDel 2.1.9
Subject: Re: [CentOS] Problems with Intel Ethernet and module e1000e
X-BeenThere: centos@centos.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: CentOS mailing list <centos@centos.org>
List-Id: CentOS mailing list <centos.centos.org>
List-Unsubscribe: <http://lists.centos.org/mailman/listinfo/centos>,
<mailto:centos-request@centos.org?subject=unsubscribe>
List-Archive: <http://lists.centos.org/pipermail/centos>
List-Post: <mailto:centos@centos.org>
List-Help: <mailto:centos-request@centos.org?subject=help>
List-Subscribe: <http://lists.centos.org/mailman/listinfo/centos>,
<mailto:centos-request@centos.org?subject=subscribe>
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Sender: centos-bounces@centos.org
Errors-To: centos-bounces@centos.org

El 04/10/11 22:18, Dennis Jacobfeuerborn escribió:
> On 10/04/2011 03:54 PM, Volker Poplawski wrote:
>> On 26.09.2011 14:40, John Doe wrote:
>>> From: Volker Poplawski<volker@openbios.org>
>>>
>>>> I'm facing a serious problem with the e100e kernel module for Intel
>>>> 82574L gigabit nics on Centos 6.
>>> I had pbms with my Intel 1000e too.
>>> Installed elrepo's kmod-e1000e and so far so good...
>>>
>>> http://elrepo.org/tiki/kmod-e1000e
>> Follow up:
>>
>> Also installed elrepo's e1000e from above url.
>>
>> No problems so far.
> Activating the CR repo and*updating the kernel might also fix any issues*
> with the e1000e driver (that did the trick for me).

Or create new ones, as happened to me
>
> Regards,
> Dennis
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
>


--


Lorenzo Martinez Rodriguez

Visit me: http://www.lorenzomartinez.es
Mail me to: lorenzo@lorenzomartinez.es
My blog: http://www.securitybydefault.com
My twitter: @lawwait
PGP Fingerprint: 97CC 2584 7A04 B2BA 00F1 76C9 0D76 83A2 9BBC BDE2

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 10-05-2011, 03:07 AM
"John A. Sullivan III"
 
Default Round-robin performance limit

On Tue, 2011-10-04 at 16:19 -0400, Adam Chasen wrote:
> Unfortunately even with playing around with various settings, queues,
> and other techniques, I was never able to exceed the bandwidth of more
> than one of the Ethernet links when accessing a single multipathed
> LUN.
>
> When communicating with two different multipathed LUNs, which present
> as two different multipath devices, I can saturate two links, but it
> is still a one to one ratio of multipath devices to link saturation.
>
> After further research on multipathing, it appears people are using md
> raid to achieve multipathed devices. My initial testing of using raid0
> md-raid device produces the behavior I expect of multipathed devices.
> I can easily saturate both links during read operations.
>
> I feel using md-raid is a less elegant solution than using
> dm-multipath, but it will have to suffice until someone can provide me
> some additional guidance.
>
> Thanks,
> Adam
We recently changed from the RAID0 approach to multipath multibus.
RAID0 did seem to give more even performance over a variety of IO
patterns but it had a critical flaw. We could not use the snapshot
capabilities of the SAN because we could never be certain of
snapshotting the RAID0 disks in a transactionally consistent state. If
I have four disk in a RAID0 array and snapshot them all, how can I be
assured that I have not done something like written two of three stripes
and no parity. This was our singular reason for discarding RAID0 over
iSCSI for multipath multibus - John

>
> On Mon, Oct 3, 2011 at 11:08 PM, Adam Chasen <adam@chasen.name> wrote:
> > Malahal,
> > After your mentioning bio vs request based I attempted to determine if
> > my kernel contains the request based mpath. It seems in 2.6.31 all
> > mpath was switched to request based. I have a kernel 2.6.31+ (actually
> > .35 and .38), so I believe I have requrest-based mpath.
> >
> > All,
> > There also appears to be a new multipath configuration option
> > documented in the RHEL 6 beta documentation:
> > rr_min_io_rq Specifies the number of I/O requests to route to a path
> > before switching to the next path in the current path group, using
> > request-based device-mapper-multipath. This setting should be used on
> > systems running current kernels. On systems running kernels older than
> > 2.6.31, use rr_min_io. The default value is 1.
> >
> > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/DM_Multipath/config_file_multipath.html
> >
> > I have not tested using this setting vs rr_min_io yet or even if my
> > system supports the configuration directive.
> >
> > If I trust some of the claims of several VMware ESX iscsi multipath
> > setups, it is possible (possibly using different software) to gain a
> > multiplicative throughput by adding additional Ethernet links. This
> > makes me hopeful that we can do this with open-iscsi and dm-mulitpath
> > as well.
> >
> > It could be something obvious I am missing, but it appears a lot of
> > people experience this same issue.
> >
> > Thanks,
> > Adam
> >
> > On Tue, May 3, 2011 at 6:12 AM, John A. Sullivan III
> > <jsullivan@opensourcedevel.com> wrote:
> >> On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
> >>> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
> >>> > I'm also very curious about your findings on rr_min_io. I cannot find
> >>> > my benchmarks but we tested various settings heavily. I do not recall
> >>> > if we saw more even scaling with 10 or 100. I remember being surprised
> >>> > that performance with it set to 1 was poor. I would have thought that,
> >>> > in a bonded environment, changing paths per iSCSI command would give
> >>> > optimal performance. Can anyone explain why it does not?
> >>>
> >>> rr_min_io of 1 will give poor performance if your multipath kernel
> >>> module doesn't support request based multipath. In those BIO based
> >>> multipath, multipath receives 4KB requests. Such requests can't be
> >>> coalesced if they are sent on different paths.
> >> <snip>
> >> Ah, that makes perfect sense and why 3 seems to be the magic number in
> >> Linux (4000 / 1460 (or whatever IP payload is)). Does that change with
> >> Jumbo frames? In fact, how would that be optimized in Linux?
> >>
> >> 9KB seems to be a reasonable common jumbo frame value for various
> >> vendors and that should contain two pages but, I would guess, Linux
> >> can't utilize it as each block must be independently acknowledged. Is
> >> that correct? Thus a frame size of a little over 4KB would be optimal
> >> for Linux?
> >>
> >> Would that mean that rr_min_io of 1 would become optimal? However, if
> >> each block needs to be acknowledged before the next is sent, I would
> >> think we are still latency bound, i.e., even if I can send four requests
> >> down four separate paths, I cannot send the second until the first has
> >> been acknowledged and since I can easily place four packets on the same
> >> path within the latency period of four packets, multibus gives me
> >> absolutely no performance advantage for a single iSCSI stream and only
> >> proves useful as I start multiplexing multiple iSCSI streams.
> >>
> >> Is that analysis correct? If so, what constitutes a separate iSCSI
> >> stream? Are two separate file requests from the same file systems to the
> >> same iSCSI device considered two iSCSI streams and thus can be
> >> multiplexed and benefit from multipath or are they considered all part
> >> of the same iSCSI stream? If they are considered one, do they become two
> >> if they reside on different partitions and thus different file systems?
> >> If not, then do we only see multibus performance gains between a single
> >> file system host and a single iSCSI host when we use virtualization each
> >> with their own iSCSI connection (as opposed to using iSCSI connections
> >> in the underlying host and exposing them to the virtual machines as
> >> local storage)?
> >>
> >> I hope I'm not hijacking this thread and realize I've asked some
> >> convoluted questions but optimizing multibus through bonded links for
> >> single large hosts is still a bit of a mystery to me. Thanks - John
> >>
> >> --
> >> dm-devel mailing list
> >> dm-devel@redhat.com
> >> https://www.redhat.com/mailman/listinfo/dm-devel
> >>
> >
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-05-2011, 07:54 PM
Adam Chasen
 
Default Round-robin performance limit

John,
I am limited in a similar fashion. I would much prefer to use multibus
multipath, but was unable to achieve bandwidth which would exceed a
single link even though it was spread over the 4 available links. Were
you able to gain even a similar performance of the RAID0 setup with
the multibus multipath?

Thanks,
Adam

On Tue, Oct 4, 2011 at 11:07 PM, John A. Sullivan III
<jsullivan@opensourcedevel.com> wrote:
> On Tue, 2011-10-04 at 16:19 -0400, Adam Chasen wrote:
>> Unfortunately even with playing around with various settings, queues,
>> and other techniques, I was never able to exceed the bandwidth of more
>> than one of the Ethernet links when accessing a single multipathed
>> LUN.
>>
>> When communicating with two different multipathed LUNs, which present
>> as two different multipath devices, I can saturate two links, but it
>> is still a one to one ratio of multipath devices to link saturation.
>>
>> After further research on multipathing, it appears people are using md
>> raid to achieve multipathed devices. My initial testing of using raid0
>> md-raid device produces the behavior I expect of multipathed devices.
>> I can easily saturate both links during read operations.
>>
>> I feel using md-raid is a less elegant solution than using
>> dm-multipath, but it will have to suffice until someone can provide me
>> some additional guidance.
>>
>> Thanks,
>> Adam
> We recently changed from the RAID0 approach to multipath multibus.
> RAID0 did seem to give more even performance over a variety of IO
> patterns but it had a critical flaw. *We could not use the snapshot
> capabilities of the SAN because we could never be certain of
> snapshotting the RAID0 disks in a transactionally consistent state. *If
> I have four disk in a RAID0 array and snapshot them all, how can I be
> assured that I have not done something like written two of three stripes
> and no parity. *This was our singular reason for discarding RAID0 over
> iSCSI for multipath multibus - John
>
>>
>> On Mon, Oct 3, 2011 at 11:08 PM, Adam Chasen <adam@chasen.name> wrote:
>> > Malahal,
>> > After your mentioning bio vs request based I attempted to determine if
>> > my kernel contains the request based mpath. It seems in 2.6.31 all
>> > mpath was switched to request based. I have a kernel 2.6.31+ (actually
>> > .35 and .38), so I believe I have requrest-based mpath.
>> >
>> > All,
>> > There also appears to be a new multipath configuration option
>> > documented in the RHEL 6 beta documentation:
>> > rr_min_io_rq * *Specifies the number of I/O requests to route to a path
>> > before switching to the next path in the current path group, using
>> > request-based device-mapper-multipath. This setting should be used on
>> > systems running current kernels. On systems running kernels older than
>> > 2.6.31, use rr_min_io. The default value is 1.
>> >
>> > http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6-Beta/html/DM_Multipath/config_file_multipath.html
>> >
>> > I have not tested using this setting vs rr_min_io yet or even if my
>> > system supports the configuration directive.
>> >
>> > If I trust some of the claims of several VMware ESX iscsi multipath
>> > setups, it is possible (possibly using different software) to gain a
>> > multiplicative throughput by adding additional Ethernet links. This
>> > makes me hopeful that we can do this with open-iscsi and dm-mulitpath
>> > as well.
>> >
>> > It could be something obvious I am missing, but it appears a lot of
>> > people experience this same issue.
>> >
>> > Thanks,
>> > Adam
>> >
>> > On Tue, May 3, 2011 at 6:12 AM, John A. Sullivan III
>> > <jsullivan@opensourcedevel.com> wrote:
>> >> On Mon, 2011-05-02 at 22:04 -0700, Malahal Naineni wrote:
>> >>> John A. Sullivan III [jsullivan@opensourcedevel.com] wrote:
>> >>> > I'm also very curious about your findings on rr_min_io. *I cannot find
>> >>> > my benchmarks but we tested various settings heavily. *I do not recall
>> >>> > if we saw more even scaling with 10 or 100. *I remember being surprised
>> >>> > that performance with it set to 1 was poor. *I would have thought that,
>> >>> > in a bonded environment, changing paths per iSCSI command would give
>> >>> > optimal performance. *Can anyone explain why it does not?
>> >>>
>> >>> rr_min_io of 1 will give poor performance if your multipath kernel
>> >>> module doesn't support request based multipath. In those BIO based
>> >>> multipath, multipath receives 4KB requests. Such requests can't be
>> >>> coalesced if they are sent on different paths.
>> >> <snip>
>> >> Ah, that makes perfect sense and why 3 seems to be the magic number in
>> >> Linux (4000 / 1460 (or whatever IP payload is)). *Does that change with
>> >> Jumbo frames? In fact, how would that be optimized in Linux?
>> >>
>> >> 9KB seems to be a reasonable common jumbo frame value for various
>> >> vendors and that should contain two pages but, I would guess, Linux
>> >> can't utilize it as each block must be independently acknowledged. Is
>> >> that correct? Thus a frame size of a little over 4KB would be optimal
>> >> for Linux?
>> >>
>> >> Would that mean that rr_min_io of 1 would become optimal? However, if
>> >> each block needs to be acknowledged before the next is sent, I would
>> >> think we are still latency bound, i.e., even if I can send four requests
>> >> down four separate paths, I cannot send the second until the first has
>> >> been acknowledged and since I can easily place four packets on the same
>> >> path within the latency period of four packets, multibus gives me
>> >> absolutely no performance advantage for a single iSCSI stream and only
>> >> proves useful as I start multiplexing multiple iSCSI streams.
>> >>
>> >> Is that analysis correct? If so, what constitutes a separate iSCSI
>> >> stream? Are two separate file requests from the same file systems to the
>> >> same iSCSI device considered two iSCSI streams and thus can be
>> >> multiplexed and benefit from multipath or are they considered all part
>> >> of the same iSCSI stream? If they are considered one, do they become two
>> >> if they reside on different partitions and thus different file systems?
>> >> If not, then do we only see multibus performance gains between a single
>> >> file system host and a single iSCSI host when we use virtualization each
>> >> with their own iSCSI connection (as opposed to using iSCSI connections
>> >> in the underlying host and exposing them to the virtual machines as
>> >> local storage)?
>> >>
>> >> I hope I'm not hijacking this thread and realize I've asked some
>> >> convoluted questions but optimizing multibus through bonded links for
>> >> single large hosts is still a bit of a mystery to me. *Thanks - John
>> >>
>> >> --
>> >> dm-devel mailing list
>> >> dm-devel@redhat.com
>> >> https://www.redhat.com/mailman/listinfo/dm-devel
>> >>
>> >
>>
>> --
>> dm-devel mailing list
>> dm-devel@redhat.com
>> https://www.redhat.com/mailman/listinfo/dm-devel
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 08:42 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org