On 06.07.2011 09:13, Robert Collins wrote:
> On Wed, Jul 6, 2011 at 6:58 PM, Stefan Bader <firstname.lastname@example.org> wrote:
>> Should have been a reply all but only went to the ml, sorry.
> No worries; I'm not subscribed so thanks for forwarding to me
>> -------- Original Message --------
>> Subject: Re: request for feedback - I have a load balancing patch for dm-raid1.c
>> Date: Tue, 05 Jul 2011 15:44:46 +0200
>> From: Stefan Bader <email@example.com>
>> To: firstname.lastname@example.org
>> On 05.07.2011 13:59, Robert Collins wrote:
>>> Hi, I recently realised
>>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/361733 was
>>> affecting me - I have my desktop machine setup with an onboard BIOS
>>> supported raid setup which ends up running with dm-raid1 targets.
>>> Anyhow, doing a load-balancing heuristic seemed pretty approachable,
>>> so I put one together; I looked at reusing the md raid1.c logic but
>>> the two implementations are pretty far apart. I did borrow what seemed
>>> to be the most significant heuristic - reading from the same target if
>>> the prior read was adjacent to it.
>>> I didn't do the markup-on-read-complete, because with ahci tagging and
>>> delayed reads it was more complex than I wanted to reason about
>>> Anyhow, I'm sure that this is improvable further, but it seems like an
>>> unqualified improvement over the status quo: it load balances reads,
>>> sequential IO is still plenty fast, and it makes it a little clearer
>>> for folk wanting to hack on this whats going on.
>>> I would love a review for correctness and concurrency safety, if
>>> someone has the time.
>>> my patch: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/361733/+attachment/2192144/+files/raid1-loadbalance.patch
>> Hi Robert,
>> I had a quick look at your patch, so here is what comes to my mind. Generally
>> the printk's will probably not be liked much. Even with going to debug they will
>> be emitted quite often. And that uses more CPU, slows down processing and could
>> cause logs to grow.
>> The idea of sector distance is not bad but maybe a combination of just not
>> switching paths every request and using a merge function would be preferred
>> (there is dm-crypt and dm-stripe which do this). There is also dm-mpath which
>> was changed from using bios to use requests, which may serve a similar benefit.
>> In the end, if possible any read sent to one path should be as large as
>> possible. Writes would benefit there as well as the drives could optimize.
>> But for a minimal intrusive/effort approach it maybe helps to change to switch
>> the mirror path every x requests to get a sort of grouping of them...
> Yeah, i could see doing those things perhaps helping. I've make the
> printks disable like other dm- modules do now, I will refresh the
> patch after any other feedback.
> The current code results in merges happening after the mirror
> selection. I think even with a merge function we'd need mapping still
> - so the question in my mind is, if we had a merge fn, would the
> mapping be a lot simpler?
I thought of that a bit and I think the merge fn actually targets a different
problem or at least is not as efficient as making the target mapping requests
instead of bios.
> It might be I guess: if the merged IO'd are big enough (specifically
> if they are cylinder at a time) then you can elevator seek cylinder to
> cylinder in a round-robin fashion and get concurrent reading going on
> - great for kernel readahead. OTOH I'm not sure how to make sure that
> happens, and if the merge is too big we'd end up just dispatching
> everything to one drive anyway.
This needs to be verified, but I think that there is a limit to the size of a
request as well. So it should get large read or writes requests but still be
broken up if something is trying to do lots more.
> A larger concern is when concurrent reads are happening from two
> sources: there the optimal thing is to satisfy one sources reads from
> one disk, and the other from the other disk.
Question is whether request merging is bound to a source. At least for
asynchronous writes those are coming from the cache and are done by a kernel
thread. So maybe there is not so much to worry.
> I think a thing that is needed is something to prevent one drive being
> ignored because someone reads sector 2Billion or something. I don't
> think its needed in a first cut (because it degrades to the current
> Anyhow, we're deep into tradeoff territory here; what I was most
> worried about was style and correctness which your comments on the
> printk issue have helped with a lot.
Just from the coding I think the path lookup would be doable and maybe even
simpler to do in just one loop. But I thought that would be a secondary worry.
kernel-team mailing list