FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 04-11-2011, 12:11 PM
Jens Axboe
 
Default block: remove per-queue plugging

On 2011-04-11 14:05, NeilBrown wrote:
> On Mon, 11 Apr 2011 13:37:20 +0200 Jens Axboe <jaxboe@fusionio.com> wrote:
>
>> On 2011-04-11 13:26, NeilBrown wrote:
>>> On Mon, 11 Apr 2011 13:04:26 +0200 Jens Axboe <jaxboe@fusionio.com> wrote:
>>>
>>>>>
>>>>> I'm sure one of us is missing something (probably both) but I'm not
>>>>> sure what.
>>>>>
>>>>> The callback is central.
>>>>>
>>>>> It is simply to use plugging in md.
>>>>> Just like blk-core, md will notice that a blk_plug is active and will put
>>>>> requests aside. I then need something to call in to md when blk_finish_plug
>>>>
>>>> But this is done in __make_request(), so md devices should not be
>>>> affected at all. This is the part of your explanation that I do not
>>>> connect with the code.
>>>>
>>>> If md itself is putting things on the plug list, why is it doing that?
>>>
>>> Yes. Exactly. md itself want to put things aside on some list.
>>> e.g. in RAID1 when using a write-intent bitmap I want to gather as many write
>>> requests as possible so I can update the bits for all of them at once.
>>> So when a plug is in effect I just queue the bios somewhere and record the
>>> bits that need to be set.
>>> Then when the unplug happens I write out the bitmap updates in a single write
>>> and when that completes, I write out the data (to all devices).
>>>
>>> Also in RAID5 it is good if I can wait for lots of write request to arrive
>>> before committing any of them to increase the possibility of getting a
>>> full-stripe write.
>>>
>>> Previously I used ->unplug_fn to release the queued requests. Now that has
>>> gone I need a different way to register a callback when an unplug happens.
>>
>> Ah, so this is what I was hinting at. But why use the task->plug for
>> that? Seems a bit counter intuitive. Why can't you just store these
>> internally?
>>
>>>
>>>>
>>>>> is called so that put-aside requests can be released.
>>>>> As md can be built as a module, that call must be a call-back of some sort.
>>>>> blk-core doesn't need to register blk_plug_flush because that is never in a
>>>>> module, so it can be called directly. But the md equivalent could be in a
>>>>> module, so I need to be able to register a call back.
>>>>>
>>>>> Does that help?
>>>>
>>>> Not really. Is the problem that _you_ would like to stash things aside,
>>>> not the fact that __make_request() puts things on a task plug list?
>>>>
>>>
>>> Yes, exactly. I (in md) want to stash things aside.
>>>
>>> (I don't actually put the stashed things on the blk_plug, though it might
>>> make sense to do that later in some cases - I'm not sure. Currently I stash
>>> things in my own internal lists and just need a call back to say "ok, flush
>>> those lists now").
>>
>> So we are making some progress... The thing I then don't understand is
>> why you want to make it associated with the plug? Seems you don't have
>> any scheduling restrictions, and in which case just storing them in md
>> seems like a much better option.
>>
>
> Yes. But I need to know when to release the requests that I have stored.
> I need to know when ->write_pages or ->read_pages or whatever has finished
> submitting a pile of pages so that I can start processing the request that I
> have put aside. So I need a callback from blk_finish_plug.

OK fair enough, I'll add your callback patch.

--
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-11-2011, 12:12 PM
Jens Axboe
 
Default block: remove per-queue plugging

On 2011-04-11 13:55, NeilBrown wrote:
> On Mon, 11 Apr 2011 20:59:28 +1000 NeilBrown <neilb@suse.de> wrote:
>
>> On Mon, 11 Apr 2011 11:19:58 +0200 Jens Axboe <jaxboe@fusionio.com> wrote:
>>
>>> On 2011-04-11 06:50, NeilBrown wrote:
>>
>>>> The only explanation I can come up with is that very occasionally schedule on
>>>> 2 separate cpus calls blk_flush_plug for the same task. I don't understand
>>>> the scheduler nearly well enough to know if or how that can happen.
>>>> However with this patch in place I can write to a RAID1 constantly for half
>>>> an hour, and without it, the write rarely lasts for 3 minutes.
>>>
>>> Or perhaps if the request_fn blocks, that would be problematic. So the
>>> patch is likely a good idea even for that case.
>>>
>>> I'll merge it, changing it to list_splice_init() as I think that would
>>> be more clear.
>>
>> OK - though I'm not 100% the patch fixes the problem - just that it hides the
>> symptom for me.
>> I might try instrumenting the code a bit more and see if I can find exactly
>> where it is re-entering flush_plug_list - as that seems to be what is
>> happening.
>
> OK, I found how it re-enters.
>
> The request_fn doesn't exactly block, but when scsi_request_fn calls
> spin_unlock_irq, this calls preempt_enable which can call schedule, which is
> a recursive call.
>
> The patch I provided will stop that from recursing again as the blk_plug.list
> will be empty.
>
> So it is almost what you suggested, however the request_fn doesn't block, it
> just enabled preempt.
>
>
> So the comment I would put at the top of that patch would be something like:

Ah, so it was pretty close. That does explain it. I've already queued up
the patch, I'll ammend the commit message.

--
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-11-2011, 12:36 PM
NeilBrown
 
Default block: remove per-queue plugging

On Mon, 11 Apr 2011 14:11:58 +0200 Jens Axboe <jaxboe@fusionio.com> wrote:

> > Yes. But I need to know when to release the requests that I have stored.
> > I need to know when ->write_pages or ->read_pages or whatever has finished
> > submitting a pile of pages so that I can start processing the request that I
> > have put aside. So I need a callback from blk_finish_plug.
>
> OK fair enough, I'll add your callback patch.
>

Thanks. I'll queue up my md fixes to follow it once it gets to -linus.

NeilBrown

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-11-2011, 12:48 PM
Jens Axboe
 
Default block: remove per-queue plugging

On 2011-04-11 14:36, NeilBrown wrote:
> On Mon, 11 Apr 2011 14:11:58 +0200 Jens Axboe <jaxboe@fusionio.com> wrote:
>
>>> Yes. But I need to know when to release the requests that I have stored.
>>> I need to know when ->write_pages or ->read_pages or whatever has finished
>>> submitting a pile of pages so that I can start processing the request that I
>>> have put aside. So I need a callback from blk_finish_plug.
>>
>> OK fair enough, I'll add your callback patch.
>>
>
> Thanks. I'll queue up my md fixes to follow it once it gets to -linus.

Great, once you do that and XFS kills the blk_flush_plug() calls too,
then we can remove that export and make it internal only.

--
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-11-2011, 04:59 PM
"hch@infradead.org"
 
Default block: remove per-queue plugging

On Mon, Apr 11, 2011 at 02:50:22PM +1000, NeilBrown wrote:
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 273d60b..903ce8d 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -2674,19 +2674,23 @@ static void flush_plug_list(struct blk_plug *plug)
> struct request_queue *q;
> unsigned long flags;
> struct request *rq;
> + struct list_head head;
>
> BUG_ON(plug->magic != PLUG_MAGIC);
>
> if (list_empty(&plug->list))
> return;
> + list_add(&head, &plug->list);
> + list_del_init(&plug->list);
>
> if (plug->should_sort)
> - list_sort(NULL, &plug->list, plug_rq_cmp);
> + list_sort(NULL, &head, plug_rq_cmp);
> + plug->should_sort = 0;

As Jens mentioned this should be list_splice_init. But looking over
flush_plug_list the code there seems strange to me.

What does the local_irq_save in flush_plug_list protect? Why don't
we need it over the list_sort? And do we still need it when first
splicing the list to a local one?

It's one of these cases where I'd really like to see more comments
explaining why the code is doing what it's doing.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-11-2011, 09:14 PM
NeilBrown
 
Default block: remove per-queue plugging

On Mon, 11 Apr 2011 12:59:23 -0400 "hch@infradead.org" <hch@infradead.org>
wrote:

> On Mon, Apr 11, 2011 at 02:50:22PM +1000, NeilBrown wrote:
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 273d60b..903ce8d 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -2674,19 +2674,23 @@ static void flush_plug_list(struct blk_plug *plug)
> > struct request_queue *q;
> > unsigned long flags;
> > struct request *rq;
> > + struct list_head head;
> >
> > BUG_ON(plug->magic != PLUG_MAGIC);
> >
> > if (list_empty(&plug->list))
> > return;
> > + list_add(&head, &plug->list);
> > + list_del_init(&plug->list);
> >
> > if (plug->should_sort)
> > - list_sort(NULL, &plug->list, plug_rq_cmp);
> > + list_sort(NULL, &head, plug_rq_cmp);
> > + plug->should_sort = 0;
>
> As Jens mentioned this should be list_splice_init. But looking over
> flush_plug_list the code there seems strange to me.
>
> What does the local_irq_save in flush_plug_list protect? Why don't
> we need it over the list_sort? And do we still need it when first
> splicing the list to a local one?
>
> It's one of these cases where I'd really like to see more comments
> explaining why the code is doing what it's doing.

My understanding of that was that the calling requirement of
__elv_add_request is that the queue spinlock is held and that interrupts are
disabled.
So rather than possible enabling and disabling interrupts several times as
different queue are handled, the code just disabled interrupts once, and
then just take the spinlock once for each different queue.

The whole point of the change to plugging was to take locks less often.
Disabling interrupts less often is presumably an analogous goal.

Though I agree that a comment would help.

q = NULL;
+ /* Disable interrupts just once rather than using spin_lock_irq/sin_unlock_irq
* variants
*/
local_irq_save(flags);


assuming my analysis is correct.

NeilBrown

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-11-2011, 10:58 PM
"hch@infradead.org"
 
Default block: remove per-queue plugging

Looking at the patch
(http://git.kernel.dk/?p=linux-2.6-block.git;a=commitdiff;h=761e433f3de6fb8e369af9e5c 08beb86286d023f)

I'm not sure it's an optimal design. The flush callback really
is a per-queue thing. Why isn't it a function pointer in the request
queue when doing the blk_run_queue call once we're done with a given
queue before moving on to the next one?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-11-2011, 10:59 PM
"hch@infradead.org"
 
Default block: remove per-queue plugging

On Tue, Apr 12, 2011 at 07:14:28AM +1000, NeilBrown wrote:
>
> My understanding of that was that the calling requirement of
> __elv_add_request is that the queue spinlock is held and that interrupts are
> disabled.
> So rather than possible enabling and disabling interrupts several times as
> different queue are handled, the code just disabled interrupts once, and
> then just take the spinlock once for each different queue.
>
> The whole point of the change to plugging was to take locks less often.
> Disabling interrupts less often is presumably an analogous goal.
>
> Though I agree that a comment would help.
>
> q = NULL;
> + /* Disable interrupts just once rather than using spin_lock_irq/sin_unlock_irq
> * variants
> */
> local_irq_save(flags);
>
>
> assuming my analysis is correct.

Your explanation does make sense to be now that you explain it. I
didn't even thing of that variant before.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-12-2011, 01:12 AM
"hch@infradead.org"
 
Default block: remove per-queue plugging

On Mon, Apr 11, 2011 at 02:48:45PM +0200, Jens Axboe wrote:
> Great, once you do that and XFS kills the blk_flush_plug() calls too,
> then we can remove that export and make it internal only.

Linus pulled the tree, so they are gone now. Btw, there's still some
bits in the area that confuse me:

- what's the point of the queue_sync_plugs? It has a lot of comment
that seem to pre-data the onstack plugging, but except for that
it's trivial wrapper around blk_flush_plug, with an argument
that is not used.
- is there a good reason for the existance of __blk_flush_plug? You'd
get one additional instruction in the inlined version of
blk_flush_plug when opencoding, but avoid the need for chained
function calls.
- Why is having a plug in blk_flush_plug marked unlikely? Note that
unlikely is the static branch prediction hint to mark the case
extremly unlikely and is even used for hot/cold partitioning. But
when we call it we usually check beforehand if we actually have
plugs, so it's actually likely to happen.
- what is the point of blk_finish_plug? All callers have
the plug on stack, and there's no good reason for adding the NULL
check. Note that blk_start_plug doesn't have the NULL check either.
- Why does __blk_flush_plug call __blk_finish_plug which might clear
tsk->plug, just to set it back after the call? When manually inlining
__blk_finish_plug ino __blk_flush_plug it looks like:

void __blk_flush_plug(struct task_struct *tsk, struct blk_plug *plug)
{
flush_plug_list(plug);
if (plug == tsk->plug)
tsk->plug = NULL;
tsk->plug = plug;
}

it would seem much smarted to just call flush_plug_list directly.
In fact it seems like the tsk->plug is not nessecary at all and
all remaining __blk_flush_plug callers could be replaced with
flush_plug_list.

- and of course the remaining issue of why io_schedule needs an
expliciy blk_flush_plug when schedule() already does one in
case it actually needs to schedule.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 04-12-2011, 06:18 AM
Jens Axboe
 
Default block: remove per-queue plugging

On 2011-04-11 23:14, NeilBrown wrote:
> On Mon, 11 Apr 2011 12:59:23 -0400 "hch@infradead.org" <hch@infradead.org>
> wrote:
>
>> On Mon, Apr 11, 2011 at 02:50:22PM +1000, NeilBrown wrote:
>>> diff --git a/block/blk-core.c b/block/blk-core.c
>>> index 273d60b..903ce8d 100644
>>> --- a/block/blk-core.c
>>> +++ b/block/blk-core.c
>>> @@ -2674,19 +2674,23 @@ static void flush_plug_list(struct blk_plug *plug)
>>> struct request_queue *q;
>>> unsigned long flags;
>>> struct request *rq;
>>> + struct list_head head;
>>>
>>> BUG_ON(plug->magic != PLUG_MAGIC);
>>>
>>> if (list_empty(&plug->list))
>>> return;
>>> + list_add(&head, &plug->list);
>>> + list_del_init(&plug->list);
>>>
>>> if (plug->should_sort)
>>> - list_sort(NULL, &plug->list, plug_rq_cmp);
>>> + list_sort(NULL, &head, plug_rq_cmp);
>>> + plug->should_sort = 0;
>>
>> As Jens mentioned this should be list_splice_init. But looking over
>> flush_plug_list the code there seems strange to me.
>>
>> What does the local_irq_save in flush_plug_list protect? Why don't
>> we need it over the list_sort? And do we still need it when first
>> splicing the list to a local one?
>>
>> It's one of these cases where I'd really like to see more comments
>> explaining why the code is doing what it's doing.
>
> My understanding of that was that the calling requirement of
> __elv_add_request is that the queue spinlock is held and that interrupts are
> disabled.
> So rather than possible enabling and disabling interrupts several times as
> different queue are handled, the code just disabled interrupts once, and
> then just take the spinlock once for each different queue.
>
> The whole point of the change to plugging was to take locks less often.
> Disabling interrupts less often is presumably an analogous goal.
>
> Though I agree that a comment would help.
>
> q = NULL;
> + /* Disable interrupts just once rather than using spin_lock_irq/sin_unlock_irq
> * variants
> */
> local_irq_save(flags);
>
>
> assuming my analysis is correct.

Yep that is correct, it's to avoid juggling irq on and off for multiple
queues. I will put a comment there.


--
Jens Axboe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 09:14 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org