FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 10-03-2008, 03:08 PM
Kiyoshi Ueda
 
Default dm: request-based dm-multipath

Hi Alasdair,

This patch-set is the updated version of request-based dm-multipath.
The changes from the previous version (*) are to follow up the change
of the interface to export lld's busy state (PATCH 5).

(*) http://lkml.org/lkml/2008/9/12/100

All necessary interfaces to support request stacking drivers are now
in the for-2.6.28 branch of the Jens' linux-2.6-block git repository.
This patch-set depends on those interfaces.

This patch-set is created on top of 2.6.27-rc8 + your patches shown
between "NEXT_PATCHES_START" and "NEXT_PATCHES_END" of the series file:
ftp://sources.redhat.com/pub/dm/patches/2.6-unstable/editing-quilt/patches/series.html

Please review and apply.


Summary of the patches:
1/8: dm core: remove unused DM_WQ_FLUSH_ALL
2/8: dm core: tidy local_init
3/8: dm core: add kmem_cache for request-based dm
4/8: dm core: add target interfaces for request-based dm
5/8: dm core: add core functions for request-based dm
6/8: dm core: enable request-based dm
7/8: dm core: reject I/O violating new queue limits
8/8: dm-mpath: convert to request-based
(PATCH 1 is also included in the Milan's barrier support patch-set)

drivers/md/dm-ioctl.c | 13
drivers/md/dm-mpath.c | 192 +++++---
drivers/md/dm-table.c | 82 +++
drivers/md/dm.c | 956 +++++++++++++++++++++++++++++++++++++++--- drivers/md/dm.h | 17
include/linux/device-mapper.h | 24 +
6 files changed, 1161 insertions(+), 123 deletions(-)

Thanks,
Kiyoshi Ueda

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-28-2009, 02:40 PM
Alasdair G Kergon
 
Default dm: request-based dm-multipath

On Fri, Oct 03, 2008 at 11:08:25AM -0400, Kiyoshi Ueda wrote:
> This patch-set is the updated version of request-based dm-multipath.
> The changes from the previous version (*) are to follow up the change
> of the interface to export lld's busy state (PATCH 5).

I've fixed them up to compile again, but haven't thoroughly checked for
side-effects.

Alasdair
--
agk@redhat.com

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-29-2009, 06:18 AM
Kiyoshi Ueda
 
Default dm: request-based dm-multipath

Hi Alasdair,

On 01/29/2009 12:40 AM +0900, Alasdair G Kergon wrote:
> On Fri, Oct 03, 2008 at 11:08:25AM -0400, Kiyoshi Ueda wrote:
>> This patch-set is the updated version of request-based dm-multipath.
>> The changes from the previous version (*) are to follow up the change
>> of the interface to export lld's busy state (PATCH 5).
>
> I've fixed them up to compile again, but haven't thoroughly checked for
> side-effects.

I found some problems below in my patches and now considering
how to fix them:
o Unnecessary EIO is returned to application if it submits
a bio during table swapping.
o kernel panic occurs by frequent table swapping during heavy I/Os.

I'll post the fixed version after my vacation.

Thanks,
Kiyoshi Ueda

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-29-2009, 09:41 AM
Hannes Reinecke
 
Default dm: request-based dm-multipath

On Thu, Jan 29, 2009 at 04:18:59PM +0900, Kiyoshi Ueda wrote:
> Hi Alasdair,
>
> On 01/29/2009 12:40 AM +0900, Alasdair G Kergon wrote:
> > On Fri, Oct 03, 2008 at 11:08:25AM -0400, Kiyoshi Ueda wrote:
> >> This patch-set is the updated version of request-based dm-multipath.
> >> The changes from the previous version (*) are to follow up the change
> >> of the interface to export lld's busy state (PATCH 5).
> >
> > I've fixed them up to compile again, but haven't thoroughly checked for
> > side-effects.
>
> I found some problems below in my patches and now considering
> how to fix them:
> o Unnecessary EIO is returned to application if it submits
> a bio during table swapping.
Yes, I've noticed that. Problem is this:

--- linux-2.6.27.orig/drivers/md/dm.c
+++ linux-2.6.27/drivers/md/dm.c
@@ -1304,7 +1304,11 @@ static int dm_make_request(struct reques
return 0;
}

- if (unlikely(!md->map)) {
+ /*
+ * Submitting to a stopped queue with no map is okay;
+ * might happen during reconfiguration.
+ */
+ if (unlikely(!md->map) && !blk_queue_stopped(q)) {
bio_endio(bio, -EIO);
return 0;
}

The make_request callback should never return EIO if there's any
chance at all to get this request done.

> o kernel panic occurs by frequent table swapping during heavy I/Os.
>
That's probably fixed by this patch:

--- linux-2.6.27/drivers/md/dm.c.orig 2009-01-23 15:59:22.741461315 +0100
+++ linux-2.6.27/drivers/md/dm.c 2009-01-26 09:03:02.787605723 +0100
@@ -714,13 +714,14 @@ static void free_bio_clone(struct reques
struct dm_rq_target_io *tio = clone->end_io_data;
struct mapped_device *md = tio->md;
struct bio *bio;
- struct dm_clone_bio_info *info;

while ((bio = clone->bio) != NULL) {
clone->bio = bio->bi_next;

- info = bio->bi_private;
- free_bio_info(md, info);
+ if (bio->bi_private) {
+ struct dm_clone_bio_info *info = bio->bi_private;
+ free_bio_info(md, info);
+ }

bio->bi_private = md->bs;
bio_put(bio);

The info field is not necessarily filled here, so we have to check for it
explicitly.

With these two patches request-based multipathing have survived all stress-tests
so far. Except on mainframe (zfcp), but that's more a driver-related thing.

Good work!

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N�g
GF: Markus Rex, HRB 16746 (AG N�g)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-29-2009, 01:32 PM
Alasdair G Kergon
 
Default dm: request-based dm-multipath

On Thu, Jan 29, 2009 at 11:41:47AM +0100, Hannes Reinecke wrote:
> Yes, I've noticed that. Problem is this:

Thanks - I've applied those two fixes.

Alasdair
--
agk@redhat.com

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-30-2009, 07:05 AM
Kiyoshi Ueda
 
Default dm: request-based dm-multipath

Hi Hannes,

Thank you for the comments and patches.
See my comments below.

On 01/29/2009 07:41 PM +0900, Hannes Reinecke wrote:
> On Thu, Jan 29, 2009 at 04:18:59PM +0900, Kiyoshi Ueda wrote:
>> Hi Alasdair,
>>
>> On 01/29/2009 12:40 AM +0900, Alasdair G Kergon wrote:
>>> On Fri, Oct 03, 2008 at 11:08:25AM -0400, Kiyoshi Ueda wrote:
>>>> This patch-set is the updated version of request-based dm-multipath.
>>>> The changes from the previous version (*) are to follow up the change
>>>> of the interface to export lld's busy state (PATCH 5).
>>> I've fixed them up to compile again, but haven't thoroughly checked for
>>> side-effects.
>> I found some problems below in my patches and now considering
>> how to fix them:
>> o Unnecessary EIO is returned to application if it submits
>> a bio during table swapping.
> Yes, I've noticed that. Problem is this:
>
> --- linux-2.6.27.orig/drivers/md/dm.c
> +++ linux-2.6.27/drivers/md/dm.c
> @@ -1304,7 +1304,11 @@ static int dm_make_request(struct reques
> return 0;
> }
>
> - if (unlikely(!md->map)) {
> + /*
> + * Submitting to a stopped queue with no map is okay;
> + * might happen during reconfiguration.
> + */
> + if (unlikely(!md->map) && !blk_queue_stopped(q)) {
> bio_endio(bio, -EIO);
> return 0;
> }
>
> The make_request callback should never return EIO if there's any
> chance at all to get this request done.

Exactly this part has a race condition with table swapping.
Although you patch fixes the race condition, I think I need
more careful consideration here.
(e.g. Is it really OK to go bios through __make_request() with
old queue restrictions during table swapping?)

I'll work on this problem after my vacation.


>> o kernel panic occurs by frequent table swapping during heavy I/Os.
>>
> That's probably fixed by this patch:
>
> --- linux-2.6.27/drivers/md/dm.c.orig 2009-01-23 15:59:22.741461315 +0100
> +++ linux-2.6.27/drivers/md/dm.c 2009-01-26 09:03:02.787605723 +0100
> @@ -714,13 +714,14 @@ static void free_bio_clone(struct reques
> struct dm_rq_target_io *tio = clone->end_io_data;
> struct mapped_device *md = tio->md;
> struct bio *bio;
> - struct dm_clone_bio_info *info;
>
> while ((bio = clone->bio) != NULL) {
> clone->bio = bio->bi_next;
>
> - info = bio->bi_private;
> - free_bio_info(md, info);
> + if (bio->bi_private) {
> + struct dm_clone_bio_info *info = bio->bi_private;
> + free_bio_info(md, info);
> + }
>
> bio->bi_private = md->bs;
> bio_put(bio);
>
> The info field is not necessarily filled here, so we have to check for it
> explicitly.
>
> With these two patches request-based multipathing have survived all stress-tests
> so far. Except on mainframe (zfcp), but that's more a driver-related thing.

Hmm, it seems I'm hitting a different problem from the one you hit,
because I still see my problem even with your patch.
I need more investigation after my vacation.

Thanks,
Kiyoshi Ueda

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-10-2009, 05:10 AM
Kiyoshi Ueda
 
Default dm: request-based dm-multipath

Hi Hannes,

On 2009/01/30 17:05 +0900, Kiyoshi Ueda wrote:
>>> o kernel panic occurs by frequent table swapping during heavy I/Os.
>>>
>> That's probably fixed by this patch:
>>
>> --- linux-2.6.27/drivers/md/dm.c.orig 2009-01-23 15:59:22.741461315 +0100
>> +++ linux-2.6.27/drivers/md/dm.c 2009-01-26 09:03:02.787605723 +0100
>> @@ -714,13 +714,14 @@ static void free_bio_clone(struct reques
>> struct dm_rq_target_io *tio = clone->end_io_data;
>> struct mapped_device *md = tio->md;
>> struct bio *bio;
>> - struct dm_clone_bio_info *info;
>>
>> while ((bio = clone->bio) != NULL) {
>> clone->bio = bio->bi_next;
>>
>> - info = bio->bi_private;
>> - free_bio_info(md, info);
>> + if (bio->bi_private) {
>> + struct dm_clone_bio_info *info = bio->bi_private;
>> + free_bio_info(md, info);
>> + }
>>
>> bio->bi_private = md->bs;
>> bio_put(bio);
>>
>> The info field is not necessarily filled here, so we have to check for it
>> explicitly.
>>
>> With these two patches request-based multipathing have survived all stress-tests
>> so far. Except on mainframe (zfcp), but that's more a driver-related thing.

My problem was different from that one, and I have fixed my problem.

Do you hit some problem without the patch above?
If so, that should be a programming bug and we need to fix it. Otherwise,
we should be leaking a memory (since all cloned bio should always have
the dm_clone_bio_info structure in ->bi_private).

Thanks,
Kiyoshi Ueda

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-10-2009, 06:17 AM
Hannes Reinecke
 
Default dm: request-based dm-multipath

Hi Kiyoshi,

Kiyoshi Ueda wrote:

Hi Hannes,

On 2009/01/30 17:05 +0900, Kiyoshi Ueda wrote:

o kernel panic occurs by frequent table swapping during heavy I/Os.


That's probably fixed by this patch:

--- linux-2.6.27/drivers/md/dm.c.orig 2009-01-23 15:59:22.741461315 +0100
+++ linux-2.6.27/drivers/md/dm.c 2009-01-26 09:03:02.787605723 +0100
@@ -714,13 +714,14 @@ static void free_bio_clone(struct reques
struct dm_rq_target_io *tio = clone->end_io_data;
struct mapped_device *md = tio->md;
struct bio *bio;
- struct dm_clone_bio_info *info;

while ((bio = clone->bio) != NULL) {

clone->bio = bio->bi_next;

- info = bio->bi_private;

- free_bio_info(md, info);
+ if (bio->bi_private) {
+ struct dm_clone_bio_info *info = bio->bi_private;
+ free_bio_info(md, info);
+ }

bio->bi_private = md->bs;

bio_put(bio);

The info field is not necessarily filled here, so we have to check for it
explicitly.

With these two patches request-based multipathing have survived all stress-tests
so far. Except on mainframe (zfcp), but that's more a driver-related thing.


My problem was different from that one, and I have fixed my problem.


What was this? Was is something specific to your setup or some within the
request-based multipathing code?
If the latter, I'd be _very_ much interested in seeing the patch. Naturally.


Do you hit some problem without the patch above?
If so, that should be a programming bug and we need to fix it. Otherwise,
we should be leaking a memory (since all cloned bio should always have
the dm_clone_bio_info structure in ->bi_private).


Yes, I've found that one later on.
The real problem was in clone_setup_bios(), which might end up calling an
invalid end_io_data pointer. Patch is attached.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nrnberg
GF: Markus Rex, HRB 16746 (AG Nrnberg)
From: Hannes Reinecke <hare@suse.de>
Subject: Kernel oops in free_bio_clone()
References: bnc#472360

Bug is here:

static int setup_clone(struct request *clone, struct request *rq,
struct dm_rq_target_io *tio)
{
int r;

blk_rq_init(NULL, clone);

r = clone_request_bios(clone, rq, tio->md);
if (r)
return r;

copy_request_info(clone, rq);
clone->start_time = jiffies;
clone->end_io = end_clone_request;
clone->end_io_data = tio;

return 0;
}

clone_request_bios() might end up calling free_bio_clone(), which references:

static void free_bio_clone(struct request *clone)
{
struct dm_rq_target_io *tio = clone->end_io_data;
struct mapped_device *md = tio->md;
...

but end_io_data will be set only _after_ the call to clone_request_bios().
So we should be passing the 'md' argument directly here to avoid this
bug and several pointless derefencings.

Signed-off-by: Hannes Reinecke <hare@suse.de>

--- linux-2.6.27-SLE11_BRANCH/drivers/md/dm.c.orig 2009-02-04 10:33:22.656627650 +0100
+++ linux-2.6.27-SLE11_BRANCH/drivers/md/dm.c 2009-02-05 11:03:35.843251773 +0100
@@ -709,10 +709,8 @@ static void end_clone_bio(struct bio *cl
blk_update_request(tio->orig, 0, nr_bytes);
}

-static void free_bio_clone(struct request *clone)
+static void free_bio_clone(struct request *clone, struct mapped_device *md)
{
- struct dm_rq_target_io *tio = clone->end_io_data;
- struct mapped_device *md = tio->md;
struct bio *bio;

while ((bio = clone->bio) != NULL) {
@@ -743,7 +741,7 @@ static void dm_unprep_request(struct req
rq->special = NULL;
rq->cmd_flags &= ~REQ_DONTPREP;

- free_bio_clone(clone);
+ free_bio_clone(clone, tio->md);
dec_rq_pending(tio);
free_rq_tio(tio->md, tio);
}
@@ -820,7 +818,7 @@ static void dm_end_request(struct reques
rq->sense_len = clone->sense_len;
}

- free_bio_clone(clone);
+ free_bio_clone(clone, tio->md);
dec_rq_pending(tio);
free_rq_tio(tio->md, tio);

@@ -1406,7 +1404,7 @@ static int clone_request_bios(struct req
return 0;

free_and_out:
- free_bio_clone(clone);
+ free_bio_clone(clone, md);

return -ENOMEM;
}
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-10-2009, 07:17 AM
Kiyoshi Ueda
 
Default dm: request-based dm-multipath

Hi Hannes,

On 03/10/2009 04:17 PM +0900, Hannes Reinecke wrote:
> Hi Kiyoshi,
>
> Kiyoshi Ueda wrote:
>> Hi Hannes,
>>
>> On 2009/01/30 17:05 +0900, Kiyoshi Ueda wrote:
>>>>> o kernel panic occurs by frequent table swapping during heavy I/Os.
>>>>>
>>>> That's probably fixed by this patch:
>>>>
>>>> --- linux-2.6.27/drivers/md/dm.c.orig 2009-01-23
>>>> 15:59:22.741461315 +0100
>>>> +++ linux-2.6.27/drivers/md/dm.c 2009-01-26
>>>> 09:03:02.787605723 +0100
>>>> @@ -714,13 +714,14 @@ static void free_bio_clone(struct reques
>>>> struct dm_rq_target_io *tio = clone->end_io_data;
>>>> struct mapped_device *md = tio->md;
>>>> struct bio *bio;
>>>> - struct dm_clone_bio_info *info;
>>>>
>>>> while ((bio = clone->bio) != NULL) {
>>>> clone->bio = bio->bi_next;
>>>>
>>>> - info = bio->bi_private;
>>>> - free_bio_info(md, info);
>>>> + if (bio->bi_private) {
>>>> + struct dm_clone_bio_info *info =
>>>> bio->bi_private;
>>>> + free_bio_info(md, info);
>>>> + }
>>>>
>>>> bio->bi_private = md->bs;
>>>> bio_put(bio);
>>>>
>>>> The info field is not necessarily filled here, so we have to check
>>>> for it
>>>> explicitly.
>>>>
>>>> With these two patches request-based multipathing have survived all
>>>> stress-tests
>>>> so far. Except on mainframe (zfcp), but that's more a driver-related
>>>> thing.
>>
>> My problem was different from that one, and I have fixed my problem.
>>
> What was this? Was is something specific to your setup or some within the
> request-based multipathing code?
> If the latter, I'd be _very_ much interested in seeing the patch.
> Naturally.

Suspend was broken.
dm_suspend() recognized that suspend completed while some requests
were still in flight. So we could swap/free the in-use table while
there was in_flight request.
The patch is like the attached one, although it is not finalized and
I'm testing now.
I'll post an updated patch-set including the attached patch
this week or next week.


>> Do you hit some problem without the patch above?
>> If so, that should be a programming bug and we need to fix it.
>> Otherwise,
>> we should be leaking a memory (since all cloned bio should always have
>> the dm_clone_bio_info structure in ->bi_private).
>>
> Yes, I've found that one later on.
> The real problem was in clone_setup_bios(), which might end up calling an
> invalid end_io_data pointer. Patch is attached.

Thank you for the patch.
I'll see it soon.

Thanks,
Kiyoshi Ueda


---
drivers/md/dm.c | 236 ++++++++++++++++++++++++++++++++++----------------------
1 file changed, 144 insertions(+), 92 deletions(-)

Index: 2.6.29-rc2/drivers/md/dm.c
================================================== =================
--- 2.6.29-rc2.orig/drivers/md/dm.c
+++ 2.6.29-rc2/drivers/md/dm.c
@@ -701,11 +701,17 @@ static void free_bio_clone(struct reques
}
}

-static void dec_rq_pending(struct dm_rq_target_io *tio)
+/*
+ * XXX: Not taking queue lock for efficiency.
+ * For correctness, waiters will check that again with queue lock held.
+ * No false negative because this function will be called everytime
+ * in_flight is decremented.
+ */
+static void rq_completed(struct mapped_device *md)
{
- if (!atomic_dec_return(&tio->md->pending))
+ if (!md->queue->in_flight)
/* nudge anyone waiting on suspend queue */
- wake_up(&tio->md->wait);
+ wake_up(&md->wait);
}

static void dm_unprep_request(struct request *rq)
@@ -717,7 +723,6 @@ static void dm_unprep_request(struct req
rq->cmd_flags &= ~REQ_DONTPREP;

free_bio_clone(clone);
- dec_rq_pending(tio);
free_rq_tio(tio->md, tio);
}

@@ -727,6 +732,7 @@ static void dm_unprep_request(struct req
void dm_requeue_request(struct request *clone)
{
struct dm_rq_target_io *tio = clone->end_io_data;
+ struct mapped_device *md = tio->md;
struct request *rq = tio->orig;
struct request_queue *q = rq->q;
unsigned long flags;
@@ -738,6 +744,8 @@ void dm_requeue_request(struct request *
blk_plug_device(q);
blk_requeue_request(q, rq);
spin_unlock_irqrestore(q->queue_lock, flags);
+
+ rq_completed(md);
}
EXPORT_SYMBOL_GPL(dm_requeue_request);

@@ -776,6 +784,7 @@ static void start_queue(struct request_q
static void dm_end_request(struct request *clone, int error)
{
struct dm_rq_target_io *tio = clone->end_io_data;
+ struct mapped_device *md = tio->md;
struct request *rq = tio->orig;
struct request_queue *q = rq->q;
unsigned int nr_bytes = blk_rq_bytes(rq);
@@ -794,12 +803,12 @@ static void dm_end_request(struct reques
}

free_bio_clone(clone);
- dec_rq_pending(tio);
free_rq_tio(tio->md, tio);

if (unlikely(blk_end_request(rq, error, nr_bytes)))
BUG();

+ rq_completed(md);
blk_run_queue(q);
}

@@ -1397,7 +1406,7 @@ static int setup_clone(struct request *c
return 0;
}

-static inline int dm_flush_suspending(struct mapped_device *md)
+static inline int dm_rq_flush_suspending(struct mapped_device *md)
{
return !md->suspend_rq.data;
}
@@ -1411,23 +1420,11 @@ static int dm_prep_fn(struct request_que
struct dm_rq_target_io *tio;
struct request *clone;

- if (unlikely(rq == &md->suspend_rq)) { /* Flush suspend marker */
- if (dm_flush_suspending(md)) {
- if (q->in_flight)
- return BLKPREP_DEFER;
- else {
- /* This device should be quiet now */
- __stop_queue(q);
- smp_mb();
- BUG_ON(atomic_read(&md->pending));
- wake_up(&md->wait);
- return BLKPREP_KILL;
- }
- } else
- /*
- * The suspend process was interrupted.
- * So no need to suspend now.
- */
+ if (unlikely(rq == &md->suspend_rq)) {
+ if (dm_rq_flush_suspending(md))
+ return BLKPREP_OK;
+ else
+ /* The flush suspend was interrupted */
return BLKPREP_KILL;
}

@@ -1436,11 +1433,6 @@ static int dm_prep_fn(struct request_que
return BLKPREP_KILL;
}

- if (unlikely(!dm_request_based(md))) {
- DMWARN("Request was queued into bio-based device");
- return BLKPREP_KILL;
- }
-
tio = alloc_rq_tio(md); /* Only one for each original request */
if (!tio)
/* -ENOMEM */
@@ -1473,7 +1465,6 @@ static void map_request(struct dm_target
struct dm_rq_target_io *tio = clone->end_io_data;

tio->ti = ti;
- atomic_inc(&md->pending);
r = ti->type->map_rq(ti, clone, &tio->info);
switch (r) {
case DM_MAPIO_SUBMITTED:
@@ -1511,19 +1502,35 @@ static void dm_request_fn(struct request
struct request *rq;

/*
- * The check for blk_queue_stopped() needs here, because:
- * - device suspend uses blk_stop_queue() and expects that
- * no I/O will be dispatched any more after the queue stop
- * - generic_unplug_device() doesn't call q->request_fn()
- * when the queue is stopped, so no problem
- * - but underlying device drivers may call q->request_fn()
- * without the check through blk_run_queue()
+ * The check of blk_queue_stopped() needs here, because we want to
+ * complete noflush suspend quickly:
+ * - noflush suspend stops the queue in dm_suspend() and expects
+ * that no I/O will be dispatched any more after the queue stop
+ * - but if the queue stop is done while the loop below and
+ * there is no check for the queue stop, I/O dispatching
+ * may not stop until all remaining I/Os in the queue are
+ * dispatched. For noflush suspend, we shouldn't want
+ * this behavior.
*/
while (!blk_queue_plugged(q) && !blk_queue_stopped(q)) {
rq = elv_next_request(q);
if (!rq)
goto plug_and_out;

+ if (unlikely(rq == &md->suspend_rq)) { /* Flush suspend maker */
+ if (q->in_flight)
+ /* Not quiet yet. Wait more */
+ goto plug_and_out;
+
+ /* This device should be quiet now */
+ __stop_queue(q);
+ blkdev_dequeue_request(rq);
+ if (unlikely(__blk_end_request(rq, 0, 0)))
+ BUG();
+ wake_up(&md->wait);
+ goto out;
+ }
+
ti = dm_table_find_target(map, rq->sector);
if (ti->type->busy && ti->type->busy(ti))
goto plug_and_out;
@@ -1996,15 +2003,20 @@ EXPORT_SYMBOL_GPL(dm_put);
static int dm_wait_for_completion(struct mapped_device *md)
{
int r = 0;
+ struct request_queue *q = md->queue;
+ unsigned long flags;

while (1) {
set_current_state(TASK_INTERRUPTIBLE);

smp_mb();
if (dm_request_based(md)) {
- if (!atomic_read(&md->pending) &&
- blk_queue_stopped(md->queue))
+ spin_lock_irqsave(q->queue_lock, flags);
+ if (!q->in_flight && blk_queue_stopped(q)) {
+ spin_unlock_irqrestore(q->queue_lock, flags);
break;
+ }
+ spin_unlock_irqrestore(q->queue_lock, flags);
} else if (!atomic_read(&md->pending))
break;

@@ -2107,86 +2119,75 @@ out:
return r;
}

-static inline void dm_invalidate_flush_suspend(struct mapped_device *md)
+static inline void dm_rq_invalidate_suspend_marker(struct mapped_device *md)
{
md->suspend_rq.data = (void *)0x1;
}

-static void dm_abort_suspend(struct mapped_device *md, int noflush)
+/*
+ * For noflush suspend, starting the queue is enough because noflush suspend
+ * only stops the queue.
+ *
+ * For flush suspend, we also need to take care of the marker.
+ * We could remove the marker from the queue forcibly using list_del_init(),
+ * but it would break the block-layer. To follow the block-layer manner,
+ * we just put an invalidated mark on the marker here and wait for it to be
+ * completed by the normal way.
+ */
+static void dm_rq_abort_suspend(struct mapped_device *md, int noflush)
{
struct request_queue *q = md->queue;
unsigned long flags;

- /*
- * For flush suspend, invalidation and queue restart must be protected
- * by a single queue lock to prevent a race with dm_prep_fn().
- */
spin_lock_irqsave(q->queue_lock, flags);
if (!noflush)
- dm_invalidate_flush_suspend(md);
+ dm_rq_invalidate_suspend_marker(md);
__start_queue(q);
spin_unlock_irqrestore(q->queue_lock, flags);
}

-/*
- * Additional suspend work for request-based dm.
- *
- * In request-based dm, stopping request_queue prevents mapping.
- * Even after stopping the request_queue, submitted requests from upper-layer
- * can be inserted to the request_queue. So original (unmapped) requests are
- * kept in the request_queue during suspension.
- */
-static void dm_start_suspend(struct mapped_device *md, int noflush)
+static void dm_rq_start_suspend(struct mapped_device *md, int noflush)
{
struct request *rq = &md->suspend_rq;
struct request_queue *q = md->queue;
- unsigned long flags;

- if (noflush) {
+ if (noflush)
stop_queue(q);
- return;
+ else {
+ blk_rq_init(q, rq);
+ blk_insert_request(q, rq, 0, NULL);
}
+}

- /*
- * For flush suspend, we need a marker to indicate the border line
- * between flush needed I/Os and deferred I/Os, since all I/Os are
- * queued in the request_queue during suspension.
- *
- * This marker must be inserted after setting DMF_BLOCK_IO,
- * because dm_prep_fn() considers no DMF_BLOCK_IO to be
- * a suspend interruption.
- */
+static int dm_rq_suspend_unavailable(struct mapped_device *md, int noflush)
+{
+ int r = 0;
+ struct request *rq = &md->suspend_rq;
+ struct request_queue *q = md->queue;
+ unsigned long flags;
+
+ if (noflush)
+ return 0;
+
+ /* The marker must be protected by queue lock if it is in use */
spin_lock_irqsave(q->queue_lock, flags);
if (unlikely(rq->ref_count)) {
/*
- * This can happen when the previous suspend was interrupted,
- * the inserted suspend_rq for the previous suspend has still
- * been in the queue and this suspend has been invoked.
- *
- * We could re-insert the suspend_rq by deleting it from
- * the queue forcibly using list_del_init(&rq->queuelist).
- * But it would break the block-layer easily.
- * So we don't re-insert the suspend_rq again in such a case.
- * The suspend_rq should be already invalidated during
- * the previous suspend interruption, so just wait for it
- * to be completed.
- *
- * This suspend will never complete, so warn the user to
- * interrupt this suspend and retry later.
+ * This can happen, when the previous flush suspend was
+ * interrupted, the marker is still in the queue and
+ * this flush suspend has been invoked, because we don't
+ * remove the marker at the time of suspend interruption.
+ * We have only one marker per mapped_device, so we can't
+ * start another flush suspend while it is in use.
*/
- BUG_ON(!rq->data);
- spin_unlock_irqrestore(q->queue_lock, flags);
-
- DMWARN("Invalidating the previous suspend is still in"
- " progress. This suspend will be never done."
- " Please interrupt this suspend and retry later.");
- return;
+ BUG_ON(!rq->data); /* The marker should be invalidated */
+ DMWARN("Invalidating the previous flush suspend is still in"
+ " progress. Please retry later.");
+ r = 1;
}
spin_unlock_irqrestore(q->queue_lock, flags);

- /* Now no user of the suspend_rq */
- blk_rq_init(q, rq);
- blk_insert_request(q, rq, 0, NULL);
+ return r;
}

/*
@@ -2231,6 +2232,52 @@ static void unlock_fs(struct mapped_devi
* dm_bind_table, dm_suspend must be called to flush any in
* flight bios and ensure that any further io gets deferred.
*/
+/*
+ * Suspend mechanism in request-based dm.
+ *
+ * After the suspend starts, (remaining requests in the request_queue are
+ * flushed if it is flush suspend, and) further incoming requests are kept
+ * in the request_queue and deferred.
+ * The suspend completes when the following conditions have been satisfied,
+ * so wait for it:
+ * 1. q->in_flight is 0 (which means no in_flight request)
+ * 2. queue has been stopped (which means no request dispatching)
+ *
+ *
+ * Noflush suspend
+ * ---------------
+ * Noflush suspend doesn't need to dispatch remaining requests.
+ * So stop the queue immediately. Then, wait for all in_flight requests
+ * to be completed or requeued.
+ *
+ * To abort noflush suspend, start the queue.
+ *
+ *
+ * Flush suspend
+ * -------------
+ * Flush suspend needs to dispatch remaining requests. So stop the queue
+ * after the remaining requests are completed. (Requeued request must be also
+ * re-dispatched and completed. Until then, we can't stop the queue.)
+ *
+ * During flushing the remaining requests, further incoming requests are also
+ * inserted to the same queue. To distinguish which requests are needed to be
+ * flushed, we insert a marker request to the queue at the time of starting
+ * flush suspend, like a barrier.
+ * The dispatching is blocked when the marker is found on the top of the queue.
+ * And the queue is stopped when all in_flight requests are completed, since
+ * that means the remaining requests are completely flushed.
+ * Then, the marker is removed from the queue.
+ *
+ * To abort flush suspend, we also need to take care of the marker, not only
+ * starting the queue.
+ * We could remove the marker forcibly from the queue, but it would break
+ * the block-layer. Instead, we put a invalidated mark on the marker.
+ * When the invalidated marker is found on the top of the queue, it is
+ * immediately removed from the queue, so it doesn't block dispatching.
+ * Because we have only one marker per mapped_device, we can't start another
+ * flush suspend until the invalidated marker is removed from the queue.
+ * So fail and return with -EBUSY in such a case.
+ */
int dm_suspend(struct mapped_device *md, unsigned suspend_flags)
{
struct dm_table *map = NULL;
@@ -2246,6 +2293,11 @@ int dm_suspend(struct mapped_device *md,
goto out_unlock;
}

+ if (dm_request_based(md) && dm_rq_suspend_unavailable(md, noflush)) {
+ r = -EBUSY;
+ goto out_unlock;
+ }
+
map = dm_get_table(md);

/*
@@ -2288,7 +2340,7 @@ int dm_suspend(struct mapped_device *md,
up_write(&md->io_lock);

if (dm_request_based(md))
- dm_start_suspend(md, noflush);
+ dm_rq_start_suspend(md, noflush);

/* unplug */
if (map)
@@ -2316,7 +2368,7 @@ int dm_suspend(struct mapped_device *md,
dm_queue_flush(md, DM_WQ_FLUSH_DEFERRED, NULL);

if (dm_request_based(md))
- dm_abort_suspend(md, noflush);
+ dm_rq_abort_suspend(md, noflush);

unlock_fs(md);
goto out; /* pushback list is already flushed, so skip flush */

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 03-11-2009, 11:28 AM
Hannes Reinecke
 
Default dm: request-based dm-multipath

Hi Kiyoshi,

Kiyoshi Ueda wrote:

Hi Hannes,


[ .. ]


Suspend was broken.
dm_suspend() recognized that suspend completed while some requests
were still in flight. So we could swap/free the in-use table while
there was in_flight request.
The patch is like the attached one, although it is not finalized and
I'm testing now.
I'll post an updated patch-set including the attached patch
this week or next week.


---
drivers/md/dm.c | 236 ++++++++++++++++++++++++++++++++++----------------------
1 file changed, 144 insertions(+), 92 deletions(-)

Index: 2.6.29-rc2/drivers/md/dm.c
================================================== =================
--- 2.6.29-rc2.orig/drivers/md/dm.c
+++ 2.6.29-rc2/drivers/md/dm.c
@@ -701,11 +701,17 @@ static void free_bio_clone(struct reques
}
}

-static void dec_rq_pending(struct dm_rq_target_io *tio)

+/*
+ * XXX: Not taking queue lock for efficiency.
+ * For correctness, waiters will check that again with queue lock held.
+ * No false negative because this function will be called everytime
+ * in_flight is decremented.
+ */
+static void rq_completed(struct mapped_device *md)
{
- if (!atomic_dec_return(&tio->md->pending))
+ if (!md->queue->in_flight)
/* nudge anyone waiting on suspend queue */
- wake_up(&tio->md->wait);
+ wake_up(&md->wait);
}


Hmm. Don't think that's a good idea. Either take the spinlock here or
in_flight should be atomic.

Apart from this I'll give it a shot.

Cheers,

Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nrnberg
GF: Markus Rex, HRB 16746 (AG Nrnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 04:33 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org