Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Device-mapper Development (http://www.linux-archive.org/device-mapper-development/)
-   -   dm: implement REQ_FLUSH/FUA support for request-based dm (http://www.linux-archive.org/device-mapper-development/420499-dm-implement-req_flush-fua-support-request-based-dm.html)

Mike Snitzer 08-30-2010 01:28 PM

dm: implement REQ_FLUSH/FUA support for request-based dm
 
On Mon, Aug 30 2010 at 5:58am -0400,
Tejun Heo <tj@kernel.org> wrote:

> This patch converts request-based dm to support the new REQ_FLUSH/FUA.
...
> This patch rips out special flush code path and deals handles
> REQ_FLUSH/FUA requests the same way as other requests. The only
> special treatment is that REQ_FLUSH requests use the block address 0
> when finding target, which is enough for now.

Looks very comparable to the patch I prepared but I have 2 observations
below (based on my findings from testing my patch).

> @@ -1562,22 +1483,15 @@ static int setup_clone(struct request *clone, struct request *rq,
> {
> int r;
>
> - if (dm_rq_is_flush_request(rq)) {
> - blk_rq_init(NULL, clone);
> - clone->cmd_type = REQ_TYPE_FS;
> - clone->cmd_flags |= (REQ_HARDBARRIER | WRITE);
> - } else {
> - r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
> - dm_rq_bio_constructor, tio);
> - if (r)
> - return r;
> -
> - clone->cmd = rq->cmd;
> - clone->cmd_len = rq->cmd_len;
> - clone->sense = rq->sense;
> - clone->buffer = rq->buffer;
> - }
> + r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
> + dm_rq_bio_constructor, tio);
> + if (r)
> + return r;
>
> + clone->cmd = rq->cmd;
> + clone->cmd_len = rq->cmd_len;
> + clone->sense = rq->sense;
> + clone->buffer = rq->buffer;
> clone->end_io = end_clone_request;
> clone->end_io_data = tio;

blk_rq_prep_clone() of a REQ_FLUSH request will result in a
rq_data_dir(clone) of read.

I still had the following:

if (rq->cmd_flags & REQ_FLUSH) {
blk_rq_init(NULL, clone);
clone->cmd_type = REQ_TYPE_FS;
/* without this the clone has a rq_data_dir of 0 */
clone->cmd_flags |= WRITE_FLUSH;
} else {
r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
dm_rq_bio_constructor, tio);
...

Request-based DM's REQ_FLUSH still works without this special casing but
I figured I'd raise this to ask: what is the proper rq_data_dir() is for
a REQ_FLUSH?

> @@ -1709,15 +1621,12 @@ static void dm_request_fn(struct request_queue *q)
> if (!rq)
> goto plug_and_out;
>
> - if (unlikely(dm_rq_is_flush_request(rq))) {
> - BUG_ON(md->flush_request);
> - md->flush_request = rq;
> - blk_start_request(rq);
> - queue_work(md->wq, &md->barrier_work);
> - goto out;
> - }
> + /* always use block 0 to find the target for flushes for now */
> + pos = 0;
> + if (!(rq->cmd_flags & REQ_FLUSH))
> + pos = blk_rq_pos(rq);
>
> - ti = dm_table_find_target(map, blk_rq_pos(rq));
> + ti = dm_table_find_target(map, pos);

I added the following here: BUG_ON(!dm_target_is_valid(ti));

> if (ti->type->busy && ti->type->busy(ti))
> goto plug_and_out;

I also needed to avoid the ->busy call for REQ_FLUSH:

if (!(rq->cmd_flags & REQ_FLUSH)) {
ti = dm_table_find_target(map, blk_rq_pos(rq));
BUG_ON(!dm_target_is_valid(ti));
if (ti->type->busy && ti->type->busy(ti))
goto plug_and_out;
} else {
/* rq-based only ever has one target! leverage this for FLUSH */
ti = dm_table_get_target(map, 0);
}

If I allowed ->busy to be called for REQ_FLUSH it would result in a
deadlock. I haven't identified where/why yet.

Other than these remaining issues this patch looks good.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Tejun Heo 08-30-2010 01:59 PM

dm: implement REQ_FLUSH/FUA support for request-based dm
 
Hello,

On 08/30/2010 03:28 PM, Mike Snitzer wrote:
>> + clone->cmd = rq->cmd;
>> + clone->cmd_len = rq->cmd_len;
>> + clone->sense = rq->sense;
>> + clone->buffer = rq->buffer;
>> clone->end_io = end_clone_request;
>> clone->end_io_data = tio;
>
> blk_rq_prep_clone() of a REQ_FLUSH request will result in a
> rq_data_dir(clone) of read.

Hmmm... why? blk_rq_prep_clone() copies all REQ_* flags in
REQ_CLONE_MASK and REQ_WRITE is definitely there. I'll check.

> I still had the following:
>
> if (rq->cmd_flags & REQ_FLUSH) {
> blk_rq_init(NULL, clone);
> clone->cmd_type = REQ_TYPE_FS;
> /* without this the clone has a rq_data_dir of 0 */
> clone->cmd_flags |= WRITE_FLUSH;
> } else {
> r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
> dm_rq_bio_constructor, tio);
> ...
>
> Request-based DM's REQ_FLUSH still works without this special casing but
> I figured I'd raise this to ask: what is the proper rq_data_dir() is for
> a REQ_FLUSH?

Technically block layer doesn't care one way or the other but WRITE
definitely. Maybe it would be a good idea to enforce that from block
layer.

>> @@ -1709,15 +1621,12 @@ static void dm_request_fn(struct request_queue *q)
>> if (!rq)
>> goto plug_and_out;
>>
>> - if (unlikely(dm_rq_is_flush_request(rq))) {
>> - BUG_ON(md->flush_request);
>> - md->flush_request = rq;
>> - blk_start_request(rq);
>> - queue_work(md->wq, &md->barrier_work);
>> - goto out;
>> - }
>> + /* always use block 0 to find the target for flushes for now */
>> + pos = 0;
>> + if (!(rq->cmd_flags & REQ_FLUSH))
>> + pos = blk_rq_pos(rq);
>>
>> - ti = dm_table_find_target(map, blk_rq_pos(rq));
>> + ti = dm_table_find_target(map, pos);
>
> I added the following here: BUG_ON(!dm_target_is_valid(ti));

I'll add it.

>> if (ti->type->busy && ti->type->busy(ti))
>> goto plug_and_out;
>
> I also needed to avoid the ->busy call for REQ_FLUSH:
>
> if (!(rq->cmd_flags & REQ_FLUSH)) {
> ti = dm_table_find_target(map, blk_rq_pos(rq));
> BUG_ON(!dm_target_is_valid(ti));
> if (ti->type->busy && ti->type->busy(ti))
> goto plug_and_out;
> } else {
> /* rq-based only ever has one target! leverage this for FLUSH */
> ti = dm_table_get_target(map, 0);
> }
>
> If I allowed ->busy to be called for REQ_FLUSH it would result in a
> deadlock. I haven't identified where/why yet.

Ah... that's probably from "if (!elv_queue_empty(q))" check below,
flushes are on a separate queue but I forgot to update
elv_queue_empty() to check the flush queue. elv_queue_empty() can
return %true spuriously in which case the queue won't be plugged and
restarted later leading to queue hang. I'll fix elv_queue_empty().

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Tejun Heo 08-30-2010 03:07 PM

dm: implement REQ_FLUSH/FUA support for request-based dm
 
On 08/30/2010 03:59 PM, Tejun Heo wrote:
> Ah... that's probably from "if (!elv_queue_empty(q))" check below,
> flushes are on a separate queue but I forgot to update
> elv_queue_empty() to check the flush queue. elv_queue_empty() can
> return %true spuriously in which case the queue won't be plugged and
> restarted later leading to queue hang. I'll fix elv_queue_empty().

I think I was too quick to blame elv_queue_empty(). Can you please
test whether the following patch fixes the hang?

Thanks.

---
block/blk-flush.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

Index: block/block/blk-flush.c
================================================== =================
--- block.orig/block/blk-flush.c
+++ block/block/blk-flush.c
@@ -28,7 +28,8 @@ unsigned blk_flush_cur_seq(struct reques
}

static struct request *blk_flush_complete_seq(struct request_queue *q,
- unsigned seq, int error)
+ unsigned seq, int error,
+ bool from_end_io)
{
struct request *next_rq = NULL;

@@ -51,6 +52,13 @@ static struct request *blk_flush_complet
if (!list_empty(&q->pending_flushes)) {
next_rq = list_entry_rq(q->pending_flushes.next);
list_move(&next_rq->queuelist, &q->queue_head);
+ /*
+ * Moving a request silently to queue_head may
+ * stall the queue, kick the queue if we
+ * aren't in the issue path already.
+ */
+ if (from_end_io)
+ __blk_run_queue(q);
}
}
return next_rq;
@@ -59,19 +67,19 @@ static struct request *blk_flush_complet
static void pre_flush_end_io(struct request *rq, int error)
{
elv_completed_request(rq->q, rq);
- blk_flush_complete_seq(rq->q, QUEUE_FSEQ_PREFLUSH, error);
+ blk_flush_complete_seq(rq->q, QUEUE_FSEQ_PREFLUSH, error, true);
}

static void flush_data_end_io(struct request *rq, int error)
{
elv_completed_request(rq->q, rq);
- blk_flush_complete_seq(rq->q, QUEUE_FSEQ_DATA, error);
+ blk_flush_complete_seq(rq->q, QUEUE_FSEQ_DATA, error, true);
}

static void post_flush_end_io(struct request *rq, int error)
{
elv_completed_request(rq->q, rq);
- blk_flush_complete_seq(rq->q, QUEUE_FSEQ_POSTFLUSH, error);
+ blk_flush_complete_seq(rq->q, QUEUE_FSEQ_POSTFLUSH, error, true);
}

static void init_flush_request(struct request *rq, struct gendisk *disk)
@@ -165,7 +173,7 @@ struct request *blk_do_flush(struct requ
skip |= QUEUE_FSEQ_DATA;
if (!do_postflush)
skip |= QUEUE_FSEQ_POSTFLUSH;
- return blk_flush_complete_seq(q, skip, 0);
+ return blk_flush_complete_seq(q, skip, 0, false);
}

static void bio_end_flush(struct bio *bio, int err)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Tejun Heo 08-30-2010 03:45 PM

dm: implement REQ_FLUSH/FUA support for request-based dm
 
This patch converts request-based dm to support the new REQ_FLUSH/FUA.

The original request-based flush implementation depended on
request_queue blocking other requests while a barrier sequence is in
progress, which is no longer true for the new REQ_FLUSH/FUA.

In general, request-based dm doesn't have infrastructure for cloning
one source request to multiple targets, but the original flush
implementation had a special mostly independent path which can issue
flushes to multiple targets and sequence them. However, the
capability isn't currently in use and adds a lot of complexity.
Moreoever, it's unlikely to be useful in its current form as it
doesn't make sense to be able to send out flushes to multiple targets
when write requests can't be.

This patch rips out special flush code path and deals handles
REQ_FLUSH/FUA requests the same way as other requests. The only
special treatment is that REQ_FLUSH requests use the block address 0
when finding target, which is enough for now.

* added BUG_ON(!dm_target_is_valid(ti)) in dm_request_fn() as
suggested by Mike Snitzer

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Mike Snitzer <snitzer@redhat.com>
---
Here's a version w/ BUG_ON() added. Once the queue hang issue is
tracked down, I'll refresh the whole series and repost.

Thanks.

drivers/md/dm.c | 206 +++++---------------------------------------------------
1 file changed, 22 insertions(+), 184 deletions(-)

Index: block/drivers/md/dm.c
================================================== =================
--- block.orig/drivers/md/dm.c
+++ block/drivers/md/dm.c
@@ -143,20 +143,9 @@ struct mapped_device {
spinlock_t deferred_lock;

/*
- * Protect barrier_error from concurrent endio processing
- * in request-based dm.
- */
- spinlock_t barrier_error_lock;
- int barrier_error;
-
- /*
- * Processing queue (flush/barriers)
+ * Processing queue (flush)
*/
struct workqueue_struct *wq;
- struct work_struct barrier_work;
-
- /* A pointer to the currently processing pre/post flush request */
- struct request *flush_request;

/*
* The current mapping.
@@ -732,23 +721,6 @@ static void end_clone_bio(struct bio *cl
blk_update_request(tio->orig, 0, nr_bytes);
}

-static void store_barrier_error(struct mapped_device *md, int error)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&md->barrier_error_lock, flags);
- /*
- * Basically, the first error is taken, but:
- * -EOPNOTSUPP supersedes any I/O error.
- * Requeue request supersedes any I/O error but -EOPNOTSUPP.
- */
- if (!md->barrier_error || error == -EOPNOTSUPP ||
- (md->barrier_error != -EOPNOTSUPP &&
- error == DM_ENDIO_REQUEUE))
- md->barrier_error = error;
- spin_unlock_irqrestore(&md->barrier_error_lock, flags);
-}
-
/*
* Don't touch any member of the md after calling this function because
* the md may be freed in dm_put() at the end of this function.
@@ -786,13 +758,11 @@ static void free_rq_clone(struct request
static void dm_end_request(struct request *clone, int error)
{
int rw = rq_data_dir(clone);
- int run_queue = 1;
- bool is_barrier = clone->cmd_flags & REQ_HARDBARRIER;
struct dm_rq_target_io *tio = clone->end_io_data;
struct mapped_device *md = tio->md;
struct request *rq = tio->orig;

- if (rq->cmd_type == REQ_TYPE_BLOCK_PC && !is_barrier) {
+ if (rq->cmd_type == REQ_TYPE_BLOCK_PC) {
rq->errors = clone->errors;
rq->resid_len = clone->resid_len;

@@ -806,15 +776,8 @@ static void dm_end_request(struct reques
}

free_rq_clone(clone);
-
- if (unlikely(is_barrier)) {
- if (unlikely(error))
- store_barrier_error(md, error);
- run_queue = 0;
- } else
- blk_end_request_all(rq, error);
-
- rq_completed(md, rw, run_queue);
+ blk_end_request_all(rq, error);
+ rq_completed(md, rw, true);
}

static void dm_unprep_request(struct request *rq)
@@ -839,16 +802,6 @@ void dm_requeue_unmapped_request(struct
struct request_queue *q = rq->q;
unsigned long flags;

- if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) {
- /*
- * Barrier clones share an original request.
- * Leave it to dm_end_request(), which handles this special
- * case.
- */
- dm_end_request(clone, DM_ENDIO_REQUEUE);
- return;
- }
-
dm_unprep_request(rq);

spin_lock_irqsave(q->queue_lock, flags);
@@ -938,19 +891,6 @@ static void dm_complete_request(struct r
struct dm_rq_target_io *tio = clone->end_io_data;
struct request *rq = tio->orig;

- if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) {
- /*
- * Barrier clones share an original request. So can't use
- * softirq_done with the original.
- * Pass the clone to dm_done() directly in this special case.
- * It is safe (even if clone->q->queue_lock is held here)
- * because there is no I/O dispatching during the completion
- * of barrier clone.
- */
- dm_done(clone, error, true);
- return;
- }
-
tio->error = error;
rq->completion_data = clone;
blk_complete_request(rq);
@@ -967,17 +907,6 @@ void dm_kill_unmapped_request(struct req
struct dm_rq_target_io *tio = clone->end_io_data;
struct request *rq = tio->orig;

- if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) {
- /*
- * Barrier clones share an original request.
- * Leave it to dm_end_request(), which handles this special
- * case.
- */
- BUG_ON(error > 0);
- dm_end_request(clone, error);
- return;
- }
-
rq->cmd_flags |= REQ_FAILED;
dm_complete_request(clone, error);
}
@@ -1507,14 +1436,6 @@ static int dm_request(struct request_que
return _dm_request(q, bio);
}

-static bool dm_rq_is_flush_request(struct request *rq)
-{
- if (rq->cmd_flags & REQ_FLUSH)
- return true;
- else
- return false;
-}
-
void dm_dispatch_request(struct request *rq)
{
int r;
@@ -1562,22 +1483,15 @@ static int setup_clone(struct request *c
{
int r;

- if (dm_rq_is_flush_request(rq)) {
- blk_rq_init(NULL, clone);
- clone->cmd_type = REQ_TYPE_FS;
- clone->cmd_flags |= (REQ_HARDBARRIER | WRITE);
- } else {
- r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
- dm_rq_bio_constructor, tio);
- if (r)
- return r;
-
- clone->cmd = rq->cmd;
- clone->cmd_len = rq->cmd_len;
- clone->sense = rq->sense;
- clone->buffer = rq->buffer;
- }
+ r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
+ dm_rq_bio_constructor, tio);
+ if (r)
+ return r;

+ clone->cmd = rq->cmd;
+ clone->cmd_len = rq->cmd_len;
+ clone->sense = rq->sense;
+ clone->buffer = rq->buffer;
clone->end_io = end_clone_request;
clone->end_io_data = tio;

@@ -1618,9 +1532,6 @@ static int dm_prep_fn(struct request_que
struct mapped_device *md = q->queuedata;
struct request *clone;

- if (unlikely(dm_rq_is_flush_request(rq)))
- return BLKPREP_OK;
-
if (unlikely(rq->special)) {
DMWARN("Already has something in rq->special.");
return BLKPREP_KILL;
@@ -1697,6 +1608,7 @@ static void dm_request_fn(struct request
struct dm_table *map = dm_get_live_table(md);
struct dm_target *ti;
struct request *rq, *clone;
+ sector_t pos;

/*
* For suspend, check blk_queue_stopped() and increment
@@ -1709,15 +1621,14 @@ static void dm_request_fn(struct request
if (!rq)
goto plug_and_out;

- if (unlikely(dm_rq_is_flush_request(rq))) {
- BUG_ON(md->flush_request);
- md->flush_request = rq;
- blk_start_request(rq);
- queue_work(md->wq, &md->barrier_work);
- goto out;
- }
+ /* always use block 0 to find the target for flushes for now */
+ pos = 0;
+ if (!(rq->cmd_flags & REQ_FLUSH))
+ pos = blk_rq_pos(rq);
+
+ ti = dm_table_find_target(map, pos);
+ BUG_ON(!dm_target_is_valid(ti));

- ti = dm_table_find_target(map, blk_rq_pos(rq));
if (ti->type->busy && ti->type->busy(ti))
goto plug_and_out;

@@ -1888,7 +1799,6 @@ out:
static const struct block_device_operations dm_blk_dops;

static void dm_wq_work(struct work_struct *work);
-static void dm_rq_barrier_work(struct work_struct *work);

static void dm_init_md_queue(struct mapped_device *md)
{
@@ -1943,7 +1853,6 @@ static struct mapped_device *alloc_dev(i
mutex_init(&md->suspend_lock);
mutex_init(&md->type_lock);
spin_lock_init(&md->deferred_lock);
- spin_lock_init(&md->barrier_error_lock);
rwlock_init(&md->map_lock);
atomic_set(&md->holders, 1);
atomic_set(&md->open_count, 0);
@@ -1966,7 +1875,6 @@ static struct mapped_device *alloc_dev(i
atomic_set(&md->pending[1], 0);
init_waitqueue_head(&md->wait);
INIT_WORK(&md->work, dm_wq_work);
- INIT_WORK(&md->barrier_work, dm_rq_barrier_work);
init_waitqueue_head(&md->eventq);

md->disk->major = _major;
@@ -2220,8 +2128,6 @@ static int dm_init_request_based_queue(s
blk_queue_softirq_done(md->queue, dm_softirq_done);
blk_queue_prep_rq(md->queue, dm_prep_fn);
blk_queue_lld_busy(md->queue, dm_lld_busy);
- /* no flush support for request based dm yet */
- blk_queue_flush(md->queue, 0);

elv_register_queue(md->queue);

@@ -2421,73 +2327,6 @@ static void dm_queue_flush(struct mapped
queue_work(md->wq, &md->work);
}

-static void dm_rq_set_target_request_nr(struct request *clone, unsigned request_nr)
-{
- struct dm_rq_target_io *tio = clone->end_io_data;
-
- tio->info.target_request_nr = request_nr;
-}
-
-/* Issue barrier requests to targets and wait for their completion. */
-static int dm_rq_barrier(struct mapped_device *md)
-{
- int i, j;
- struct dm_table *map = dm_get_live_table(md);
- unsigned num_targets = dm_table_get_num_targets(map);
- struct dm_target *ti;
- struct request *clone;
-
- md->barrier_error = 0;
-
- for (i = 0; i < num_targets; i++) {
- ti = dm_table_get_target(map, i);
- for (j = 0; j < ti->num_flush_requests; j++) {
- clone = clone_rq(md->flush_request, md, GFP_NOIO);
- dm_rq_set_target_request_nr(clone, j);
- atomic_inc(&md->pending[rq_data_dir(clone)]);
- map_request(ti, clone, md);
- }
- }
-
- dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
- dm_table_put(map);
-
- return md->barrier_error;
-}
-
-static void dm_rq_barrier_work(struct work_struct *work)
-{
- int error;
- struct mapped_device *md = container_of(work, struct mapped_device,
- barrier_work);
- struct request_queue *q = md->queue;
- struct request *rq;
- unsigned long flags;
-
- /*
- * Hold the md reference here and leave it at the last part so that
- * the md can't be deleted by device opener when the barrier request
- * completes.
- */
- dm_get(md);
-
- error = dm_rq_barrier(md);
-
- rq = md->flush_request;
- md->flush_request = NULL;
-
- if (error == DM_ENDIO_REQUEUE) {
- spin_lock_irqsave(q->queue_lock, flags);
- blk_requeue_request(q, rq);
- spin_unlock_irqrestore(q->queue_lock, flags);
- } else
- blk_end_request_all(rq, error);
-
- blk_run_queue(q);
-
- dm_put(md);
-}
-
/*
* Swap in a new table, returning the old one for the caller to destroy.
*/
@@ -2619,9 +2458,8 @@ int dm_suspend(struct mapped_device *md,
up_write(&md->io_lock);

/*
- * Request-based dm uses md->wq for barrier (dm_rq_barrier_work) which
- * can be kicked until md->queue is stopped. So stop md->queue before
- * flushing md->wq.
+ * Stop md->queue before flushing md->wq in case request-based
+ * dm defers requests to md->wq from md->queue.
*/
if (dm_request_based(md))
stop_queue(md->queue);

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Mike Snitzer 08-30-2010 07:08 PM

dm: implement REQ_FLUSH/FUA support for request-based dm
 
On Mon, Aug 30 2010 at 11:07am -0400,
Tejun Heo <tj@kernel.org> wrote:

> On 08/30/2010 03:59 PM, Tejun Heo wrote:
> > Ah... that's probably from "if (!elv_queue_empty(q))" check below,
> > flushes are on a separate queue but I forgot to update
> > elv_queue_empty() to check the flush queue. elv_queue_empty() can
> > return %true spuriously in which case the queue won't be plugged and
> > restarted later leading to queue hang. I'll fix elv_queue_empty().
>
> I think I was too quick to blame elv_queue_empty(). Can you please
> test whether the following patch fixes the hang?

It does, thanks!

Tested-by: Mike Snitzer <snitzer@redhat.com>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Mike Snitzer 08-30-2010 07:18 PM

dm: implement REQ_FLUSH/FUA support for request-based dm
 
On Mon, Aug 30 2010 at 11:45am -0400,
Tejun Heo <tj@kernel.org> wrote:

> This patch converts request-based dm to support the new REQ_FLUSH/FUA.
>
> The original request-based flush implementation depended on
> request_queue blocking other requests while a barrier sequence is in
> progress, which is no longer true for the new REQ_FLUSH/FUA.
>
> In general, request-based dm doesn't have infrastructure for cloning
> one source request to multiple targets, but the original flush
> implementation had a special mostly independent path which can issue
> flushes to multiple targets and sequence them. However, the
> capability isn't currently in use and adds a lot of complexity.
> Moreoever, it's unlikely to be useful in its current form as it
> doesn't make sense to be able to send out flushes to multiple targets
> when write requests can't be.
>
> This patch rips out special flush code path and deals handles
> REQ_FLUSH/FUA requests the same way as other requests. The only
> special treatment is that REQ_FLUSH requests use the block address 0
> when finding target, which is enough for now.
>
> * added BUG_ON(!dm_target_is_valid(ti)) in dm_request_fn() as
> suggested by Mike Snitzer
>
> Signed-off-by: Tejun Heo <tj@kernel.org>

Looks good.

Acked-by: Mike Snitzer <snitzer@redhat.com>

Junichi and/or Kiyoshi,
Could you please review this patch and add your Acked-by if it is OK?
(Alasdair will want to see NEC's Ack to accept this patch).

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Mike Snitzer 08-30-2010 09:28 PM

dm: implement REQ_FLUSH/FUA support for request-based dm
 
On Mon, Aug 30 2010 at 3:08pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Mon, Aug 30 2010 at 11:07am -0400,
> Tejun Heo <tj@kernel.org> wrote:
>
> > On 08/30/2010 03:59 PM, Tejun Heo wrote:
> > > Ah... that's probably from "if (!elv_queue_empty(q))" check below,
> > > flushes are on a separate queue but I forgot to update
> > > elv_queue_empty() to check the flush queue. elv_queue_empty() can
> > > return %true spuriously in which case the queue won't be plugged and
> > > restarted later leading to queue hang. I'll fix elv_queue_empty().
> >
> > I think I was too quick to blame elv_queue_empty(). Can you please
> > test whether the following patch fixes the hang?
>
> It does, thanks!

Hmm, but unfortunately I was too quick to say the patch fixed the hang.

It is much more rare, but I can still get a hang. I just got the
following running vgcreate against an DM mpath (rq-based) device:

INFO: task vgcreate:3517 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
vgcreate D ffff88003d677a00 5168 3517 3361 0x00000080
ffff88003d677998 0000000000000046 ffff880000000000 ffff88003d677fd8
ffff880039c84860 ffff88003d677fd8 00000000001d3880 ffff880039c84c30
ffff880039c84c28 00000000001d3880 00000000001d3880 ffff88003d677fd8
Call Trace:
[<ffffffff81389308>] io_schedule+0x73/0xb5
[<ffffffff811c7304>] get_request_wait+0xef/0x17d
[<ffffffff810642be>] ? autoremove_wake_function+0x0/0x39
[<ffffffff811c7890>] __make_request+0x333/0x467
[<ffffffff810251e5>] ? pvclock_clocksource_read+0x50/0xb9
[<ffffffff811c5e91>] generic_make_request+0x342/0x3bf
[<ffffffff81074714>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff81069df2>] ? local_clock+0x41/0x5a
[<ffffffff811c5fe9>] submit_bio+0xdb/0xf8
[<ffffffff810754a4>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff811381a6>] dio_bio_submit+0x7b/0x9c
[<ffffffff81138dbe>] __blockdev_direct_IO+0x7f3/0x97d
[<ffffffff810251e5>] ? pvclock_clocksource_read+0x50/0xb9
[<ffffffff81136d7a>] blkdev_direct_IO+0x57/0x59
[<ffffffff81135f58>] ? blkdev_get_blocks+0x0/0x90
[<ffffffff810ce301>] generic_file_aio_read+0xed/0x5b4
[<ffffffff81077932>] ? lock_release_non_nested+0xd5/0x23b
[<ffffffff810e40f8>] ? might_fault+0x5c/0xac
[<ffffffff810251e5>] ? pvclock_clocksource_read+0x50/0xb9
[<ffffffff8110e131>] do_sync_read+0xcb/0x108
[<ffffffff81074688>] ? trace_hardirqs_off_caller+0x1f/0x9e
[<ffffffff81389a99>] ? __mutex_unlock_slowpath+0x120/0x132
[<ffffffff8119d805>] ? fsnotify_perm+0x4a/0x50
[<ffffffff8119d86c>] ? security_file_permission+0x2e/0x33
[<ffffffff8110e7a3>] vfs_read+0xab/0x107
[<ffffffff81075473>] ? trace_hardirqs_on_caller+0x11d/0x141
[<ffffffff8110e8c2>] sys_read+0x4d/0x74
[<ffffffff81002c32>] system_call_fastpath+0x16/0x1b
no locks held by vgcreate/3517.

I was then able to reproduce it after reboot and another ~5 attempts
(all against 2.6.36-rc2 + your latest FLUSH+FUA patchset and DM
patches).

crash> bt -l 3893
PID: 3893 TASK: ffff88003e65a430 CPU: 0 COMMAND: "vgcreate"
#0 [ffff88003a5298d8] schedule at ffffffff813891d3
/root/git/linux-2.6/kernel/sched.c: 2873
#1 [ffff88003a5299a0] io_schedule at ffffffff81389308
/root/git/linux-2.6/kernel/sched.c: 5128
#2 [ffff88003a5299c0] get_request_wait at ffffffff811c7304
/root/git/linux-2.6/block/blk-core.c: 879
#3 [ffff88003a529a50] __make_request at ffffffff811c7890
/root/git/linux-2.6/block/blk-core.c: 1301
#4 [ffff88003a529ac0] generic_make_request at ffffffff811c5e91
/root/git/linux-2.6/block/blk-core.c: 1536
#5 [ffff88003a529b70] submit_bio at ffffffff811c5fe9
/root/git/linux-2.6/block/blk-core.c: 1632
#6 [ffff88003a529bc0] dio_bio_submit at ffffffff811381a6
/root/git/linux-2.6/fs/direct-io.c: 375
#7 [ffff88003a529bf0] __blockdev_direct_IO at ffffffff81138dbe
/root/git/linux-2.6/fs/direct-io.c: 1087
#8 [ffff88003a529cd0] blkdev_direct_IO at ffffffff81136d7a
/root/git/linux-2.6/fs/block_dev.c: 177
#9 [ffff88003a529d10] generic_file_aio_read at ffffffff810ce301
/root/git/linux-2.6/mm/filemap.c: 1303
#10 [ffff88003a529df0] do_sync_read at ffffffff8110e131
/root/git/linux-2.6/fs/read_write.c: 282
#11 [ffff88003a529f00] vfs_read at ffffffff8110e7a3
/root/git/linux-2.6/fs/read_write.c: 310
#12 [ffff88003a529f40] sys_read at ffffffff8110e8c2
/root/git/linux-2.6/fs/read_write.c: 388
#13 [ffff88003a529f80] system_call_fastpath at ffffffff81002c32
/root/git/linux-2.6/arch/x86/kernel/entry_64.S: 488
RIP: 0000003b602d41a0 RSP: 00007fff55d5b928 RFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffffff81002c32 RCX: 00007fff55d5b960
RDX: 0000000000001000 RSI: 00007fff55d5a000 RDI: 0000000000000005
RBP: 0000000000000000 R8: 0000000000494ecd R9: 0000000000001000
R10: 000000315c41c160 R11: 0000000000000246 R12: 00007fff55d5a000
R13: 00007fff55d5b0a0 R14: 0000000000000000 R15: 0000000000000000
ORIG_RAX: 0000000000000000 CS: 0033 SS: 002b

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Tejun Heo 08-31-2010 10:29 AM

dm: implement REQ_FLUSH/FUA support for request-based dm
 
On 08/30/2010 11:28 PM, Mike Snitzer wrote:
> On Mon, Aug 30 2010 at 3:08pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
>> On Mon, Aug 30 2010 at 11:07am -0400,
>> Tejun Heo <tj@kernel.org> wrote:
>>
>>> On 08/30/2010 03:59 PM, Tejun Heo wrote:
>>>> Ah... that's probably from "if (!elv_queue_empty(q))" check below,
>>>> flushes are on a separate queue but I forgot to update
>>>> elv_queue_empty() to check the flush queue. elv_queue_empty() can
>>>> return %true spuriously in which case the queue won't be plugged and
>>>> restarted later leading to queue hang. I'll fix elv_queue_empty().
>>>
>>> I think I was too quick to blame elv_queue_empty(). Can you please
>>> test whether the following patch fixes the hang?
>>
>> It does, thanks!
>
> Hmm, but unfortunately I was too quick to say the patch fixed the hang.
>
> It is much more rare, but I can still get a hang. I just got the
> following running vgcreate against an DM mpath (rq-based) device:

Can you please try this one instead?

Thanks.

---
block/blk-flush.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)

Index: block/block/blk-flush.c
================================================== =================
--- block.orig/block/blk-flush.c
+++ block/block/blk-flush.c
@@ -56,22 +56,38 @@ static struct request *blk_flush_complet
return next_rq;
}

+static void blk_flush_complete_seq_end_io(struct request_queue *q,
+ unsigned seq, int error)
+{
+ bool was_empty = elv_queue_empty(q);
+ struct request *next_rq;
+
+ next_rq = blk_flush_complete_seq(q, seq, error);
+
+ /*
+ * Moving a request silently to empty queue_head may stall the
+ * queue. Kick the queue in those cases.
+ */
+ if (next_rq && was_empty)
+ __blk_run_queue(q);
+}
+
static void pre_flush_end_io(struct request *rq, int error)
{
elv_completed_request(rq->q, rq);
- blk_flush_complete_seq(rq->q, QUEUE_FSEQ_PREFLUSH, error);
+ blk_flush_complete_seq_end_io(rq->q, QUEUE_FSEQ_PREFLUSH, error);
}

static void flush_data_end_io(struct request *rq, int error)
{
elv_completed_request(rq->q, rq);
- blk_flush_complete_seq(rq->q, QUEUE_FSEQ_DATA, error);
+ blk_flush_complete_seq_end_io(rq->q, QUEUE_FSEQ_DATA, error);
}

static void post_flush_end_io(struct request *rq, int error)
{
elv_completed_request(rq->q, rq);
- blk_flush_complete_seq(rq->q, QUEUE_FSEQ_POSTFLUSH, error);
+ blk_flush_complete_seq_end_io(rq->q, QUEUE_FSEQ_POSTFLUSH, error);
}

static void init_flush_request(struct request *rq, struct gendisk *disk)


--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Mike Snitzer 08-31-2010 01:02 PM

dm: implement REQ_FLUSH/FUA support for request-based dm
 
On Tue, Aug 31 2010 at 6:29am -0400,
Tejun Heo <tj@kernel.org> wrote:

> On 08/30/2010 11:28 PM, Mike Snitzer wrote:
> > Hmm, but unfortunately I was too quick to say the patch fixed the hang.
> >
> > It is much more rare, but I can still get a hang. I just got the
> > following running vgcreate against an DM mpath (rq-based) device:
>
> Can you please try this one instead?

Still hit the hang on the 5th iteration of my test:
while true ; do ./test_dm_discard_mpath.sh && sleep 1 ; done

Would you like me to (re)send my test script offlist?

INFO: task vgcreate:2617 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
vgcreate D ffff88007bf7ba00 4688 2617 2479 0x00000080
ffff88007bf7b998 0000000000000046 ffff880000000000 ffff88007bf7bfd8
ffff88005542a430 ffff88007bf7bfd8 00000000001d3880 ffff88005542a800
ffff88005542a7f8 00000000001d3880 00000000001d3880 ffff88007bf7bfd8
Call Trace:
[<ffffffff81389338>] io_schedule+0x73/0xb5
[<ffffffff811c7304>] get_request_wait+0xef/0x17d
[<ffffffff810642be>] ? autoremove_wake_function+0x0/0x39
[<ffffffff811c7890>] __make_request+0x333/0x467
[<ffffffff810251e5>] ? pvclock_clocksource_read+0x50/0xb9
[<ffffffff811c5e91>] generic_make_request+0x342/0x3bf
[<ffffffff81074714>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff81069df2>] ? local_clock+0x41/0x5a
[<ffffffff811c5fe9>] submit_bio+0xdb/0xf8
[<ffffffff810754a4>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff811381a6>] dio_bio_submit+0x7b/0x9c
[<ffffffff81138dbe>] __blockdev_direct_IO+0x7f3/0x97d
[<ffffffff810251e5>] ? pvclock_clocksource_read+0x50/0xb9
[<ffffffff81136d7a>] blkdev_direct_IO+0x57/0x59
[<ffffffff81135f58>] ? blkdev_get_blocks+0x0/0x90
[<ffffffff810ce301>] generic_file_aio_read+0xed/0x5b4
[<ffffffff81077932>] ? lock_release_non_nested+0xd5/0x23b
[<ffffffff810e40f8>] ? might_fault+0x5c/0xac
[<ffffffff810251e5>] ? pvclock_clocksource_read+0x50/0xb9
[<ffffffff8110e131>] do_sync_read+0xcb/0x108
[<ffffffff81074688>] ? trace_hardirqs_off_caller+0x1f/0x9e
[<ffffffff81389ac9>] ? __mutex_unlock_slowpath+0x120/0x132
[<ffffffff8119d805>] ? fsnotify_perm+0x4a/0x50
[<ffffffff8119d86c>] ? security_file_permission+0x2e/0x33
[<ffffffff8110e7a3>] vfs_read+0xab/0x107
[<ffffffff81075473>] ? trace_hardirqs_on_caller+0x11d/0x141
[<ffffffff8110e8c2>] sys_read+0x4d/0x74
[<ffffffff81002c32>] system_call_fastpath+0x16/0x1b
no locks held by vgcreate/2617.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Tejun Heo 08-31-2010 01:14 PM

dm: implement REQ_FLUSH/FUA support for request-based dm
 
Hello,

On 08/31/2010 03:02 PM, Mike Snitzer wrote:
> On Tue, Aug 31 2010 at 6:29am -0400,
> Tejun Heo <tj@kernel.org> wrote:
>
>> On 08/30/2010 11:28 PM, Mike Snitzer wrote:
>>> Hmm, but unfortunately I was too quick to say the patch fixed the hang.
>>>
>>> It is much more rare, but I can still get a hang. I just got the
>>> following running vgcreate against an DM mpath (rq-based) device:
>>
>> Can you please try this one instead?
>
> Still hit the hang on the 5th iteration of my test:
> while true ; do ./test_dm_discard_mpath.sh && sleep 1 ; done
>
> Would you like me to (re)send my test script offlist?

Yes, please. Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


All times are GMT. The time now is 04:21 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.