Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Device-mapper Development (http://www.linux-archive.org/device-mapper-development/)
-   -   dm: support REQ_FLUSH directly (http://www.linux-archive.org/device-mapper-development/408137-dm-support-req_flush-directly.html)

Christoph Hellwig 08-03-2010 06:51 PM

dm: support REQ_FLUSH directly
 
Adapt device-mapper to the new world order where even bio based devices
get simple REQ_FLUSH requests for cache flushes, and need to submit
them downwards for implementing barriers.

Note that I've removed the unlikely statements around the REQ_FLUSH
checks. While these generally aren't as common as normal read/writes
they are common enough that statically mispredictim them is a really
bad idea.

Tested with simple linear LVM volumes only so far.


Index: linux-2.6/drivers/md/dm-crypt.c
================================================== =================
--- linux-2.6.orig/drivers/md/dm-crypt.c 2010-08-03 20:26:49.629254174 +0200
+++ linux-2.6/drivers/md/dm-crypt.c 2010-08-03 20:36:59.279003929 +0200
@@ -1249,7 +1249,7 @@ static int crypt_map(struct dm_target *t
struct dm_crypt_io *io;
struct crypt_config *cc;

- if (unlikely(bio_empty_barrier(bio))) {
+ if (bio->bi_rw & REQ_FLUSH) {
cc = ti->private;
bio->bi_bdev = cc->dev->bdev;
return DM_MAPIO_REMAPPED;
Index: linux-2.6/drivers/md/dm-raid1.c
================================================== =================
--- linux-2.6.orig/drivers/md/dm-raid1.c 2010-08-03 20:26:49.641003999 +0200
+++ linux-2.6/drivers/md/dm-raid1.c 2010-08-03 20:36:59.280003649 +0200
@@ -629,7 +629,7 @@ static void do_write(struct mirror_set *
struct dm_io_region io[ms->nr_mirrors], *dest = io;
struct mirror *m;
struct dm_io_request io_req = {
- .bi_rw = WRITE | (bio->bi_rw & WRITE_BARRIER),
+ .bi_rw = WRITE | (bio->bi_rw & (WRITE_BARRIER|REQ_FLUSH)),
.mem.type = DM_IO_BVEC,
.mem.ptr.bvec = bio->bi_io_vec + bio->bi_idx,
.notify.fn = write_callback,
@@ -670,7 +670,7 @@ static void do_writes(struct mirror_set
bio_list_init(&requeue);

while ((bio = bio_list_pop(writes))) {
- if (unlikely(bio_empty_barrier(bio))) {
+ if (bio->bi_rw & REQ_FLUSH) {
bio_list_add(&sync, bio);
continue;
}
@@ -1199,12 +1199,14 @@ static int mirror_end_io(struct dm_targe
struct dm_bio_details *bd = NULL;
struct dm_raid1_read_record *read_record = map_context->ptr;

+ if (bio->bi_rw & REQ_FLUSH)
+ return error;
+
/*
* We need to dec pending if this was a write.
*/
if (rw == WRITE) {
- if (likely(!bio_empty_barrier(bio)))
- dm_rh_dec(ms->rh, map_context->ll);
+ dm_rh_dec(ms->rh, map_context->ll);
return error;
}

Index: linux-2.6/drivers/md/dm-region-hash.c
================================================== =================
--- linux-2.6.orig/drivers/md/dm-region-hash.c 2010-08-03 20:26:49.650023346 +0200
+++ linux-2.6/drivers/md/dm-region-hash.c 2010-08-03 20:36:59.285025649 +0200
@@ -399,7 +399,7 @@ void dm_rh_mark_nosync(struct dm_region_
region_t region = dm_rh_bio_to_region(rh, bio);
int recovering = 0;

- if (bio_empty_barrier(bio)) {
+ if (bio->bi_rw & REQ_FLUSH) {
rh->barrier_failure = 1;
return;
}
@@ -524,7 +524,7 @@ void dm_rh_inc_pending(struct dm_region_
struct bio *bio;

for (bio = bios->head; bio; bio = bio->bi_next) {
- if (bio_empty_barrier(bio))
+ if (bio->bi_rw & REQ_FLUSH)
continue;
rh_inc(rh, dm_rh_bio_to_region(rh, bio));
}
Index: linux-2.6/drivers/md/dm-snap.c
================================================== =================
--- linux-2.6.orig/drivers/md/dm-snap.c 2010-08-03 20:26:49.656003091 +0200
+++ linux-2.6/drivers/md/dm-snap.c 2010-08-03 20:36:59.290023135 +0200
@@ -1581,7 +1581,7 @@ static int snapshot_map(struct dm_target
chunk_t chunk;
struct dm_snap_pending_exception *pe = NULL;

- if (unlikely(bio_empty_barrier(bio))) {
+ if (bio->bi_rw & REQ_FLUSH) {
bio->bi_bdev = s->cow->bdev;
return DM_MAPIO_REMAPPED;
}
@@ -1685,7 +1685,7 @@ static int snapshot_merge_map(struct dm_
int r = DM_MAPIO_REMAPPED;
chunk_t chunk;

- if (unlikely(bio_empty_barrier(bio))) {
+ if (bio->bi_rw & REQ_FLUSH) {
if (!map_context->flush_request)
bio->bi_bdev = s->origin->bdev;
else
@@ -2123,7 +2123,7 @@ static int origin_map(struct dm_target *
struct dm_dev *dev = ti->private;
bio->bi_bdev = dev->bdev;

- if (unlikely(bio_empty_barrier(bio)))
+ if (bio->bi_rw & REQ_FLUSH)
return DM_MAPIO_REMAPPED;

/* Only tell snapshots if this is a write */
Index: linux-2.6/drivers/md/dm-stripe.c
================================================== =================
--- linux-2.6.orig/drivers/md/dm-stripe.c 2010-08-03 20:26:49.663003301 +0200
+++ linux-2.6/drivers/md/dm-stripe.c 2010-08-03 20:36:59.295005744 +0200
@@ -214,7 +214,7 @@ static int stripe_map(struct dm_target *
sector_t offset, chunk;
uint32_t stripe;

- if (unlikely(bio_empty_barrier(bio))) {
+ if (bio->bi_rw & REQ_FLUSH) {
BUG_ON(map_context->flush_request >= sc->stripes);
bio->bi_bdev = sc->stripe[map_context->flush_request].dev->bdev;
return DM_MAPIO_REMAPPED;
Index: linux-2.6/drivers/md/dm.c
================================================== =================
--- linux-2.6.orig/drivers/md/dm.c 2010-08-03 20:26:49.676004139 +0200
+++ linux-2.6/drivers/md/dm.c 2010-08-03 20:36:59.301005325 +0200
@@ -633,7 +633,7 @@ static void dec_pending(struct dm_io *io
io_error = io->error;
bio = io->bio;

- if (bio->bi_rw & REQ_HARDBARRIER) {
+ if (bio == &md->barrier_bio) {
/*
* There can be just one barrier request so we use
* a per-device variable for error reporting.
@@ -851,7 +851,7 @@ void dm_requeue_unmapped_request(struct
struct request_queue *q = rq->q;
unsigned long flags;

- if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) {
+ if (clone->cmd_flags & REQ_HARDBARRIER) {
/*
* Barrier clones share an original request.
* Leave it to dm_end_request(), which handles this special
@@ -950,7 +950,7 @@ static void dm_complete_request(struct r
struct dm_rq_target_io *tio = clone->end_io_data;
struct request *rq = tio->orig;

- if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) {
+ if (clone->cmd_flags & REQ_HARDBARRIER) {
/*
* Barrier clones share an original request. So can't use
* softirq_done with the original.
@@ -979,7 +979,7 @@ void dm_kill_unmapped_request(struct req
struct dm_rq_target_io *tio = clone->end_io_data;
struct request *rq = tio->orig;

- if (unlikely(clone->cmd_flags & REQ_HARDBARRIER)) {
+ if (clone->cmd_flags & REQ_HARDBARRIER) {
/*
* Barrier clones share an original request.
* Leave it to dm_end_request(), which handles this special
@@ -1208,7 +1208,7 @@ static int __clone_and_map(struct clone_
sector_t len = 0, max;
struct dm_target_io *tio;

- if (unlikely(bio_empty_barrier(bio)))
+ if (bio->bi_rw & REQ_FLUSH)
return __clone_and_map_empty_barrier(ci);

ti = dm_table_find_target(ci->map, ci->sector);
@@ -1308,7 +1308,7 @@ static void __split_and_process_bio(stru

ci.map = dm_get_live_table(md);
if (unlikely(!ci.map)) {
- if (!(bio->bi_rw & REQ_HARDBARRIER))
+ if (bio != &md->barrier_bio)
bio_io_error(bio);
else
if (!md->barrier_error)
@@ -1326,7 +1326,7 @@ static void __split_and_process_bio(stru
spin_lock_init(&ci.io->endio_lock);
ci.sector = bio->bi_sector;
ci.sector_count = bio_sectors(bio);
- if (unlikely(bio_empty_barrier(bio)))
+ if (bio->bi_rw & REQ_FLUSH)
ci.sector_count = 1;
ci.idx = bio->bi_idx;

@@ -1421,7 +1421,7 @@ static int _dm_request(struct request_qu
* we have to queue this io for later.
*/
if (unlikely(test_bit(DMF_QUEUE_IO_TO_THREAD, &md->flags)) ||
- unlikely(bio->bi_rw & REQ_HARDBARRIER)) {
+ unlikely(bio->bi_rw & (REQ_HARDBARRIER|REQ_FLUSH))) {
up_read(&md->io_lock);

if (unlikely(test_bit(DMF_BLOCK_IO_FOR_SUSPEND, &md->flags)) &&
@@ -1462,14 +1462,6 @@ static int dm_request(struct request_que
return _dm_request(q, bio);
}

-static bool dm_rq_is_flush_request(struct request *rq)
-{
- if (rq->cmd_flags & REQ_FLUSH)
- return true;
- else
- return false;
-}
-
void dm_dispatch_request(struct request *rq)
{
int r;
@@ -1517,10 +1509,10 @@ static int setup_clone(struct request *c
{
int r;

- if (dm_rq_is_flush_request(rq)) {
+ if (rq->cmd_flags & REQ_FLUSH) {
blk_rq_init(NULL, clone);
clone->cmd_type = REQ_TYPE_FS;
- clone->cmd_flags |= (REQ_HARDBARRIER | WRITE);
+ clone->cmd_flags |= (WRITE_SYNC | REQ_FLUSH);
} else {
r = blk_rq_prep_clone(clone, rq, tio->md->bs, GFP_ATOMIC,
dm_rq_bio_constructor, tio);
@@ -1573,7 +1565,7 @@ static int dm_prep_fn(struct request_que
struct mapped_device *md = q->queuedata;
struct request *clone;

- if (unlikely(dm_rq_is_flush_request(rq)))
+ if (rq->cmd_flags & REQ_FLUSH)
return BLKPREP_OK;

if (unlikely(rq->special)) {
@@ -1664,7 +1656,7 @@ static void dm_request_fn(struct request
if (!rq)
goto plug_and_out;

- if (unlikely(dm_rq_is_flush_request(rq))) {
+ if (rq->cmd_flags & REQ_FLUSH) {
BUG_ON(md->flush_request);
md->flush_request = rq;
blk_start_request(rq);
@@ -2239,7 +2231,7 @@ static void dm_flush(struct mapped_devic

bio_init(&md->barrier_bio);
md->barrier_bio.bi_bdev = md->bdev;
- md->barrier_bio.bi_rw = WRITE_BARRIER;
+ md->barrier_bio.bi_rw = WRITE_SYNC | REQ_FLUSH;
__split_and_process_bio(md, &md->barrier_bio);

dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
@@ -2250,19 +2242,8 @@ static void process_barrier(struct mappe
md->barrier_error = 0;

dm_flush(md);
-
- if (!bio_empty_barrier(bio)) {
- __split_and_process_bio(md, bio);
- dm_flush(md);
- }
-
- if (md->barrier_error != DM_ENDIO_REQUEUE)
- bio_endio(bio, md->barrier_error);
- else {
- spin_lock_irq(&md->deferred_lock);
- bio_list_add_head(&md->deferred, bio);
- spin_unlock_irq(&md->deferred_lock);
- }
+ __split_and_process_bio(md, bio);
+ dm_flush(md);
}

/*
Index: linux-2.6/include/linux/bio.h
================================================== =================
--- linux-2.6.orig/include/linux/bio.h 2010-08-03 20:32:11.951274008 +0200
+++ linux-2.6/include/linux/bio.h 2010-08-03 20:36:59.303005325 +0200
@@ -241,10 +241,6 @@ enum rq_flag_bits {
#define bio_offset(bio) bio_iovec((bio))->bv_offset
#define bio_segments(bio) ((bio)->bi_vcnt - (bio)->bi_idx)
#define bio_sectors(bio) ((bio)->bi_size >> 9)
-#define bio_empty_barrier(bio)
- ((bio->bi_rw & REQ_HARDBARRIER) &&
- !bio_has_data(bio) &&
- !(bio->bi_rw & REQ_DISCARD))

static inline unsigned int bio_cur_bytes(struct bio *bio)
{
Index: linux-2.6/drivers/md/dm-io.c
================================================== =================
--- linux-2.6.orig/drivers/md/dm-io.c 2010-08-03 20:26:49.685023485 +0200
+++ linux-2.6/drivers/md/dm-io.c 2010-08-03 20:36:59.308004417 +0200
@@ -364,7 +364,7 @@ static void dispatch_io(int rw, unsigned
*/
for (i = 0; i < num_regions; i++) {
*dp = old_pages;
- if (where[i].count || (rw & REQ_HARDBARRIER))
+ if (where[i].count || (rw & REQ_FLUSH))
do_region(rw, i, where + i, dp, io);
}


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Kiyoshi Ueda 08-04-2010 04:57 AM

dm: support REQ_FLUSH directly
 
Hi Christoph,

On 08/04/2010 03:51 AM +0900, Christoph Hellwig wrote:
> Adapt device-mapper to the new world order where even bio based devices
> get simple REQ_FLUSH requests for cache flushes, and need to submit
> them downwards for implementing barriers.
<snip>
> Index: linux-2.6/drivers/md/dm.c
> ================================================== =================
> --- linux-2.6.orig/drivers/md/dm.c 2010-08-03 20:26:49.676004139 +0200
> +++ linux-2.6/drivers/md/dm.c 2010-08-03 20:36:59.301005325 +0200
<snip>
> @@ -1573,7 +1565,7 @@ static int dm_prep_fn(struct request_que
> struct mapped_device *md = q->queuedata;
> struct request *clone;
>
> - if (unlikely(dm_rq_is_flush_request(rq)))
> + if (rq->cmd_flags & REQ_FLUSH)
> return BLKPREP_OK;
>
> if (unlikely(rq->special)) {
> @@ -1664,7 +1656,7 @@ static void dm_request_fn(struct request
> if (!rq)
> goto plug_and_out;
>
> - if (unlikely(dm_rq_is_flush_request(rq))) {
> + if (rq->cmd_flags & REQ_FLUSH) {
> BUG_ON(md->flush_request);
> md->flush_request = rq;
> blk_start_request(rq);

Current request-based device-mapper's flush code depends on
the block-layer's barrier behavior which dispatches only one request
at a time when flush is needed.
In other words, current request-based device-mapper can't handle
other requests while a flush request is in progress.

I'll take a look how I can fix the request-based device-mapper to
cope with it. I think it'll take time for carefull investigation.

Thanks,
Kiyoshi Ueda

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Christoph Hellwig 08-04-2010 08:54 AM

dm: support REQ_FLUSH directly
 
On Wed, Aug 04, 2010 at 01:57:37PM +0900, Kiyoshi Ueda wrote:
> > - if (unlikely(dm_rq_is_flush_request(rq))) {
> > + if (rq->cmd_flags & REQ_FLUSH) {
> > BUG_ON(md->flush_request);
> > md->flush_request = rq;
> > blk_start_request(rq);
>
> Current request-based device-mapper's flush code depends on
> the block-layer's barrier behavior which dispatches only one request
> at a time when flush is needed.
> In other words, current request-based device-mapper can't handle
> other requests while a flush request is in progress.
>
> I'll take a look how I can fix the request-based device-mapper to
> cope with it. I think it'll take time for carefull investigation.

Given that request based device mapper doesn't even look at the
block numbers from what I can see just removing any special casing
for REQ_FLUSH should probably do it.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

"Jun'ichi Nomura" 08-05-2010 02:16 AM

dm: support REQ_FLUSH directly
 
Hi Christoph,

(08/04/10 17:54), Christoph Hellwig wrote:
> On Wed, Aug 04, 2010 at 01:57:37PM +0900, Kiyoshi Ueda wrote:
>>> - if (unlikely(dm_rq_is_flush_request(rq))) {
>>> + if (rq->cmd_flags & REQ_FLUSH) {
>>> BUG_ON(md->flush_request);
>>> md->flush_request = rq;
>>> blk_start_request(rq);
>>
>> Current request-based device-mapper's flush code depends on
>> the block-layer's barrier behavior which dispatches only one request
>> at a time when flush is needed.
>> In other words, current request-based device-mapper can't handle
>> other requests while a flush request is in progress.
>>
>> I'll take a look how I can fix the request-based device-mapper to
>> cope with it. I think it'll take time for carefull investigation.
>
> Given that request based device mapper doesn't even look at the
> block numbers from what I can see just removing any special casing
> for REQ_FLUSH should probably do it.

Special casing is necessary because device-mapper may have to
send multiple copies of REQ_FLUSH request to multiple
targets, while normal request is just sent to single target.

Thanks,
--
Jun'ichi Nomura, NEC Corporation

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Mike Snitzer 08-26-2010 10:50 PM

dm: support REQ_FLUSH directly
 
On Wed, Aug 04 2010 at 10:16pm -0400,
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> Hi Christoph,
>
> (08/04/10 17:54), Christoph Hellwig wrote:
> > On Wed, Aug 04, 2010 at 01:57:37PM +0900, Kiyoshi Ueda wrote:
> >>> - if (unlikely(dm_rq_is_flush_request(rq))) {
> >>> + if (rq->cmd_flags & REQ_FLUSH) {
> >>> BUG_ON(md->flush_request);
> >>> md->flush_request = rq;
> >>> blk_start_request(rq);
> >>
> >> Current request-based device-mapper's flush code depends on
> >> the block-layer's barrier behavior which dispatches only one request
> >> at a time when flush is needed.
> >> In other words, current request-based device-mapper can't handle
> >> other requests while a flush request is in progress.
> >>
> >> I'll take a look how I can fix the request-based device-mapper to
> >> cope with it. I think it'll take time for carefull investigation.
> >
> > Given that request based device mapper doesn't even look at the
> > block numbers from what I can see just removing any special casing
> > for REQ_FLUSH should probably do it.
>
> Special casing is necessary because device-mapper may have to
> send multiple copies of REQ_FLUSH request to multiple
> targets, while normal request is just sent to single target.

Yes, request-based DM is meant to have all the same capabilities as
bio-based DM. So in theory it should support multiple targets but in
practice it doesn't. DM's multipath target is the only consumer of
request-based DM and it only ever clones a single flush request
(num_flush_requests = 1).

So why not remove all of request-based DM's barrier infrastructure and
simply rely on the revised block layer to sequence the FLUSH+WRITE
request for request-based DM?

Given that we do not have a request-based DM target that requires
cloning multiple FLUSH requests its unused code that is delaying DM
support for the new FLUSH+FUA work (NOTE: bio-based DM obviously still
needs work in this area).

Once we have a need for using request-based DM for something other than
multipath we can take a fresh look at implementing rq-based FLUSH+FUA.

Mike

p.s. I know how hard NEC worked on request-based DM's barrier support;
so I'm not suggesting this lightly. For me it just seems like we're
carrying complexity in DM that hasn't ever been required.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Mike Snitzer 08-27-2010 12:40 AM

dm: support REQ_FLUSH directly
 
On Thu, Aug 26 2010 at 6:50pm -0400,
Mike Snitzer <snitzer@redhat.com> wrote:

> Once we have a need for using request-based DM for something other than
> multipath we can take a fresh look at implementing rq-based FLUSH+FUA.
>
> Mike
>
> p.s. I know how hard NEC worked on request-based DM's barrier support;
> so I'm not suggesting this lightly. For me it just seems like we're
> carrying complexity in DM that hasn't ever been required.

To be clear: the piece that I was saying wasn't required is the need to
for request-based DM to clone a FLUSH to send to multiple targets
(saying as much was just a confusing distraction.. please ignore that).

Anyway, my previous email's question still stands.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Jamie Lokier 08-27-2010 01:20 AM

dm: support REQ_FLUSH directly
 
Mike Snitzer wrote:
> On Thu, Aug 26 2010 at 6:50pm -0400,
> Mike Snitzer <snitzer@redhat.com> wrote:
>
> > Once we have a need for using request-based DM for something other than
> > multipath we can take a fresh look at implementing rq-based FLUSH+FUA.
> >
> > Mike
> >
> > p.s. I know how hard NEC worked on request-based DM's barrier support;
> > so I'm not suggesting this lightly. For me it just seems like we're
> > carrying complexity in DM that hasn't ever been required.
>
> To be clear: the piece that I was saying wasn't required is the need to
> for request-based DM to clone a FLUSH to send to multiple targets
> (saying as much was just a confusing distraction.. please ignore that).
>
> Anyway, my previous email's question still stands.

On a slightly related note: DM suggests a reason for the lower layer, or the
request queues, to implement the trivial optimisation of discarding
FLUSHes if there's been no WRITE since the previous FLUSH.

That was mentioned elsewhere in this big thread as not being worth
even the small effort - because the filesystem is able to make good
decisions anyway.

But once you have something like RAID or striping, it's quite common
for the filesystem to issue a FLUSH when only a subset of the target
devices have received WRITEs through the RAID/striping layer since
they last received a FLUSH.

-- Jamie

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

"Jun'ichi Nomura" 08-27-2010 01:43 AM

dm: support REQ_FLUSH directly
 
Hi Mike,

(08/27/10 07:50), Mike Snitzer wrote:
>> Special casing is necessary because device-mapper may have to
>> send multiple copies of REQ_FLUSH request to multiple
>> targets, while normal request is just sent to single target.
>
> Yes, request-based DM is meant to have all the same capabilities as
> bio-based DM. So in theory it should support multiple targets but in
> practice it doesn't. DM's multipath target is the only consumer of
> request-based DM and it only ever clones a single flush request
> (num_flush_requests = 1).

This is correct. But,

> So why not remove all of request-based DM's barrier infrastructure and
> simply rely on the revised block layer to sequence the FLUSH+WRITE
> request for request-based DM?
>
> Given that we do not have a request-based DM target that requires
> cloning multiple FLUSH requests its unused code that is delaying DM
> support for the new FLUSH+FUA work (NOTE: bio-based DM obviously still
> needs work in this area).

the above mentioned 'special casing' is not a hard part.
See the attached patch.

The hard part is discerning the error type for flush failure
as discussed in the other thread.
And as Kiyoshi wrote, that's an existing problem so it can
be worked on as a separate issue than the new FLUSH work.

Thanks,
--
Jun'ichi Nomura, NEC Corporation


Cope with new sequencing of flush requests in the block layer.

Request-based dm used to depend on the barrier sequencer in the block layer
in that, when a flush request is dispatched, there are no other requests
in-flight. So it reused md->pending counter for checking completion of
cloned flush requests.

This patch separates the pending counter for flush request
as a prepartion for the new FLUSH work, where a flush request can be
dispatched while other normal requests are in-flight.

Index: linux-2.6.36-rc2/drivers/md/dm.c
================================================== =================
--- linux-2.6.36-rc2.orig/drivers/md/dm.c
+++ linux-2.6.36-rc2/drivers/md/dm.c
@@ -162,6 +162,7 @@ struct mapped_device {

/* A pointer to the currently processing pre/post flush request */
struct request *flush_request;
+ atomic_t flush_pending;

/*
* The current mapping.
@@ -777,10 +778,16 @@ static void store_barrier_error(struct m
* the md may be freed in dm_put() at the end of this function.
* Or do dm_get() before calling this function and dm_put() later.
*/
-static void rq_completed(struct mapped_device *md, int rw, int run_queue)
+static void rq_completed(struct mapped_device *md, int rw, int run_queue, bool is_flush)
{
atomic_dec(&md->pending[rw]);

+ if (is_flush) {
+ atomic_dec(&md->flush_pending);
+ if (!atomic_read(&md->flush_pending))
+ wake_up(&md->wait);
+ }
+
/* nudge anyone waiting on suspend queue */
if (!md_in_flight(md))
wake_up(&md->wait);
@@ -837,7 +844,7 @@ static void dm_end_request(struct reques
} else
blk_end_request_all(rq, error);

- rq_completed(md, rw, run_queue);
+ rq_completed(md, rw, run_queue, is_barrier);
}

static void dm_unprep_request(struct request *rq)
@@ -880,7 +887,7 @@ void dm_requeue_unmapped_request(struct
blk_requeue_request(q, rq);
spin_unlock_irqrestore(q->queue_lock, flags);

- rq_completed(md, rw, 0);
+ rq_completed(md, rw, 0, false);
}
EXPORT_SYMBOL_GPL(dm_requeue_unmapped_request);

@@ -1993,6 +2000,7 @@ static struct mapped_device *alloc_dev(i

atomic_set(&md->pending[0], 0);
atomic_set(&md->pending[1], 0);
+ atomic_set(&md->flush_pending, 0);
init_waitqueue_head(&md->wait);
INIT_WORK(&md->work, dm_wq_work);
INIT_WORK(&md->barrier_work, dm_rq_barrier_work);
@@ -2375,7 +2383,7 @@ void dm_put(struct mapped_device *md)
}
EXPORT_SYMBOL_GPL(dm_put);

-static int dm_wait_for_completion(struct mapped_device *md, int interruptible)
+static int dm_wait_for_completion(struct mapped_device *md, int interruptible, bool for_flush)
{
int r = 0;
DECLARE_WAITQUEUE(wait, current);
@@ -2388,6 +2396,8 @@ static int dm_wait_for_completion(struct
set_current_state(interruptible);

smp_mb();
+ if (for_flush && !atomic_read(&md->flush_pending))
+ break;
if (!md_in_flight(md))
break;

@@ -2408,14 +2418,14 @@ static int dm_wait_for_completion(struct

static void dm_flush(struct mapped_device *md)
{
- dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
+ dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE, false);

bio_init(&md->barrier_bio);
md->barrier_bio.bi_bdev = md->bdev;
md->barrier_bio.bi_rw = WRITE_BARRIER;
__split_and_process_bio(md, &md->barrier_bio);

- dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
+ dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE, false);
}

static void process_barrier(struct mapped_device *md, struct bio *bio)
@@ -2512,11 +2522,12 @@ static int dm_rq_barrier(struct mapped_d
clone = clone_rq(md->flush_request, md, GFP_NOIO);
dm_rq_set_target_request_nr(clone, j);
atomic_inc(&md->pending[rq_data_dir(clone)]);
+ atomic_inc(&md->flush_pending);
map_request(ti, clone, md);
}
}

- dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE);
+ dm_wait_for_completion(md, TASK_UNINTERRUPTIBLE, true);
dm_table_put(map);

return md->barrier_error;
@@ -2705,7 +2716,7 @@ int dm_suspend(struct mapped_device *md,
* We call dm_wait_for_completion to wait for all existing requests
* to finish.
*/
- r = dm_wait_for_completion(md, TASK_INTERRUPTIBLE);
+ r = dm_wait_for_completion(md, TASK_INTERRUPTIBLE, false);

down_write(&md->io_lock);
if (noflush)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Mike Snitzer 08-27-2010 04:08 AM

dm: support REQ_FLUSH directly
 
On Thu, Aug 26 2010 at 9:43pm -0400,
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> Hi Mike,
>
> (08/27/10 07:50), Mike Snitzer wrote:
> >> Special casing is necessary because device-mapper may have to
> >> send multiple copies of REQ_FLUSH request to multiple
> >> targets, while normal request is just sent to single target.
> >
> > Yes, request-based DM is meant to have all the same capabilities as
> > bio-based DM. So in theory it should support multiple targets but in
> > practice it doesn't. DM's multipath target is the only consumer of
> > request-based DM and it only ever clones a single flush request
> > (num_flush_requests = 1).
>
> This is correct. But,
>
> > So why not remove all of request-based DM's barrier infrastructure and
> > simply rely on the revised block layer to sequence the FLUSH+WRITE
> > request for request-based DM?
> >
> > Given that we do not have a request-based DM target that requires
> > cloning multiple FLUSH requests its unused code that is delaying DM
> > support for the new FLUSH+FUA work (NOTE: bio-based DM obviously still
> > needs work in this area).
>
> the above mentioned 'special casing' is not a hard part.
> See the attached patch.

Yes, Tejun suggested something like this in one of the threads. Thanks
for implementing it.

But do you agree that the request-based barrier code (added in commit
d0bcb8786) could be reverted given the new FLUSH work?

We no longer need waiting now that ordering isn't a concern. Especially
so given rq-based doesn't support multiple targets. As you know, from
dm_table_set_type:

/*
* Request-based dm supports only tables that have a single target now.
* To support multiple targets, request splitting support is needed,
* and that needs lots of changes in the block-layer.
* (e.g. request completion process for partial completion.)
*/

I think we need to at least benchmark the performance of dm-mpath
without any of this extra, soon to be unnecessary, code.

Maybe my concern is overblown...

> The hard part is discerning the error type for flush failure
> as discussed in the other thread.
> And as Kiyoshi wrote, that's an existing problem so it can
> be worked on as a separate issue than the new FLUSH work.

Right, Mike Christie will be refreshing his patchset that should enable
us to resolve that separate issue.

Thanks,
Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

"Jun'ichi Nomura" 08-27-2010 05:52 AM

dm: support REQ_FLUSH directly
 
Hi Mike,

(08/27/10 13:08), Mike Snitzer wrote:
>> the above mentioned 'special casing' is not a hard part.
>> See the attached patch.
>
> Yes, Tejun suggested something like this in one of the threads. Thanks
> for implementing it.
>
> But do you agree that the request-based barrier code (added in commit
> d0bcb8786) could be reverted given the new FLUSH work?

No, it's a separate thing.
If we don't need to care about the case where multiple clones
of flush request are necessary, the special casing of flush
request can be removed regardless of the new FLUSH work.

> We no longer need waiting now that ordering isn't a concern. Especially

The waiting is not for ordering, but for multiple clones.

> so given rq-based doesn't support multiple targets. As you know, from
> dm_table_set_type:
>
> /*
> * Request-based dm supports only tables that have a single target now.
> * To support multiple targets, request splitting support is needed,
> * and that needs lots of changes in the block-layer.
> * (e.g. request completion process for partial completion.)
> */

This comment is about multiple targets.
The special code for barrier is for single target whose
num_flush_requests > 1. Different thing.

> I think we need to at least benchmark the performance of dm-mpath
> without any of this extra, soon to be unnecessary, code.

If there will be no need for supporting a request-based target
with num_flush_requests > 1, the special handling of flush
can be removed.

And since there is no such target in the current tree,
I don't object if you remove that part of code for good reason.

Thanks,
--
Jun'ichi Nomura, NEC Corporation

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


All times are GMT. The time now is 01:54 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.