block: fix flush machinery for stacking drivers with differring flush flags
Tejun Heo <tj@kernel.org> writes:
>> -#define REQ_CLONE_MASK REQ_COMMON_MASK
>> +/*
>> + * Cloned requests are inserted into the elevator via blk_insert_cloned_request.
>> + * Because the flush flags exported by the request-based dm target may in
>> + * theory be different from the flush flags of the underlying request_queue,
>> + * we need to pass along information regarding whether a particular request
>> + * is part of a flush sequence. This is primarily used to complete I/Os early
>> + * that would otherwise not be necessary (such as an empty flush for a request
>> + * queue that does not support flush). In such a case, the end_io path for
>> + * the request would try to account the I/O instead of ignoring it, resulting
>> + * in a null pointer dereference.
>> + */
>> +#define REQ_CLONE_MASK (REQ_COMMON_MASK | REQ_FLUSH_SEQ)
>
> I'm probably missing something, but why do we still need to copy
> REQ_FLUSH_SEQ? Why doesn't the following work?
>
> * dm driver always advertises REQ_FLUSH|FUA like other stacking
> drivers.
>
> * blk-flush for the dm, decomposes flushes to FLUSH + FUA write and
> send it down.
>
> * dm driver clones the requests and send them down to each member
> queue.
>
> * blk-flush on member queue, handles FLUSH as FLUSH and decomposes FUA
> write as necessary.
>
> What am I missing? Why does end_io path still matter when it goes
> through blk-flush on the member device too?
You're missing the I/O completion of an empty flush trying to do I/O
accounting, and oopsing, as shown in the stack trace I provided before.
We could avoid passing REQ_FLUSH_SEQ, and then set it when completing an
empty flush, but I thought that was even worse. Or, maybe we could
clear REQ_IO_STAT when completing such requests.
-Jeff
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
08-12-2011, 05:01 PM
Jeff Moyer
block: fix flush machinery for stacking drivers with differring flush flags
Shaohua Li <shli@kernel.org> writes:
> 2011/8/10 Jeff Moyer <jmoyer@redhat.com>:
>> @@ -320,6 +319,7 @@ void blk_insert_flush(struct request *rq)
>> * * * *if ((policy & REQ_FSEQ_DATA) &&
>> * * * * * *!(policy & (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH))) {
>> * * * * * * * *list_add_tail(&rq->queuelist, &q->queue_head);
>> + * * * * * * * blk_run_queue_async(q);
> A minor issue. I can understand this is required for
> blk_insert_cloned_request() because INSERT_BACK will run
> queue but INSERT_FLUSH doesn't. But sounds we don't need
> run queue for normal requests. Either __make_request will run
> queue (task has plug list) or flush_plug will run queue. delaying
> run queue has its benefit. can we do the runqueue in
> blk_insert_cloned_request() if this is a INSERT_FLUSH.
Well, the only time we need to run the queue is when the request has
data, has REQ_FUA set, and the underlying queue's flush flags contain
only REQ_FUA. In code:
if (rq->cmd_flags & REQ_FUA && q->flush_flags == REQ_FUA)
blk_run_queue_async(q);
If that was added to blk_insert_cloned_request, we could get rid of the
blk_run_queue_async in blk_insert_flush. However, I think Tejun will
object due to the layering violation for the same reason he doesn't like
my handling of empty flushes in blk_insert_cloned_request.
Tejun?
Cheers,
Jeff
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel