FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 09-19-2011, 06:49 AM
"Jun'ichi Nomura"
 
Default dm mpath: maintain reference count for underlying devices

Hi Mike,

On 09/16/11 22:59, Mike Snitzer wrote:
> When processing a request, DM-mpath's map_io() set the cloned request's
> request_queue to the appropriate underlying device's request_queue
> without getting a reference on that request_queue.
>
> DM-mpath now maintains a reference count on the underlying devices'
> request_queue. This change wasn't motivated by a specific report but
> code, like blk_insert_cloned_request(), will access the request_queue
> with the understanding that the request_queue is valid.

Umm, I think it doesn't make sense.

DM opens underlying devices and it should be sufficient to keep
request_queue from being freed.
If it was not enough, any other openers would have to get the reference
count, too, and that should be done in more generic place.

Thanks,
--
Jun'ichi Nomura, NEC Corporation

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-19-2011, 02:34 PM
Mike Snitzer
 
Default dm mpath: maintain reference count for underlying devices

On Mon, Sep 19 2011 at 2:49am -0400,
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> Hi Mike,
>
> On 09/16/11 22:59, Mike Snitzer wrote:
> > When processing a request, DM-mpath's map_io() set the cloned request's
> > request_queue to the appropriate underlying device's request_queue
> > without getting a reference on that request_queue.
> >
> > DM-mpath now maintains a reference count on the underlying devices'
> > request_queue. This change wasn't motivated by a specific report but
> > code, like blk_insert_cloned_request(), will access the request_queue
> > with the understanding that the request_queue is valid.
>
> Umm, I think it doesn't make sense.
>
> DM opens underlying devices and it should be sufficient to keep
> request_queue from being freed.

I welcome your review but please be more specific in the future.

Sure DM opens the underlying devices:

dm_get_device()
-> open_dev()
-> blkdev_get_by_dev()
-> bdget()
-> blkdev_get()

But DM only gets a reference on the associated block_device.

DM multipath makes use of the request_queue of each paths'
block_device. Having a reference on the block_device isn't the same as
having a reference on the request_queue.

Point is, blk_cleanup_queue() could easily be called by the SCSI
subsystem for a device that is removed -- a request_queue reference is
taken by the underlying driver at blk_alloc_queue_node() time. So SCSI
is free to drop the only reference in blk_cleanup_queue() which frees
the request_queue (unless upper layer driver like mpath also takes a
request_queue reference).

FYI, I got looking at mpath's request_queue references, or lack thereof,
because of this report/patch on LKML from Roland Drier:
https://lkml.org/lkml/2011/7/8/457

here was my follow-up to Roland:
https://lkml.org/lkml/2011/7/11/410

James Bottomley points out that we should always have a reference on the
request_queue (otherwise final put frees the request_queue on us):
https://lkml.org/lkml/2011/7/12/265

> If it was not enough, any other openers would have to get the reference
> count, too, and that should be done in more generic place.

For DM, dm-multipath is the only direct consumer of request_queue(s)
that DM didn't allocate.

We have no intention of adding another request-based target (in fact
there is serious doubt that request-based DM was ever worth it). So I
avoided complicating the DM core (even if only slightly) for rq-based
concerns that are localized to dm-multipath.

Mike

p.s. it should be noted that AFAIK this patch is already part of Oracle
Linux's uek kernel...

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-20-2011, 05:29 AM
"Jun'ichi Nomura"
 
Default dm mpath: maintain reference count for underlying devices

Hi Mike,

On 09/19/11 23:34, Mike Snitzer wrote:
> On Mon, Sep 19 2011 at 2:49am -0400,
> Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:
>> DM opens underlying devices and it should be sufficient to keep
>> request_queue from being freed.
>
> I welcome your review but please be more specific in the future.
>
> Sure DM opens the underlying devices:
>
> dm_get_device()
> -> open_dev()
> -> blkdev_get_by_dev()
> -> bdget()
> -> blkdev_get()
>
> But DM only gets a reference on the associated block_device.

Point is the above should be sufficient to keep the queue from freeing.
Otherwise, 'q->_something_' everywhere could cause invalid pointer access
as the queue is freed.


Below are additional details replying to your comments:

>
> DM multipath makes use of the request_queue of each paths'
> block_device. Having a reference on the block_device isn't the same as
> having a reference on the request_queue.

Yes. But it does not necessarily mean we have to raise
a reference count of the request_queue.

>
> Point is, blk_cleanup_queue() could easily be called by the SCSI
> subsystem for a device that is removed -- a request_queue reference is
> taken by the underlying driver at blk_alloc_queue_node() time. So SCSI
> is free to drop the only reference in blk_cleanup_queue() which frees
> the request_queue (unless upper layer driver like mpath also takes a
> request_queue reference).

As for SCSI, it takes another reference count and drops it
in scsi_device_dev_release.
So blk_cleanup_queue is not dropping the last reference.

>
> FYI, I got looking at mpath's request_queue references, or lack thereof,
> because of this report/patch on LKML from Roland Drier:
> https://lkml.org/lkml/2011/7/8/457
>
> here was my follow-up to Roland:
> https://lkml.org/lkml/2011/7/11/410

For that problem, taking a reference count is not a remedy.
The problem occurs because elevator is freed regardless of the
reference count.

The cause of the problem was:
a) SCSI has moved blk_cleanup_queue() to earlier stage
where there still is a opener
b) blk_cleanup_queue() frees elevator after marking
the queue DEAD
c) blk_insert_cloned_request() uses elevator without
checking QUEUE_FLAG_DEAD

Roland's patch was to fix c) by adding QUEUE_FLAG_DEAD check.
However, it's not possible to do it safely because
QUEUE_FLAG_DEAD means we can't even access q->queue_lock.
(See Vivek's comment in the same thread)
And without queue_lock, there's a window for a race.

James's suggestion was to fix b) by not freeing elevator
until blk_release_queue() is called:
https://lkml.org/lkml/2011/8/10/421
But it would hit the same issue that there's no guarantee
of q->queue_lock validity after blk_cleanup_queue().
James's other suggestion was to add a callback mechanism
for driver to free q->queue_lock.

I was trying to fix a) in the following thread:
https://lkml.org/lkml/2011/8/18/103
but haven't gotten response from James so it seems rejected.


>
> James Bottomley points out that we should always have a reference on the
> request_queue (otherwise final put frees the request_queue on us):
> https://lkml.org/lkml/2011/7/12/265

SCSI is taking a reference count of device being used.

>
>> If it was not enough, any other openers would have to get the reference
>> count, too, and that should be done in more generic place.
>
> For DM, dm-multipath is the only direct consumer of request_queue(s)
> that DM didn't allocate.

In DM, there are various users of underlying request_queue's member;
e.g. device_flush_capable(), dm_table_any_congested(), etc.

They would not be safe if request_queue was suddenly freed when
someone accessing 'q->something'.

>
> We have no intention of adding another request-based target (in fact
> there is serious doubt that request-based DM was ever worth it). So I
> avoided complicating the DM core (even if only slightly) for rq-based
> concerns that are localized to dm-multipath.

Where to put the code is about the maintainability.
So I don't mind if that's the maintainers' preference.

But for this specific patch, I don't understand why it is necessary
nor sufficient.

> p.s. it should be noted that AFAIK this patch is already part of Oracle
> Linux's uek kernel...

The patch should be harmless except for the extra complexity and
possible future maintenance burden.

Thanks,
--
Jun'ichi Nomura, NEC Corporation

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-20-2011, 07:13 AM
James Bottomley
 
Default dm mpath: maintain reference count for underlying devices

On Tue, 2011-09-20 at 14:29 +0900, Jun'ichi Nomura wrote:
> Hi Mike,
>
> On 09/19/11 23:34, Mike Snitzer wrote:
> > On Mon, Sep 19 2011 at 2:49am -0400,
> > Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:
> >> DM opens underlying devices and it should be sufficient to keep
> >> request_queue from being freed.
> >
> > I welcome your review but please be more specific in the future.
> >
> > Sure DM opens the underlying devices:
> >
> > dm_get_device()
> > -> open_dev()
> > -> blkdev_get_by_dev()
> > -> bdget()
> > -> blkdev_get()
> >
> > But DM only gets a reference on the associated block_device.
>
> Point is the above should be sufficient to keep the queue from freeing.
> Otherwise, 'q->_something_' everywhere could cause invalid pointer access
> as the queue is freed.
>
>
> Below are additional details replying to your comments:
>
> >
> > DM multipath makes use of the request_queue of each paths'
> > block_device. Having a reference on the block_device isn't the same as
> > having a reference on the request_queue.
>
> Yes. But it does not necessarily mean we have to raise
> a reference count of the request_queue.
>
> >
> > Point is, blk_cleanup_queue() could easily be called by the SCSI
> > subsystem for a device that is removed -- a request_queue reference is
> > taken by the underlying driver at blk_alloc_queue_node() time. So SCSI
> > is free to drop the only reference in blk_cleanup_queue() which frees
> > the request_queue (unless upper layer driver like mpath also takes a
> > request_queue reference).
>
> As for SCSI, it takes another reference count and drops it
> in scsi_device_dev_release.
> So blk_cleanup_queue is not dropping the last reference.
>
> >
> > FYI, I got looking at mpath's request_queue references, or lack thereof,
> > because of this report/patch on LKML from Roland Drier:
> > https://lkml.org/lkml/2011/7/8/457
> >
> > here was my follow-up to Roland:
> > https://lkml.org/lkml/2011/7/11/410
>
> For that problem, taking a reference count is not a remedy.
> The problem occurs because elevator is freed regardless of the
> reference count.
>
> The cause of the problem was:
> a) SCSI has moved blk_cleanup_queue() to earlier stage
> where there still is a opener
> b) blk_cleanup_queue() frees elevator after marking
> the queue DEAD
> c) blk_insert_cloned_request() uses elevator without
> checking QUEUE_FLAG_DEAD
>
> Roland's patch was to fix c) by adding QUEUE_FLAG_DEAD check.
> However, it's not possible to do it safely because
> QUEUE_FLAG_DEAD means we can't even access q->queue_lock.
> (See Vivek's comment in the same thread)
> And without queue_lock, there's a window for a race.
>
> James's suggestion was to fix b) by not freeing elevator
> until blk_release_queue() is called:
> https://lkml.org/lkml/2011/8/10/421
> But it would hit the same issue that there's no guarantee
> of q->queue_lock validity after blk_cleanup_queue().
> James's other suggestion was to add a callback mechanism
> for driver to free q->queue_lock.

Actually, there isn't a problem with this patch: Any driver that
expects blk_cleanup_queue to guarantee last use of the lock has to call
it as the last releaser. If it doesn't, it would oops (or would have
oopsed before we started putting the block guards in).

> I was trying to fix a) in the following thread:
> https://lkml.org/lkml/2011/8/18/103
> but haven't gotten response from James so it seems rejected.

I think I said quite a few times that this would reintroduce the oops I
was trying to fix. Alan seems to think that there are sufficient guards
just to move the blk_cleanup_queue(), but I'd prefer to get
blk_cleanup_queue() working properly like the del method it's supposed
to be.

I still think this (with beefed up doc) is the correct fix.

James

---

diff --git a/block/blk-core.c b/block/blk-core.c
index 90e1ffd..f30f9bf 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -350,7 +350,11 @@ EXPORT_SYMBOL(blk_put_queue);
/*
* Note: If a driver supplied the queue lock, it should not zap that lock
* unexpectedly as some queue cleanup components like elevator_exit() and
- * blk_throtl_exit() need queue lock.
+ * blk_throtl_exit() need queue lock. If you're using blk_cleanup_queue() as
+ * the point after which you can zap the supplied lock, you *must* ensure that
+ * the blk_put_queue() in this code is the final put. If it isn't, the
+ * elevator cleanup functions (which take the lock) won't be called and you'll
+ * get an oops when they are
*/
void blk_cleanup_queue(struct request_queue *q)
{
@@ -367,11 +371,6 @@ void blk_cleanup_queue(struct request_queue *q)
queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
mutex_unlock(&q->sysfs_lock);

- if (q->elevator)
- elevator_exit(q->elevator);
-
- blk_throtl_exit(q);
-
blk_put_queue(q);
}
EXPORT_SYMBOL(blk_cleanup_queue);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 0ee17b5..a5a756b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)

blk_sync_queue(q);

+ if (q->elevator)
+ elevator_exit(q->elevator);
+
+ blk_throtl_exit(q);
+
if (rl->rq_pool)
mempool_destroy(rl->rq_pool);



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-20-2011, 11:31 AM
"Jun'ichi Nomura"
 
Default dm mpath: maintain reference count for underlying devices

Hi James,

On 09/20/11 16:13, James Bottomley wrote:
>> James's suggestion was to fix b) by not freeing elevator
>> until blk_release_queue() is called:
>> https://lkml.org/lkml/2011/8/10/421
>> But it would hit the same issue that there's no guarantee
>> of q->queue_lock validity after blk_cleanup_queue().
>> James's other suggestion was to add a callback mechanism
>> for driver to free q->queue_lock.
>
> Actually, there isn't a problem with this patch: Any driver that
> expects blk_cleanup_queue to guarantee last use of the lock has to call
> it as the last releaser. If it doesn't, it would oops (or would have
> oopsed before we started putting the block guards in).

'q->sysfs_lock' seems to protect the queue access via sysfs.
Are you sure it never delay the last put?

The separation of blk_cleanup_queue() and blk_release_queue()
was introduced with the following commit:

commit 483f4afc421435b7cfe5e88f74eea0b73a476d75
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Sat Mar 18 18:34:37 2006 -0500
[PATCH] fix sysfs interaction and lifetime rules handling for queues

I added Al to Cc hoping he knows why elevator_exit() was
placed in blk_cleanup_queue() from the first time.

Other than the above concern, I have no objection to your patch.

>> I was trying to fix a) in the following thread:
>> https://lkml.org/lkml/2011/8/18/103
>> but haven't gotten response from James so it seems rejected.
>
> I think I said quite a few times that this would reintroduce the oops I
> was trying to fix. Alan seems to think that there are sufficient guards
> just to move the blk_cleanup_queue(), but I'd prefer to get
> blk_cleanup_queue() working properly like the del method it's supposed
> to be.

I think you haven't mentioned what oops would be reintroduced.

Thanks,
--
Jun'ichi Nomura, NEC Corporation

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-17-2011, 08:15 PM
Mike Snitzer
 
Default dm mpath: maintain reference count for underlying devices

On Tue, Sep 20 2011 at 1:29am -0400,
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:

> Hi Mike,
>
> On 09/19/11 23:34, Mike Snitzer wrote:
> > On Mon, Sep 19 2011 at 2:49am -0400,
> > Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote:
> >> DM opens underlying devices and it should be sufficient to keep
> >> request_queue from being freed.
> >
> > I welcome your review but please be more specific in the future.
> >
> > Sure DM opens the underlying devices:
> >
> > dm_get_device()
> > -> open_dev()
> > -> blkdev_get_by_dev()
> > -> bdget()
> > -> blkdev_get()
> >
> > But DM only gets a reference on the associated block_device.
>
> Point is the above should be sufficient to keep the queue from freeing.
> Otherwise, 'q->_something_' everywhere could cause invalid pointer access
> as the queue is freed.
>
>
> Below are additional details replying to your comments:
>
> >
> > DM multipath makes use of the request_queue of each paths'
> > block_device. Having a reference on the block_device isn't the same as
> > having a reference on the request_queue.
>
> Yes. But it does not necessarily mean we have to raise
> a reference count of the request_queue.
>
> >
> > Point is, blk_cleanup_queue() could easily be called by the SCSI
> > subsystem for a device that is removed -- a request_queue reference is
> > taken by the underlying driver at blk_alloc_queue_node() time. So SCSI
> > is free to drop the only reference in blk_cleanup_queue() which frees
> > the request_queue (unless upper layer driver like mpath also takes a
> > request_queue reference).
>
> As for SCSI, it takes another reference count and drops it
> in scsi_device_dev_release.
> So blk_cleanup_queue is not dropping the last reference.

FYI, this patch from Tejun should also fix the concern I had relative to
mpath's underlying devices' request_queues:
https://lkml.org/lkml/2011/10/16/148

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 12:52 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org