dm mpath: maintain reference count for underlying devices
Hi Mike,
On 09/16/11 22:59, Mike Snitzer wrote: > When processing a request, DM-mpath's map_io() set the cloned request's > request_queue to the appropriate underlying device's request_queue > without getting a reference on that request_queue. > > DM-mpath now maintains a reference count on the underlying devices' > request_queue. This change wasn't motivated by a specific report but > code, like blk_insert_cloned_request(), will access the request_queue > with the understanding that the request_queue is valid. Umm, I think it doesn't make sense. DM opens underlying devices and it should be sufficient to keep request_queue from being freed. If it was not enough, any other openers would have to get the reference count, too, and that should be done in more generic place. Thanks, -- Jun'ichi Nomura, NEC Corporation -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
dm mpath: maintain reference count for underlying devices
On Mon, Sep 19 2011 at 2:49am -0400,
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote: > Hi Mike, > > On 09/16/11 22:59, Mike Snitzer wrote: > > When processing a request, DM-mpath's map_io() set the cloned request's > > request_queue to the appropriate underlying device's request_queue > > without getting a reference on that request_queue. > > > > DM-mpath now maintains a reference count on the underlying devices' > > request_queue. This change wasn't motivated by a specific report but > > code, like blk_insert_cloned_request(), will access the request_queue > > with the understanding that the request_queue is valid. > > Umm, I think it doesn't make sense. > > DM opens underlying devices and it should be sufficient to keep > request_queue from being freed. I welcome your review but please be more specific in the future. Sure DM opens the underlying devices: dm_get_device() -> open_dev() -> blkdev_get_by_dev() -> bdget() -> blkdev_get() But DM only gets a reference on the associated block_device. DM multipath makes use of the request_queue of each paths' block_device. Having a reference on the block_device isn't the same as having a reference on the request_queue. Point is, blk_cleanup_queue() could easily be called by the SCSI subsystem for a device that is removed -- a request_queue reference is taken by the underlying driver at blk_alloc_queue_node() time. So SCSI is free to drop the only reference in blk_cleanup_queue() which frees the request_queue (unless upper layer driver like mpath also takes a request_queue reference). FYI, I got looking at mpath's request_queue references, or lack thereof, because of this report/patch on LKML from Roland Drier: https://lkml.org/lkml/2011/7/8/457 here was my follow-up to Roland: https://lkml.org/lkml/2011/7/11/410 James Bottomley points out that we should always have a reference on the request_queue (otherwise final put frees the request_queue on us): https://lkml.org/lkml/2011/7/12/265 > If it was not enough, any other openers would have to get the reference > count, too, and that should be done in more generic place. For DM, dm-multipath is the only direct consumer of request_queue(s) that DM didn't allocate. We have no intention of adding another request-based target (in fact there is serious doubt that request-based DM was ever worth it). So I avoided complicating the DM core (even if only slightly) for rq-based concerns that are localized to dm-multipath. Mike p.s. it should be noted that AFAIK this patch is already part of Oracle Linux's uek kernel... -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
dm mpath: maintain reference count for underlying devices
Hi Mike,
On 09/19/11 23:34, Mike Snitzer wrote: > On Mon, Sep 19 2011 at 2:49am -0400, > Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote: >> DM opens underlying devices and it should be sufficient to keep >> request_queue from being freed. > > I welcome your review but please be more specific in the future. > > Sure DM opens the underlying devices: > > dm_get_device() > -> open_dev() > -> blkdev_get_by_dev() > -> bdget() > -> blkdev_get() > > But DM only gets a reference on the associated block_device. Point is the above should be sufficient to keep the queue from freeing. Otherwise, 'q->_something_' everywhere could cause invalid pointer access as the queue is freed. Below are additional details replying to your comments: > > DM multipath makes use of the request_queue of each paths' > block_device. Having a reference on the block_device isn't the same as > having a reference on the request_queue. Yes. But it does not necessarily mean we have to raise a reference count of the request_queue. > > Point is, blk_cleanup_queue() could easily be called by the SCSI > subsystem for a device that is removed -- a request_queue reference is > taken by the underlying driver at blk_alloc_queue_node() time. So SCSI > is free to drop the only reference in blk_cleanup_queue() which frees > the request_queue (unless upper layer driver like mpath also takes a > request_queue reference). As for SCSI, it takes another reference count and drops it in scsi_device_dev_release. So blk_cleanup_queue is not dropping the last reference. > > FYI, I got looking at mpath's request_queue references, or lack thereof, > because of this report/patch on LKML from Roland Drier: > https://lkml.org/lkml/2011/7/8/457 > > here was my follow-up to Roland: > https://lkml.org/lkml/2011/7/11/410 For that problem, taking a reference count is not a remedy. The problem occurs because elevator is freed regardless of the reference count. The cause of the problem was: a) SCSI has moved blk_cleanup_queue() to earlier stage where there still is a opener b) blk_cleanup_queue() frees elevator after marking the queue DEAD c) blk_insert_cloned_request() uses elevator without checking QUEUE_FLAG_DEAD Roland's patch was to fix c) by adding QUEUE_FLAG_DEAD check. However, it's not possible to do it safely because QUEUE_FLAG_DEAD means we can't even access q->queue_lock. (See Vivek's comment in the same thread) And without queue_lock, there's a window for a race. James's suggestion was to fix b) by not freeing elevator until blk_release_queue() is called: https://lkml.org/lkml/2011/8/10/421 But it would hit the same issue that there's no guarantee of q->queue_lock validity after blk_cleanup_queue(). James's other suggestion was to add a callback mechanism for driver to free q->queue_lock. I was trying to fix a) in the following thread: https://lkml.org/lkml/2011/8/18/103 but haven't gotten response from James so it seems rejected. > > James Bottomley points out that we should always have a reference on the > request_queue (otherwise final put frees the request_queue on us): > https://lkml.org/lkml/2011/7/12/265 SCSI is taking a reference count of device being used. > >> If it was not enough, any other openers would have to get the reference >> count, too, and that should be done in more generic place. > > For DM, dm-multipath is the only direct consumer of request_queue(s) > that DM didn't allocate. In DM, there are various users of underlying request_queue's member; e.g. device_flush_capable(), dm_table_any_congested(), etc. They would not be safe if request_queue was suddenly freed when someone accessing 'q->something'. > > We have no intention of adding another request-based target (in fact > there is serious doubt that request-based DM was ever worth it). So I > avoided complicating the DM core (even if only slightly) for rq-based > concerns that are localized to dm-multipath. Where to put the code is about the maintainability. So I don't mind if that's the maintainers' preference. But for this specific patch, I don't understand why it is necessary nor sufficient. > p.s. it should be noted that AFAIK this patch is already part of Oracle > Linux's uek kernel... The patch should be harmless except for the extra complexity and possible future maintenance burden. Thanks, -- Jun'ichi Nomura, NEC Corporation -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
dm mpath: maintain reference count for underlying devices
On Tue, 2011-09-20 at 14:29 +0900, Jun'ichi Nomura wrote:
> Hi Mike, > > On 09/19/11 23:34, Mike Snitzer wrote: > > On Mon, Sep 19 2011 at 2:49am -0400, > > Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote: > >> DM opens underlying devices and it should be sufficient to keep > >> request_queue from being freed. > > > > I welcome your review but please be more specific in the future. > > > > Sure DM opens the underlying devices: > > > > dm_get_device() > > -> open_dev() > > -> blkdev_get_by_dev() > > -> bdget() > > -> blkdev_get() > > > > But DM only gets a reference on the associated block_device. > > Point is the above should be sufficient to keep the queue from freeing. > Otherwise, 'q->_something_' everywhere could cause invalid pointer access > as the queue is freed. > > > Below are additional details replying to your comments: > > > > > DM multipath makes use of the request_queue of each paths' > > block_device. Having a reference on the block_device isn't the same as > > having a reference on the request_queue. > > Yes. But it does not necessarily mean we have to raise > a reference count of the request_queue. > > > > > Point is, blk_cleanup_queue() could easily be called by the SCSI > > subsystem for a device that is removed -- a request_queue reference is > > taken by the underlying driver at blk_alloc_queue_node() time. So SCSI > > is free to drop the only reference in blk_cleanup_queue() which frees > > the request_queue (unless upper layer driver like mpath also takes a > > request_queue reference). > > As for SCSI, it takes another reference count and drops it > in scsi_device_dev_release. > So blk_cleanup_queue is not dropping the last reference. > > > > > FYI, I got looking at mpath's request_queue references, or lack thereof, > > because of this report/patch on LKML from Roland Drier: > > https://lkml.org/lkml/2011/7/8/457 > > > > here was my follow-up to Roland: > > https://lkml.org/lkml/2011/7/11/410 > > For that problem, taking a reference count is not a remedy. > The problem occurs because elevator is freed regardless of the > reference count. > > The cause of the problem was: > a) SCSI has moved blk_cleanup_queue() to earlier stage > where there still is a opener > b) blk_cleanup_queue() frees elevator after marking > the queue DEAD > c) blk_insert_cloned_request() uses elevator without > checking QUEUE_FLAG_DEAD > > Roland's patch was to fix c) by adding QUEUE_FLAG_DEAD check. > However, it's not possible to do it safely because > QUEUE_FLAG_DEAD means we can't even access q->queue_lock. > (See Vivek's comment in the same thread) > And without queue_lock, there's a window for a race. > > James's suggestion was to fix b) by not freeing elevator > until blk_release_queue() is called: > https://lkml.org/lkml/2011/8/10/421 > But it would hit the same issue that there's no guarantee > of q->queue_lock validity after blk_cleanup_queue(). > James's other suggestion was to add a callback mechanism > for driver to free q->queue_lock. Actually, there isn't a problem with this patch: Any driver that expects blk_cleanup_queue to guarantee last use of the lock has to call it as the last releaser. If it doesn't, it would oops (or would have oopsed before we started putting the block guards in). > I was trying to fix a) in the following thread: > https://lkml.org/lkml/2011/8/18/103 > but haven't gotten response from James so it seems rejected. I think I said quite a few times that this would reintroduce the oops I was trying to fix. Alan seems to think that there are sufficient guards just to move the blk_cleanup_queue(), but I'd prefer to get blk_cleanup_queue() working properly like the del method it's supposed to be. I still think this (with beefed up doc) is the correct fix. James --- diff --git a/block/blk-core.c b/block/blk-core.c index 90e1ffd..f30f9bf 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -350,7 +350,11 @@ EXPORT_SYMBOL(blk_put_queue); /* * Note: If a driver supplied the queue lock, it should not zap that lock * unexpectedly as some queue cleanup components like elevator_exit() and - * blk_throtl_exit() need queue lock. + * blk_throtl_exit() need queue lock. If you're using blk_cleanup_queue() as + * the point after which you can zap the supplied lock, you *must* ensure that + * the blk_put_queue() in this code is the final put. If it isn't, the + * elevator cleanup functions (which take the lock) won't be called and you'll + * get an oops when they are */ void blk_cleanup_queue(struct request_queue *q) { @@ -367,11 +371,6 @@ void blk_cleanup_queue(struct request_queue *q) queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q); mutex_unlock(&q->sysfs_lock); - if (q->elevator) - elevator_exit(q->elevator); - - blk_throtl_exit(q); - blk_put_queue(q); } EXPORT_SYMBOL(blk_cleanup_queue); diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c index 0ee17b5..a5a756b 100644 --- a/block/blk-sysfs.c +++ b/block/blk-sysfs.c @@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj) blk_sync_queue(q); + if (q->elevator) + elevator_exit(q->elevator); + + blk_throtl_exit(q); + if (rl->rq_pool) mempool_destroy(rl->rq_pool); -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
dm mpath: maintain reference count for underlying devices
Hi James,
On 09/20/11 16:13, James Bottomley wrote: >> James's suggestion was to fix b) by not freeing elevator >> until blk_release_queue() is called: >> https://lkml.org/lkml/2011/8/10/421 >> But it would hit the same issue that there's no guarantee >> of q->queue_lock validity after blk_cleanup_queue(). >> James's other suggestion was to add a callback mechanism >> for driver to free q->queue_lock. > > Actually, there isn't a problem with this patch: Any driver that > expects blk_cleanup_queue to guarantee last use of the lock has to call > it as the last releaser. If it doesn't, it would oops (or would have > oopsed before we started putting the block guards in). 'q->sysfs_lock' seems to protect the queue access via sysfs. Are you sure it never delay the last put? The separation of blk_cleanup_queue() and blk_release_queue() was introduced with the following commit: commit 483f4afc421435b7cfe5e88f74eea0b73a476d75 Author: Al Viro <viro@zeniv.linux.org.uk> Date: Sat Mar 18 18:34:37 2006 -0500 [PATCH] fix sysfs interaction and lifetime rules handling for queues I added Al to Cc hoping he knows why elevator_exit() was placed in blk_cleanup_queue() from the first time. Other than the above concern, I have no objection to your patch. >> I was trying to fix a) in the following thread: >> https://lkml.org/lkml/2011/8/18/103 >> but haven't gotten response from James so it seems rejected. > > I think I said quite a few times that this would reintroduce the oops I > was trying to fix. Alan seems to think that there are sufficient guards > just to move the blk_cleanup_queue(), but I'd prefer to get > blk_cleanup_queue() working properly like the del method it's supposed > to be. I think you haven't mentioned what oops would be reintroduced. Thanks, -- Jun'ichi Nomura, NEC Corporation -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
dm mpath: maintain reference count for underlying devices
On Tue, Sep 20 2011 at 1:29am -0400,
Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote: > Hi Mike, > > On 09/19/11 23:34, Mike Snitzer wrote: > > On Mon, Sep 19 2011 at 2:49am -0400, > > Jun'ichi Nomura <j-nomura@ce.jp.nec.com> wrote: > >> DM opens underlying devices and it should be sufficient to keep > >> request_queue from being freed. > > > > I welcome your review but please be more specific in the future. > > > > Sure DM opens the underlying devices: > > > > dm_get_device() > > -> open_dev() > > -> blkdev_get_by_dev() > > -> bdget() > > -> blkdev_get() > > > > But DM only gets a reference on the associated block_device. > > Point is the above should be sufficient to keep the queue from freeing. > Otherwise, 'q->_something_' everywhere could cause invalid pointer access > as the queue is freed. > > > Below are additional details replying to your comments: > > > > > DM multipath makes use of the request_queue of each paths' > > block_device. Having a reference on the block_device isn't the same as > > having a reference on the request_queue. > > Yes. But it does not necessarily mean we have to raise > a reference count of the request_queue. > > > > > Point is, blk_cleanup_queue() could easily be called by the SCSI > > subsystem for a device that is removed -- a request_queue reference is > > taken by the underlying driver at blk_alloc_queue_node() time. So SCSI > > is free to drop the only reference in blk_cleanup_queue() which frees > > the request_queue (unless upper layer driver like mpath also takes a > > request_queue reference). > > As for SCSI, it takes another reference count and drops it > in scsi_device_dev_release. > So blk_cleanup_queue is not dropping the last reference. FYI, this patch from Tejun should also fix the concern I had relative to mpath's underlying devices' request_queues: https://lkml.org/lkml/2011/10/16/148 -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel |
| All times are GMT. The time now is 02:17 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.