FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 08-10-2011, 07:52 PM
James Bottomley
 
Default Oops when SCSI device under multipath is removed

On Wed, 2011-08-10 at 13:29 +0900, Jun'ichi Nomura wrote:
> Hi James,
>
> With the attached shell script, blk_insert_cloned_request will cause
> oops with 3.1-rc1. (This is a regression introduced in 2.6.39)
>
> The regression was introduced by 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b,
> "[SCSI] put stricter guards on queue dead checks".
> Part of the commit has moved scsi_free_queue(), which calls
> blk_cleanup_queue(), to scsi_remove_device(), which is called while
> the device is still open.
> The oops occurs because blk_insert_cloned_request() is called
> after blk_cleanup_queue() is called (which frees elevator
> and turns on QUEUE_FLAG_DEAD).
>
> 2 patches have been proposed but neither of them included:
> 1) Add QUEUE_FLAG_DEAD check in blk_insert_cloned_request()
> https://lkml.org/lkml/2011/7/8/457
> 2) SCSI to call blk_cleanup_queue() from device's ->release() callback
> (before 2.6.39, it used to work like this)
> https://lkml.org/lkml/2011/7/2/106

Well, they both have documented objections. I asked why we destroy the
elevator in the del case and didn't get any traction, so let me show the
actual patch which should fix all of these issues.

Is there a good reason for not doing this as a bug fix now?

James

---
diff --git a/block/blk-core.c b/block/blk-core.c
index b627558..cc72b7d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -367,11 +367,6 @@ void blk_cleanup_queue(struct request_queue *q)
queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
mutex_unlock(&q->sysfs_lock);

- if (q->elevator)
- elevator_exit(q->elevator);
-
- blk_throtl_exit(q);
-
blk_put_queue(q);
}
EXPORT_SYMBOL(blk_cleanup_queue);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 0ee17b5..a5a756b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)

blk_sync_queue(q);

+ if (q->elevator)
+ elevator_exit(q->elevator);
+
+ blk_throtl_exit(q);
+
if (rl->rq_pool)
mempool_destroy(rl->rq_pool);



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-11-2011, 12:24 AM
"Jun'ichi Nomura"
 
Default Oops when SCSI device under multipath is removed

Hi James,

On 08/11/11 04:52, James Bottomley wrote:
> On Wed, 2011-08-10 at 13:29 +0900, Jun'ichi Nomura wrote:
>> 2 patches have been proposed but neither of them included:
>> 1) Add QUEUE_FLAG_DEAD check in blk_insert_cloned_request()
>> https://lkml.org/lkml/2011/7/8/457
>> 2) SCSI to call blk_cleanup_queue() from device's ->release() callback
>> (before 2.6.39, it used to work like this)
>> https://lkml.org/lkml/2011/7/2/106
>
> Well, they both have documented objections. I asked why we destroy the
> elevator in the del case and didn't get any traction, so let me show the
> actual patch which should fix all of these issues.
>
> Is there a good reason for not doing this as a bug fix now?
>
> James
>
> ---
> diff --git a/block/blk-core.c b/block/blk-core.c
> index b627558..cc72b7d 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -367,11 +367,6 @@ void blk_cleanup_queue(struct request_queue *q)
> queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
> mutex_unlock(&q->sysfs_lock);
>
> - if (q->elevator)
> - elevator_exit(q->elevator);
> -
> - blk_throtl_exit(q);
> -
> blk_put_queue(q);
> }
> EXPORT_SYMBOL(blk_cleanup_queue);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 0ee17b5..a5a756b 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -477,6 +477,11 @@ static void blk_release_queue(struct kobject *kobj)
>
> blk_sync_queue(q);
>
> + if (q->elevator)
> + elevator_exit(q->elevator);
> +
> + blk_throtl_exit(q);
> +
> if (rl->rq_pool)
> mempool_destroy(rl->rq_pool);

I think it doesn't work because elevator_exit() and
blk_throtl_exit() take &q->queue_lock, which may be freed
by LLD after blk_cleanup_queue, before blk_release_queue.

Vivek's comment here describes it in detail:
https://lkml.org/lkml/2011/7/12/279

Vivek Goyal wrote:
> I thought a driver could either rely on spin lock provided by request
> queue or override that by providing its own spinlock.
>
> blk_init_allocated_queue_node()
> /* Override internal queue lock with supplied lock pointer */
> if (lock)
> q->queue_lock = lock;
> So if driver calls blk_cleanup_queue() and drops its reference on queue, then
> it should be free to release any memory it has allocated for spinlock.
> So though queue is around there are no gurantees that q->queue_lock is
> still around. That memory might have been freed by driver and reused.
>
> I see many drivers are providing their own locks. Some samples from
> drivers/block.
>
> /virtio_blk.c: q = vblk->disk->queue = blk_init_queue(do_virtblk_request,
> &vblk->lock);
> ./xd.c: xd_queue = blk_init_queue(do_xd_request, &xd_lock);
> ./cpqarray.c: q = blk_init_queue(do_ida_request, &hba[i]->lock);
> ./sx8.c: q = blk_init_queue(carm_rq_fn, &host->lock);
> ./sx8.c: q = blk_init_queue(carm_oob_rq_fn, &host->lock);
> ./floppy.c: disks[dr]->queue = blk_init_queue(do_fd_request, &floppy_lock);
> ./viodasd.c: q = blk_init_queue(do_viodasd_request, &d->q_lock);
> ./cciss.c: disk->queue = blk_init_queue(do_cciss_request, &h->lock);
> ./hd.c: hd_queue = blk_init_queue(do_hd_request, &hd_lock);
> ./DAC960.c: RequestQueue = blk_init_queue(DAC960_RequestFunction,&Controller->queue_lock);
> ./z2ram.c: z2_queue = blk_init_queue(do_z2_request, &z2ram_lock);
> ./amiflop.c: disk->queue = blk_init_queue(do_fd_request, &amiflop_lock);
> ./xen-blkfront.c: rq = blk_init_queue(do_blkif_request, &blkif_io_lock);
> ./paride/pd.c: pd_queue = blk_init_queue(do_pd_request, &pd_lock);
> ./paride/pf.c: pf_queue = blk_init_queue(do_pf_request, &pf_spin_lock);
> ./paride/pcd.c: pcd_queue = blk_init_queue(do_pcd_request, &pcd_lock);
> ./mg_disk.c: host->breq = blk_init_queue(mg_request_poll, &host->lock);
> ./mg_disk.c: host->breq = blk_init_queue(mg_request, &host->lock);
> ./rbd.c: q = blk_init_queue(rbd_rq_fn, &rbd_dev->lock);
> ./sunvdc.c: q = blk_init_queue(do_vdc_request, &port->vio.lock);
> ./swim.c: swd->queue = blk_init_queue(do_fd_request, &swd->lock);
> ./xsysace.c: ace->queue = blk_init_queue(ace_request, &ace->lock);
> ./osdblk.c: q = blk_init_queue(osdblk_rq_fn, &osdev->lock);
> ./ps3disk.c: queue = blk_init_queue(ps3disk_request, &priv->lock);
> ./swim3.c: swim3_queue = blk_init_queue(do_fd_request, &swim3_lock);
> ./ub.c: if ((q = blk_init_queue(ub_request_fn, sc->lock)) == NULL)
> ./nbd.c: disk->queue = blk_init_queue(do_nbd_request, &nbd_lock);

# SCSI doesn't hit this problem because it uses
# queue_lock embedded in struct request_queue.


Thanks,
--
Jun'ichi Nomura, NEC Corporation

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-11-2011, 03:01 AM
"Jun'ichi Nomura"
 
Default Oops when SCSI device under multipath is removed

Hi James,

On 08/11/11 09:24, Jun'ichi Nomura wrote:
> On 08/11/11 04:52, James Bottomley wrote:
>> On Wed, 2011-08-10 at 13:29 +0900, Jun'ichi Nomura wrote:
>>> 2) SCSI to call blk_cleanup_queue() from device's ->release() callback
>>> (before 2.6.39, it used to work like this)
>>> https://lkml.org/lkml/2011/7/2/106
>>
>> Well, they both have documented objections. I asked why we destroy the
>> elevator in the del case and didn't get any traction, so let me show the
>> actual patch which should fix all of these issues.
>>
>> Is there a good reason for not doing this as a bug fix now?
...
> I think it doesn't work because elevator_exit() and
> blk_throtl_exit() take &q->queue_lock, which may be freed
> by LLD after blk_cleanup_queue, before blk_release_queue.

If the reason you moved scsi_free_queue into scsi_remove_device
is marking the queue dead, how about the following patch?
Do you think it's acceptable?

Jun'ichi Nomura, NEC Corporation


Add blk_kill_queue() for drivers which want to mark the queue dead early.

blk_cleanup_queue() is an interface for LLD to notify block layer
that LLD no longer needs the queue.
Since q->queue_lock may point to a structure in LLD which is freed
after blk_cleanup_queue, blk_cleanup_queue() frees subordinate structures
like elevator, which uses q->queue_lock, to avoid invalid reference.

OTOH, LLD like SCSI wants to just mark the queue dead earlier in tear
down phase.

So this patch factors out the early part of blk_cleanup_queue into
blk_kill_queue for such drivers.

--- linux-3.1-rc1/include/linux/blkdev.h.orig 2011-08-11 11:19:52.585280223 +0900
+++ linux-3.1-rc1/include/linux/blkdev.h 2011-08-11 11:20:09.482279763 +0900
@@ -804,6 +804,7 @@ extern struct request_queue *blk_init_al
extern struct request_queue *blk_init_queue(request_fn_proc *, spinlock_t *);
extern struct request_queue *blk_init_allocated_queue(struct request_queue *,
request_fn_proc *, spinlock_t *);
+extern void blk_kill_queue(struct request_queue *);
extern void blk_cleanup_queue(struct request_queue *);
extern void blk_queue_make_request(struct request_queue *, make_request_fn *);
extern void blk_queue_bounce_limit(struct request_queue *, u64);
--- linux-3.1-rc1/block/blk-core.c.orig 2011-08-10 09:46:06.014043123 +0900
+++ linux-3.1-rc1/block/blk-core.c 2011-08-11 11:19:34.551280697 +0900
@@ -347,6 +347,17 @@ void blk_put_queue(struct request_queue
}
EXPORT_SYMBOL(blk_put_queue);

+void blk_kill_queue(struct request_queue *q)
+{
+ blk_sync_queue(q);
+
+ del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
+ mutex_lock(&q->sysfs_lock);
+ queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
+ mutex_unlock(&q->sysfs_lock);
+}
+EXPORT_SYMBOL(blk_kill_queue);
+
/*
* Note: If a driver supplied the queue lock, it should not zap that lock
* unexpectedly as some queue cleanup components like elevator_exit() and
@@ -360,12 +371,7 @@ void blk_cleanup_queue(struct request_qu
* are done before moving on. Going into this function, we should
* not have processes doing IO to this device.
*/
- blk_sync_queue(q);
-
- del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
- mutex_lock(&q->sysfs_lock);
- queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
- mutex_unlock(&q->sysfs_lock);
+ blk_kill_queue(q);

if (q->elevator)
elevator_exit(q->elevator);
--- linux-3.1-rc1/drivers/scsi/scsi_sysfs.c.orig 2011-08-09 18:48:13.676485115 +0900
+++ linux-3.1-rc1/drivers/scsi/scsi_sysfs.c 2011-08-11 11:21:07.923277456 +0900
@@ -322,6 +322,7 @@ static void scsi_device_dev_release_user
kfree(evt);
}

+ scsi_free_queue(sdev->request_queue);
blk_put_queue(sdev->request_queue);
/* NULL queue means the device can't be used */
sdev->request_queue = NULL;
@@ -937,7 +938,7 @@ void __scsi_remove_device(struct scsi_de
sdev->request_queue->queuedata = NULL;

/* Freeing the queue signals to block that we're done */
- scsi_free_queue(sdev->request_queue);
+ blk_kill_queue(sdev->request_queue);
put_device(dev);
}


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-11-2011, 02:33 PM
James Bottomley
 
Default Oops when SCSI device under multipath is removed

On Thu, 2011-08-11 at 12:01 +0900, Jun'ichi Nomura wrote:
> Hi James,
>
> On 08/11/11 09:24, Jun'ichi Nomura wrote:
> > On 08/11/11 04:52, James Bottomley wrote:
> >> On Wed, 2011-08-10 at 13:29 +0900, Jun'ichi Nomura wrote:
> >>> 2) SCSI to call blk_cleanup_queue() from device's ->release() callback
> >>> (before 2.6.39, it used to work like this)
> >>> https://lkml.org/lkml/2011/7/2/106
> >>
> >> Well, they both have documented objections. I asked why we destroy the
> >> elevator in the del case and didn't get any traction, so let me show the
> >> actual patch which should fix all of these issues.
> >>
> >> Is there a good reason for not doing this as a bug fix now?
> ...
> > I think it doesn't work because elevator_exit() and
> > blk_throtl_exit() take &q->queue_lock, which may be freed
> > by LLD after blk_cleanup_queue, before blk_release_queue.
>
> If the reason you moved scsi_free_queue into scsi_remove_device
> is marking the queue dead, how about the following patch?
> Do you think it's acceptable?

Well, it's just hiding the problem. The essential problem is that only
block has the correctly refcounted knowledge to know the last release of
the queue reference. Until that time, the holder of the reference can
use the queue regardless of whether blk_cleanup_queue() has been called.
This is the race you complain about since use of the queue involves the
lock which should be guarded by QUEUE_DEAD checks.

This is essentially unfixable with function calls. The only way to fix
it is to have a callback model for freeing the external lock.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-11-2011, 02:59 PM
Alan Stern
 
Default Oops when SCSI device under multipath is removed

On Thu, 11 Aug 2011, James Bottomley wrote:

> > If the reason you moved scsi_free_queue into scsi_remove_device
> > is marking the queue dead, how about the following patch?
> > Do you think it's acceptable?
>
> Well, it's just hiding the problem. The essential problem is that only
> block has the correctly refcounted knowledge to know the last release of
> the queue reference. Until that time, the holder of the reference can
> use the queue regardless of whether blk_cleanup_queue() has been called.
> This is the race you complain about since use of the queue involves the
> lock which should be guarded by QUEUE_DEAD checks.
>
> This is essentially unfixable with function calls. The only way to fix
> it is to have a callback model for freeing the external lock.

Assuming the queue is associated with a device, the queue could take a
reference to the device, dropping that reference when the queue is
freed. Then the lock could safely be freed at the same time as the
device.

Alan Stern

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-11-2011, 03:05 PM
James Bottomley
 
Default Oops when SCSI device under multipath is removed

On Thu, 2011-08-11 at 10:59 -0400, Alan Stern wrote:
> On Thu, 11 Aug 2011, James Bottomley wrote:
>
> > > If the reason you moved scsi_free_queue into scsi_remove_device
> > > is marking the queue dead, how about the following patch?
> > > Do you think it's acceptable?
> >
> > Well, it's just hiding the problem. The essential problem is that only
> > block has the correctly refcounted knowledge to know the last release of
> > the queue reference. Until that time, the holder of the reference can
> > use the queue regardless of whether blk_cleanup_queue() has been called.
> > This is the race you complain about since use of the queue involves the
> > lock which should be guarded by QUEUE_DEAD checks.
> >
> > This is essentially unfixable with function calls. The only way to fix
> > it is to have a callback model for freeing the external lock.
>
> Assuming the queue is associated with a device, the queue could take a
> reference to the device, dropping that reference when the queue is
> freed. Then the lock could safely be freed at the same time as the
> device.

If that assumption is correct, there's no point refcounting the queue at
all because its use is entirely subordinated to the lifecycle of the
associated device. Plus all the wittering about my previous patch is
pointless, because blk_cleanup_queue() has to do the final put of the
queue in the lock free path (otherwise the assumption is violated).

However, much as I'd like to accept this rosy view, the original oops
that started all of this in 2.6.38 was someone caught something with a
reference to a SCSI queue after the device release function had been
called.

James


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-11-2011, 03:16 PM
Alan Stern
 
Default Oops when SCSI device under multipath is removed

On Thu, 11 Aug 2011, James Bottomley wrote:

> > > Well, it's just hiding the problem. The essential problem is that only
> > > block has the correctly refcounted knowledge to know the last release of
> > > the queue reference. Until that time, the holder of the reference can
> > > use the queue regardless of whether blk_cleanup_queue() has been called.
> > > This is the race you complain about since use of the queue involves the
> > > lock which should be guarded by QUEUE_DEAD checks.
> > >
> > > This is essentially unfixable with function calls. The only way to fix
> > > it is to have a callback model for freeing the external lock.
> >
> > Assuming the queue is associated with a device, the queue could take a
> > reference to the device, dropping that reference when the queue is
> > freed. Then the lock could safely be freed at the same time as the
> > device.
>
> If that assumption is correct, there's no point refcounting the queue at
> all because its use is entirely subordinated to the lifecycle of the
> associated device.

That's true. Why wasn't it done that way originally? Are there queues
that aren't associated with devices?

> Plus all the wittering about my previous patch is
> pointless, because blk_cleanup_queue() has to do the final put of the
> queue in the lock free path (otherwise the assumption is violated).
>
> However, much as I'd like to accept this rosy view, the original oops
> that started all of this in 2.6.38 was someone caught something with a
> reference to a SCSI queue after the device release function had been
> called.

Not according to your commit log. You wrote that the reference was
taken after scsi_remove_device() had been called -- but the device
release function is scsi_device_dev_release_usercontext().

Alan Stern

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-16-2011, 11:26 AM
"Jun'ichi Nomura"
 
Default Oops when SCSI device under multipath is removed

Hi,

On 08/12/11 00:16, Alan Stern wrote:
> On Thu, 11 Aug 2011, James Bottomley wrote:
>> However, much as I'd like to accept this rosy view, the original oops
>> that started all of this in 2.6.38 was someone caught something with a
>> reference to a SCSI queue after the device release function had been
>> called.
>
> Not according to your commit log. You wrote that the reference was
> taken after scsi_remove_device() had been called -- but the device
> release function is scsi_device_dev_release_usercontext().

The commit log of 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
("[SCSI] put stricter guards on queue dead checks") does not
explain about the move of scsi_free_queue().

But according to the discussion below, it seems
the move was motivated to solve the following self-deadlock:
https://lkml.org/lkml/2011/4/12/9

[in the context of kblockd_workqueue]
blk_delay_work
__blk_run_queue
scsi_request_fn
put_device
(puts final sdev refcount)
scsi_device_dev_release
execute_in_process_context(scsi_device_dev_release _usercontext)
[execute immediately because it's in process context]
scsi_device_dev_release_usercontext
scsi_free_queue
blk_cleanup_queue
blk_sync_queue
(wait for blk_delay_work to complete...)

James, is my understanding correct?

If so, isn't it possible to move the scsi_free_queue back to
the original place and solve the deadlock instead by
avoiding the wait in the same context?

@@ -338,8 +339,8 @@ static void scsi_device_dev_release_user
static void scsi_device_dev_release(struct device *dev)
{
struct scsi_device *sdp = to_scsi_device(dev);
- execute_in_process_context(scsi_device_dev_release _usercontext,
- &sdp->ew);
+ INIT_WORK(&sdp->ew.work, scsi_device_dev_release_usercontext);
+ schedule_work(&sdp->ew.work);
}

static struct class sdev_class = {

Thanks,
--
Jun'ichi Nomura, NEC Corporation

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-18-2011, 09:11 AM
"Jun'ichi Nomura"
 
Default Oops when SCSI device under multipath is removed

Hi James,

On 08/16/11 20:26, Jun'ichi Nomura wrote:
> The commit log of 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
> ("[SCSI] put stricter guards on queue dead checks") does not
> explain about the move of scsi_free_queue().
>
> But according to the discussion below, it seems
> the move was motivated to solve the following self-deadlock:
> https://lkml.org/lkml/2011/4/12/9
>
> [in the context of kblockd_workqueue]
> blk_delay_work
> __blk_run_queue
> scsi_request_fn
> put_device
> (puts final sdev refcount)
> scsi_device_dev_release
> execute_in_process_context(scsi_device_dev_release _usercontext)
> [execute immediately because it's in process context]
> scsi_device_dev_release_usercontext
> scsi_free_queue
> blk_cleanup_queue
> blk_sync_queue
> (wait for blk_delay_work to complete...)
>
> James, is my understanding correct?
>
> If so, isn't it possible to move the scsi_free_queue back to
> the original place and solve the deadlock instead by
> avoiding the wait in the same context?

Actually, Tejun has posted a patch to replace
execute_in_process_context() with queue_work()
and asking your review:

[PATCH RESEND] scsi: don't use execute_in_process_context()
https://lkml.org/lkml/2011/4/30/87

Do you think you can take the patch and revert the move
of scsi_free_queue()?

Thanks,
--
Jun'ichi Nomura, NEC Corporation

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-31-2011, 07:50 PM
Thadeu Lima de Souza Cascardo
 
Default Oops when SCSI device under multipath is removed

On Thu, Aug 18, 2011 at 06:11:19PM +0900, Jun'ichi Nomura wrote:
> Hi James,
>
> On 08/16/11 20:26, Jun'ichi Nomura wrote:
> > The commit log of 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b
> > ("[SCSI] put stricter guards on queue dead checks") does not
> > explain about the move of scsi_free_queue().
> >
> > But according to the discussion below, it seems
> > the move was motivated to solve the following self-deadlock:
> > https://lkml.org/lkml/2011/4/12/9
> >
> > [in the context of kblockd_workqueue]
> > blk_delay_work
> > __blk_run_queue
> > scsi_request_fn
> > put_device
> > (puts final sdev refcount)
> > scsi_device_dev_release
> > execute_in_process_context(scsi_device_dev_release _usercontext)
> > [execute immediately because it's in process context]
> > scsi_device_dev_release_usercontext
> > scsi_free_queue
> > blk_cleanup_queue
> > blk_sync_queue
> > (wait for blk_delay_work to complete...)
> >
> > James, is my understanding correct?
> >
> > If so, isn't it possible to move the scsi_free_queue back to
> > the original place and solve the deadlock instead by
> > avoiding the wait in the same context?
>
> Actually, Tejun has posted a patch to replace
> execute_in_process_context() with queue_work()
> and asking your review:
>
> [PATCH RESEND] scsi: don't use execute_in_process_context()
> https://lkml.org/lkml/2011/4/30/87
>
> Do you think you can take the patch and revert the move
> of scsi_free_queue()?
>
> Thanks,
> --
> Jun'ichi Nomura, NEC Corporation
> --

I've tested with your suggestion (reverting the move of scsi_free_queue)
and it works like a charm. I did not get any oops after that. I tested
with a multipath setup on top of two iscsi targets. Using dd after
logging out of some of one of the iscsi targets would trigger the oops.
With this patch, it could not be triggered anymore.

Best regards,
Cascardo.

--

diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
index e0bd3f7..a6eb6f1 100644
--- a/drivers/scsi/scsi_sysfs.c
+++ b/drivers/scsi/scsi_sysfs.c
@@ -322,7 +322,11 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)
kfree(evt);
}

+ /* Freeing the queue signals to block that we're done */
+ scsi_free_queue(sdev->request_queue);
+
blk_put_queue(sdev->request_queue);
+
/* NULL queue means the device can't be used */
sdev->request_queue = NULL;

@@ -936,8 +940,6 @@ void __scsi_remove_device(struct scsi_device *sdev)
/* cause the request function to reject all I/O requests */
sdev->request_queue->queuedata = NULL;

- /* Freeing the queue signals to block that we're done */
- scsi_free_queue(sdev->request_queue);
put_device(dev);
}

---

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 10:26 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org