FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 09-05-2012, 03:57 AM
Dave Chinner
 
Default block: Avoid deadlocks with bio allocation by stacking drivers

On Tue, Sep 04, 2012 at 11:26:33AM -0700, Tejun Heo wrote:
> Hello,
>
> On Tue, Sep 04, 2012 at 09:54:23AM -0400, Vivek Goyal wrote:
> > > Given that we are working around stack depth issues in the
> > > filesystems already in several places, and now it seems like there's
> > > a reason to work around it in the block layers as well, shouldn't we
> > > simply increase the default stack size rather than introduce
> > > complexity and performance regressions to try and work around not
> > > having enough stack?
> >
> > Dave,
> >
> > In this particular instance, we really don't have any bug reports of
> > stack overflowing. Just discussing what will happen if we make
> > generic_make_request() recursive again.
>
> I think there was one and that's why we added the bio_list thing.

There was more than one - it was a regular enough to be considered a
feature... :/

> > > I mean, we can deal with it like the ia32 4k stack issue was dealt
> > > with (i.e. ignore those stupid XFS people, that's an XFS bug), or
> > > we can face the reality that storage stacks have become so complex
> > > that 8k is no longer a big enough stack for a modern system....
> >
> > So first question will be, what's the right stack size? If we make
> > generic_make_request() recursive, then at some storage stack depth we will
> > overflow stack anyway (if we have created too deep a stack). Hence
> > keeping current logic kind of makes sense as in theory we can support
> > arbitrary depth of storage stack.
>
> But, yeah, this can't be solved by enlarging the stack size. The
> upper limit is unbound.

Sure, but recursion issue is isolated to the block layer.

If we can still submit IO directly through the block layer without
pushing it off to a work queue, then the overall stack usage problem
still exists. But if the block layer always pushes the IO off into
another workqueue to avoid stack overflows, then the context
switches are going to cause significant performance regressions for
high IOPS workloads. I don't really like either situation.

So while you are discussing stack issues, think a little about the
bigger picture outside of the immediate issue at hand - a better
solution for everyone might pop up....

Cheers,

Dave.
--
Dave Chinner
david@fromorbit.com

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-05-2012, 04:37 AM
Tejun Heo
 
Default block: Avoid deadlocks with bio allocation by stacking drivers

Hello, Dave.

On Wed, Sep 05, 2012 at 01:57:59PM +1000, Dave Chinner wrote:
> > But, yeah, this can't be solved by enlarging the stack size. The
> > upper limit is unbound.
>
> Sure, but recursion issue is isolated to the block layer.
>
> If we can still submit IO directly through the block layer without
> pushing it off to a work queue, then the overall stack usage problem
> still exists. But if the block layer always pushes the IO off into
> another workqueue to avoid stack overflows, then the context
> switches are going to cause significant performance regressions for
> high IOPS workloads. I don't really like either situation.

Kent's proposed solution doesn't do that. The rescuer work item is
used iff mempool allocation fails w/o GFP_WAIT. IOW, we're already
under severe memory pressure and stalls are expected all around the
kernel (somehow this sounds festive...) It doesn't alter the
breadth-first walk of bio decomposition and shouldn't degrade
performance in any noticeable way.

> So while you are discussing stack issues, think a little about the
> bigger picture outside of the immediate issue at hand - a better
> solution for everyone might pop up....

It's probably because I haven't been bitten much from stack overflow
but I'd like to keep thinking that stack overflows are extremely
unusual and the ones causing them are the bad ones. Thank you very
much.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-08-2012, 07:36 PM
Tejun Heo
 
Default block: Avoid deadlocks with bio allocation by stacking drivers

(Restoring cc list from the previous discussion. Please retain the cc
list of the people who discussed in the previous postings.)

On Fri, Sep 07, 2012 at 03:12:53PM -0700, Kent Overstreet wrote:
> But this is tricky and not a generic solution. This patch solves it for
> all users by inverting the previously described technique. We allocate a
> rescuer workqueue for each bio_set, and then in the allocation code if
> there are bios on current->bio_list we would be blocking, we punt them
> to the rescuer workqueue to be submitted.

It would be great if this explanation can be expanded a bit. Why does
it make the deadlock condition go away? What are the restrictions -
e.g. using other mempools for additional per-bio data structure
wouldn't work, right?

> +static void punt_bios_to_rescuer(struct bio_set *bs)
> +{
> + struct bio_list punt, nopunt;
> + struct bio *bio;
> +
> + bio_list_init(&punt);
> + bio_list_init(&nopunt);
> +
> + while ((bio = bio_list_pop(current->bio_list)))
> + bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio);
> +
> + *current->bio_list = nopunt;

Why this is necessary needs explanation and it's done in rather
unusual way. I suppose the weirdness is from bio_list API
restriction?

> + spin_lock(&bs->rescue_lock);
> + bio_list_merge(&bs->rescue_list, &punt);
> + spin_unlock(&bs->rescue_lock);
> +
> + queue_work(bs->rescue_workqueue, &bs->rescue_work);
> +}
> +
> /**
> * bio_alloc_bioset - allocate a bio for I/O
> * @gfp_mask: the GFP_ mask given to the slab allocator
> @@ -317,6 +354,7 @@ EXPORT_SYMBOL(bio_reset);
> */
> struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> {
> + gfp_t saved_gfp = gfp_mask;
> unsigned front_pad;
> unsigned inline_vecs;
> unsigned long idx = BIO_POOL_NONE;
> @@ -334,13 +372,37 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> front_pad = 0;
> inline_vecs = nr_iovecs;
> } else {
> + /*
> + * generic_make_request() converts recursion to iteration; this
> + * means if we're running beneath it, any bios we allocate and
> + * submit will not be submitted (and thus freed) until after we
> + * return.
> + *
> + * This exposes us to a potential deadlock if we allocate
> + * multiple bios from the same bio_set() while running
> + * underneath generic_make_request(). If we were to allocate
> + * multiple bios (say a stacking block driver that was splitting
> + * bios), we would deadlock if we exhausted the mempool's
> + * reserve.
> + *
> + * We solve this, and guarantee forward progress, with a rescuer
> + * workqueue per bio_set. If we go to allocate and there are
> + * bios on current->bio_list, we first try the allocation
> + * without __GFP_WAIT; if that fails, we punt those bios we
> + * would be blocking to the rescuer workqueue before we retry
> + * with the original gfp_flags.
> + */
> +
> + if (current->bio_list && !bio_list_empty(current->bio_list))
> + gfp_mask &= ~__GFP_WAIT;
> +retry:
> p = mempool_alloc(bs->bio_pool, gfp_mask);
> front_pad = bs->front_pad;
> inline_vecs = BIO_INLINE_VECS;
> }

Wouldn't the following be better?

p = mempool_alloc(bs->bi_pool, gfp_mask);
if (unlikely(!p) && gfp_mask != saved_gfp) {
punt_bios_to_rescuer(bs);
p = mempool_alloc(bs->bi_pool, saved_gfp);
}

I really hope the usage restriction (don't mix with mempool) for
bioset is clearly documented somewhere appropriate.

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-10-2012, 12:28 AM
Kent Overstreet
 
Default block: Avoid deadlocks with bio allocation by stacking drivers

On Sat, Sep 08, 2012 at 12:36:41PM -0700, Tejun Heo wrote:
> (Restoring cc list from the previous discussion. Please retain the cc
> list of the people who discussed in the previous postings.)
>
> On Fri, Sep 07, 2012 at 03:12:53PM -0700, Kent Overstreet wrote:
> > But this is tricky and not a generic solution. This patch solves it for
> > all users by inverting the previously described technique. We allocate a
> > rescuer workqueue for each bio_set, and then in the allocation code if
> > there are bios on current->bio_list we would be blocking, we punt them
> > to the rescuer workqueue to be submitted.
>
> It would be great if this explanation can be expanded a bit. Why does
> it make the deadlock condition go away? What are the restrictions -
> e.g. using other mempools for additional per-bio data structure
> wouldn't work, right?

Ok, I'll do that. New patch below.

> > +static void punt_bios_to_rescuer(struct bio_set *bs)
> > +{
> > + struct bio_list punt, nopunt;
> > + struct bio *bio;
> > +
> > + bio_list_init(&punt);
> > + bio_list_init(&nopunt);
> > +
> > + while ((bio = bio_list_pop(current->bio_list)))
> > + bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio);
> > +
> > + *current->bio_list = nopunt;
>
> Why this is necessary needs explanation and it's done in rather
> unusual way. I suppose the weirdness is from bio_list API
> restriction?

It's because bio_lists are singly linked, so deleting an entry from the
middle of the list would be a real pain - just much cleaner/simpler to
do it this way.

> > + spin_lock(&bs->rescue_lock);
> > + bio_list_merge(&bs->rescue_list, &punt);
> > + spin_unlock(&bs->rescue_lock);
> > +
> > + queue_work(bs->rescue_workqueue, &bs->rescue_work);
> > +}
> > +
> > /**
> > * bio_alloc_bioset - allocate a bio for I/O
> > * @gfp_mask: the GFP_ mask given to the slab allocator
> > @@ -317,6 +354,7 @@ EXPORT_SYMBOL(bio_reset);
> > */
> > struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> > {
> > + gfp_t saved_gfp = gfp_mask;
> > unsigned front_pad;
> > unsigned inline_vecs;
> > unsigned long idx = BIO_POOL_NONE;
> > @@ -334,13 +372,37 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
> > front_pad = 0;
> > inline_vecs = nr_iovecs;
> > } else {
> > + /*
> > + * generic_make_request() converts recursion to iteration; this
> > + * means if we're running beneath it, any bios we allocate and
> > + * submit will not be submitted (and thus freed) until after we
> > + * return.
> > + *
> > + * This exposes us to a potential deadlock if we allocate
> > + * multiple bios from the same bio_set() while running
> > + * underneath generic_make_request(). If we were to allocate
> > + * multiple bios (say a stacking block driver that was splitting
> > + * bios), we would deadlock if we exhausted the mempool's
> > + * reserve.
> > + *
> > + * We solve this, and guarantee forward progress, with a rescuer
> > + * workqueue per bio_set. If we go to allocate and there are
> > + * bios on current->bio_list, we first try the allocation
> > + * without __GFP_WAIT; if that fails, we punt those bios we
> > + * would be blocking to the rescuer workqueue before we retry
> > + * with the original gfp_flags.
> > + */
> > +
> > + if (current->bio_list && !bio_list_empty(current->bio_list))
> > + gfp_mask &= ~__GFP_WAIT;
> > +retry:
> > p = mempool_alloc(bs->bio_pool, gfp_mask);
> > front_pad = bs->front_pad;
> > inline_vecs = BIO_INLINE_VECS;
> > }
>
> Wouldn't the following be better?
>
> p = mempool_alloc(bs->bi_pool, gfp_mask);
> if (unlikely(!p) && gfp_mask != saved_gfp) {
> punt_bios_to_rescuer(bs);
> p = mempool_alloc(bs->bi_pool, saved_gfp);
> }

That'd require duplicating the error handling in two different places -
once for the initial allocation, once for the bvec allocation. And I
really hate that writing code that does

alloc_something()
if (fail) {
alloc_something_again()
}

it just screams ugly to me.

> I really hope the usage restriction (don't mix with mempool) for
> bioset is clearly documented somewhere appropriate.

Good point, I'm adding it to bio_alloc_bioset's documentation.


commit 4edb21e0b749fc098c72edcb4f9abdeca6fc62cd
Author: Kent Overstreet <koverstreet@google.com>
Date: Sun Sep 9 17:23:29 2012 -0700

block: Avoid deadlocks with bio allocation by stacking drivers

Previously, if we ever try to allocate more than once from the same bio
set while running under generic_make_request() (i.e. a stacking block
driver), we risk deadlock.

This is because of the code in generic_make_request() that converts
recursion to iteration; any bios we submit won't actually be submitted
(so they can complete and eventually be freed) until after we return -
this means if we allocate a second bio, we're blocking the first one
from ever being freed.

Thus if enough threads call into a stacking block driver at the same
time with bios that need multiple splits, and the bio_set's reserve gets
used up, we deadlock.

This can be worked around in the driver code - we could check if we're
running under generic_make_request(), then mask out __GFP_WAIT when we
go to allocate a bio, and if the allocation fails punt to workqueue and
retry the allocation.

But this is tricky and not a generic solution. This patch solves it for
all users by inverting the previously described technique. We allocate a
rescuer workqueue for each bio_set, and then in the allocation code if
there are bios on current->bio_list we would be blocking, we punt them
to the rescuer workqueue to be submitted.

This guarantees forward progress for bio allocations under
generic_make_request() provided each bio is submitted before allocating
the next, and provided the bios are freed after they complete.

Note that this doesn't do anything for allocation from other mempools.
Instead of allocating per bio data structures from a mempool, code
should use bio_set's front_pad.

Tested it by forcing the rescue codepath to be taken (by disabling the
first GFP_NOWAIT) attempt, and then ran it with bcache (which does a lot
of arbitrary bio splitting) and verified that the rescuer was being
invoked.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>

diff --git a/fs/bio.c b/fs/bio.c
index 13e9567..fa07e1f 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -295,6 +295,53 @@ void bio_reset(struct bio *bio)
}
EXPORT_SYMBOL(bio_reset);

+static void bio_alloc_rescue(struct work_struct *work)
+{
+ struct bio_set *bs = container_of(work, struct bio_set, rescue_work);
+ struct bio *bio;
+
+ while (1) {
+ spin_lock(&bs->rescue_lock);
+ bio = bio_list_pop(&bs->rescue_list);
+ spin_unlock(&bs->rescue_lock);
+
+ if (!bio)
+ break;
+
+ generic_make_request(bio);
+ }
+}
+
+static void punt_bios_to_rescuer(struct bio_set *bs)
+{
+ struct bio_list punt, nopunt;
+ struct bio *bio;
+
+ /*
+ * Don't want to punt all bios on current->bio_list; if there was a bio
+ * on there for a stacking driver higher up in the stack, processing it
+ * could require allocating bios from this bio_set, and we don't want to
+ * do that from our own rescuer.
+ *
+ * Since bio lists are singly linked, pop them all instead of trying to
+ * remove from the middle of the list:
+ */
+
+ bio_list_init(&punt);
+ bio_list_init(&nopunt);
+
+ while ((bio = bio_list_pop(current->bio_list)))
+ bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio);
+
+ *current->bio_list = nopunt;
+
+ spin_lock(&bs->rescue_lock);
+ bio_list_merge(&bs->rescue_list, &punt);
+ spin_unlock(&bs->rescue_lock);
+
+ queue_work(bs->rescue_workqueue, &bs->rescue_work);
+}
+
/**
* bio_alloc_bioset - allocate a bio for I/O
* @gfp_mask: the GFP_ mask given to the slab allocator
@@ -312,11 +359,27 @@ EXPORT_SYMBOL(bio_reset);
* previously allocated bio for IO before attempting to allocate a new one.
* Failure to do so can cause deadlocks under memory pressure.
*
+ * Note that when running under generic_make_request() (i.e. any block
+ * driver), bios are not submitted until after you return - see the code in
+ * generic_make_request() that converts recursion into iteration, to prevent
+ * stack overflows.
+ *
+ * This would normally mean allocating multiple bios under
+ * generic_make_request() would be susceptible to deadlocks, but we have
+ * deadlock avoidance code that resubmits any blocked bios from a rescuer
+ * thread.
+ *
+ * However, we do not guarantee forward progress for allocations from other
+ * mempools. Doing multiple allocations from the same mempool under
+ * generic_make_request() should be avoided - instead, use bio_set's front_pad
+ * for per bio allocations.
+ *
* RETURNS:
* Pointer to new bio on success, NULL on failure.
*/
struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
{
+ gfp_t saved_gfp = gfp_mask;
unsigned front_pad;
unsigned inline_vecs;
unsigned long idx = BIO_POOL_NONE;
@@ -334,13 +397,37 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)
front_pad = 0;
inline_vecs = nr_iovecs;
} else {
+ /*
+ * generic_make_request() converts recursion to iteration; this
+ * means if we're running beneath it, any bios we allocate and
+ * submit will not be submitted (and thus freed) until after we
+ * return.
+ *
+ * This exposes us to a potential deadlock if we allocate
+ * multiple bios from the same bio_set() while running
+ * underneath generic_make_request(). If we were to allocate
+ * multiple bios (say a stacking block driver that was splitting
+ * bios), we would deadlock if we exhausted the mempool's
+ * reserve.
+ *
+ * We solve this, and guarantee forward progress, with a rescuer
+ * workqueue per bio_set. If we go to allocate and there are
+ * bios on current->bio_list, we first try the allocation
+ * without __GFP_WAIT; if that fails, we punt those bios we
+ * would be blocking to the rescuer workqueue before we retry
+ * with the original gfp_flags.
+ */
+
+ if (current->bio_list && !bio_list_empty(current->bio_list))
+ gfp_mask &= ~__GFP_WAIT;
+retry:
p = mempool_alloc(bs->bio_pool, gfp_mask);
front_pad = bs->front_pad;
inline_vecs = BIO_INLINE_VECS;
}

if (unlikely(!p))
- return NULL;
+ goto err;

bio = p + front_pad;
bio_init(bio);
@@ -361,6 +448,13 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, int nr_iovecs, struct bio_set *bs)

err_free:
mempool_free(p, bs->bio_pool);
+err:
+ if (gfp_mask != saved_gfp) {
+ punt_bios_to_rescuer(bs);
+ gfp_mask = saved_gfp;
+ goto retry;
+ }
+
return NULL;
}
EXPORT_SYMBOL(bio_alloc_bioset);
@@ -1570,6 +1664,9 @@ static void biovec_free_pools(struct bio_set *bs)

void bioset_free(struct bio_set *bs)
{
+ if (bs->rescue_workqueue)
+ destroy_workqueue(bs->rescue_workqueue);
+
if (bs->bio_pool)
mempool_destroy(bs->bio_pool);

@@ -1605,6 +1702,10 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)

bs->front_pad = front_pad;

+ spin_lock_init(&bs->rescue_lock);
+ bio_list_init(&bs->rescue_list);
+ INIT_WORK(&bs->rescue_work, bio_alloc_rescue);
+
bs->bio_slab = bio_find_or_create_slab(front_pad + back_pad);
if (!bs->bio_slab) {
kfree(bs);
@@ -1615,9 +1716,14 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
if (!bs->bio_pool)
goto bad;

- if (!biovec_create_pools(bs, pool_size))
- return bs;
+ if (biovec_create_pools(bs, pool_size))
+ goto bad;
+
+ bs->rescue_workqueue = alloc_workqueue("bioset", WQ_MEM_RECLAIM, 0);
+ if (!bs->rescue_workqueue)
+ goto bad;

+ return bs;
bad:
bioset_free(bs);
return NULL;
diff --git a/include/linux/bio.h b/include/linux/bio.h
index a2bfe3a..c32ea0d 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -491,6 +491,15 @@ struct bio_set {
mempool_t *bio_integrity_pool;
#endif
mempool_t *bvec_pool;
+
+ /*
+ * Deadlock avoidance for stacking block drivers: see comments in
+ * bio_alloc_bioset() for details
+ */
+ spinlock_t rescue_lock;
+ struct bio_list rescue_list;
+ struct work_struct rescue_work;
+ struct workqueue_struct *rescue_workqueue;
};

struct biovec_slab {

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-10-2012, 03:25 PM
Vivek Goyal
 
Default block: Avoid deadlocks with bio allocation by stacking drivers

On Sun, Sep 09, 2012 at 05:28:10PM -0700, Kent Overstreet wrote:

[..]
> > > +retry:
> > > p = mempool_alloc(bs->bio_pool, gfp_mask);
> > > front_pad = bs->front_pad;
> > > inline_vecs = BIO_INLINE_VECS;
> > > }
> >
> > Wouldn't the following be better?
> >
> > p = mempool_alloc(bs->bi_pool, gfp_mask);
> > if (unlikely(!p) && gfp_mask != saved_gfp) {
> > punt_bios_to_rescuer(bs);
> > p = mempool_alloc(bs->bi_pool, saved_gfp);
> > }
>
> That'd require duplicating the error handling in two different places -
> once for the initial allocation, once for the bvec allocation. And I
> really hate that writing code that does
>
> alloc_something()
> if (fail) {
> alloc_something_again()
> }
>
> it just screams ugly to me.

Personally I kind of like Tejun's suggestion. Yes, it means that you
will have to do it twice (once for bio and once for bvecs). But at
the same time, if bvec allocation fails, then you don't have to free
already allocated bio and retry bio allocation again. Current code
is doing that and I am not sure why should we free allocated bio and
retry again in case of bvec allocation failure.

Secondly current code is doing following.

if (unlikely(!bvl))
goto err_free;

err_free:
mempool_free(p, bs->bio_pool);
err:
retry_allocation.

Not sure but err_free kind of gives impression that free up whatever
is allocated path, and return NULL. Instead of we end up retrying
allocation.

Implementing Tejun's idea atleast keeps error and retry code separate.
I am not too particular about it. Just a thought. Though I do want to
know why to free up already allocated bio and retry in case of bvec
allocation failure.

Thanks
Vivek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-10-2012, 05:22 PM
Tejun Heo
 
Default block: Avoid deadlocks with bio allocation by stacking drivers

Hello, Kent.

On Sun, Sep 09, 2012 at 05:28:10PM -0700, Kent Overstreet wrote:
> > > + while ((bio = bio_list_pop(current->bio_list)))
> > > + bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio);
> > > +
> > > + *current->bio_list = nopunt;
> >
> > Why this is necessary needs explanation and it's done in rather
> > unusual way. I suppose the weirdness is from bio_list API
> > restriction?
>
> It's because bio_lists are singly linked, so deleting an entry from the
> middle of the list would be a real pain - just much cleaner/simpler to
> do it this way.

Yeah, I wonder how benefical that singly linked list is. Eh well...

> > Wouldn't the following be better?
> >
> > p = mempool_alloc(bs->bi_pool, gfp_mask);
> > if (unlikely(!p) && gfp_mask != saved_gfp) {
> > punt_bios_to_rescuer(bs);
> > p = mempool_alloc(bs->bi_pool, saved_gfp);
> > }
>
> That'd require duplicating the error handling in two different places -
> once for the initial allocation, once for the bvec allocation. And I
> really hate that writing code that does
>
> alloc_something()
> if (fail) {
> alloc_something_again()
> }
>
> it just screams ugly to me.

I don't know. That at least represents what's going on and goto'ing
back and forth is hardly pretty. Sometimes the code gets much uglier
/ unwieldy and we have to live with gotos. Here, that doesn't seem to
be the case.

> +static void punt_bios_to_rescuer(struct bio_set *bs)
> +{
> + struct bio_list punt, nopunt;
> + struct bio *bio;
> +
> + /*
> + * Don't want to punt all bios on current->bio_list; if there was a bio
> + * on there for a stacking driver higher up in the stack, processing it
> + * could require allocating bios from this bio_set, and we don't want to
> + * do that from our own rescuer.

Hmmm... isn't it more like we "must" process only the bios which are
from this bio_set to have any kind of forward-progress guarantee? The
above sounds like it's just something undesirable.

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-10-2012, 08:24 PM
Kent Overstreet
 
Default block: Avoid deadlocks with bio allocation by stacking drivers

On Mon, Sep 10, 2012 at 10:22:10AM -0700, Tejun Heo wrote:
> Hello, Kent.
>
> On Sun, Sep 09, 2012 at 05:28:10PM -0700, Kent Overstreet wrote:
> > > > + while ((bio = bio_list_pop(current->bio_list)))
> > > > + bio_list_add(bio->bi_pool == bs ? &punt : &nopunt, bio);
> > > > +
> > > > + *current->bio_list = nopunt;
> > >
> > > Why this is necessary needs explanation and it's done in rather
> > > unusual way. I suppose the weirdness is from bio_list API
> > > restriction?
> >
> > It's because bio_lists are singly linked, so deleting an entry from the
> > middle of the list would be a real pain - just much cleaner/simpler to
> > do it this way.
>
> Yeah, I wonder how benefical that singly linked list is. Eh well...

Well, this is the first time I can think of that it's come up, and IMO
this is no less clean a way of writing it... just a bit unusual in C,
it feels more functional to me instead of imperative.

> > > Wouldn't the following be better?
> > >
> > > p = mempool_alloc(bs->bi_pool, gfp_mask);
> > > if (unlikely(!p) && gfp_mask != saved_gfp) {
> > > punt_bios_to_rescuer(bs);
> > > p = mempool_alloc(bs->bi_pool, saved_gfp);
> > > }
> >
> > That'd require duplicating the error handling in two different places -
> > once for the initial allocation, once for the bvec allocation. And I
> > really hate that writing code that does
> >
> > alloc_something()
> > if (fail) {
> > alloc_something_again()
> > }
> >
> > it just screams ugly to me.
>
> I don't know. That at least represents what's going on and goto'ing
> back and forth is hardly pretty. Sometimes the code gets much uglier
> / unwieldy and we have to live with gotos. Here, that doesn't seem to
> be the case.

I think this is really more personal preference than anything, but:

Setting gfp_mask = saved_gfp after calling punt_bio_to_rescuer() is
really the correct thing to do, and makes the code clearer IMO: once
we've run punt_bio_to_rescuer() we don't need to mask out GFP_WAIT (not
until the next time a bio is submitted, really).

This matters a bit for the bvl allocation too, if we call
punt_bio_to_rescuer() for the bio allocation no point doing it again.

So to be rigorously correct, your way would have to be

p = mempool_alloc(bs->bio_pool, gfp_mask);
if (!p && gfp_mask != saved_gfp) {
punt_bios_to_rescuer(bs);
gfp_mask = saved_gfp;
p = mempool_alloc(bs->bio_pool, gfp_mask);
}

And at that point, why duplicate that line of code? It doesn't matter that
much, but IMO a goto retry better labels what's actually going on (it's
something that's not uncommon in the kernel and if I see a retry label
in a function I pretty immediately have an idea of what's going on).

So we could do

retry:
p = mempool_alloc(bs->bio_pool, gfp_mask);
if (!p && gfp_mask != saved_gfp) {
punt_bios_to_rescuer(bs);
gfp_mask = saved_gfp;
goto retry;
}

(side note: not that it really matters here, but gcc will inline the
bvec_alloc_bs() call if it's not duplicated, I've never seen it
consolidate duplicated code and /then/ inline based off that)

This does have the advantage that we're not freeing and reallocating the
bio like Vivek pointed out, but I'm not a huge fan of having the
punting/retry logic in the main code path.

I don't care that much though. I'd prefer not to have the actual
allocations duplicated, but it's starting to feel like bikeshedding to
me.

> > +static void punt_bios_to_rescuer(struct bio_set *bs)
> > +{
> > + struct bio_list punt, nopunt;
> > + struct bio *bio;
> > +
> > + /*
> > + * Don't want to punt all bios on current->bio_list; if there was a bio
> > + * on there for a stacking driver higher up in the stack, processing it
> > + * could require allocating bios from this bio_set, and we don't want to
> > + * do that from our own rescuer.
>
> Hmmm... isn't it more like we "must" process only the bios which are
> from this bio_set to have any kind of forward-progress guarantee? The
> above sounds like it's just something undesirable.

Yeah, that'd be better, I'll change it.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-10-2012, 08:40 PM
Tejun Heo
 
Default block: Avoid deadlocks with bio allocation by stacking drivers

Hello, Kent.

On Mon, Sep 10, 2012 at 01:24:35PM -0700, Kent Overstreet wrote:
> And at that point, why duplicate that line of code? It doesn't matter that
> much, but IMO a goto retry better labels what's actually going on (it's
> something that's not uncommon in the kernel and if I see a retry label
> in a function I pretty immediately have an idea of what's going on).
>
> So we could do
>
> retry:
> p = mempool_alloc(bs->bio_pool, gfp_mask);
> if (!p && gfp_mask != saved_gfp) {
> punt_bios_to_rescuer(bs);
> gfp_mask = saved_gfp;
> goto retry;
> }

Yes, we do retry loops if that makes the code simpler. Doing that to
save one extra alloc call, I don't think so.

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-10-2012, 09:33 PM
Kent Overstreet
 
Default block: Avoid deadlocks with bio allocation by stacking drivers

On Mon, Sep 10, 2012 at 01:40:10PM -0700, Tejun Heo wrote:
> Hello, Kent.
>
> On Mon, Sep 10, 2012 at 01:24:35PM -0700, Kent Overstreet wrote:
> > And at that point, why duplicate that line of code? It doesn't matter that
> > much, but IMO a goto retry better labels what's actually going on (it's
> > something that's not uncommon in the kernel and if I see a retry label
> > in a function I pretty immediately have an idea of what's going on).
> >
> > So we could do
> >
> > retry:
> > p = mempool_alloc(bs->bio_pool, gfp_mask);
> > if (!p && gfp_mask != saved_gfp) {
> > punt_bios_to_rescuer(bs);
> > gfp_mask = saved_gfp;
> > goto retry;
> > }
>
> Yes, we do retry loops if that makes the code simpler. Doing that to
> save one extra alloc call, I don't think so.

"Simpler" isn't really an objective thing though. To me the goto version
is more obvious/idiomatic.

Eh. I'll do it your way, but consider this a formal objection

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-10-2012, 09:37 PM
Tejun Heo
 
Default block: Avoid deadlocks with bio allocation by stacking drivers

On Mon, Sep 10, 2012 at 02:33:49PM -0700, Kent Overstreet wrote:
> "Simpler" isn't really an objective thing though. To me the goto version
> is more obvious/idiomatic.
>
> Eh. I'll do it your way, but consider this a formal objection

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 11:24 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org