Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Device-mapper Development (http://www.linux-archive.org/device-mapper-development/)
-   -   block: Introduce new bio_split() (http://www.linux-archive.org/device-mapper-development/687052-block-introduce-new-bio_split.html)

Kent Overstreet 07-24-2012 08:11 PM

block: Introduce new bio_split()
 
The new bio_split() can split arbitrary bios - it's not restricted to
single page bios, like the old bio_split() (previously renamed to
bio_pair_split()). It also has different semantics - it doesn't allocate
a struct bio_pair, leaving it up to the caller to handle completions.

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
fs/bio.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++
1 files changed, 99 insertions(+), 0 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 5d02aa5..a15e121 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1539,6 +1539,105 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
EXPORT_SYMBOL(bio_pair_split);

/**
+ * bio_split - split a bio
+ * @bio: bio to split
+ * @sectors: number of sectors to split from the front of @bio
+ * @gfp: gfp mask
+ * @bs: bio set to allocate from
+ *
+ * Allocates and returns a new bio which represents @sectors from the start of
+ * @bio, and updates @bio to represent the remaining sectors.
+ *
+ * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
+ * unchanged.
+ *
+ * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
+ * bvec boundry; it is the caller's responsibility to ensure that @bio is not
+ * freed before the split.
+ *
+ * If bio_split() is running under generic_make_request(), it's not safe to
+ * allocate more than one bio from the same bio set. Therefore, if it is running
+ * under generic_make_request() it masks out __GFP_WAIT when doing the
+ * allocation. The caller must check for failure if there's any possibility of
+ * it being called from under generic_make_request(); it is then the caller's
+ * responsibility to retry from a safe context (by e.g. punting to workqueue).
+ */
+struct bio *bio_split(struct bio *bio, int sectors,
+ gfp_t gfp, struct bio_set *bs)
+{
+ unsigned idx, vcnt = 0, nbytes = sectors << 9;
+ struct bio_vec *bv;
+ struct bio *ret = NULL;
+
+ BUG_ON(sectors <= 0);
+
+ /*
+ * If we're being called from underneath generic_make_request() and we
+ * already allocated any bios from this bio set, we risk deadlock if we
+ * use the mempool. So instead, we possibly fail and let the caller punt
+ * to workqueue or somesuch and retry in a safe context.
+ */
+ if (current->bio_list)
+ gfp &= ~__GFP_WAIT;
+
+ if (sectors >= bio_sectors(bio))
+ return bio;
+
+ trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
+ bio->bi_sector + sectors);
+
+ bio_for_each_segment(bv, bio, idx) {
+ vcnt = idx - bio->bi_idx;
+
+ if (!nbytes) {
+ ret = bio_alloc_bioset(gfp, 0, bs);
+ if (!ret)
+ return NULL;
+
+ ret->bi_io_vec = bio_iovec(bio);
+ ret->bi_flags |= 1 << BIO_CLONED;
+ break;
+ } else if (nbytes < bv->bv_len) {
+ ret = bio_alloc_bioset(gfp, ++vcnt, bs);
+ if (!ret)
+ return NULL;
+
+ memcpy(ret->bi_io_vec, bio_iovec(bio),
+ sizeof(struct bio_vec) * vcnt);
+
+ ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
+ bv->bv_offset += nbytes;
+ bv->bv_len -= nbytes;
+ break;
+ }
+
+ nbytes -= bv->bv_len;
+ }
+
+ ret->bi_bdev = bio->bi_bdev;
+ ret->bi_sector = bio->bi_sector;
+ ret->bi_size = sectors << 9;
+ ret->bi_rw = bio->bi_rw;
+ ret->bi_vcnt = vcnt;
+ ret->bi_max_vecs = vcnt;
+ ret->bi_end_io = bio->bi_end_io;
+ ret->bi_private = bio->bi_private;
+
+ bio->bi_sector += sectors;
+ bio->bi_size -= sectors << 9;
+ bio->bi_idx = idx;
+
+ if (bio_integrity(bio)) {
+ bio_integrity_clone(ret, bio, gfp, bs);
+ bio_integrity_trim(ret, 0, bio_sectors(ret));
+ bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(bio_split);
+
+/**
* bio_sector_offset - Find hardware sector offset in bio
* @bio: bio to inspect
* @index: bio_vec index
--
1.7.7.3

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Boaz Harrosh 07-25-2012 11:55 AM

block: Introduce new bio_split()
 
On 07/24/2012 11:11 PM, Kent Overstreet wrote:

> The new bio_split() can split arbitrary bios - it's not restricted to
> single page bios, like the old bio_split() (previously renamed to
> bio_pair_split()). It also has different semantics - it doesn't allocate
> a struct bio_pair, leaving it up to the caller to handle completions.
>
> Signed-off-by: Kent Overstreet <koverstreet@google.com>
> ---
> fs/bio.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++
> 1 files changed, 99 insertions(+), 0 deletions(-)
>
> diff --git a/fs/bio.c b/fs/bio.c
> index 5d02aa5..a15e121 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -1539,6 +1539,105 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
> EXPORT_SYMBOL(bio_pair_split);
>
> /**
> + * bio_split - split a bio
> + * @bio: bio to split
> + * @sectors: number of sectors to split from the front of @bio
> + * @gfp: gfp mask
> + * @bs: bio set to allocate from
> + *
> + * Allocates and returns a new bio which represents @sectors from the start of
> + * @bio, and updates @bio to represent the remaining sectors.
> + *
> + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
> + * unchanged.
> + *
> + * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
> + * bvec boundry; it is the caller's responsibility to ensure that @bio is not
> + * freed before the split.
> + *
> + * If bio_split() is running under generic_make_request(), it's not safe to
> + * allocate more than one bio from the same bio set. Therefore, if it is running
> + * under generic_make_request() it masks out __GFP_WAIT when doing the
> + * allocation. The caller must check for failure if there's any possibility of
> + * it being called from under generic_make_request(); it is then the caller's
> + * responsibility to retry from a safe context (by e.g. punting to workqueue).
> + */
> +struct bio *bio_split(struct bio *bio, int sectors,
> + gfp_t gfp, struct bio_set *bs)
> +{
> + unsigned idx, vcnt = 0, nbytes = sectors << 9;
> + struct bio_vec *bv;
> + struct bio *ret = NULL;
> +
> + BUG_ON(sectors <= 0);
> +
> + /*
> + * If we're being called from underneath generic_make_request() and we
> + * already allocated any bios from this bio set, we risk deadlock if we
> + * use the mempool. So instead, we possibly fail and let the caller punt
> + * to workqueue or somesuch and retry in a safe context.
> + */
> + if (current->bio_list)
> + gfp &= ~__GFP_WAIT;


NACK!

If as you said above in the comment:
if there's any possibility of it being called from under generic_make_request();
it is then the caller's responsibility to ...

So all the comment needs to say is:
... caller's responsibility to not set __GFP_WAIT at gfp.

And drop this here. It is up to the caller to decide. If the caller wants he can do
"if (current->bio_list)" by his own.

This is a general purpose utility you might not know it's context.
for example with osdblk above will break.

Thanks
Boaz

> +
> + if (sectors >= bio_sectors(bio))
> + return bio;
> +
> + trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
> + bio->bi_sector + sectors);
> +
> + bio_for_each_segment(bv, bio, idx) {
> + vcnt = idx - bio->bi_idx;
> +
> + if (!nbytes) {
> + ret = bio_alloc_bioset(gfp, 0, bs);
> + if (!ret)
> + return NULL;
> +
> + ret->bi_io_vec = bio_iovec(bio);
> + ret->bi_flags |= 1 << BIO_CLONED;
> + break;
> + } else if (nbytes < bv->bv_len) {
> + ret = bio_alloc_bioset(gfp, ++vcnt, bs);
> + if (!ret)
> + return NULL;
> +
> + memcpy(ret->bi_io_vec, bio_iovec(bio),
> + sizeof(struct bio_vec) * vcnt);
> +
> + ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
> + bv->bv_offset += nbytes;
> + bv->bv_len -= nbytes;
> + break;
> + }
> +
> + nbytes -= bv->bv_len;
> + }
> +
> + ret->bi_bdev = bio->bi_bdev;
> + ret->bi_sector = bio->bi_sector;
> + ret->bi_size = sectors << 9;
> + ret->bi_rw = bio->bi_rw;
> + ret->bi_vcnt = vcnt;
> + ret->bi_max_vecs = vcnt;
> + ret->bi_end_io = bio->bi_end_io;
> + ret->bi_private = bio->bi_private;
> +
> + bio->bi_sector += sectors;
> + bio->bi_size -= sectors << 9;
> + bio->bi_idx = idx;
> +
> + if (bio_integrity(bio)) {
> + bio_integrity_clone(ret, bio, gfp, bs);
> + bio_integrity_trim(ret, 0, bio_sectors(ret));
> + bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(bio_split);
> +
> +/**
> * bio_sector_offset - Find hardware sector offset in bio
> * @bio: bio to inspect
> * @index: bio_vec index


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Kent Overstreet 07-25-2012 11:26 PM

block: Introduce new bio_split()
 
On Wed, Jul 25, 2012 at 02:55:40PM +0300, Boaz Harrosh wrote:
> On 07/24/2012 11:11 PM, Kent Overstreet wrote:
>
> > The new bio_split() can split arbitrary bios - it's not restricted to
> > single page bios, like the old bio_split() (previously renamed to
> > bio_pair_split()). It also has different semantics - it doesn't allocate
> > a struct bio_pair, leaving it up to the caller to handle completions.
> >
> > Signed-off-by: Kent Overstreet <koverstreet@google.com>
> > ---
> > fs/bio.c | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++ ++++++++++++
> > 1 files changed, 99 insertions(+), 0 deletions(-)
> >
> > diff --git a/fs/bio.c b/fs/bio.c
> > index 5d02aa5..a15e121 100644
> > --- a/fs/bio.c
> > +++ b/fs/bio.c
> > @@ -1539,6 +1539,105 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
> > EXPORT_SYMBOL(bio_pair_split);
> >
> > /**
> > + * bio_split - split a bio
> > + * @bio: bio to split
> > + * @sectors: number of sectors to split from the front of @bio
> > + * @gfp: gfp mask
> > + * @bs: bio set to allocate from
> > + *
> > + * Allocates and returns a new bio which represents @sectors from the start of
> > + * @bio, and updates @bio to represent the remaining sectors.
> > + *
> > + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
> > + * unchanged.
> > + *
> > + * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
> > + * bvec boundry; it is the caller's responsibility to ensure that @bio is not
> > + * freed before the split.
> > + *
> > + * If bio_split() is running under generic_make_request(), it's not safe to
> > + * allocate more than one bio from the same bio set. Therefore, if it is running
> > + * under generic_make_request() it masks out __GFP_WAIT when doing the
> > + * allocation. The caller must check for failure if there's any possibility of
> > + * it being called from under generic_make_request(); it is then the caller's
> > + * responsibility to retry from a safe context (by e.g. punting to workqueue).
> > + */
> > +struct bio *bio_split(struct bio *bio, int sectors,
> > + gfp_t gfp, struct bio_set *bs)
> > +{
> > + unsigned idx, vcnt = 0, nbytes = sectors << 9;
> > + struct bio_vec *bv;
> > + struct bio *ret = NULL;
> > +
> > + BUG_ON(sectors <= 0);
> > +
> > + /*
> > + * If we're being called from underneath generic_make_request() and we
> > + * already allocated any bios from this bio set, we risk deadlock if we
> > + * use the mempool. So instead, we possibly fail and let the caller punt
> > + * to workqueue or somesuch and retry in a safe context.
> > + */
> > + if (current->bio_list)
> > + gfp &= ~__GFP_WAIT;
>
>
> NACK!
>
> If as you said above in the comment:
> if there's any possibility of it being called from under generic_make_request();
> it is then the caller's responsibility to ...
>
> So all the comment needs to say is:
> ... caller's responsibility to not set __GFP_WAIT at gfp.
>
> And drop this here. It is up to the caller to decide. If the caller wants he can do
> "if (current->bio_list)" by his own.
>
> This is a general purpose utility you might not know it's context.
> for example with osdblk above will break.

Well I'm highly highly skeptical that using __GFP_WAIT under
generic_make_request() is ever a sane thing to do - it could certainly
be safe in specific circumstances, but it's just such a fragile thing to
rely on, you have to _never_ use the same bio pool more than once. And
even then I bet there's other subtle ways it could break.

But you're not the first to complain about it, and your point about
existing code is compelling.

commit ea124f899af29887e24d07497442066572012e5b
Author: Kent Overstreet <koverstreet@google.com>
Date: Wed Jul 25 16:25:10 2012 -0700

block: Introduce new bio_split()

The new bio_split() can split arbitrary bios - it's not restricted to
single page bios, like the old bio_split() (previously renamed to
bio_pair_split()). It also has different semantics - it doesn't allocate
a struct bio_pair, leaving it up to the caller to handle completions.

diff --git a/fs/bio.c b/fs/bio.c
index 0470376..312e5de 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1537,6 +1537,102 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
EXPORT_SYMBOL(bio_pair_split);

/**
+ * bio_split - split a bio
+ * @bio: bio to split
+ * @sectors: number of sectors to split from the front of @bio
+ * @gfp: gfp mask
+ * @bs: bio set to allocate from
+ *
+ * Allocates and returns a new bio which represents @sectors from the start of
+ * @bio, and updates @bio to represent the remaining sectors.
+ *
+ * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
+ * unchanged.
+ *
+ * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
+ * bvec boundry; it is the caller's responsibility to ensure that @bio is not
+ * freed before the split.
+ *
+ * BIG FAT WARNING:
+ *
+ * If you're calling this from under generic_make_request() (i.e.
+ * current->bio_list != NULL), you should mask out __GFP_WAIT and punt to
+ * workqueue if the allocation fails. Otherwise, your code will probably
+ * deadlock.
+ *
+ * You can't allocate more than once from the same bio pool without submitting
+ * the previous allocations (so they'll eventually complete and deallocate
+ * themselves), but if you're under generic_make_request() those previous
+ * allocations won't submit until you return . And if you have to split bios,
+ * you should expect that some bios will require multiple splits.
+ */
+struct bio *bio_split(struct bio *bio, int sectors,
+ gfp_t gfp, struct bio_set *bs)
+{
+ unsigned idx, vcnt = 0, nbytes = sectors << 9;
+ struct bio_vec *bv;
+ struct bio *ret = NULL;
+
+ BUG_ON(sectors <= 0);
+
+ if (sectors >= bio_sectors(bio))
+ return bio;
+
+ trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
+ bio->bi_sector + sectors);
+
+ bio_for_each_segment(bv, bio, idx) {
+ vcnt = idx - bio->bi_idx;
+
+ if (!nbytes) {
+ ret = bio_alloc_bioset(gfp, 0, bs);
+ if (!ret)
+ return NULL;
+
+ ret->bi_io_vec = bio_iovec(bio);
+ ret->bi_flags |= 1 << BIO_CLONED;
+ break;
+ } else if (nbytes < bv->bv_len) {
+ ret = bio_alloc_bioset(gfp, ++vcnt, bs);
+ if (!ret)
+ return NULL;
+
+ memcpy(ret->bi_io_vec, bio_iovec(bio),
+ sizeof(struct bio_vec) * vcnt);
+
+ ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
+ bv->bv_offset += nbytes;
+ bv->bv_len -= nbytes;
+ break;
+ }
+
+ nbytes -= bv->bv_len;
+ }
+
+ ret->bi_bdev = bio->bi_bdev;
+ ret->bi_sector = bio->bi_sector;
+ ret->bi_size = sectors << 9;
+ ret->bi_rw = bio->bi_rw;
+ ret->bi_vcnt = vcnt;
+ ret->bi_max_vecs = vcnt;
+ ret->bi_end_io = bio->bi_end_io;
+ ret->bi_private = bio->bi_private;
+
+ bio->bi_sector += sectors;
+ bio->bi_size -= sectors << 9;
+ bio->bi_idx = idx;
+
+ if (bio_integrity(bio)) {
+ bio_integrity_clone(ret, bio, gfp, bs);
+ bio_integrity_trim(ret, 0, bio_sectors(ret));
+ bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(bio_split);
+
+/**
* bio_sector_offset - Find hardware sector offset in bio
* @bio: bio to inspect
* @index: bio_vec index

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Kent Overstreet 08-06-2012 10:08 PM

block: Introduce new bio_split()
 
The new bio_split() can split arbitrary bios - it's not restricted to
single page bios, like the old bio_split() (previously renamed to
bio_pair_split()). It also has different semantics - it doesn't allocate
a struct bio_pair, leaving it up to the caller to handle completions.

v5: Take out current->bio_list check and make it the caller's
responsibility, per Boaz

Signed-off-by: Kent Overstreet <koverstreet@google.com>
---
fs/bio.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++ +
include/linux/bio.h | 3 ++
2 files changed, 99 insertions(+), 0 deletions(-)

diff --git a/fs/bio.c b/fs/bio.c
index 0470376..312e5de 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1537,6 +1537,102 @@ struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors)
EXPORT_SYMBOL(bio_pair_split);

/**
+ * bio_split - split a bio
+ * @bio: bio to split
+ * @sectors: number of sectors to split from the front of @bio
+ * @gfp: gfp mask
+ * @bs: bio set to allocate from
+ *
+ * Allocates and returns a new bio which represents @sectors from the start of
+ * @bio, and updates @bio to represent the remaining sectors.
+ *
+ * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
+ * unchanged.
+ *
+ * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
+ * bvec boundry; it is the caller's responsibility to ensure that @bio is not
+ * freed before the split.
+ *
+ * BIG FAT WARNING:
+ *
+ * If you're calling this from under generic_make_request() (i.e.
+ * current->bio_list != NULL), you should mask out __GFP_WAIT and punt to
+ * workqueue if the allocation fails. Otherwise, your code will probably
+ * deadlock.
+ *
+ * You can't allocate more than once from the same bio pool without submitting
+ * the previous allocations (so they'll eventually complete and deallocate
+ * themselves), but if you're under generic_make_request() those previous
+ * allocations won't submit until you return . And if you have to split bios,
+ * you should expect that some bios will require multiple splits.
+ */
+struct bio *bio_split(struct bio *bio, int sectors,
+ gfp_t gfp, struct bio_set *bs)
+{
+ unsigned idx, vcnt = 0, nbytes = sectors << 9;
+ struct bio_vec *bv;
+ struct bio *ret = NULL;
+
+ BUG_ON(sectors <= 0);
+
+ if (sectors >= bio_sectors(bio))
+ return bio;
+
+ trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
+ bio->bi_sector + sectors);
+
+ bio_for_each_segment(bv, bio, idx) {
+ vcnt = idx - bio->bi_idx;
+
+ if (!nbytes) {
+ ret = bio_alloc_bioset(gfp, 0, bs);
+ if (!ret)
+ return NULL;
+
+ ret->bi_io_vec = bio_iovec(bio);
+ ret->bi_flags |= 1 << BIO_CLONED;
+ break;
+ } else if (nbytes < bv->bv_len) {
+ ret = bio_alloc_bioset(gfp, ++vcnt, bs);
+ if (!ret)
+ return NULL;
+
+ memcpy(ret->bi_io_vec, bio_iovec(bio),
+ sizeof(struct bio_vec) * vcnt);
+
+ ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
+ bv->bv_offset += nbytes;
+ bv->bv_len -= nbytes;
+ break;
+ }
+
+ nbytes -= bv->bv_len;
+ }
+
+ ret->bi_bdev = bio->bi_bdev;
+ ret->bi_sector = bio->bi_sector;
+ ret->bi_size = sectors << 9;
+ ret->bi_rw = bio->bi_rw;
+ ret->bi_vcnt = vcnt;
+ ret->bi_max_vecs = vcnt;
+ ret->bi_end_io = bio->bi_end_io;
+ ret->bi_private = bio->bi_private;
+
+ bio->bi_sector += sectors;
+ bio->bi_size -= sectors << 9;
+ bio->bi_idx = idx;
+
+ if (bio_integrity(bio)) {
+ bio_integrity_clone(ret, bio, gfp, bs);
+ bio_integrity_trim(ret, 0, bio_sectors(ret));
+ bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(bio_split);
+
+/**
* bio_sector_offset - Find hardware sector offset in bio
* @bio: bio to inspect
* @index: bio_vec index
diff --git a/include/linux/bio.h b/include/linux/bio.h
index fdcc8dc..2d06262 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -201,6 +201,9 @@ struct bio_pair {
atomic_t cnt;
int error;
};
+
+extern struct bio *bio_split(struct bio *bio, int sectors,
+ gfp_t gfp, struct bio_set *bs);
extern struct bio_pair *bio_pair_split(struct bio *bi, int first_sectors);
extern void bio_pair_release(struct bio_pair *dbio);

--
1.7.7.3

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Tejun Heo 08-08-2012 10:58 PM

block: Introduce new bio_split()
 
Hello,

On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> /**
> + * bio_split - split a bio
> + * @bio: bio to split
> + * @sectors: number of sectors to split from the front of @bio
> + * @gfp: gfp mask
> + * @bs: bio set to allocate from
> + *
> + * Allocates and returns a new bio which represents @sectors from the start of
> + * @bio, and updates @bio to represent the remaining sectors.
> + *
> + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
> + * unchanged.

Umm.... I don't know. This is rather confusing. The function may
return new or old bios? What's the rationale behind it? Return
ERR_PTR(-EINVAL) instead?

> + *
> + * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
> + * bvec boundry; it is the caller's responsibility to ensure that @bio is not
> + * freed before the split.

This is somewhat error-prone. Given how splits are used now, this
might not be a big issue but it isn't difficult to imagine how this
could go subtly wrong. More on this.

> + *
> + * BIG FAT WARNING:
> + *
> + * If you're calling this from under generic_make_request() (i.e.
> + * current->bio_list != NULL), you should mask out __GFP_WAIT and punt to
> + * workqueue if the allocation fails. Otherwise, your code will probably
> + * deadlock.

If the condition is detectable, WARN_ON_ONCE() please.

> + * You can't allocate more than once from the same bio pool without submitting
> + * the previous allocations (so they'll eventually complete and deallocate
> + * themselves), but if you're under generic_make_request() those previous
> + * allocations won't submit until you return . And if you have to split bios,
^
extra space
> + * you should expect that some bios will require multiple splits.
> + */
> +struct bio *bio_split(struct bio *bio, int sectors,
> + gfp_t gfp, struct bio_set *bs)
> +{
> + unsigned idx, vcnt = 0, nbytes = sectors << 9;
> + struct bio_vec *bv;
> + struct bio *ret = NULL;
> +
> + BUG_ON(sectors <= 0);
> +
> + if (sectors >= bio_sectors(bio))
> + return bio;
> +
> + trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
> + bio->bi_sector + sectors);
> +
> + bio_for_each_segment(bv, bio, idx) {
> + vcnt = idx - bio->bi_idx;
> +
> + if (!nbytes) {
> + ret = bio_alloc_bioset(gfp, 0, bs);
> + if (!ret)
> + return NULL;
> +
> + ret->bi_io_vec = bio_iovec(bio);
> + ret->bi_flags |= 1 << BIO_CLONED;
> + break;
> + } else if (nbytes < bv->bv_len) {
> + ret = bio_alloc_bioset(gfp, ++vcnt, bs);
> + if (!ret)
> + return NULL;
> +
> + memcpy(ret->bi_io_vec, bio_iovec(bio),
> + sizeof(struct bio_vec) * vcnt);
> +
> + ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
> + bv->bv_offset += nbytes;
> + bv->bv_len -= nbytes;
> + break;
> + }

Ummm... ISTR reviewing this code and getting confused by bio_alloc
inside bio_for_each_segment() loop and commenting something about
that. Yeah, this one.

http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/15790/focus=370

So, I actually have reviewed this but didn't get any response and
majority of the issues I raised aren't addressed and you sent the
patch to me again? What the hell, Kent?

> +
> + nbytes -= bv->bv_len;
> + }
> +
> + ret->bi_bdev = bio->bi_bdev;
> + ret->bi_sector = bio->bi_sector;
> + ret->bi_size = sectors << 9;
> + ret->bi_rw = bio->bi_rw;
> + ret->bi_vcnt = vcnt;
> + ret->bi_max_vecs = vcnt;
> + ret->bi_end_io = bio->bi_end_io;

Is this safe? Why isn't this chaining completion of split bio to the
original one?

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Tejun Heo 08-08-2012 11:05 PM

block: Introduce new bio_split()
 
One more thing.

On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> + if (bio_integrity(bio)) {
> + bio_integrity_clone(ret, bio, gfp, bs);
> + bio_integrity_trim(ret, 0, bio_sectors(ret));
> + bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));

Is this equivalent to bio_integrity_split() performance-wise?

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Kent Overstreet 08-09-2012 01:19 AM

block: Introduce new bio_split()
 
On Wed, Aug 08, 2012 at 03:58:39PM -0700, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> > /**
> > + * bio_split - split a bio
> > + * @bio: bio to split
> > + * @sectors: number of sectors to split from the front of @bio
> > + * @gfp: gfp mask
> > + * @bs: bio set to allocate from
> > + *
> > + * Allocates and returns a new bio which represents @sectors from the start of
> > + * @bio, and updates @bio to represent the remaining sectors.
> > + *
> > + * If bio_sectors(@bio) was less than or equal to @sectors, returns @bio
> > + * unchanged.
>
> Umm.... I don't know. This is rather confusing. The function may
> return new or old bios? What's the rationale behind it? Return
> ERR_PTR(-EINVAL) instead?

Returning the old bio would be semantically equivalent to returning an
error, but IME when you're actually using it it does make sense and
leads to slightly cleaner code.

The reason is that when you're splitting, sectors is typically just the
maximum number of sectors you can handle here - you calculate the device
limit, or the number of sectors you can read from this location, or
whatever.

So the code ends up looking like:

while (1) {
split = bio_split(bio, sectors);

/* do some stuff to split and submit it */

/* check if that was the last split and break */
}

If bio_split returned an error, it'd make the code more convoluted -
you'd have to do work on either the split or the original bio, and then
repeat the same check later when it's time to break out of the loop.


>
> > + *
> > + * The newly allocated bio will point to @bio's bi_io_vec, if the split was on a
> > + * bvec boundry; it is the caller's responsibility to ensure that @bio is not
> > + * freed before the split.
>
> This is somewhat error-prone. Given how splits are used now, this
> might not be a big issue but it isn't difficult to imagine how this
> could go subtly wrong. More on this.

I can't find anything else in your emails on the subject...

So, I do agree, but there is a rationale:

Due to the way bio completions have to be chained, I'm not convinced
it's much of an issue in practice; if you're processing a bio by
splitting it, you can't complete it until all the splits have
completed, so you have to have a hook there.

In order for this to lead to a bug, you'd have to be cloning the
original bio (i.e. you can't be splitting a bio that someone else owns
and passed you, because that won't be freed until after you complete it)
and then you have to fail to put/free that clone in your hook, where
you're going to have other state to free too.

Cloning a bio and then not explicitly freeing it ought to be fairly
obviously wrong, IMO.

I think there's a more positive reason to do it this way long term, too.
I'm working towards getting rid of arbitrary restrictions in the block
layer, and forcing bio_split() to allocate the bvec introduces one;
allocating a bio with more than BIO_MAX_VECS will fail, and that _is_
the kind of tricky restriction that's likely to trip callers up (it's
certainly happened to me, I think multiple times).

Currently this is still an issue if the split isn't aligned on a bvec
boundary, but that's also fixable - by making the bvec immutable, which
would have a lot of other benefits too.

Making bio vecs immutable would also solve the original problem, because
cloning a bio would no longer clone the bvec as well - so the bvec the
split points to would _always_ be owned by something higher up that
couldn't free it until after the split completes.

>
> > + *
> > + * BIG FAT WARNING:
> > + *
> > + * If you're calling this from under generic_make_request() (i.e.
> > + * current->bio_list != NULL), you should mask out __GFP_WAIT and punt to
> > + * workqueue if the allocation fails. Otherwise, your code will probably
> > + * deadlock.
>
> If the condition is detectable, WARN_ON_ONCE() please.

Ok, I like that.

>
> > + * You can't allocate more than once from the same bio pool without submitting
> > + * the previous allocations (so they'll eventually complete and deallocate
> > + * themselves), but if you're under generic_make_request() those previous
> > + * allocations won't submit until you return . And if you have to split bios,
> ^
> extra space
> > + * you should expect that some bios will require multiple splits.
> > + */
> > +struct bio *bio_split(struct bio *bio, int sectors,
> > + gfp_t gfp, struct bio_set *bs)
> > +{
> > + unsigned idx, vcnt = 0, nbytes = sectors << 9;
> > + struct bio_vec *bv;
> > + struct bio *ret = NULL;
> > +
> > + BUG_ON(sectors <= 0);
> > +
> > + if (sectors >= bio_sectors(bio))
> > + return bio;
> > +
> > + trace_block_split(bdev_get_queue(bio->bi_bdev), bio,
> > + bio->bi_sector + sectors);
> > +
> > + bio_for_each_segment(bv, bio, idx) {
> > + vcnt = idx - bio->bi_idx;
> > +
> > + if (!nbytes) {
> > + ret = bio_alloc_bioset(gfp, 0, bs);
> > + if (!ret)
> > + return NULL;
> > +
> > + ret->bi_io_vec = bio_iovec(bio);
> > + ret->bi_flags |= 1 << BIO_CLONED;
> > + break;
> > + } else if (nbytes < bv->bv_len) {
> > + ret = bio_alloc_bioset(gfp, ++vcnt, bs);
> > + if (!ret)
> > + return NULL;
> > +
> > + memcpy(ret->bi_io_vec, bio_iovec(bio),
> > + sizeof(struct bio_vec) * vcnt);
> > +
> > + ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
> > + bv->bv_offset += nbytes;
> > + bv->bv_len -= nbytes;
> > + break;
> > + }
>
> Ummm... ISTR reviewing this code and getting confused by bio_alloc
> inside bio_for_each_segment() loop and commenting something about
> that. Yeah, this one.
>
> http://thread.gmane.org/gmane.linux.kernel.device-mapper.devel/15790/focus=370
>
> So, I actually have reviewed this but didn't get any response and
> majority of the issues I raised aren't addressed and you sent the
> patch to me again? What the hell, Kent?

Argh. I apologize, I knew I'd missing something. Cutting and pasting the
stuff I haven't already responded to/fixed:

>> + ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
>> + bv->bv_offset += nbytes;
>> + bv->bv_len -= nbytes;
>
> Please don't indent assignments.

Ok, unindented those.

>
>> + break;
>> + }
>> +
>> + nbytes -= bv->bv_len;
>> + }
>
> I find the code a bit confusing. Wouldn't it be better to structure
> it as
>
> bio_for_each_segment() {
> find splitting point;
> }
>
> Do all of the splitting.

Definitely, except I don't see how to sanely do it that way with the
different cases for splitting on a bvec boundry and not.

I would like to get rid of that conditional eventually, but by making
bvecs immutable.

>
>> + ret->bi_bdev = bio->bi_bdev;
>> + ret->bi_sector = bio->bi_sector;
>> + ret->bi_size = sectors << 9;
>> + ret->bi_rw = bio->bi_rw;
>> + ret->bi_vcnt = vcnt;
>> + ret->bi_max_vecs = vcnt;
>> + ret->bi_end_io = bio->bi_end_io;
>> + ret->bi_private = bio->bi_private;
>>
>> - bio_endio(master, bp->error);
>> - mempool_free(bp, bp->bio2.bi_private);
>> + bio->bi_sector += sectors;
>> + bio->bi_size -= sectors << 9;
>> + bio->bi_idx = idx;
>
> I personally would prefer not having indentations here either.

These I'd prefer to keep - it is a dozen assignments in a row, I
_really_ find the indented version more readable.

> So, before, split wouldn't override orig->bi_private. Now, it does so
> while the bio is in flight. I don't know. If the only benefit of
> temporary override is avoiding have a separate end_io call, I think
> not doing so is better. Also, behavior changes as subtle as this
> *must* be noted in the patch description.

Already said more about this below, but to elaborate a bit - there are
situations where the caller really wouldn't want the completions chained
(i.e, if the splits are going to different devices or otherwise are
going to have different error handling, the caller really needs to
supply its own endio function(s)).

The old behaviour is still available (certainly there are cases where it
_is_ what you want) - it's just been decoupled a bit.

>
> > +
> > + nbytes -= bv->bv_len;
> > + }
> > +
> > + ret->bi_bdev = bio->bi_bdev;
> > + ret->bi_sector = bio->bi_sector;
> > + ret->bi_size = sectors << 9;
> > + ret->bi_rw = bio->bi_rw;
> > + ret->bi_vcnt = vcnt;
> > + ret->bi_max_vecs = vcnt;
> > + ret->bi_end_io = bio->bi_end_io;
>
> Is this safe? Why isn't this chaining completion of split bio to the
> original one?

Outside the scope of this function - if you want the completions
chained, you'd use bio_pair_split().

With this bio_split() it's perfectly reasonable to split a bio an
arbitrary number of times, and if that's what you're doing it's much
cleaner (and more efficient) to just use a refcount instead of chaining
the completions a bunch of times.

So if that's what the caller is doing, this will do exactly what they
want - if the caller wants to chain the completions, the caller can
still do that (like how bio_pair_split() does in the next patch).

>
> Thanks.
>
> --
> tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Kent Overstreet 08-09-2012 01:39 AM

block: Introduce new bio_split()
 
On Wed, Aug 08, 2012 at 04:05:32PM -0700, Tejun Heo wrote:
> One more thing.
>
> On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> > + if (bio_integrity(bio)) {
> > + bio_integrity_clone(ret, bio, gfp, bs);
> > + bio_integrity_trim(ret, 0, bio_sectors(ret));
> > + bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
>
> Is this equivalent to bio_integrity_split() performance-wise?

Strictly speaking, no. But it has the advantage of being drastically
simpler - and the only one only worked for single page bios so I
would've had to come up with something new from scratch, and as
confusing as the integrity stuff is I wouldn't trust the result.

I'm skeptical that it's going to matter in practice given how much
iteration is done elsewhere in the course of processing a bio and given
that this stuff isn't used with high end SSDs...

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Tejun Heo 08-09-2012 06:44 AM

block: Introduce new bio_split()
 
Hello, Kent.

On Wed, Aug 08, 2012 at 06:19:28PM -0700, Kent Overstreet wrote:
> If bio_split returned an error, it'd make the code more convoluted -
> you'd have to do work on either the split or the original bio, and then
> repeat the same check later when it's time to break out of the loop.

I really dislike this reasonsing. The interface might streamline
certain use cases a bit but at the cost of confusing semantics. It's
called bio_split() which takes a bio and returns a bio. Any sane
person would expect it to return the split bio iff the original bio is
successfully split and NULL or ERR_PTR() value if splitting didn't
happen for whatever reason. Please don't try to make code look
elegant by twisting obvious things. Saving three lines is never
worthwhile it it costs obviousness.

This reminds me of, unfortunately, kobject_get() which returns the
parameter it got passed in verbatim. The rationale was that it makes
code like the following *look* more elegant.

my_kobj = kobject_get(&blahblah.kobj);

Some people liked how things like the above looked. Unfortunately, it
unnecessarily made people wonder whether the return value could be
different from the parameter (can it ever fail?) and led to cases
where people misunderstood the semantics and assumed that kobj somehow
handles zero refcnt racing and kobject_get() would magically return
NULL if refcnt reaches zero. I hope we don't have those bugs anymore
but code built on top of kobject unfortunately propagated this
stupidity multiple layers upwards and I'm not sure.

So, please don't do things like this.

> > This is somewhat error-prone. Given how splits are used now, this
> > might not be a big issue but it isn't difficult to imagine how this
> > could go subtly wrong. More on this.
>
> I can't find anything else in your emails on the subject...

Oh, it was the chaining thing. If you bump ref on the original bio
and let the split one put it on completion, the caller doesn't have to
worry about this.

> > > + bio_for_each_segment(bv, bio, idx) {
> > > + vcnt = idx - bio->bi_idx;
> > > +
> > > + if (!nbytes) {
> > > + ret = bio_alloc_bioset(gfp, 0, bs);
> > > + if (!ret)
> > > + return NULL;
> > > +
> > > + ret->bi_io_vec = bio_iovec(bio);
> > > + ret->bi_flags |= 1 << BIO_CLONED;
> > > + break;
> > > + } else if (nbytes < bv->bv_len) {
> > > + ret = bio_alloc_bioset(gfp, ++vcnt, bs);
> > > + if (!ret)
> > > + return NULL;
> > > +
> > > + memcpy(ret->bi_io_vec, bio_iovec(bio),
> > > + sizeof(struct bio_vec) * vcnt);
> > > +
> > > + ret->bi_io_vec[vcnt - 1].bv_len = nbytes;
> > > + bv->bv_offset += nbytes;
> > > + bv->bv_len -= nbytes;
> > > + break;
> > > + }
...
> > I find the code a bit confusing. Wouldn't it be better to structure
> > it as
> >
> > bio_for_each_segment() {
> > find splitting point;
> > }
> >
> > Do all of the splitting.
>
> Definitely, except I don't see how to sanely do it that way with the
> different cases for splitting on a bvec boundry and not.

I really don't buy that. Just break out on the common condition and
differentiate the two cases afterwards.

> > So, before, split wouldn't override orig->bi_private. Now, it does so
> > while the bio is in flight. I don't know. If the only benefit of
> > temporary override is avoiding have a separate end_io call, I think
> > not doing so is better. Also, behavior changes as subtle as this
> > *must* be noted in the patch description.
>
> Already said more about this below, but to elaborate a bit - there are
> situations where the caller really wouldn't want the completions chained
> (i.e, if the splits are going to different devices or otherwise are
> going to have different error handling, the caller really needs to
> supply its own endio function(s)).

Then let it take @chain parameter and if @chain is %false, clear endio
(or introduce bio_split_chain()). Copying endio of the original bio
is useless and dangerous.

> Outside the scope of this function - if you want the completions
> chained, you'd use bio_pair_split().
>
> With this bio_split() it's perfectly reasonable to split a bio an
> arbitrary number of times, and if that's what you're doing it's much
> cleaner (and more efficient) to just use a refcount instead of chaining
> the completions a bunch of times.
>
> So if that's what the caller is doing, this will do exactly what they
> want - if the caller wants to chain the completions, the caller can
> still do that (like how bio_pair_split() does in the next patch).

bio_pair doesn't really make much sense after this change. Originally
it made sense as the container holding everything necessary for the
clone, now it's just a proxy to keep the old interface. It doesn't
even succeed at that. The interface changes. Let's just roll it into
the default bio_clone() interface.

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

Tejun Heo 08-09-2012 07:22 AM

block: Introduce new bio_split()
 
On Wed, Aug 08, 2012 at 06:39:23PM -0700, Kent Overstreet wrote:
> On Wed, Aug 08, 2012 at 04:05:32PM -0700, Tejun Heo wrote:
> > One more thing.
> >
> > On Mon, Aug 06, 2012 at 03:08:37PM -0700, Kent Overstreet wrote:
> > > + if (bio_integrity(bio)) {
> > > + bio_integrity_clone(ret, bio, gfp, bs);
> > > + bio_integrity_trim(ret, 0, bio_sectors(ret));
> > > + bio_integrity_trim(bio, bio_sectors(ret), bio_sectors(bio));
> >
> > Is this equivalent to bio_integrity_split() performance-wise?
>
> Strictly speaking, no. But it has the advantage of being drastically
> simpler - and the only one only worked for single page bios so I
> would've had to come up with something new from scratch, and as
> confusing as the integrity stuff is I wouldn't trust the result.

There's already bio_integrity_split() and you're actively dropping it.

> I'm skeptical that it's going to matter in practice given how much
> iteration is done elsewhere in the course of processing a bio and given
> that this stuff isn't used with high end SSDs...

If you think the active dropping is justified, please let the change
and justification clearly stated. You're burying the active change in
two separate patches without even mentioning it or cc'ing people who
care about bio-integrity (Martin K. Petersen). Ummm.... this is
simply unacceptable and makes me a lot more suspicious about the
patchset. Please be explicit about changes you make. Peer-review
breaks down unless such trust can be maintained.

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


All times are GMT. The time now is 12:08 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.