FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 08-07-2012, 03:17 AM
Muthu Kumar
 
Default block: Only clone bio vecs that are in use

Hi,

On Tue, Jul 24, 2012 at 1:11 PM, Kent Overstreet <koverstreet@google.com> wrote:
> bcache creates large bios internally, and then splits them according to
> the device requirements before it sends them down. If a lower level
> device tries to clone the bio, and the original bio had more than
> BIO_MAX_PAGES, the clone will fail unecessarily.
>
> We can fix this by only cloning the bio vecs that are actually in use.
>
> Signed-off-by: Kent Overstreet <koverstreet@google.com>

<snip>

>
> diff --git a/drivers/md/dm.c b/drivers/md/dm.c
> index 3f3c26e..193fb19 100644
> --- a/drivers/md/dm.c
> +++ b/drivers/md/dm.c
> @@ -1057,11 +1057,10 @@ static struct bio *clone_bio(struct bio *bio, sector_t sector,
> {
> struct bio *clone;
>
> - clone = bio_alloc_bioset(GFP_NOIO, bio->bi_max_vecs, bs);
> + clone = bio_alloc_bioset(GFP_NOIO, bv_count, bs);

Number of io_vecs allocated in clone is different. Please see my comment below.

> __bio_clone(clone, bio);
> clone->bi_sector = sector;
> - clone->bi_idx = idx;
> - clone->bi_vcnt = idx + bv_count;
> + clone->bi_vcnt = bv_count;
> clone->bi_size = to_bytes(len);
> clone->bi_flags &= ~(1 << BIO_SEG_VALID);
>
> diff --git a/fs/bio.c b/fs/bio.c
> index 7a0801d..ec6a357 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -451,8 +451,9 @@ EXPORT_SYMBOL(bio_phys_segments);
> */
> void __bio_clone(struct bio *bio, struct bio *bio_src)
> {
> - memcpy(bio->bi_io_vec, bio_src->bi_io_vec,
> - bio_src->bi_max_vecs * sizeof(struct bio_vec));
> + memcpy(bio->bi_io_vec,
> + bio_iovec(bio_src),
> + bio_segments(bio_src) * sizeof(struct bio_vec));
>


You are changing the meaning of __bio_clone() here. In old code, the
number of io_vecs, bi_idx, bi_vcnt are preserved. But in this modified
code, you are mapping bio_src's bi_iovec[bi_idx] to bio_dests
bi_iovec[0] and also restricting the number of allocated io_vecs of
the clone. It may be useful for cases were we would like a identical
copy of the original bio (may not be in current code base, but this
implementation is definitely not what one would expect from the name
"clone").

May be, call this new implementation some thing else (and use it for bcache)?

Regards,
Muthu



> /*
> * most users will be overriding ->bi_bdev with a new target,
> @@ -461,10 +462,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
> bio->bi_sector = bio_src->bi_sector;
> bio->bi_bdev = bio_src->bi_bdev;
> bio->bi_flags |= 1 << BIO_CLONED;
> + bio->bi_flags &= ~(1 << BIO_SEG_VALID);
> bio->bi_rw = bio_src->bi_rw;
> - bio->bi_vcnt = bio_src->bi_vcnt;
> + bio->bi_vcnt = bio_segments(bio_src);
> bio->bi_size = bio_src->bi_size;
> - bio->bi_idx = bio_src->bi_idx;
> }
> EXPORT_SYMBOL(__bio_clone);
>
> @@ -479,7 +480,7 @@ EXPORT_SYMBOL(__bio_clone);
> struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
> struct bio_set *bs)
> {
> - struct bio *b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
> + struct bio *b = bio_alloc_bioset(gfp_mask, bio_segments(bio), bs);
>
> if (!b)
> return NULL;
> @@ -509,7 +510,7 @@ EXPORT_SYMBOL(bio_clone);
>
> struct bio *bio_clone_kmalloc(struct bio *bio, gfp_t gfp_mask)
> {
> - struct bio *b = bio_kmalloc(gfp_mask, bio->bi_max_vecs);
> + struct bio *b = bio_kmalloc(gfp_mask, bio_segments(bio));
>
> if (!b)
> return NULL;
> --
> 1.7.7.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-08-2012, 11:30 PM
Tejun Heo
 
Default block: Only clone bio vecs that are in use

Hello,

On Mon, Aug 06, 2012 at 03:08:41PM -0700, Kent Overstreet wrote:
> @@ -459,10 +460,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
> bio->bi_sector = bio_src->bi_sector;
> bio->bi_bdev = bio_src->bi_bdev;
> bio->bi_flags |= 1 << BIO_CLONED;
> + bio->bi_flags &= ~(1 << BIO_SEG_VALID);

This isn't obvious at all. Why no explanation anywhere? Also it
would be nice to update comments of the updated functions so that it's
clear that only partial cloning happens.

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-09-2012, 03:06 AM
Kent Overstreet
 
Default block: Only clone bio vecs that are in use

On Wed, Aug 08, 2012 at 04:30:07PM -0700, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 06, 2012 at 03:08:41PM -0700, Kent Overstreet wrote:
> > @@ -459,10 +460,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
> > bio->bi_sector = bio_src->bi_sector;
> > bio->bi_bdev = bio_src->bi_bdev;
> > bio->bi_flags |= 1 << BIO_CLONED;
> > + bio->bi_flags &= ~(1 << BIO_SEG_VALID);
>
> This isn't obvious at all. Why no explanation anywhere? Also it
> would be nice to update comments of the updated functions so that it's
> clear that only partial cloning happens.

Because it's not obvious to me, either - I had to grep around through a
fair amount of code to figure out the semantics of BIO_SEG_VALID and I
doubt I have it 100%. I'm also pretty sure it's not used consistently in
the existing code...

If it means what I think it means, it should be obvious - nr_segs isn't
valid because the number of pages in the iovec changed (though we didn't
bother to copy it over anyways. So it's not necessary if nr_segs = 0
means nr_segs isn't valid, but bleh).

Anyways. yeah. BIO_SEG_VALID should be documented, and if it was I think
this code would be fine.

I will update the comment for the partial cloning thing:

/**
* __bio_clone - clone a bio
* @bio: destination bio
* @bio_src: bio to clone
*
* Clone a &bio. Caller will own the returned bio, but not
* the actual data it points to. Reference count of returned
* bio will be one.
*
* We don't clone the entire bvec, just the part from bi_idx to b_vcnt
* (i.e. what the bio currently points to, so the new bio is still
* equivalent to the old bio).
*/
void __bio_clone(struct bio *bio, struct bio *bio_src)
{
memcpy(bio->bi_io_vec,
bio_iovec(bio_src),
bio_segments(bio_src) * sizeof(struct bio_vec));

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-09-2012, 05:37 PM
Tejun Heo
 
Default block: Only clone bio vecs that are in use

Hello,

On Mon, Aug 06, 2012 at 03:08:41PM -0700, Kent Overstreet wrote:
> bcache creates large bios internally, and then splits them according to
> the device requirements before it sends them down. If a lower level
> device tries to clone the bio, and the original bio had more than
> BIO_MAX_PAGES, the clone will fail unecessarily.
>
> We can fix this by only cloning the bio vecs that are actually in use.

How was this tested?

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-13-2012, 09:46 PM
Kent Overstreet
 
Default block: Only clone bio vecs that are in use

On Thu, Aug 09, 2012 at 10:37:00AM -0700, Tejun Heo wrote:
> Hello,
>
> On Mon, Aug 06, 2012 at 03:08:41PM -0700, Kent Overstreet wrote:
> > bcache creates large bios internally, and then splits them according to
> > the device requirements before it sends them down. If a lower level
> > device tries to clone the bio, and the original bio had more than
> > BIO_MAX_PAGES, the clone will fail unecessarily.
> >
> > We can fix this by only cloning the bio vecs that are actually in use.
>
> How was this tested?

This code has been in the bcache tree for months, and I added it to fix
a real bug (think I saw it with bcache on top of dm) - and since then
it's been tested in probably just about all the relevant configurations,
certainly both dm and md.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-22-2012, 09:10 PM
Tejun Heo
 
Default block: Only clone bio vecs that are in use

Hello, Kent.

On Wed, Aug 22, 2012 at 10:04:10AM -0700, Kent Overstreet wrote:
> bcache creates large bios internally, and then splits them according to
> the device requirements before it sends them down. If a lower level
> device tries to clone the bio, and the original bio had more than
> BIO_MAX_PAGES, the clone will fail unecessarily.
>
> We can fix this by only cloning the bio vecs that are actually in use.

I'm pretty sure I sound like a broken record by now, but

* How was this tested?

* What are the implications and possible dangers?

> @@ -463,10 +468,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
> bio->bi_sector = bio_src->bi_sector;
> bio->bi_bdev = bio_src->bi_bdev;
> bio->bi_flags |= 1 << BIO_CLONED;
> + bio->bi_flags &= ~(1 << BIO_SEG_VALID);

For the n'th time, explain please.

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-24-2012, 07:05 AM
Kent Overstreet
 
Default block: Only clone bio vecs that are in use

On Wed, Aug 22, 2012 at 02:10:45PM -0700, Tejun Heo wrote:
> Hello, Kent.
>
> On Wed, Aug 22, 2012 at 10:04:10AM -0700, Kent Overstreet wrote:
> > bcache creates large bios internally, and then splits them according to
> > the device requirements before it sends them down. If a lower level
> > device tries to clone the bio, and the original bio had more than
> > BIO_MAX_PAGES, the clone will fail unecessarily.
> >
> > We can fix this by only cloning the bio vecs that are actually in use.
>
> I'm pretty sure I sound like a broken record by now, but
>
> * How was this tested?
>
> * What are the implications and possible dangers?

I've said all that on list, but I gather what you really wanted was to
have it all in the patch description. Will do.

Though actually, this patch by itself is pretty innocuous. It's my bio
splitting code that actually depends on the fields of struct bio
(bi_idx) being used consistently.

Anyways, tell me if the description below is better.

> > @@ -463,10 +468,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
> > bio->bi_sector = bio_src->bi_sector;
> > bio->bi_bdev = bio_src->bi_bdev;
> > bio->bi_flags |= 1 << BIO_CLONED;
> > + bio->bi_flags &= ~(1 << BIO_SEG_VALID);
>
> For the n'th time, explain please.

Argh, I could've sworn I dropped that part.


commit 0edda563aef9432b45f0c6a50f52590b92594560
Author: Kent Overstreet <koverstreet@google.com>
Date: Thu Aug 23 23:26:38 2012 -0700

block: Only clone bio vecs that are in use

bcache creates large bios internally, and then splits them according to
the device requirements before it sends them down. If a lower level
device tries to clone the bio, and the original bio had more than
BIO_MAX_PAGES, the clone will fail unecessarily.

We can fix this by only cloning the bio vecs that are actually in use -
as for as the block layer is concerned the new bio is still equivalent
to the old bio.

This code should in general be safe as long as all the block layer code
uses bi_idx, bi_vcnt consistently; since bios are cloned by code that
doesn't own the original bio there's little room for issues caused by
code playing games with the original bio's bi_io_vec. One perhaps
imagine code depending the clone and original bio's io vecs lining up a
certain way, but auditing and testing haven't turned up anything.

Testing: This code has been in the bcache tree for quite awhile, and has
been tested with various md layers and dm targets (including strange
things like multipath).

Signed-off-by: Kent Overstreet <koverstreet@google.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Alasdair Kergon <agk@redhat.com>
CC: Sage Weil <sage@inktank.com>

diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 63e5852..5c3457f 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -734,7 +734,7 @@ static struct bio *bio_chain_clone(struct bio **old, struct bio **next,
}

while (old_chain && (total < len)) {
- tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
+ tmp = bio_kmalloc(gfpmask, bio_segments(old_chain));
if (!tmp)
goto err_out;

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index a3c38b9..3aeb108 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1080,11 +1080,10 @@ static struct bio *clone_bio(struct bio *bio, sector_t sector,
{
struct bio *clone;

- clone = bio_alloc_bioset(GFP_NOIO, bio->bi_max_vecs, bs);
+ clone = bio_alloc_bioset(GFP_NOIO, bv_count, bs);
__bio_clone(clone, bio);
clone->bi_sector = sector;
- clone->bi_idx = idx;
- clone->bi_vcnt = idx + bv_count;
+ clone->bi_vcnt = bv_count;
clone->bi_size = to_bytes(len);
clone->bi_flags &= ~(1 << BIO_SEG_VALID);

diff --git a/fs/bio.c b/fs/bio.c
index 895e2a6..cd80710 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -467,11 +467,16 @@ EXPORT_SYMBOL(bio_phys_segments);
* Clone a &bio. Caller will own the returned bio, but not
* the actual data it points to. Reference count of returned
* bio will be one.
+ *
+ * We don't clone the entire bvec, just the part from bi_idx to b_vcnt
+ * (i.e. what the bio currently points to, so the new bio is still
+ * equivalent to the old bio).
*/
void __bio_clone(struct bio *bio, struct bio *bio_src)
{
- memcpy(bio->bi_io_vec, bio_src->bi_io_vec,
- bio_src->bi_max_vecs * sizeof(struct bio_vec));
+ memcpy(bio->bi_io_vec,
+ bio_iovec(bio_src),
+ bio_segments(bio_src) * sizeof(struct bio_vec));

/*
* most users will be overriding ->bi_bdev with a new target,
@@ -481,9 +486,8 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
bio->bi_bdev = bio_src->bi_bdev;
bio->bi_flags |= 1 << BIO_CLONED;
bio->bi_rw = bio_src->bi_rw;
- bio->bi_vcnt = bio_src->bi_vcnt;
+ bio->bi_vcnt = bio_segments(bio_src);
bio->bi_size = bio_src->bi_size;
- bio->bi_idx = bio_src->bi_idx;
}
EXPORT_SYMBOL(__bio_clone);

@@ -500,7 +504,7 @@ struct bio *bio_clone_bioset(struct bio *bio, gfp_t gfp_mask,
{
struct bio *b;

- b = bio_alloc_bioset(gfp_mask, bio->bi_max_vecs, bs);
+ b = bio_alloc_bioset(gfp_mask, bio_segments(bio), bs);
if (!b)
return NULL;


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 08-24-2012, 08:42 PM
Tejun Heo
 
Default block: Only clone bio vecs that are in use

Hello, Kent.

On Fri, Aug 24, 2012 at 12:05:08AM -0700, Kent Overstreet wrote:
> > I'm pretty sure I sound like a broken record by now, but
> >
> > * How was this tested?
> >
> > * What are the implications and possible dangers?
>
> I've said all that on list, but I gather what you really wanted was to
> have it all in the patch description. Will do.

Yeap.

> > > @@ -463,10 +468,10 @@ void __bio_clone(struct bio *bio, struct bio *bio_src)
> > > bio->bi_sector = bio_src->bi_sector;
> > > bio->bi_bdev = bio_src->bi_bdev;
> > > bio->bi_flags |= 1 << BIO_CLONED;
> > > + bio->bi_flags &= ~(1 << BIO_SEG_VALID);
> >
> > For the n'th time, explain please.
>
> Argh, I could've sworn I dropped that part.

Can we drop it tho? If we're changing bvecs, we probably should be
clearing SEG_VALID on both bios.

> commit 0edda563aef9432b45f0c6a50f52590b92594560
> Author: Kent Overstreet <koverstreet@google.com>
> Date: Thu Aug 23 23:26:38 2012 -0700
>
> block: Only clone bio vecs that are in use
>
> bcache creates large bios internally, and then splits them according to
> the device requirements before it sends them down. If a lower level
> device tries to clone the bio, and the original bio had more than
> BIO_MAX_PAGES, the clone will fail unecessarily.
>
> We can fix this by only cloning the bio vecs that are actually in use -
> as for as the block layer is concerned the new bio is still equivalent
> to the old bio.
>
> This code should in general be safe as long as all the block layer code
> uses bi_idx, bi_vcnt consistently; since bios are cloned by code that
> doesn't own the original bio there's little room for issues caused by
> code playing games with the original bio's bi_io_vec. One perhaps
> imagine code depending the clone and original bio's io vecs lining up a
> certain way, but auditing and testing haven't turned up anything.
>
> Testing: This code has been in the bcache tree for quite awhile, and has
> been tested with various md layers and dm targets (including strange
> things like multipath).

Yeap, looks much better to me.

Thanks.

--
tejun

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 03:16 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org