mirrored device with thousand of mappingtableentries
Hi Martin,
On Sun, Mar 06 2011 at 9:59pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:
> >>>>> "Zdenek" == Zdenek Kabelac <zkabelac@redhat.com> writes:
>
> Zdenek> My finding seems to show that BIP-256 slabtop segment grow by
> Zdenek> ~73KB per each device (while dm-io is ab out ~26KB)
>
> Ok, I see it now that I tried with a bunch of DM devices.
>
> DM allocates a bioset per volume. And since each bioset has an integrity
> mempool you'll end up with a bunch of memory locked down. It seems like
> a lot but it's actually the same amount as we reserve for the data path
> (bio-0 + biovec-256).
>
> Since a bioset is not necessarily tied to a single block device we can't
> automatically decide whether to allocate the integrity pool or not. In
> the DM case, however, we just set up the integrity profile so the
> information is available.
>
> Can you please try the following patch? This will change things so we
> only attach an integrity pool to the bioset if the logical volume is
> integrity-capable.
Thanks for sorting this out. Can you post a patch that has a proper
header with your Signed-off-by? Also including Zdenek's Tested-by, and
my:
Acked-by: Mike Snitzer <snitzer@redhat.com>
Thanks,
Mike
p.s. one small nit:
> diff --git a/fs/bio.c b/fs/bio.c
> index 4bd454f..6e4a381 100644
> --- a/fs/bio.c
> +++ b/fs/bio.c
> @@ -1603,9 +1603,10 @@ void bioset_free(struct bio_set *bs)
> EXPORT_SYMBOL(bioset_free);
>
> /**
> - * bioset_create - Create a bio_set
> + * bioset_create_flags - Create a bio_set
> * @pool_size: Number of bio and bio_vecs to cache in the mempool
> * @front_pad: Number of bytes to allocate in front of the returned bio
> + * @flags: Flags that affect memory allocation
> *
> * Description:
> * Set up a bio_set to be used with @bio_alloc_bioset. Allows the caller
> @@ -1615,7 +1616,8 @@ EXPORT_SYMBOL(bioset_free);
> * Note that the bio must be embedded at the END of that structure always,
> * or things will break badly.
> */
> -struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
> +struct bio_set *bioset_create_flags(unsigned int pool_size,
> + unsigned int front_pad, unsigned int flags)
> {
> unsigned int back_pad = BIO_INLINE_VECS * sizeof(struct bio_vec);
> struct bio_set *bs;
> @@ -1636,7 +1638,8 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
> if (!bs->bio_pool)
> goto bad;
>
> - if (bioset_integrity_create(bs, pool_size))
> + if ((flags & BIOSET_NO_INTEGRITY) == 0 &&
> + bioset_integrity_create(bs, pool_size))
> goto bad;
>
> if (!biovec_create_pools(bs, pool_size))
We'd generally see: if (!(flags & BIOSET_NO_INTEGRITY) && ...)
but then the double negative is maybe a bit more challenging to deal
with (for a human). In the end, not a big deal either way
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
03-07-2011, 07:10 PM
Mike Snitzer
mirrored device with thousand of mappingtableentries
On Sun, Mar 06 2011 at 9:59pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:
> >>>>> "Zdenek" == Zdenek Kabelac <zkabelac@redhat.com> writes:
>
> Zdenek> My finding seems to show that BIP-256 slabtop segment grow by
> Zdenek> ~73KB per each device (while dm-io is ab out ~26KB)
>
> Ok, I see it now that I tried with a bunch of DM devices.
>
> DM allocates a bioset per volume. And since each bioset has an integrity
> mempool you'll end up with a bunch of memory locked down. It seems like
> a lot but it's actually the same amount as we reserve for the data path
> (bio-0 + biovec-256).
>
> Since a bioset is not necessarily tied to a single block device we can't
> automatically decide whether to allocate the integrity pool or not. In
> the DM case, however, we just set up the integrity profile so the
> information is available.
>
> Can you please try the following patch? This will change things so we
> only attach an integrity pool to the bioset if the logical volume is
> integrity-capable.
Hey Martin,
I just took the opportunity to review DM's blk_integrity code a bit more
closely -- with an eye towards stacking devices. I found an issue that
I think we need to fix that has to do with a DM device's limits being
established during do_resume() and not during table_load().
Unfortunately, a DM device's blk_integrity gets preallocated during
table_load(). dm_table_prealloc_integrity()'s call to
blk_integrity_register() establishes the blk_integrity's block_size.
But a DM device's queue_limits aren't stacked until a DM device is
resumed -- via dm_calculate_queue_limits().
For some background please see the patch header of this commit:
http://git.kernel.org/linus/754c5fc7ebb417
The final blk_integrity for the DM device isn't fully established until
do_resume()'s eventual call to dm_table_set_integrity() -- by passing a
template to blk_integrity_register(). dm_table_set_integrity() does
validate the 'block_size' of each DM devices' blk_integrity to make sure
they all match. So the code would catch the inconsistency should it
arise.
All I'm saying is: it's possible for a table_load() to not have the
awareness that a newly added device's queue_limits will cause the DM
device's final queue_limits to be increased (say a 4K device was
added to dm_device2, and dm_device2 is now being added to another
dm_device1).
So it seems we need to establish bi->sector_size during the final stage
of blk_integrity_register(), e.g. when a template is passed. Not sure
if you'd agree with that change in general but it'll work for DM because
the queue_limits are established before dm_table_set_integrity() is set.
Maybe revalidate/change the 'block_size' during the final stage in case
it changed?
Thanks,
Mike
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
03-07-2011, 07:22 PM
"Martin K. Petersen"
mirrored device with thousand of mappingtableentries
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> So it seems we need to establish bi->sector_size during the final
Mike> stage of blk_integrity_register(), e.g. when a template is passed.
Mike> Not sure if you'd agree with that change in general but it'll work
Mike> for DM because the queue_limits are established before
Mike> dm_table_set_integrity() is set.
Let me contemplate a bit since I already have some changes brewing in
this area (T10 now permits protection information to be placed at
non-logical block size intervals).
--
Martin K. Petersen Oracle Linux Engineering
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
03-08-2011, 05:51 AM
"Martin K. Petersen"
mirrored device with thousand of mappingtableentries
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> Thanks for sorting this out. Can you post a patch that has a
Mike> proper header with your Signed-off-by? Also including Zdenek's
Mike> Tested-by, and my:
I contemplated a bit and decided I liked the following approach
better. What do you think?
block: Require subsystems to explicitly allocate bio_set integrity mempool
MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.
Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.
Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
diff --git a/fs/bio.c b/fs/bio.c
index 4bd454f..06aebf9 100644
--- a/fs/bio.c
+++ b/fs/bio.c
@@ -1636,9 +1636,6 @@ struct bio_set *bioset_create(unsigned int pool_size, unsigned int front_pad)
if (!bs->bio_pool)
goto bad;
- if (bioset_integrity_create(bs, pool_size))
- goto bad;
-
if (!biovec_create_pools(bs, pool_size))
return bs;
@@ -1684,6 +1681,9 @@ static int __init init_bio(void)
if (!fs_bio_set)
panic("bio: can't allocate bios
");
+ if (bioset_integrity_create(fs_bio_set, BIO_POOL_SIZE))
+ panic("bio: can't create integrity pool
");
+
bio_split_pool = mempool_create_kmalloc_pool(BIO_SPLIT_ENTRIES,
sizeof(struct bio_pair));
if (!bio_split_pool)
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
03-08-2011, 04:13 PM
Mike Snitzer
mirrored device with thousand of mappingtableentries
On Tue, Mar 08 2011 at 1:51am -0500,
Martin K. Petersen <mkp@mkp.net> wrote:
> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
>
> Mike> Thanks for sorting this out. Can you post a patch that has a
> Mike> proper header with your Signed-off-by? Also including Zdenek's
> Mike> Tested-by, and my:
>
> I contemplated a bit and decided I liked the following approach
> better. What do you think?
>
>
> block: Require subsystems to explicitly allocate bio_set integrity mempool
>
> MD and DM create a new bio_set for every metadevice. Each bio_set has an
> integrity mempool attached regardless of whether the metadevice is
> capable of passing integrity metadata. This is a waste of memory.
>
> Instead we defer the allocation decision to MD and DM since we know at
> metadevice creation time whether integrity passthrough is needed or not.
>
> Automatic integrity mempool allocation can then be removed from
> bioset_create() and we make an explicit integrity allocation for the
> fs_bio_set.
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
In general this approach looks fine too. Doesn't address the concern I
had yesterday about the blk_integrity block_size relative to DM (load vs
resume) but I'd imagine you'd like to sort that out in a separate patch.
But I found a few DM issues below. Provided those are sorted out:
Acked-by: Mike Snitzer <snitzer@redhat.com>
(I'll defer to Jens and Neil for the block and MD bits)
Don't you need to bioset_free(pools->bs) if bioset_integrity_create()
fails? So you'd need a new 'free_bioset_and_out' label. Also seems
like maybe you had some extra whitespace on the goto?
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
03-10-2011, 03:11 PM
"Martin K. Petersen"
mirrored device with thousand of mappingtableentries
>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
Mike> In general this approach looks fine too. Doesn't address the
Mike> concern I had yesterday about the blk_integrity block_size
Mike> relative to DM (load vs resume) but I'd imagine you'd like to sort
Mike> that out in a separate patch.
Yeah, that'll come as part of my DIX1.1 patch set.
I believe I've addressed all your issues below.
block: Require subsystems to explicitly allocate bio_set integrity mempool
MD and DM create a new bio_set for every metadevice. Each bio_set has an
integrity mempool attached regardless of whether the metadevice is
capable of passing integrity metadata. This is a waste of memory.
Instead we defer the allocation decision to MD and DM since we know at
metadevice creation time whether integrity passthrough is needed or not.
Automatic integrity mempool allocation can then be removed from
bioset_create() and we make an explicit integrity allocation for the
fs_bio_set.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Acked-by: Mike Snitzer <snizer@redhat.com>