FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor


 
 
LinkBack Thread Tools
 
Old 02-22-2012, 01:14 PM
Joe Thornber
 
Default thinp zeroing

* Requirements

There are two distinct requirements for zeroing applicable to the
thin-provisioning target:

- Avoid data leaks (DATA_LEAK)

Consumers of thin devices may be using the same pool. eg, two
different vm guests provisioned from the same pool. We must
ensure that no data from one thin device appears on another in
newly provisioned areas.

- Provide guarantees about the presence/absence of sensitive data (ERASE)

eg, When decomissioning a guest vm, the host (running the pool)
wishes to guarantee that no data from that guest remains on the
data device.

* Implementing DATA_LEAK

Currently the DATA_LEAK requirement is enforced by zeroing every
newly provisioned thin device block. This zeroing can often be
elided if the write io triggering the provisioning completely covers
the block in question. This zeroing can be turned off at the pool
level if data leaks are not a concern (eg, a desktop system).
Already upstream.

* Implementing ERASE

** Erase on deallocation

The ERASE requirement is more difficult. Zeroing data blocks when
they are deallocated (ie. their ref count drops to zero after
deleting the device) sounds like a good approach, but this
introduces certain difficulties, mainly:

- To retain our crash recovery properties, the zeroing cannot occur
until after the next commit. Extra on disk metadata would need to
be stored to keep track of these blocks that need zeroing. A
commit would trigger a storm of io; currently the cost of a
copy-on-write exception is paid immediately by the io that
triggers it. Building up delayed work like this makes it very
hard to give performance estimates.

** Erasing from userland.

Zeroing all unshared blocks when deleting a thin device will create
a lot of io, I'd much rather this was being managed by userland. I
really don't want message ioctls that take many minutes to
complete.

The 'held_root' facility allows userland to read a snapshot of the
metadata in a live pool. [Note: this is a snapshot of the
metadata, not a snapshot of data]. The following will implement
ERASE in userland.

- Deactivate all thin volumes that you wish to erase.

Failure to deactivate would mean the mappings for the thins could
be out of date by the time userland reads them. There is no
mechanism for enforcing this at the device mapper level; but
userland can easily do this (eg, lvm2 already has a comprehensive
locking scheme that will handle this). It should also be pointed
out that if you try and erase a volume while you're still using it,
you are an idiot.

- Grab a 'held_root'

- Read the mappings for all of the thins you wish to erase.

- Work out which data blocks are used exclusively by this subset of
thins.

- Write zeroes across these blocks.

- send a thin-delete message to the pool for each thin.

** Crash recovery

If we crash during the copy-on-write or provision operation the
recovery process needs to zero those new, but not committed,
blocks. This requires the introduction of an 'erase log' to the
metadata format. This log would need to be committed *before* the
copy/overwrite operation could proceed.

I've implemented such an erase log [see patch], to get an idea of
the performance overhead. Testing in ideal conditions (ie. large
writes that are triggering many provision/copy operations so costs
can be amortised), we see a 25% slowdown in throughput of
provision/copy operations. Better than I feared.

We can improve the performance significantly by observing that it's
harmless to do too much zeroing of unprovisioned blocks on
recovery. This suggests a scheme similar to a mirror log, where we
mark regions of the data volume that we have pending provision/copy
operations. When we recover we just zero *all* unallocated blocks
in these marked regions. This will result in fewer commits, since
newly allocated blocks will commonly come from the same region and
so avoid the need for a commit. [TODO: get a proof of concept
patch together].

** Discards

DISCARDs *must* result in data being zeroed. Some devices set the
discard_zeroes_data flag. This is not good enough; you cannot use
this flag as a guarantee that the data no longer exists on the
disk. So real zeroing must occur. I suggest we write a separate
target that zeroes data just before discarding it, and stack it
under the thin-pool. The performance impact of this will be
significant; to the point that we may wish to turn discard within
the fs off; instead doing periodic tidy-ups.

** Avoid redundant copying

The calculation to say whether a block is shared or not (and thus
liable to suffer a copy-on-write exception), is an approximation.
It sometimes says something is shared when it isn't, which causes
us a problem wrt ERASE. To avoid leaving orphaned copies of data,
we must either tighten up the sharing detection [patch in the
works], or zero the old block (via discard).

** Summary of work items [0/5]

Too much for linux 3.4 timeframe.

- [ ] Change the shared block detection [1 day, worth doing anyway]

- [ ] Bitmap based erase log [1 week]

- [ ] Recovery tool that zeroes unallocated blocks in dirty regions [1 week]

- [ ] Implement the discard-really-zeroes target [1 month]

- [ ] Write thin_erase userland tool [1 week]

- [ ] Update lvm2 tools [3 months]

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-22-2012, 03:30 PM
Spelic
 
Default thinp zeroing

On 02/22/12 15:14, Joe Thornber wrote:

* Requirements

There are two distinct requirements for zeroing applicable to the
thin-provisioning target:

- Avoid data leaks (DATA_LEAK)

* Implementing DATA_LEAK


* Implementing ERASE

** Erase on deallocation


** Erasing from userland.

** Crash recovery

** Discards




Hello
thanks for all your hard work regarding thinp

I was thinking: why don't you implement a bitmap that takes care of
emulating the discard functionality?


This would take care of all your issues above, and also be great for a
lot of use cases even outside thinp (*).


Every read would first hit the bitmap; if the bitmap says that the
region has been discarded, thinp would return zeroes to the requestor.


When a discard comes, you first set the bits in the
discard-emulation-bitmap, and then also pass the discard to layers
below. Passing the discard below has no user-visible effects (because
discard is already implemented in thinp) however it is still
advantageous to pass it to lower layers because there might be SSDs
below thinp which can benefit from the discard.


I suggest a bitmap of 4kbytes / bit, and then if a discard comes that is
not 4K aligned (that would be a mistake of the above layers, at least a
"performance" mistake), you set the bitmaps only for the bits which are
completely covered by the discard, and then you are left with at most
two misaligned edges one at the beginning and one at the end of the
discard region, and for those you will need to write zeroes to the
layers below. So in the worst case you need to set a few bits and then
perform two small writes of zeroes, but in most cases you just set a few
bits.


(*) remember that most MD Raid levels do not pass discards below, so we
-raid users- cannot really see zeroes where discard has been triggered.
That's a problem when we want to backup a virtual machine disk image (DM
volume) from the outside: non-zeroes don't compress well; it's like we
backup deleted files everytime.


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-22-2012, 05:04 PM
Zdenek Kabelac
 
Default thinp zeroing

Dne 22.2.2012 17:30, Spelic napsal(a):
> On 02/22/12 15:14, Joe Thornber wrote:
>> * Requirements
>>
>> There are two distinct requirements for zeroing applicable to the
>> thin-provisioning target:
>>
>> - Avoid data leaks (DATA_LEAK)
>>
>> * Implementing DATA_LEAK
>>
>>
>> * Implementing ERASE
>>
>> ** Erase on deallocation
>>
>>
>> ** Erasing from userland.
>>
>> ** Crash recovery
>>
>> ** Discards
>>
>>
>
> Hello
> thanks for all your hard work regarding thinp
>
> I was thinking: why don't you implement a bitmap that takes care of emulating
> the discard functionality?
>
> This would take care of all your issues above, and also be great for a lot of
> use cases even outside thinp (*).
>
> Every read would first hit the bitmap; if the bitmap says that the region has
> been discarded, thinp would return zeroes to the requestor.
>
> When a discard comes, you first set the bits in the discard-emulation-bitmap,
> and then also pass the discard to layers below. Passing the discard below has
> no user-visible effects (because discard is already implemented in thinp)
> however it is still advantageous to pass it to lower layers because there
> might be SSDs below thinp which can benefit from the discard.
>
> I suggest a bitmap of 4kbytes / bit, and then if a discard comes that is not
> 4K aligned (that would be a mistake of the above layers, at least a
> "performance" mistake), you set the bitmaps only for the bits which are
> completely covered by the discard, and then you are left with at most two
> misaligned edges one at the beginning and one at the end of the discard
> region, and for those you will need to write zeroes to the layers below. So in
> the worst case you need to set a few bits and then perform two small writes of
> zeroes, but in most cases you just set a few bits.
>
> (*) remember that most MD Raid levels do not pass discards below, so we -raid
> users- cannot really see zeroes where discard has been triggered. That's a
> problem when we want to backup a virtual machine disk image (DM volume) from
> the outside: non-zeroes don't compress well; it's like we backup deleted files
> everytime.
>

For backups there will be much better solution which will be able to get list
of provisioned blocks for a device (in case of snapshot - diffs).

IMHO Bitmaps are expensive - as you may observe with certain extX operations.

Zdenek


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-23-2012, 01:49 AM
Mike Snitzer
 
Default thinp zeroing

Nice write-up. It is concerning that we have to go to such lengths but
I don't see a way around it without limiting who can consume thinp.

On Wed, Feb 22 2012 at 9:14am -0500,
Joe Thornber <thornber@redhat.com> wrote:

> ** Discards
>
> DISCARDs *must* result in data being zeroed. Some devices set the
> discard_zeroes_data flag. This is not good enough; you cannot use
> this flag as a guarantee that the data no longer exists on the
> disk. So real zeroing must occur. I suggest we write a separate
> target that zeroes data just before discarding it, and stack it
> under the thin-pool. The performance impact of this will be
> significant; to the point that we may wish to turn discard within
> the fs off; instead doing periodic tidy-ups.

...

> ** Summary of work items [0/5]
>
> - [ ] Implement the discard-really-zeroes target [1 month]

I don't think it'll take a month. Probably a focused week to 2 weeks.

I can develop this target before jumping in to the HSM target (unless
you'd rather I start in on HSM asap).

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-23-2012, 02:43 PM
Joe Thornber
 
Default thinp zeroing

On Wed, Feb 22, 2012 at 05:30:04PM +0100, Spelic wrote:
> On 02/22/12 15:14, Joe Thornber wrote:
> >* Requirements
> >
> > There are two distinct requirements for zeroing applicable to the
> > thin-provisioning target:
> >
> > - Avoid data leaks (DATA_LEAK)
> >
> >* Implementing DATA_LEAK
> >
> >
> >* Implementing ERASE
> >
> >** Erase on deallocation
> >
> >
> >** Erasing from userland.
> >
> >** Crash recovery
> >
> >** Discards
> >
> >
>
> Hello
> thanks for all your hard work regarding thinp
>
> I was thinking: why don't you implement a bitmap that takes care of
> emulating the discard functionality?
>
> This would take care of all your issues above, and also be great for
> a lot of use cases even outside thinp (*).
>
> Every read would first hit the bitmap; if the bitmap says that the
> region has been discarded, thinp would return zeroes to the
> requestor.

Already done, the first thing a discard bio does is remove mappings
from the btree. It's then (optionally) handed down to the underlying
device.

> I suggest a bitmap of 4kbytes / bit, and then if a discard comes
> that is not 4K aligned (that would be a mistake of the above layers,
> at least a "performance" mistake), you set the bitmaps only for the
> bits which are completely covered by the discard, and then you are
> left with at most two misaligned edges one at the beginning and one
> at the end of the discard region, and for those you will need to
> write zeroes to the layers below. So in the worst case you need to
> set a few bits and then perform two small writes of zeroes, but in
> most cases you just set a few bits.

Things like SSDs that set the discard_zeroes_data flag are only saying
that they'll return zeroes if you read from this area. This is
different from promising the data has been overwritten with zeroes on
the disk. Hence the need in the ERASE case for real writes across the
discarded area.

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-23-2012, 02:47 PM
Joe Thornber
 
Default thinp zeroing

On Wed, Feb 22, 2012 at 09:49:18PM -0500, Mike Snitzer wrote:
> Nice write-up. It is concerning that we have to go to such lengths but
> I don't see a way around it without limiting who can consume thinp.
>
> On Wed, Feb 22 2012 at 9:14am -0500,
> Joe Thornber <thornber@redhat.com> wrote:
>
> > ** Discards
> >
> > DISCARDs *must* result in data being zeroed. Some devices set the
> > discard_zeroes_data flag. This is not good enough; you cannot use
> > this flag as a guarantee that the data no longer exists on the
> > disk. So real zeroing must occur. I suggest we write a separate
> > target that zeroes data just before discarding it, and stack it
> > under the thin-pool. The performance impact of this will be
> > significant; to the point that we may wish to turn discard within
> > the fs off; instead doing periodic tidy-ups.
>
> ...
>
> > ** Summary of work items [0/5]
> >
> > - [ ] Implement the discard-really-zeroes target [1 month]
>
> I don't think it'll take a month. Probably a focused week to 2 weeks.

By the time you include getting it through agk I think a month is
highly optimistic.

> I can develop this target before jumping in to the HSM target (unless
> you'd rather I start in on HSM asap).

HSM is the priority please. ERASE can wait until later. Plus given
the development effort and performance impact I think there are other
alternatives we should consider (such as using dm-crypt on each thin,
and throwing away the keys when you delete it).

- Joe

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 11:55 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org