FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 06-11-2008, 01:57 PM
Adam Hamsik
 
Default NetBSD libdevmapper port

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

*** Please keep me on CC when you replying from dm-devel list I'm not
currently subscribed there. ***


My name is Adam Hamsik and this summer I'm working on GSOC project
Write and improve LVM driver. As my task I have rewritten (We can't
include GPL software into NetBSD kernel therefore I have rewritten it
from scratch)dm like driver for NetBSD and ported libdevmapper +
lvm2tools to NetBSD. There are some small differences between

Linux and NetBSD such as:

a) NetBSD doesn't use proc filesystem like linux does.

b) NetBSD has 2 types of devices(when we talk about disk devices) raw
and block.
raw device is char device which allow users directly from
device(without buffer cache).


After the first implementation of dm-driver my mentor suggested to
rewrite/clean dm-ioctl interface to be more NetBSD-like. I have
implemented new interface based on proplib library [1] it is based on
work

made by apple [2]. I have added 3 files to libdevmapper

include/netbsd/netbsd-dm.h -> file shared between kernel driver and
libdevmapper.
lib/ioctl/libdm_netbsd.c -> file with external functions for parsing
native NetBSD proplist dictionaries to

libdevmapper dm_ioctl structure.

lib/ioctl/libdm-nbsd-iface.c -> this is copied/changed libdm-iface.c I
found that number of NetBSD specific changes is bigger than I can
easily #ifdef them, therefore I have added new

NetBSD interface file.

I have created patch against latest release of libdevmapper, it would
be great if we will be able to manage
commiting of this patch to libdevmapper main repo. My patch is
currently not ready to commit, it needs major cleanup, but I thought
that it would be good to let dm developers know about my effort and
show my work to them.


My patch is located here [3], I have uploaded patch against lvm2tools,
too. But it is patch against 2.02.28.
Because there were quite massive changes to lvm2tools in latest
releases I will report lvm2tools again and

merge my changes with latest lvm release.

There is also my BSD licensed device-mapper driver which is located
here [5].


Any suggestions, comments are welcome.


[1] http://netbsd.gw.com/cgi-bin/man-cgi/man?proplib+3+NetBSD-current
[2] http://developer.apple.com/documentation/Darwin/Reference/ManPages/man5/plist.5.html
[3] http://www.netbsd.org/~haad/libdevmapper_netbsd.diff
[4] http://www.netbsd.org/~haad/lvm2_netbsd.diff
[5] http://www.netbsd.org/~haad/dm20080610.tar.bz2

Regards

Adam.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFIT9nelIxPgX3Go0MRAmU5AJ4+HgffqYCUAfotrS2Isy E9Wjth9ACgmd20
XcqvpqhmlgBy7/rlzdptjTM=
=W1/N
-----END PGP SIGNATURE-----

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-13-2008, 10:46 AM
Alasdair G Kergon
 
Default NetBSD libdevmapper port

On Wed, Jun 11, 2008 at 03:57:49PM +0200, Adam Hamsik wrote:
> *** Please keep me on CC when you replying from dm-devel list I'm not
> currently subscribed there. ***

> I have created patch against latest release of libdevmapper, it would
> be great if we will be able to manage
> commiting of this patch to libdevmapper main repo. My patch is
> currently not ready to commit, it needs major cleanup, but I thought
> that it would be good to let dm developers know about my effort and
> show my work to them.

We'd be happy to include such patches in the userspace tools and
libraries.

Just ensure that all the differences are abstracted suitably so
that 'configure' can deal with them without breaking linux:-)

Note that we're part way through the process of merging the
device-mapper and lvm2 source trees: when this is complete
releases will consist of a single tarball with a single
'configure.in' (and an option just to build the former
device-mapper components).

Rather than sending one patch at the end, it's probably better to send
smaller patches as you go along, as and when logically-coherent sets of
changes are ready. Also, try to avoid copying and pasting code into new
files where very little is changing - create internal library functions
or include files and share them instead, refactoring existing code
as required.

Alasdair
--
agk@redhat.com

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-13-2008, 04:51 PM
Mikulas Patocka
 
Default NetBSD libdevmapper port

Hi

If you are rewriting it --- have you somehow thought about avoiding
suspend?


A big source of problems in Linux is, that when you suspend a device, you
can do only a limited set of calls --- basically, you must avoid anything
that could possibly wait for I/O or allocate memory --- or you might end
up waiting for the suspended device and deadlock. And I know still about 3
places in kernel that have this error possibility.


A Linux LVM does something like: suspend old table, write to disk with
direct i/o, resume new table. I'd suggest that you invent some method how
to batch these operations into single syscall --- or you run into a
several years of deadlock problems on NetBSD --- basically, on Linux, we
have to preallocate a stack and heap (so that running LVM process won't
cause a page fault --- the question --- how to do it portably on all
NetBSD architectures?), mlock the process, make sure that we don't open
files or write anything to terminal while suspended, make sure that the
ioctl syscall doesn't allocate anything (currently false, it won't
deadlock but it could randomly fail), make sure that O_DIRECT write
syscall doesn't allocate anything or wait for other I/O (currently it
false, there is a deadlock possibility) ... etc.


--- if you port lvm2 as it is, you'll have to audit (and maybe rewrite)
many parts of NetBSD kernel for not waiting for I/O. If you do it badly,
you'll get deadlocks.


This suspend thing was a big misdesign and if you are writing it from
scratch, try to avoid it.


Mikulas


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

*** Please keep me on CC when you replying from dm-devel list I'm not
currently subscribed there. ***


My name is Adam Hamsik and this summer I'm working on GSOC project Write and
improve LVM driver. As my task I have rewritten (We can't include GPL
software into NetBSD kernel therefore I have rewritten it from scratch)dm
like driver for NetBSD and ported libdevmapper + lvm2tools to NetBSD. There
are some small differences between

Linux and NetBSD such as:

a) NetBSD doesn't use proc filesystem like linux does.

b) NetBSD has 2 types of devices(when we talk about disk devices) raw and
block.
raw device is char device which allow users directly from device(without
buffer cache).


After the first implementation of dm-driver my mentor suggested to
rewrite/clean dm-ioctl interface to be more NetBSD-like. I have implemented
new interface based on proplib library [1] it is based on work

made by apple [2]. I have added 3 files to libdevmapper

include/netbsd/netbsd-dm.h -> file shared between kernel driver and
libdevmapper.
lib/ioctl/libdm_netbsd.c -> file with external functions for parsing native
NetBSD proplist dictionaries to

libdevmapper dm_ioctl structure.

lib/ioctl/libdm-nbsd-iface.c -> this is copied/changed libdm-iface.c I found
that number of NetBSD specific changes is
bigger than I can easily #ifdef them, therefore I have added new

NetBSD interface file.

I have created patch against latest release of libdevmapper, it would be
great if we will be able to manage
commiting of this patch to libdevmapper main repo. My patch is currently not
ready to commit, it needs major cleanup, but I thought that it would be good
to let dm developers know about my effort and show my work to them.


My patch is located here [3], I have uploaded patch against lvm2tools, too.
But it is patch against 2.02.28.
Because there were quite massive changes to lvm2tools in latest releases I
will report lvm2tools again and

merge my changes with latest lvm release.

There is also my BSD licensed device-mapper driver which is located here [5].

Any suggestions, comments are welcome.


[1] http://netbsd.gw.com/cgi-bin/man-cgi/man?proplib+3+NetBSD-current
[2]
http://developer.apple.com/documentation/Darwin/Reference/ManPages/man5/plist.5.html

[3] http://www.netbsd.org/~haad/libdevmapper_netbsd.diff
[4] http://www.netbsd.org/~haad/lvm2_netbsd.diff
[5] http://www.netbsd.org/~haad/dm20080610.tar.bz2

Regards

Adam.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFIT9nelIxPgX3Go0MRAmU5AJ4+HgffqYCUAfotrS2Isy E9Wjth9ACgmd20
XcqvpqhmlgBy7/rlzdptjTM=
=W1/N
-----END PGP SIGNATURE-----

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-13-2008, 05:34 PM
Alasdair G Kergon
 
Default NetBSD libdevmapper port

On Fri, Jun 13, 2008 at 12:51:48PM -0400, Mikulas Patocka wrote:
> This suspend thing was a big misdesign and if you are writing it from
> scratch, try to avoid it.

Despite how things might seem, we gained some substantial
simplifications by doing things this way and I reject the term
'misdesign':-) If writing this from scratch under similar time
constraints, we would still handle this a similar way.

But this suspend/resume mechanism was never intended to last as long as
it has - it was meant to be a stepping stone to a more-sophisticated
transaction mechanism that we have still not found time to develop,
mostly because this existing mechanism is "good enough" most of the
time.

Alasdair
--
agk@redhat.com

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-13-2008, 10:12 PM
Mikulas Patocka
 
Default NetBSD libdevmapper port

On Fri, 13 Jun 2008, Alasdair G Kergon wrote:


On Fri, Jun 13, 2008 at 12:51:48PM -0400, Mikulas Patocka wrote:

This suspend thing was a big misdesign and if you are writing it from
scratch, try to avoid it.


Despite how things might seem, we gained some substantial
simplifications by doing things this way and I reject the term
'misdesign':-) If writing this from scratch under similar time
constraints, we would still handle this a similar way.


If you sum up the time that you spent looking for the deadlocks (for
example that one when writing to raw device while suspended updated inode
mtime field and the kernel tried to write that inode to the suspended root
device), thinking how to do proper memory and stack preallocation and
locking, how to allocate structure for ioctl arguments (that trick with
PF_MEMALLOC, still not bug-free, but at least works in typical scenarios)
+ plus the time still needed to fix things that are still broken (that
GFP_KERNEL allocation in direct IO path, PF_MEMALLOC allocation in ioctl
parameter copying) --- and compare this time to the hypothetical time how
long it would take to write IOCTL call that performs few operations in a
batch and never returns to userspace with suspended device --- which one
do you think would win? I think the second solution would be written
faster and with fewer bugs.


BTW. is there any need to update on-disk metadata while suspended? What
would happen if we first updated metadata and then did suspend+resume in
just one syscall? For linear or snapshots, there should be no problem, I'm
interested if there's a dm target where this could produce a race
condition.


Mikulas


But this suspend/resume mechanism was never intended to last as long as
it has - it was meant to be a stepping stone to a more-sophisticated
transaction mechanism that we have still not found time to develop,
mostly because this existing mechanism is "good enough" most of the
time.

Alasdair
--
agk@redhat.com

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-14-2008, 09:58 AM
Brett Lymn
 
Default NetBSD libdevmapper port

Hi - I am mentoring Adam in this project

On Fri, Jun 13, 2008 at 12:51:48PM -0400, Mikulas Patocka wrote:
>
> If you are rewriting it --- have you somehow thought about avoiding
> suspend?
>

I assume you are talking about handling a suspend of a device
underlying the LVM. At the moment, NetBSD does not have the facility
to suspend a device in this manner so I dont think we will have this
issue (yet).

>
> A Linux LVM does something like: suspend old table, write to disk with
> direct i/o, resume new table. I'd suggest that you invent some method how
> to batch these operations into single syscall

I think this should be doable with the API that Adam is using - in
NetBSD there is a thing called proplib which is a method of passing a
very limited form of XML into the kernel - Adam is using proplib to
pass the lvm parameters into the kernel.

> --- the question --- how to do it portably on all
> NetBSD architectures?),

Actually, most of the operations you have listed are, from memory,
machine independent code - there is very little of the kernel that
actually is architecture specific we work hard to keep it that way.

>
> --- if you port lvm2 as it is, you'll have to audit (and maybe rewrite)
> many parts of NetBSD kernel for not waiting for I/O. If you do it badly,
> you'll get deadlocks.
>

The head of the bleeding edge NetBSD code (netbsd-current) is having a
lot of work done on it to make the kernel re-entrant and
multi-threaded. This may be a bit of a bonus in terms of what you are
saying because there should be locks that need to be held to perform
the i/o - it may be a case of just making the acquiring of these locks
non-blocking (or it may not be an issue at all). We shall need to
wait and see on that one.

Thanks for the input.

--
Brett Lymn
"Warning:
The information contained in this email and any attached files is
confidential to BAE Systems Australia. If you are not the intended
recipient, any use, disclosure or copying of this email or any
attachments is expressly prohibited. If you have received this email
in error, please notify us immediately. VIRUS: Every care has been
taken to ensure this email and its attachments are virus free,
however, any loss or damage incurred in using this email is not the
sender's responsibility. It is your responsibility to ensure virus
checks are completed before installing any data sent in this email to
your computer."


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-16-2008, 02:24 AM
Mikulas Patocka
 
Default NetBSD libdevmapper port

On Sat, 14 Jun 2008, Brett Lymn wrote:


Hi - I am mentoring Adam in this project

On Fri, Jun 13, 2008 at 12:51:48PM -0400, Mikulas Patocka wrote:


If you are rewriting it --- have you somehow thought about avoiding
suspend?



I assume you are talking about handling a suspend of a device
underlying the LVM. At the moment, NetBSD does not have the facility
to suspend a device in this manner so I dont think we will have this
issue (yet).


It's good, so you can think twice and code once


A Linux LVM does something like: suspend old table, write to disk with
direct i/o, resume new table. I'd suggest that you invent some method how
to batch these operations into single syscall


I think this should be doable with the API that Adam is using - in
NetBSD there is a thing called proplib which is a method of passing a
very limited form of XML into the kernel - Adam is using proplib to
pass the lvm parameters into the kernel.


--- the question --- how to do it portably on all
NetBSD architectures?),


Actually, most of the operations you have listed are, from memory,
machine independent code - there is very little of the kernel that
actually is architecture specific we work hard to keep it that way.


See function in LVM2/lib/mm/memlock.c - _allocate_memory

It attempts to prepare memory for locking, so that there will be no more
page faults while some device is suspended. It could fail if:


* if you have a different heap algorithm that allocates temp_malloc_mem
block in a separate chunk
* if you have an architecture with separate stacks for stack data and for
register windows (I don't know if such exists) - then alloca(_size_stack)
will preallocate just one stack, not both
* if a running process can take some additional faults specific to a given
architecture that allocate memory (for example fault on FPU instruction
allocating FPU context --- I don't know how precisely you have it
implemented).


- these are very brittle requirements and if someone forgets about them,
then LVM will be deadlocky. And these requirements span a lot of
LVM-independent code and it's hard to enforce all libc/kernel engineers to
think about LVM-specific peculiarities.



--- if you port lvm2 as it is, you'll have to audit (and maybe rewrite)
many parts of NetBSD kernel for not waiting for I/O. If you do it badly,
you'll get deadlocks.



The head of the bleeding edge NetBSD code (netbsd-current) is having a
lot of work done on it to make the kernel re-entrant and
multi-threaded. This may be a bit of a bonus in terms of what you are
saying because there should be locks that need to be held to perform
the i/o - it may be a case of just making the acquiring of these locks
non-blocking (or it may not be an issue at all). We shall need to
wait and see on that one.


Any non-atomic memory allocation can wait for I/O. So if you allocate some
kernel memory for example in that tiny XML parser, this could trigger
dirty page writeout, the writeout may be directed to the suspended device
and you are locked up. (atomic allocations may fail anytime --- for
example if too many packets come from network and exhaust the atomic
reserve --- so they are not perfect solution too)


If you solve this problem somehow generally (for example batching suspend
and ioctls into one call), I'd support bacporting that solution to Linux.


Mikulas


Thanks for the input.


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-16-2008, 10:03 PM
Adam Hamsik
 
Default NetBSD libdevmapper port

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Jun,Friday 13 2008, at 6:51 PM, Mikulas Patocka wrote:


Hi

If you are rewriting it --- have you somehow thought about avoiding
suspend?


device suspend/resume are unsupported for now. I was not sure what
these commands should really do.

Because GPL/BSD licensing issues I haven't look at linux code.




A big source of problems in Linux is, that when you suspend a
device, you can do only a limited set of calls --- basically, you
must avoid anything that could possibly wait for I/O or allocate
memory --- or you might end up waiting for the suspended device and
deadlock. And I know still about 3 places in kernel that have this
error possibility.


ok but why I should want to do something like suspending of device ?
to avoid IO operations when I'm changing
device table or what problem are device suspend/resume want to solve.
If this is problem I can use one mutex
shared between dm_dev_suspend/resume_ioctl call and dmstrategy(this
routine does IO) so I can avoid IO's when

device is suspended.




A Linux LVM does something like: suspend old table, write to disk
with direct i/o, resume new table. I'd suggest that you invent some
method how to batch these operations into single syscall --- or you
run into a several years of deadlock problems on NetBSD ---
basically, on Linux, we have to preallocate a stack and heap (so
that running LVM process won't cause a page fault --- the question
--- how to do it portably on all NetBSD architectures?), mlock the
process, make sure that we don't open files or write anything to
terminal while suspended, make sure that the ioctl syscall doesn't
allocate anything (currently false, it won't deadlock but it could
randomly fail), make sure that O_DIRECT write syscall doesn't
allocate anything or wait for other I/O (currently it false, there
is a deadlock possibility) ... etc.


--- if you port lvm2 as it is, you'll have to audit (and maybe
rewrite) many parts of NetBSD kernel for not waiting for I/O. If you
do it badly, you'll get deadlocks.


I want definitely avoid such massive changes to our kernel.




This suspend thing was a big misdesign and if you are writing it
from scratch, try to avoid it.


Regards

Adam.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)

iD8DBQFIVuNElIxPgX3Go0MRAvoaAJ9TC+dcYmp9S7yv/0un4YcMUE+AwwCeP1G0
C5o2j8D/+heQATR8eewYd34=
=dmZO
-----END PGP SIGNATURE-----

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 06-16-2008, 11:26 PM
Mikulas Patocka
 
Default NetBSD libdevmapper port

On Tue, 17 Jun 2008, Adam Hamsik wrote:


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


On Jun,Friday 13 2008, at 6:51 PM, Mikulas Patocka wrote:


Hi

If you are rewriting it --- have you somehow thought about avoiding
suspend?


device suspend/resume are unsupported for now. I was not sure what these
commands should really do.

Because GPL/BSD licensing issues I haven't look at linux code.




A big source of problems in Linux is, that when you suspend a device, you
can do only a limited set of calls --- basically, you must avoid anything
that could possibly wait for I/O or allocate memory --- or you might end up
waiting for the suspended device and deadlock. And I know still about 3
places in kernel that have this error possibility.


ok but why I should want to do something like suspending of device ? to
avoid IO operations when I'm changing device table


Yes, exactly that. For example if you are about to move a logical volume
somewhere else, lvm replaces it with dm-raid1 target that does the
copying. If there were old IOs for the old linear target flying around
while dm-raid1 copies the data, there would be data corruption.


It also updates on-disk metadata while it's suspended. I'm not sure if
this is really needed to update them at this point. Maybe at some cases it
is.


or what problem are device suspend/resume want to solve. If this
is problem I can use one mutex
shared between dm_dev_suspend/resume_ioctl call and dmstrategy(this routine
does IO) so I can avoid IO's when device is suspended.


The problem is that when you lock this mutex, you must avoid any memory
allocation (because it may submit write IO and wait for it). And it's very
hard to do it if you return to userspace.


Mikulas

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 08:16 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org