Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian Development (http://www.linux-archive.org/debian-development/)
-   -   Ideas for object-based git-like storage on Linux (http://www.linux-archive.org/debian-development/485924-ideas-object-based-git-like-storage-linux.html)

Philip Ashmore 02-07-2011 12:22 AM

Ideas for object-based git-like storage on Linux
 
On 06/02/11 23:40, Roger Leigh wrote:

Hi folks,

There are lots of Debian people out there using git, and some of them
have expressed interest over the years in having the ability to use
git as a filesystem in its own right (#477942 is an example of one
in a package I maintain).

I've finally got down to it and written all my thoughts on the topic
down in a mostly-organised form, which you can find at

http://www.codelibre.net/~rleigh/hashlink.pdf

This paper looks at the concept of object-based storage, and the
creation of "hashlinks", essentially symlinks which use hashes
rather than pathnames to refer to a file. Currently a complete
draft, which could probably use a little more editing.

Any thoughts or comments welcome; I'm just putting it out there
because I have no time to actually implement this at the moment,
but it's an interesting topic, and one which could potentially
revolutionise the way we use filesystems if done properly. I
started writing to organise my thinking on the matter, and I think
that through that I've actually got a basically implementable
robust design that would actually work very efficiently.

[For the curious, I thought I'd forego XeLaTeX and inkscape, and
write this in troff (-ms) and xfig/PIC. It's not too shabby for
a nearly 40 year old system, though I am not half as proficient in
it as I am with LaTeX.]


Regards,
Roger



You could take a look at http://sourceforge.net/projects/treedb/
which implements an object data store right now.

It doesn't have transactions or a security model yet but it does support 64 bit memory maps.
It's stable but not mature yet, and has plenty of tests/demos to get you started, in C and C++.

I'm working on v3c-schema now which will allow you to specify the object schema directly instead of
through C structure definitions + code.
This will for example allow you to do speed versus resource usage analysis more easily.

Comments + suggestions welcome.

Philip

Joey Hess 02-07-2011 01:23 AM

Ideas for object-based git-like storage on Linux
 
Roger Leigh wrote:
> There are lots of Debian people out there using git, and some of them
> have expressed interest over the years in having the ability to use
> git as a filesystem in its own right (#477942 is an example of one
> in a package I maintain).
>
> I've finally got down to it and written all my thoughts on the topic
> down in a mostly-organised form, which you can find at
>
> http://www.codelibre.net/~rleigh/hashlink.pdf
>
> This paper looks at the concept of object-based storage, and the
> creation of "hashlinks", essentially symlinks which use hashes
> rather than pathnames to refer to a file.

You may be interested in my git-annex program, which implements just
such a thing, although in user space, not kernel space.
http://git-annex.branchable.com/

joey@gnu:~/lib/sound/misc>ls -l dj_mooch_-_misc_-_01_-_dirty_as_mud.mp3
lrwxrwxrwx 1 joey joey 113 Nov 9 14:09 dj_mooch_-_misc_-_01_-_dirty_as_mud.mp3 -> ../.git/annex/objects/SHA1:717566db4265b4b3a986ba84e797df56c25923be/SHA1:717566db4265b4b3a986ba84e797df56c25923be

Note that once you have a hashlink, you can know that if another
filesystem elsewhere has the same hashlink, the content can be retrieved
from there. So the deduplication and general content management can
be done cross-filesystem. That is a basis of much of the good stuff
git-annex can do. For example:

joey@gnu:~/lib/big/new>ls -l debian-6.0.0-amd64-i386-netinst.iso
lrwxrwxrwx 1 joey joey 145 Feb 6 14:10 debian-6.0.0-amd64-i386-netinst.iso -> ../.git/annex/objects/WORM:1296961374:432142336:debian-6.0.0-amd64-i386-netinst.iso/WORM:1296961374:432142336:debian-6.0.0-amd64-i386-netinst.iso
joey@gnu:~/lib/big/new>git log --pretty=oneline debian-6.0.0-amd64-i386-netinst.iso
97dc2f3a5b5aec7671c7f282b8ea42a89f90b44a pre passport-sync
joey@gnu:~/lib/big/new>cat debian-6.0.0-amd64-i386-netinst.iso
cat: debian-6.0.0-amd64-i386-netinst.iso: No such file or directory
joey@gnu:~/lib/big/new>git annex get debian-6.0.0-amd64-i386-netinst.iso
get debian-6.0.0-amd64-i386-netinst.iso (copying from turtle...)


Also, Josh Triplett has some ideas about integrating git with dpkg for
fully versioned systems. I'm not sure if he's ready to make them
public, but you may find talking with him interesting.

--
see shy jo

Jens Peter Secher 02-07-2011 09:53 PM

Ideas for object-based git-like storage on Linux
 
On 2011-02-07 00:40, Roger Leigh wrote:
> http://www.codelibre.net/~rleigh/hashlink.pdf

Interesting! It seems that it would also be beneficial to
log-structured file systems [1].

I spotted a couple of typos:

Page 4:
"Files may be therefore be replaced" => "Files may therefore be replaced"

Page 8:
"that is to say it it written once" => "that is to say it is written once"
"storage is be used" => "storage is used"

Cheers,
--
Jens Peter Secher, GPG fingerprint 0EE5978AFE63E8A1.

A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D5077FA.6090506@debian.org">http://lists.debian.org/4D5077FA.6090506@debian.org

Roger Leigh 02-07-2011 10:51 PM

Ideas for object-based git-like storage on Linux
 
On Mon, Feb 07, 2011 at 11:53:46PM +0100, Jens Peter Secher wrote:
> On 2011-02-07 00:40, Roger Leigh wrote:
> > http://www.codelibre.net/~rleigh/hashlink.pdf
>
> Interesting! It seems that it would also be beneficial to
> log-structured file systems [1].
>
> I spotted a couple of typos:
>
> Page 4:
> "Files may be therefore be replaced" => "Files may therefore be replaced"
>
> Page 8:
> "that is to say it it written once" => "that is to say it is written once"
> "storage is be used" => "storage is used"

Many thanks for catching those; I've now fixed them.

BTW, I didn't see a reference to [1], but I'd be very interested to
look at it!


Thanks,
Roger

--
.'`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.

Jens Peter Secher 02-08-2011 03:16 PM

Ideas for object-based git-like storage on Linux
 
On 2011-2-8 0:51 , Roger Leigh wrote:

On Mon, Feb 07, 2011 at 11:53:46PM +0100, Jens Peter Secher wrote:

Interesting! It seems that it would also be beneficial to
log-structured file systems [1].


[1] http://en.wikipedia.org/wiki/Log-structured_file_system

Cheers,
/JP

--
Jens Peter Secher, GPG fingerprint 0EE5978AFE63E8A1.

A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D516C49.1030905@debian.org">http://lists.debian.org/4D516C49.1030905@debian.org

Roger Leigh 02-08-2011 10:59 PM

Ideas for object-based git-like storage on Linux
 
On Sun, Feb 06, 2011 at 10:23:00PM -0400, Joey Hess wrote:
> Roger Leigh wrote:
> > There are lots of Debian people out there using git, and some of them
> > have expressed interest over the years in having the ability to use
> > git as a filesystem in its own right (#477942 is an example of one
> > in a package I maintain).
> >
> > I've finally got down to it and written all my thoughts on the topic
> > down in a mostly-organised form, which you can find at
> >
> > http://www.codelibre.net/~rleigh/hashlink.pdf
> >
> > This paper looks at the concept of object-based storage, and the
> > creation of "hashlinks", essentially symlinks which use hashes
> > rather than pathnames to refer to a file.
>
> You may be interested in my git-annex program, which implements just
> such a thing, although in user space, not kernel space.
> http://git-annex.branchable.com/

I've been meaning to give git-annex a whirl for a while, but I'm
afraid I've lacked the time to get intimately acquainted with it.
From what I understand, in terms of what a single user could do with
it, it's looking pretty equivalent, the major difference being that
it's entirely in userspace. It's definitely on my TODO list.

I wanted to look at if it was possible to make multi-user store and
provide a lightweight kernel interface to access it. It might well
be possible to use git-annex as the storage backend for an initial
implementation!

Following the suggestion to look at log structured filesystems, I
took a look at things like Plan9's Venti. It's good stuff; my main
objection to most being that they are append-only, with no provision
for GC of no longer referenced data. I would consider that a
requirement for a general purpose store with rapid turnover of data,
especially if you're going to store working copies as well as things
of "commit quality", and even things you commit can be temporary for
rebasing etc.


Regards,
Roger

--
.'`. Roger Leigh
: :' : Debian GNU/Linux http://people.debian.org/~rleigh/
`. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/
`- GPG Public Key: 0x25BFB848 Please GPG sign your mail.

Goswin von Brederlow 02-12-2011 07:45 PM

Ideas for object-based git-like storage on Linux
 
Roger Leigh <rleigh@codelibre.net> writes:

> On Sun, Feb 06, 2011 at 10:23:00PM -0400, Joey Hess wrote:
>> Roger Leigh wrote:
>> > There are lots of Debian people out there using git, and some of them
>> > have expressed interest over the years in having the ability to use
>> > git as a filesystem in its own right (#477942 is an example of one
>> > in a package I maintain).
>> >
>> > I've finally got down to it and written all my thoughts on the topic
>> > down in a mostly-organised form, which you can find at
>> >
>> > http://www.codelibre.net/~rleigh/hashlink.pdf
>> >
>> > This paper looks at the concept of object-based storage, and the
>> > creation of "hashlinks", essentially symlinks which use hashes
>> > rather than pathnames to refer to a file.
>>
>> You may be interested in my git-annex program, which implements just
>> such a thing, although in user space, not kernel space.
>> http://git-annex.branchable.com/
>
> I've been meaning to give git-annex a whirl for a while, but I'm
> afraid I've lacked the time to get intimately acquainted with it.
> From what I understand, in terms of what a single user could do with
> it, it's looking pretty equivalent, the major difference being that
> it's entirely in userspace. It's definitely on my TODO list.
>
> I wanted to look at if it was possible to make multi-user store and
> provide a lightweight kernel interface to access it. It might well
> be possible to use git-annex as the storage backend for an initial
> implementation!

All you need is a little bit of fuse code. There really is no need to
invent a new kernel interface for this and something like this is best
kept in userspace to keep the complexity managable.

> Following the suggestion to look at log structured filesystems, I
> took a look at things like Plan9's Venti. It's good stuff; my main
> objection to most being that they are append-only, with no provision
> for GC of no longer referenced data. I would consider that a
> requirement for a general purpose store with rapid turnover of data,
> especially if you're going to store working copies as well as things
> of "commit quality", and even things you commit can be temporary for
> rebasing etc.
>
>
> Regards,
> Roger

MfG
Goswin


--
To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 87mxm16kyq.fsf@frosties.localnet">http://lists.debian.org/87mxm16kyq.fsf@frosties.localnet


All times are GMT. The time now is 09:27 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.