Ideas for object-based git-like storage on Linux
On 06/02/11 23:40, Roger Leigh wrote:
Hi folks, There are lots of Debian people out there using git, and some of them have expressed interest over the years in having the ability to use git as a filesystem in its own right (#477942 is an example of one in a package I maintain). I've finally got down to it and written all my thoughts on the topic down in a mostly-organised form, which you can find at http://www.codelibre.net/~rleigh/hashlink.pdf This paper looks at the concept of object-based storage, and the creation of "hashlinks", essentially symlinks which use hashes rather than pathnames to refer to a file. Currently a complete draft, which could probably use a little more editing. Any thoughts or comments welcome; I'm just putting it out there because I have no time to actually implement this at the moment, but it's an interesting topic, and one which could potentially revolutionise the way we use filesystems if done properly. I started writing to organise my thinking on the matter, and I think that through that I've actually got a basically implementable robust design that would actually work very efficiently. [For the curious, I thought I'd forego XeLaTeX and inkscape, and write this in troff (-ms) and xfig/PIC. It's not too shabby for a nearly 40 year old system, though I am not half as proficient in it as I am with LaTeX.] Regards, Roger You could take a look at http://sourceforge.net/projects/treedb/ which implements an object data store right now. It doesn't have transactions or a security model yet but it does support 64 bit memory maps. It's stable but not mature yet, and has plenty of tests/demos to get you started, in C and C++. I'm working on v3c-schema now which will allow you to specify the object schema directly instead of through C structure definitions + code. This will for example allow you to do speed versus resource usage analysis more easily. Comments + suggestions welcome. Philip |
Ideas for object-based git-like storage on Linux
Roger Leigh wrote:
> There are lots of Debian people out there using git, and some of them > have expressed interest over the years in having the ability to use > git as a filesystem in its own right (#477942 is an example of one > in a package I maintain). > > I've finally got down to it and written all my thoughts on the topic > down in a mostly-organised form, which you can find at > > http://www.codelibre.net/~rleigh/hashlink.pdf > > This paper looks at the concept of object-based storage, and the > creation of "hashlinks", essentially symlinks which use hashes > rather than pathnames to refer to a file. You may be interested in my git-annex program, which implements just such a thing, although in user space, not kernel space. http://git-annex.branchable.com/ joey@gnu:~/lib/sound/misc>ls -l dj_mooch_-_misc_-_01_-_dirty_as_mud.mp3 lrwxrwxrwx 1 joey joey 113 Nov 9 14:09 dj_mooch_-_misc_-_01_-_dirty_as_mud.mp3 -> ../.git/annex/objects/SHA1:717566db4265b4b3a986ba84e797df56c25923be/SHA1:717566db4265b4b3a986ba84e797df56c25923be Note that once you have a hashlink, you can know that if another filesystem elsewhere has the same hashlink, the content can be retrieved from there. So the deduplication and general content management can be done cross-filesystem. That is a basis of much of the good stuff git-annex can do. For example: joey@gnu:~/lib/big/new>ls -l debian-6.0.0-amd64-i386-netinst.iso lrwxrwxrwx 1 joey joey 145 Feb 6 14:10 debian-6.0.0-amd64-i386-netinst.iso -> ../.git/annex/objects/WORM:1296961374:432142336:debian-6.0.0-amd64-i386-netinst.iso/WORM:1296961374:432142336:debian-6.0.0-amd64-i386-netinst.iso joey@gnu:~/lib/big/new>git log --pretty=oneline debian-6.0.0-amd64-i386-netinst.iso 97dc2f3a5b5aec7671c7f282b8ea42a89f90b44a pre passport-sync joey@gnu:~/lib/big/new>cat debian-6.0.0-amd64-i386-netinst.iso cat: debian-6.0.0-amd64-i386-netinst.iso: No such file or directory joey@gnu:~/lib/big/new>git annex get debian-6.0.0-amd64-i386-netinst.iso get debian-6.0.0-amd64-i386-netinst.iso (copying from turtle...) Also, Josh Triplett has some ideas about integrating git with dpkg for fully versioned systems. I'm not sure if he's ready to make them public, but you may find talking with him interesting. -- see shy jo |
Ideas for object-based git-like storage on Linux
On 2011-02-07 00:40, Roger Leigh wrote:
> http://www.codelibre.net/~rleigh/hashlink.pdf Interesting! It seems that it would also be beneficial to log-structured file systems [1]. I spotted a couple of typos: Page 4: "Files may be therefore be replaced" => "Files may therefore be replaced" Page 8: "that is to say it it written once" => "that is to say it is written once" "storage is be used" => "storage is used" Cheers, -- Jens Peter Secher, GPG fingerprint 0EE5978AFE63E8A1. A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 4D5077FA.6090506@debian.org">http://lists.debian.org/4D5077FA.6090506@debian.org |
Ideas for object-based git-like storage on Linux
On Mon, Feb 07, 2011 at 11:53:46PM +0100, Jens Peter Secher wrote:
> On 2011-02-07 00:40, Roger Leigh wrote: > > http://www.codelibre.net/~rleigh/hashlink.pdf > > Interesting! It seems that it would also be beneficial to > log-structured file systems [1]. > > I spotted a couple of typos: > > Page 4: > "Files may be therefore be replaced" => "Files may therefore be replaced" > > Page 8: > "that is to say it it written once" => "that is to say it is written once" > "storage is be used" => "storage is used" Many thanks for catching those; I've now fixed them. BTW, I didn't see a reference to [1], but I'd be very interested to look at it! Thanks, Roger -- .'`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail. |
Ideas for object-based git-like storage on Linux
On 2011-2-8 0:51 , Roger Leigh wrote:
On Mon, Feb 07, 2011 at 11:53:46PM +0100, Jens Peter Secher wrote: Interesting! It seems that it would also be beneficial to log-structured file systems [1]. [1] http://en.wikipedia.org/wiki/Log-structured_file_system Cheers, /JP -- Jens Peter Secher, GPG fingerprint 0EE5978AFE63E8A1. A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 4D516C49.1030905@debian.org">http://lists.debian.org/4D516C49.1030905@debian.org |
Ideas for object-based git-like storage on Linux
On Sun, Feb 06, 2011 at 10:23:00PM -0400, Joey Hess wrote:
> Roger Leigh wrote: > > There are lots of Debian people out there using git, and some of them > > have expressed interest over the years in having the ability to use > > git as a filesystem in its own right (#477942 is an example of one > > in a package I maintain). > > > > I've finally got down to it and written all my thoughts on the topic > > down in a mostly-organised form, which you can find at > > > > http://www.codelibre.net/~rleigh/hashlink.pdf > > > > This paper looks at the concept of object-based storage, and the > > creation of "hashlinks", essentially symlinks which use hashes > > rather than pathnames to refer to a file. > > You may be interested in my git-annex program, which implements just > such a thing, although in user space, not kernel space. > http://git-annex.branchable.com/ I've been meaning to give git-annex a whirl for a while, but I'm afraid I've lacked the time to get intimately acquainted with it. From what I understand, in terms of what a single user could do with it, it's looking pretty equivalent, the major difference being that it's entirely in userspace. It's definitely on my TODO list. I wanted to look at if it was possible to make multi-user store and provide a lightweight kernel interface to access it. It might well be possible to use git-annex as the storage backend for an initial implementation! Following the suggestion to look at log structured filesystems, I took a look at things like Plan9's Venti. It's good stuff; my main objection to most being that they are append-only, with no provision for GC of no longer referenced data. I would consider that a requirement for a general purpose store with rapid turnover of data, especially if you're going to store working copies as well as things of "commit quality", and even things you commit can be temporary for rebasing etc. Regards, Roger -- .'`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' Printing on GNU/Linux? http://gutenprint.sourceforge.net/ `- GPG Public Key: 0x25BFB848 Please GPG sign your mail. |
Ideas for object-based git-like storage on Linux
Roger Leigh <rleigh@codelibre.net> writes:
> On Sun, Feb 06, 2011 at 10:23:00PM -0400, Joey Hess wrote: >> Roger Leigh wrote: >> > There are lots of Debian people out there using git, and some of them >> > have expressed interest over the years in having the ability to use >> > git as a filesystem in its own right (#477942 is an example of one >> > in a package I maintain). >> > >> > I've finally got down to it and written all my thoughts on the topic >> > down in a mostly-organised form, which you can find at >> > >> > http://www.codelibre.net/~rleigh/hashlink.pdf >> > >> > This paper looks at the concept of object-based storage, and the >> > creation of "hashlinks", essentially symlinks which use hashes >> > rather than pathnames to refer to a file. >> >> You may be interested in my git-annex program, which implements just >> such a thing, although in user space, not kernel space. >> http://git-annex.branchable.com/ > > I've been meaning to give git-annex a whirl for a while, but I'm > afraid I've lacked the time to get intimately acquainted with it. > From what I understand, in terms of what a single user could do with > it, it's looking pretty equivalent, the major difference being that > it's entirely in userspace. It's definitely on my TODO list. > > I wanted to look at if it was possible to make multi-user store and > provide a lightweight kernel interface to access it. It might well > be possible to use git-annex as the storage backend for an initial > implementation! All you need is a little bit of fuse code. There really is no need to invent a new kernel interface for this and something like this is best kept in userspace to keep the complexity managable. > Following the suggestion to look at log structured filesystems, I > took a look at things like Plan9's Venti. It's good stuff; my main > objection to most being that they are append-only, with no provision > for GC of no longer referenced data. I would consider that a > requirement for a general purpose store with rapid turnover of data, > especially if you're going to store working copies as well as things > of "commit quality", and even things you commit can be temporary for > rebasing etc. > > > Regards, > Roger MfG Goswin -- To UNSUBSCRIBE, email to debian-devel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 87mxm16kyq.fsf@frosties.localnet">http://lists.debian.org/87mxm16kyq.fsf@frosties.localnet |
| All times are GMT. The time now is 02:24 PM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.