Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   EXT3 Users (http://www.linux-archive.org/ext3-users/)
-   -   dynamic inode allocation (http://www.linux-archive.org/ext3-users/152331-dynamic-inode-allocation.html)

Theodore Tso 09-01-2008 06:37 PM

dynamic inode allocation
 
On Mon, Sep 01, 2008 at 01:18:31PM -0400, Mag Gam wrote:
> This maybe a newbie question but how come other file systems such as
> ReiserFS and Veritas' Vxfs dynamically allocate inodes and filesystems
> such as ext2/ext3 and JFS we need to allocate them when creating the
> filesystem? Is there a performance or maintenance gain when pre
> allocating?

Having a static inode table is definitely much simpler than a dynamic
inode table, and that's why ext2 originally used a static inode
allocation system. Ext2 drew much of its initial design inspiration
from the BSD Fast Filesystem, and it (along with most traditional Unix
filesystems) used a static inode table.

One of the advantages of having a static inode table is you can always
reliably find it. With a dynamic inode table, it can often be much
more difficult to find it in the face of filesystem corruption, caused
by either hardware or software failure. For example, with Reiserfs,
the inodes are stored in a B-Tree. If the root node, or a relatively
high-level node of the B-tree is lost, the only way to recover all of
the inodes is by looking at each block, and trying to determine if it
"looks" like part of the filesystem B-tree or not. This is what the
reiserfs's fsck program will do if the filesystem is sufficiently
damaged. Unfortuntaely, this means that if you store reiserfs
filesystem image (for example, for use by vmware, or qemu, or kvm, or
xen) in a reiserfs filesystem, and the filesystem gets damaged, the
recovery procedure will take every single block that looks like it
could have been part Reiserfs B-tree, and stich them together into a
new-btree. The result, if you have Reiserfs filesystem images is
those blocks will get treated as if they were part of the containing
filesystem, and the result is not pretty.

These problems can be solved (although they were not for Reiserfs),
but it means a lot more complexity.

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

"Mag Gam" 09-01-2008 08:29 PM

dynamic inode allocation
 
On Mon, Sep 1, 2008 at 2:37 PM, Theodore Tso <tytso@mit.edu> wrote:
> On Mon, Sep 01, 2008 at 01:18:31PM -0400, Mag Gam wrote:
>> This maybe a newbie question but how come other file systems such as
>> ReiserFS and Veritas' Vxfs dynamically allocate inodes and filesystems
>> such as ext2/ext3 and JFS we need to allocate them when creating the
>> filesystem? Is there a performance or maintenance gain when pre
>> allocating?
>
> Having a static inode table is definitely much simpler than a dynamic
> inode table, and that's why ext2 originally used a static inode
> allocation system. Ext2 drew much of its initial design inspiration
> from the BSD Fast Filesystem, and it (along with most traditional Unix
> filesystems) used a static inode table.
>
> One of the advantages of having a static inode table is you can always
> reliably find it. With a dynamic inode table, it can often be much
> more difficult to find it in the face of filesystem corruption, caused
> by either hardware or software failure. For example, with Reiserfs,
> the inodes are stored in a B-Tree. If the root node, or a relatively
> high-level node of the B-tree is lost, the only way to recover all of
> the inodes is by looking at each block, and trying to determine if it
> "looks" like part of the filesystem B-tree or not. This is what the
> reiserfs's fsck program will do if the filesystem is sufficiently
> damaged. Unfortuntaely, this means that if you store reiserfs
> filesystem image (for example, for use by vmware, or qemu, or kvm, or
> xen) in a reiserfs filesystem, and the filesystem gets damaged, the
> recovery procedure will take every single block that looks like it
> could have been part Reiserfs B-tree, and stich them together into a
> new-btree. The result, if you have Reiserfs filesystem images is
> those blocks will get treated as if they were part of the containing
> filesystem, and the result is not pretty.
>
> These problems can be solved (although they were not for Reiserfs),
> but it means a lot more complexity.
>
> - Ted
>

Ted,

Thanks for the explanation and dumb-ing it down for me :-)

So, if a reiserFs filesystem is damaged and it naturally do a fsck.
The fsck basically recreated the b-tree by scanning from 1 to end of
the filesystem?

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Theodore Tso 09-01-2008 08:39 PM

dynamic inode allocation
 
On Mon, Sep 01, 2008 at 04:29:06PM -0400, Mag Gam wrote:
>
> So, if a reiserFs filesystem is damaged and it naturally do a fsck.
> The fsck basically recreated the b-tree by scanning from 1 to end of
> the filesystem?

If the filesystem is sufficiently damaged such that portions of the
b-tree can't be found, then yes. Otherwise, the data would be totally
lost. As you can imagine, scaning every single block on the disk to
see if it looks like filesystem metadata is quite slow, so naturally
the reiserfs's fsck will avoid doing it if at all possible. But if
the root or top-level nodes of the B-tree is damaged, it doesn't have
much choice.

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

"Mag Gam" 09-01-2008 09:16 PM

dynamic inode allocation
 
On Mon, Sep 1, 2008 at 4:39 PM, Theodore Tso <tytso@mit.edu> wrote:
> On Mon, Sep 01, 2008 at 04:29:06PM -0400, Mag Gam wrote:
>>
>> So, if a reiserFs filesystem is damaged and it naturally do a fsck.
>> The fsck basically recreated the b-tree by scanning from 1 to end of
>> the filesystem?
>
> If the filesystem is sufficiently damaged such that portions of the
> b-tree can't be found, then yes. Otherwise, the data would be totally
> lost. As you can imagine, scaning every single block on the disk to
> see if it looks like filesystem metadata is quite slow, so naturally
> the reiserfs's fsck will avoid doing it if at all possible. But if
> the root or top-level nodes of the B-tree is damaged, it doesn't have
> much choice.
>
> - Ted
>
>

But, if thats the last and worst case scenario why don't they do the
full scan? Sure its going to take a long time if its a big filesystem
(there should be no changes since it would be unmounted), but its
better than not having any data at all...

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Theodore Tso 09-01-2008 09:23 PM

dynamic inode allocation
 
On Mon, Sep 01, 2008 at 05:16:01PM -0400, Mag Gam wrote:
> > If the filesystem is sufficiently damaged such that portions of the
> > b-tree can't be found, then yes. Otherwise, the data would be totally
> > lost. As you can imagine, scaning every single block on the disk to
> > see if it looks like filesystem metadata is quite slow, so naturally
> > the reiserfs's fsck will avoid doing it if at all possible. But if
> > the root or top-level nodes of the B-tree is damaged, it doesn't have
> > much choice.
> >
>
> But, if thats the last and worst case scenario why don't they do the
> full scan? Sure its going to take a long time if its a big filesystem
> (there should be no changes since it would be unmounted), but its
> better than not having any data at all...

As I said, in the worst case, it will do a full scan. But (a) it
takes a long time, and (b) if the filesystem has any files that
contain images of reiserfs filesystem, it will be totally scrambled.
So it makes sense that the reiserfs fsck would try to avoid this if it
can (i.e., if the b-tree is only mildly corrupted).

With that said, this is really going out of scope of this mailing
list. And I am not an expert on reiserfs's filesystem checker,
although I have had people confirm to me that indeed, you can lose
really big if your reiserfs filesystem contains files that have are
images of other reiserfs filesystems for things like Virtualization.
This problem is apparently solved in reiser4, it is NOT solved in
reiserfs (i.e., version 3). As far as I am concerned, that's ample
reason not to use reiserfs, but obviously I'm basied. :-)

- Ted


_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

"Mag Gam" 09-01-2008 09:47 PM

dynamic inode allocation
 
Thanks!

This has cured my curiosity (for now...)


On Mon, Sep 1, 2008 at 5:23 PM, Theodore Tso <tytso@mit.edu> wrote:
> On Mon, Sep 01, 2008 at 05:16:01PM -0400, Mag Gam wrote:
>> > If the filesystem is sufficiently damaged such that portions of the
>> > b-tree can't be found, then yes. Otherwise, the data would be totally
>> > lost. As you can imagine, scaning every single block on the disk to
>> > see if it looks like filesystem metadata is quite slow, so naturally
>> > the reiserfs's fsck will avoid doing it if at all possible. But if
>> > the root or top-level nodes of the B-tree is damaged, it doesn't have
>> > much choice.
>> >
>>
>> But, if thats the last and worst case scenario why don't they do the
>> full scan? Sure its going to take a long time if its a big filesystem
>> (there should be no changes since it would be unmounted), but its
>> better than not having any data at all...
>
> As I said, in the worst case, it will do a full scan. But (a) it
> takes a long time, and (b) if the filesystem has any files that
> contain images of reiserfs filesystem, it will be totally scrambled.
> So it makes sense that the reiserfs fsck would try to avoid this if it
> can (i.e., if the b-tree is only mildly corrupted).
>
> With that said, this is really going out of scope of this mailing
> list. And I am not an expert on reiserfs's filesystem checker,
> although I have had people confirm to me that indeed, you can lose
> really big if your reiserfs filesystem contains files that have are
> images of other reiserfs filesystems for things like Virtualization.
> This problem is apparently solved in reiser4, it is NOT solved in
> reiserfs (i.e., version 3). As far as I am concerned, that's ample
> reason not to use reiserfs, but obviously I'm basied. :-)
>
> - Ted
>
>
>

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users


All times are GMT. The time now is 11:47 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.