Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   EXT3 Users (http://www.linux-archive.org/ext3-users/)
-   -   botched RAID, now e2fsck or what? (http://www.linux-archive.org/ext3-users/292276-botched-raid-now-e2fsck-what.html)

Lucian Șandor 12-08-2009 03:48 PM

botched RAID, now e2fsck or what?
 
Hi all,

Somehow I managed to mess with a RAID array containing an ext3 partition.

Parenthesis, if it matters: I disconnected physically a drive while
the array was online. Next thing, I lost the right order of the drives
in the array. While trying to re-create it, I overwrote the raid
superblocks. Luckily, the array was RAID5 degraded, so whenever I
re-created it, it didn't go into sync; thus, everything besides the
RAID superblocks is preserved (or so I think).

Now, I am trying to re-create the array in the proper order. It takes
me countless attempts, through hundreds of permutations. I am doing it
programatically, but I don't think I have the right tool.
Now, after creating the array and mounting it with
mount -t ext3 -n -r /dev/md2 /media/olddepot
I issue an:
e2fsck -n -f /media/olddepot
However, I cycled through all the permutations without apparent
success. I.e., in all combinations it just refused to check it, saying
something about "short read" and, of course, about invalid file
systems.

Does anybody know a better tool to check whether the mounted partition
is a slightly damaged ext3 file system? I am thinking about dumping
ext3 superblocks, but I don't know how that works.

Thanks.

(I am on the latest openSuSE, 11.2, with the latest mdadm available.)

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Christian Kujau 12-09-2009 02:43 AM

botched RAID, now e2fsck or what?
 
On Tue, 8 Dec 2009 at 11:48, Lucian Șandor wrote:
> Now, after creating the array and mounting it with
> mount -t ext3 -n -r /dev/md2 /media/olddepot
> I issue an:
> e2fsck -n -f /media/olddepot

Huh? Normally you'd want to run fsck agains the blockdevice:

$ umount /media/olddepot
$ fsck.ext3 -nvf /dev/md2

If this still does not succeed, you could try specifying a different
superblock (-b). But the important thing will be to get your raid in the
right order, otherwise fsck could do more harm than helping.

Christian.
--
BOFH excuse #25:

Decreasing electron flux

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Eric Sandeen 12-09-2009 04:09 AM

botched RAID, now e2fsck or what?
 
Lucian Șandor wrote:
> Hi all,
>
> Somehow I managed to mess with a RAID array containing an ext3 partition.
>
> Parenthesis, if it matters: I disconnected physically a drive while
> the array was online. Next thing, I lost the right order of the drives
> in the array. While trying to re-create it, I overwrote the raid
> superblocks. Luckily, the array was RAID5 degraded, so whenever I
> re-created it, it didn't go into sync; thus, everything besides the
> RAID superblocks is preserved (or so I think).
>
> Now, I am trying to re-create the array in the proper order. It takes
> me countless attempts, through hundreds of permutations. I am doing it
> programatically, but I don't think I have the right tool.
> Now, after creating the array and mounting it with
> mount -t ext3 -n -r /dev/md2 /media/olddepot
> I issue an:
> e2fsck -n -f /media/olddepot
> However, I cycled through all the permutations without apparent
> success. I.e., in all combinations it just refused to check it, saying
> something about "short read" and, of course, about invalid file
> systems.

As Christian pointed out, use the device not the mountpoint for the fsck arg:

[tmp]$ mkdir dir
[tmp]$ e2fsck -fn dir/
e2fsck 1.41.4 (27-Jan-2009)
e2fsck: Attempt to read block from filesystem resulted in short read while trying to open dir/
Could this be a zero-length partition?


:)

-Eric

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Lucian Șandor 12-10-2009 12:50 AM

botched RAID, now e2fsck or what?
 
Hi,

Thanks both for replies. Things are moving now, since I started using
e2fsck -n -f -v /dev/md0

However, no combination seems useful. Sometimes I get:
"e2fsck: Bad magic number in super-block while trying to open /dev/md0"
Other times I get:
"Superblock has an invalid journal (inode 8)."
Other times I get:
"e2fsck: Illegal inode number while checking ext3 journal for /dev/md2."
None of these appears in only one permutation, so none is indicative
for the corectness of the permutation.

I also ran dumpe2fs /dev/md2, but I don't know how to make it more
useful than it is now. Right now it finds supernodes in a series of
permutations, so again, it is not of much help.
Question 1: Is there a way to make dumpe2fs or another command
estimate the number of files in what appears to be an ext3 partition?
(I would then go by the permutation which fonds the largest number of
files.)
Question: if I were to struck lucky and find the right combination,
would dumpe2fs give me a very-very long list of superblocks? Do the
superblocks extend far into the partition, or do they always stop
early (thus showing the same number each time my RAID starts with the
right drive)?

Question 3: Is there any other tool that would search for files in the
remains of an ext3 partition, and, this way, validate or invalidate
the permutations I try?

Thanks,
Lucian Sandor


2009/12/9 Eric Sandeen <sandeen@redhat.com>:
> Lucian Șandor wrote:
>> Hi all,
>>
>> Somehow I managed to mess with a RAID array containing an ext3 partition.
>>
>> Parenthesis, if it matters: I disconnected physically a drive while
>> the array was online. Next thing, I lost the right order of the drives
>> in the array. While trying to re-create it, I overwrote the raid
>> superblocks. Luckily, the array was RAID5 degraded, so whenever I
>> re-created it, it didn't go into sync; thus, everything besides the
>> RAID superblocks is preserved (or so I think).
>>
>> Now, I am trying to re-create the array in the proper order. It takes
>> me countless attempts, through hundreds of permutations. I am doing it
>> programatically, but I don't think I have the right tool.
>> Now, after creating the array and mounting it with
>> mount -t ext3 -n -r /dev/md2 /media/olddepot
>> I issue an:
>> e2fsck -n -f /media/olddepot
>> However, I cycled through all the permutations without apparent
>> success. I.e., in all combinations it just refused to check it, saying
>> something about "short read" and, of course, about invalid file
>> systems.
>
> As Christian pointed out, use the device not the mountpoint for the fsck arg:
>
> [tmp]$ mkdir dir
> [tmp]$ e2fsck -fn dir/
> e2fsck 1.41.4 (27-Jan-2009)
> e2fsck: Attempt to read block from filesystem resulted in short read while trying to open dir/
> Could this be a zero-length partition?
>
>
> *:)
>
> -Eric
>

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Christian Kujau 12-10-2009 05:09 AM

botched RAID, now e2fsck or what?
 
On Wed, 9 Dec 2009 at 20:50, Lucian Șandor wrote:
> However, no combination seems useful. Sometimes I get:
> "e2fsck: Bad magic number in super-block while trying to open /dev/md0"

Did you try specifying a different superblock? If you can remember how the
filesystem was initially created, you can use:

$ mkfs.ext3 -n /dev/md0 (MIND THE -n SWITCH!)

to get a list of the backup superblocks, which you can then use with fsck.
Don't forget to man mkfs.ext3 :-)

> Question 1: Is there a way to make dumpe2fs or another command
> estimate the number of files in what appears to be an ext3 partition?

I can only think of:

$ dumpe2fs -h /dev/loop0 | egrep 'Inode count|Free inodes'

The difference between both values should be the used inodes, i.e.
files/directories on the filesystem.

> Question: if I were to struck lucky and find the right combination,
> would dumpe2fs give me a very-very long list of superblocks?

The superblock count depends on how the fs was initially created. I could
imagine that the list is longer for a real filesystem, as "garbage"
won't have any superblocks at all.

> superblocks extend far into the partition, or do they always stop

Superblocks are usually spread all over the device.

> Question 3: Is there any other tool that would search for files in the
> remains of an ext3 partition, and, this way, validate or invalidate
> the permutations I try?

Have a look at:
http://ext4.wiki.kernel.org/index.php/Undeletion

Christian.
--
BOFH excuse #208:

Your mail is being routed through Germany ... and they're censoring us.

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Andreas Dilger 12-10-2009 05:54 AM

botched RAID, now e2fsck or what?
 
On 2009-12-09, at 18:50, Lucian Șandor wrote:

However, no combination seems useful. Sometimes I get:
"e2fsck: Bad magic number in super-block while trying to open /dev/
md0"

Other times I get:
"Superblock has an invalid journal (inode 8)."
Other times I get:
"e2fsck: Illegal inode number while checking ext3 journal for /dev/
md2."

None of these appears in only one permutation, so none is indicative
for the corectness of the permutation.


You need to know a bit about your RAID layout and the structure of
ext*. One thing that is VERY important is whether your new MD config
has the same chunk size as it did initially. It will be impossible to
recover your config if you don't have the same chunk size.


Also, if you haven't disabled RAID resync then it may well be that
changing the RAID layout has caused a resync that has permanently
corrupted your data.


That said, I will assume the primary ext3 superblock will reside on
the first disk in the RAID set, since it is located at an offset of
1kB from the start of the device.


You should build and run the "findsuper" tool that is in the e2fsprogs
source tree. It will scan the raw disk devices and locate the ext3
superblocks. Each superblock contains the group number in which it is
stored, so you can find the first RAID disk by looking for the one
that has superblock 0 at offset 1kB from the start of the disk.


There may be other copies of the superblock #0 stored in the journal
file, but those should be ignored.


The backup superblocks have a non-zero group number, and "findsuper"
prints the offset at which that superblock should be located from the
start of the LUN. Depending on whether you have a non-power-of-two
number of disks in your RAID set, you may find the superblock copies
on different disks, and you can do some math to determine which order
the disks should be in by computing the relative offset of the
superblck within the RAID set.



The other thing that can help order the disks (depending on the RAID
chunksize and the total number of groups in the filesystem,
proportional to the filesystem size) is the group descriptor table.
It is located immediately after the superblocks, and contains a very
regular list of block numbers for the block and inode bitmaps, and the
inode table in each group.


Using "od -Ax -tx4" on a regular ext3 filesystem you can see the group
descriptor table starting at offset 0x1000, and the block numbers
basically just "count" up. This may in fact be the easiest way to
order the disks, if the group descriptor table is large enough to
cover all of the disks:


# od -Ax -tx4 /dev/hda1 | more
:
:
001000 0000012c 0000012d 0000012e 02430000
001010 000001f2 00000000 00000000 00000000
001020 0000812c 0000812d 0000812e 2e422b21
001030 0000000d 00000000 00000000 00000000
001040 00010000 00010001 00010002 27630074
001050 000000b8 00000000 00000000 00000000
001060 0001812c 0001812d 0001812e 27a70b8a
001070 00000231 00000000 00000000 00000000
001080 00020000 00020001 00020002 2cc10000
001090 00000008 00000000 00000000 00000000
0010a0 0002812c 0002812d 0002812e 25660134
0010b0 00000255 00000000 00000000 00000000
0010c0 00030000 00030001 00030002 17a50003
0010d0 000001c6 00000000 00000000 00000000
0010e0 0003812c 0003812d 0003812e 27a70000
0010f0 00000048 00000000 00000000 00000000
001100 00040000 00040001 00040002 2f8b0000

See nearly regular incrementing sequence every 0x20 bytes:

0000012c, 0000812c, 00010000, 0001812c, 00020000, 0002812c, 00030000,
0003812c



Each group descriptor block (4kB = 0x1000) covers 16GB of filesystem
space, so 64 blocks per 1TB of filesystem size. If your RAID chunk
size is not too large, and the filesystem IS large, you will be able
to fully order your disks in the RAID set. You can also verify the
RAID chunk size by determining how many blocks of consecutive group
descriptors are present before there is a "jump" where the group
descriptor blocks were written to other disks before returning to the
current disk. Remember that one of the disks in the set will also
need to store parity, so there will be some number of "garbage" blocks
before the proper data resumes.



I also ran dumpe2fs /dev/md2, but I don't know how to make it more
useful than it is now. Right now it finds supernodes in a series of
permutations, so again, it is not of much help.


I would also make sure that you can get the correct ordering and MD
chunk size before doing ANY kind of modification to the disks. It
would only take a single mistake (e.g. RAID parity rebuild while not
in the right order) to totally corrupt the filesystem.



Question 1: Is there a way to make dumpe2fs or another command
estimate the number of files in what appears to be an ext3 partition?
(I would then go by the permutation which fonds the largest number of
files.)
Question: if I were to struck lucky and find the right combination,
would dumpe2fs give me a very-very long list of superblocks? Do the
superblocks extend far into the partition, or do they always stop
early (thus showing the same number each time my RAID starts with the
right drive)?

Question 3: Is there any other tool that would search for files in the
remains of an ext3 partition, and, this way, validate or invalidate
the permutations I try?

Thanks,
Lucian Sandor


2009/12/9 Eric Sandeen <sandeen@redhat.com>:

Lucian Șandor wrote:

Hi all,

Somehow I managed to mess with a RAID array containing an ext3
partition.


Parenthesis, if it matters: I disconnected physically a drive while
the array was online. Next thing, I lost the right order of the
drives

in the array. While trying to re-create it, I overwrote the raid
superblocks. Luckily, the array was RAID5 degraded, so whenever I
re-created it, it didn't go into sync; thus, everything besides the
RAID superblocks is preserved (or so I think).

Now, I am trying to re-create the array in the proper order. It
takes
me countless attempts, through hundreds of permutations. I am
doing it

programatically, but I don't think I have the right tool.
Now, after creating the array and mounting it with
mount -t ext3 -n -r /dev/md2 /media/olddepot
I issue an:
e2fsck -n -f /media/olddepot
However, I cycled through all the permutations without apparent
success. I.e., in all combinations it just refused to check it,
saying

something about "short read" and, of course, about invalid file
systems.


As Christian pointed out, use the device not the mountpoint for the
fsck arg:


[tmp]$ mkdir dir
[tmp]$ e2fsck -fn dir/
e2fsck 1.41.4 (27-Jan-2009)
e2fsck: Attempt to read block from filesystem resulted in short
read while trying to open dir/

Could this be a zero-length partition?


:)

-Eric



_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users



Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

12-10-2009 12:47 PM

botched RAID, now e2fsck or what?
 
On Tue, Dec 08, 2009 at 11:48:18AM -0500, Lucian Șandor wrote:
>
> Now, I am trying to re-create the array in the proper order. It takes
> me countless attempts, through hundreds of permutations. I am doing it
> programatically, but I don't think I have the right tool.

Something that may help is to use the findsuper program, in the
e2fsprogs sources; it's not built by default, but you can build it by
hand. Each of the backup superblocks has a group number in one of the
fields, if it was created with a relatively modern mke2fs, so you can
use it to get information like this:

byte_offset byte_start byte_end fs_blocks blksz grp last_mount_time sb_uuid label
1024 0 95999229952 23437312 4096 0 Thu Dec 10 00:24:39 2009 fd5210bd
134217728 0 95999229952 23437312 4096 1 Wed Dec 31 19:00:00 1969 fd5210bd
402653184 0 95999229952 23437312 4096 3 Wed Dec 31 19:00:00 1969 fd5210bd
671088640 0 95999229952 23437312 4096 5 Wed Dec 31 19:00:00 1969 fd5210bd


The group number information should help you determine the order of the
disks in the raid array.

Good luck!

- Ted

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Lucian Șandor 12-10-2009 07:30 PM

botched RAID, now e2fsck or what?
 
Thank you all for your kind replies.

One extra thought and question: would it help if I had some large file
that is also on the array? Could I search for a part of the file on
the individual drives, or at least on permutated arrays?

I tried findsuper, but it keeps finding the same backup superblocks,
no matter how I switch the order of the disks except for the first
one. It might be possible, I think, that the superblocks fall on the
same disk. That is only a general impression, so I am running it
thoroughly on a smaller array, to make sure. Another issue with this
approach is that it takes a lot of time: I have a 4.5 TB array with
720 permutations to try. This sound more like a job for a few years.
2009/12/10 <tytso@mit.edu>:
> Something that may help is to use the findsuper program, in the
> e2fsprogs sources; it's not built by default, but you can build it by
> hand.
> The group number information should help you determine the order of the
> disks in the raid array.

Same issue if I use the used inode count: the permutations yield the
same numbers over and over again. I think dume2fs -h doesn't go into
the actual drive, but only reads the descriptions in the beginning,
and these fall on the same drive...
2009/12/10 Christian Kujau <lists@nerdbynature.de>:
> On Wed, 9 Dec 2009 at 20:50, Lucian Șandor wrote:
>> Question 1: Is there a way to make dumpe2fs or another command
>> estimate the number of files in what appears to be an ext3 partition?
>
> I can only think of:
> $ dumpe2fs -h /dev/loop0 | egrep 'Inode count|Free inodes'
> The difference between both values should be the used inodes, i.e.
> files/directories on the filesystem.


2009/12/10 Andreas Dilger <adilger@sun.com>:
> On 2009-12-09, at 18:50, Lucian Șandor wrote:
>>
>> However, no combination seems useful. Sometimes I get:
>> "e2fsck: Bad magic number in super-block while trying to open /dev/md0"
>> Other times I get:
>> "Superblock has an invalid journal (inode 8)."
>> Other times I get:
>> "e2fsck: Illegal inode number while checking ext3 journal for /dev/md2."
>> None of these appears in only one permutation, so none is indicative
>> for the corectness of the permutation.
>
> You need to know a bit about your RAID layout and the structure of ext*.
> One thing that is VERY important is whether your new MD config has the same
> chunk size as it did initially. It will be impossible to recover your
> config if you don't have the same chunk size.
>
> Also, if you haven't disabled RAID resync then it may well be that changing
> the RAID layout has caused a resync that has permanently corrupted your
> data.

I have the chunk size for one of the arrays. I thought that mdadm
would automatically use the same values it used when it first created
the arrays, but gues what, it did not. Now I have another headache for
the other array.
The arrays were degraded at the time of the whole mess, and I always
re-created them as degraded. I wonder how long can I still pull this
feat, after being so messy in the first place.

> That said, I will assume the primary ext3 superblock will reside on the
> first disk in the RAID set, since it is located at an offset of 1kB from the
> start of the device.
>
> You should build and run the "findsuper" tool that is in the e2fsprogs
> source tree. It will scan the raw disk devices and locate the ext3
> superblocks. Each superblock contains the group number in which it is
> stored, so you can find the first RAID disk by looking for the one that has
> superblock 0 at offset 1kB from the start of the disk.
>
> There may be other copies of the superblock #0 stored in the journal file,
> but those should be ignored.
>
> The backup superblocks have a non-zero group number, and "findsuper" prints
> the offset at which that superblock should be located from the start of the
> LUN. Depending on whether you have a non-power-of-two number of disks in
> your RAID set, you may find the superblock copies on different disks, and
> you can do some math to determine which order the disks should be in by
> computing the relative offset of the superblck within the RAID set.
>
>
> The other thing that can help order the disks (depending on the RAID
> chunksize and the total number of groups in the filesystem, proportional to
> the filesystem size) is the group descriptor table. It is located
> immediately after the superblocks, and contains a very regular list of block
> numbers for the block and inode bitmaps, and the inode table in each group.
>
> Using "od -Ax -tx4" on a regular ext3 filesystem you can see the group
> descriptor table starting at offset 0x1000, and the block numbers basically
> just "count" up. This may in fact be the easiest way to order the disks, if
> the group descriptor table is large enough to cover all of the disks:
>
> # od -Ax -tx4 /dev/hda1 | more
> :
> :
> 001000 0000012c 0000012d 0000012e 02430000
> 001010 000001f2 00000000 00000000 00000000
> 001020 0000812c 0000812d 0000812e 2e422b21
> 001030 0000000d 00000000 00000000 00000000
> 001040 00010000 00010001 00010002 27630074
> 001050 000000b8 00000000 00000000 00000000
> 001060 0001812c 0001812d 0001812e 27a70b8a
> 001070 00000231 00000000 00000000 00000000
> 001080 00020000 00020001 00020002 2cc10000
> 001090 00000008 00000000 00000000 00000000
> 0010a0 0002812c 0002812d 0002812e 25660134
> 0010b0 00000255 00000000 00000000 00000000
> 0010c0 00030000 00030001 00030002 17a50003
> 0010d0 000001c6 00000000 00000000 00000000
> 0010e0 0003812c 0003812d 0003812e 27a70000
> 0010f0 00000048 00000000 00000000 00000000
> 001100 00040000 00040001 00040002 2f8b0000
>
> See nearly regular incrementing sequence every 0x20 bytes:
>
> 0000012c, 0000812c, 00010000, 0001812c, 00020000, 0002812c, 00030000,
> 0003812c
>
>
> Each group descriptor block (4kB = 0x1000) covers 16GB of filesystem space,
> so 64 blocks per 1TB of filesystem size. If your RAID chunk size is not
> too large, and the filesystem IS large, you will be able to fully order your
> disks in the RAID set. You can also verify the RAID chunk size by
> determining how many blocks of consecutive group descriptors are present
> before there is a "jump" where the group descriptor blocks were written to
> other disks before returning to the current disk. Remember that one of the
> disks in the set will also need to store parity, so there will be some
> number of "garbage" blocks before the proper data resumes.
>

This seems a great idea. The 4.5 TB array is huge (should have a 1100
kB table), and likely its group descriptor table extends on all
partitions. I already found the pattern, but the job requires
programming, since it would be troubling to read megs of data over the
hundreds of permutations. I will try coding it, but I hope that
somebody else wrote it before. Isn't there any utility that will take
a group descriptor table and verify its integrity without modifying
it?

>> I also ran dumpe2fs /dev/md2, but I don't know how to make it more
>> useful than it is now. Right now it finds supernodes in a series of
>> permutations, so again, it is not of much help.
>
> I would also make sure that you can get the correct ordering and MD chunk
> size before doing ANY kind of modification to the disks. It would only take
> a single mistake (e.g. RAID parity rebuild while not in the right order) to
> totally corrupt the filesystem.
>
>> Question 1: Is there a way to make dumpe2fs or another command
>> estimate the number of files in what appears to be an ext3 partition?
>> (I would then go by the permutation which fonds the largest number of
>> files.)
>> Question: if I were to struck lucky and find the right combination,
>> would dumpe2fs give me a very-very long list of superblocks? Do the
>> superblocks extend far into the partition, or do they always stop
>> early (thus showing the same number each time my RAID starts with the
>> right drive)?
>>
>> Question 3: Is there any other tool that would search for files in the
>> remains of an ext3 partition, and, this way, validate or invalidate
>> the permutations I try?
>>
>> Thanks,
>> Lucian Sandor
>>
>>
>> 2009/12/9 Eric Sandeen <sandeen@redhat.com>:
>>>
>>> Lucian Șandor wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Somehow I managed to mess with a RAID array containing an ext3
>>>> partition.
>>>>
>>>> Parenthesis, if it matters: I disconnected physically a drive while
>>>> the array was online. Next thing, I lost the right order of the drives
>>>> in the array. While trying to re-create it, I overwrote the raid
>>>> superblocks. Luckily, the array was RAID5 degraded, so whenever I
>>>> re-created it, it didn't go into sync; thus, everything besides the
>>>> RAID superblocks is preserved (or so I think).
>>>>
>>>> Now, I am trying to re-create the array in the proper order. It takes
>>>> me countless attempts, through hundreds of permutations. I am doing it
>>>> programatically, but I don't think I have the right tool.
>>>> Now, after creating the array and mounting it with
>>>> mount -t ext3 -n -r /dev/md2 /media/olddepot
>>>> I issue an:
>>>> e2fsck -n -f /media/olddepot
>>>> However, I cycled through all the permutations without apparent
>>>> success. I.e., in all combinations it just refused to check it, saying
>>>> something about "short read" and, of course, about invalid file
>>>> systems.
>>>
>>> As Christian pointed out, use the device not the mountpoint for the fsck
>>> arg:
>>>
>>> [tmp]$ mkdir dir
>>> [tmp]$ e2fsck -fn dir/
>>> e2fsck 1.41.4 (27-Jan-2009)
>>> e2fsck: Attempt to read block from filesystem resulted in short read
>>> while trying to open dir/
>>> Could this be a zero-length partition?
>>>
>>>
>>> :)
>>>
>>> -Eric
>>>
>>
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Andreas Dilger 12-10-2009 07:41 PM

botched RAID, now e2fsck or what?
 
On 2009-12-10, at 13:30, Lucian Șandor wrote:

2009/12/10 Andreas Dilger <adilger@sun.com>:


Using "od -Ax -tx4" on a regular ext3 filesystem you can see the
group descriptor table starting at offset 0x1000, and the block
numbers basically just "count" up. This may in fact be the easiest
way to order the disks, if the group descriptor table is large
enough to cover all of the disks:


# od -Ax -tx4 /dev/hda1 | more
:
:
001000 0000012c 0000012d 0000012e 02430000
001010 000001f2 00000000 00000000 00000000
001020 0000812c 0000812d 0000812e 2e422b21
001030 0000000d 00000000 00000000 00000000
001040 00010000 00010001 00010002 27630074
001050 000000b8 00000000 00000000 00000000
001060 0001812c 0001812d 0001812e 27a70b8a
001070 00000231 00000000 00000000 00000000
001080 00020000 00020001 00020002 2cc10000
001090 00000008 00000000 00000000 00000000
0010a0 0002812c 0002812d 0002812e 25660134
0010b0 00000255 00000000 00000000 00000000
0010c0 00030000 00030001 00030002 17a50003
0010d0 000001c6 00000000 00000000 00000000
0010e0 0003812c 0003812d 0003812e 27a70000
0010f0 00000048 00000000 00000000 00000000
001100 00040000 00040001 00040002 2f8b0000

See nearly regular incrementing sequence every 0x20 bytes:

0000012c, 0000812c, 00010000, 0001812c, 00020000, 0002812c, 00030000,
0003812c


Each group descriptor block (4kB = 0x1000) covers 16GB of
filesystem space, so 64 blocks per 1TB of filesystem size. If
your RAID chunk size is not too large, and the filesystem IS large,
you will be able to fully order your disks in the RAID set. You
can also verify the RAID chunk size by determining how many blocks
of consecutive group descriptors are present before there is a
"jump" where the group descriptor blocks were written to other
disks before returning to the current disk. Remember that one of
the disks in the set will also need to store parity, so there will
be some number of "garbage" blocks before the proper data resumes.


This seems a great idea. The 4.5 TB array is huge (should have a 1100
kB table), and likely its group descriptor table extends on all
partitions. I already found the pattern, but the job requires
programming, since it would be troubling to read megs of data over the
hundreds of permutations. I will try coding it, but I hope that
somebody else wrote it before. Isn't there any utility that will take
a group descriptor table and verify its integrity without modifying
it?


I think you are going about this incorrectly... Run the "od" command
on the raw component drives (e.g. /dev/sda, /dev/sdb, /dev/sdc, etc),
not on the assembled MD RAID array (e.g. NOT /dev/md0).


The data blocks on the raw devices will be correct, with every 1/N
chunks of space being used for parity information (so will look like
garbage). That won't prevent you from seeing the data in the group
descriptor table and allowing you to see the order in which the disks
are supposed to be AND the chunk size.


Since the group descriptor table is only a few kB from the start of
the disk (I'm assuming you used whole-disk devices for the MD array,
instead of DOS partitions) you can just use "od ... | less" and your
eyes to see what is there. No programming needed.


Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Lucian Șandor 12-11-2009 06:33 PM

botched RAID, now e2fsck or what?
 
Hi,

Thanks for your idea. It worked great in the first step. One other
thing: immediately after the first table, there is a second one. Using
both tables, I was able to tell the parity position. For me, with 6
drives. the tables fell into an annoying pattern of complementation,
such as that four of them will always give 0000 0000 0000 and the
other two drives had identical chunks.

I am still no better because I don't know how to assemble it. Should I
create it as 1 2 3 4 5 P, or maybe as P 1 2 3 4 5?. But that is
something I might find trying a few combinations and looking at the
way the beginning of /dev/md0 is assembled.

One issue is that no matter how I will mix them, I have an extra drive
that I need to keep out. (The array was degraded for a few days before
the drive mix, and the failing drive is in the computer, now mixed up
with the others.) I can try assemble the array with any of the six
drives as missing, but I don't see a difference in the beginning of
/dev/md0, that part being written back in the times when the array was
running, and I get the same errors from e2fsck (complaining about
journal invalidity). Findsuper finds the same superblocks, e2fsck find
the same inodes :(

There should be a way of telling whether one of the 6 left
permutations makes a better combination. As I said, I even have files
that are also on the array. Any other thoughts?

Best,
Lucian Sandor


2009/12/10 Andreas Dilger <adilger@sun.com>:
> On 2009-12-10, at 13:30, Lucian Șandor wrote:
>>
>> 2009/12/10 Andreas Dilger <adilger@sun.com>:
>>>
>>> Using "od -Ax -tx4" on a regular ext3 filesystem you can see the group
>>> descriptor table starting at offset 0x1000, and the block numbers basically
>>> just "count" up. *This may in fact be the easiest way to order the disks, if
>>> the group descriptor table is large enough to cover all of the disks:
>>>
>>> # od -Ax -tx4 /dev/hda1 | more
>>> :
>>> :
>>> 001000 0000012c 0000012d 0000012e 02430000
>>> 001010 000001f2 00000000 00000000 00000000
>>> 001020 0000812c 0000812d 0000812e 2e422b21
>>> 001030 0000000d 00000000 00000000 00000000
>>> 001040 00010000 00010001 00010002 27630074
>>> 001050 000000b8 00000000 00000000 00000000
>>> 001060 0001812c 0001812d 0001812e 27a70b8a
>>> 001070 00000231 00000000 00000000 00000000
>>> 001080 00020000 00020001 00020002 2cc10000
>>> 001090 00000008 00000000 00000000 00000000
>>> 0010a0 0002812c 0002812d 0002812e 25660134
>>> 0010b0 00000255 00000000 00000000 00000000
>>> 0010c0 00030000 00030001 00030002 17a50003
>>> 0010d0 000001c6 00000000 00000000 00000000
>>> 0010e0 0003812c 0003812d 0003812e 27a70000
>>> 0010f0 00000048 00000000 00000000 00000000
>>> 001100 00040000 00040001 00040002 2f8b0000
>>>
>>> See nearly regular incrementing sequence every 0x20 bytes:
>>>
>>> 0000012c, 0000812c, 00010000, 0001812c, 00020000, 0002812c, 00030000,
>>> 0003812c
>>>
>>>
>>> Each group descriptor block (4kB = 0x1000) covers 16GB of filesystem
>>> space, so *64 blocks per 1TB of filesystem size. *If your RAID chunk size is
>>> not too large, and the filesystem IS large, you will be able to fully order
>>> your disks in the RAID set. *You can also verify the RAID chunk size by
>>> determining how many blocks of consecutive group descriptors are present
>>> before there is a "jump" where the group descriptor blocks were written to
>>> other disks before returning to the current disk. *Remember that one of the
>>> disks in the set will also need to store parity, so there will be some
>>> number of "garbage" blocks before the proper data resumes.
>>
>> This seems a great idea. The 4.5 TB array is huge (should have a 1100
>> kB table), and likely its group descriptor table extends on all
>> partitions. I already found the pattern, but the job requires
>> programming, since it would be troubling to read megs of data over the
>> hundreds of permutations. I will try coding it, but I hope that
>> somebody else wrote it before. Isn't there any utility that will take
>> a group descriptor table and verify its integrity without modifying
>> it?
>
> I think you are going about this incorrectly... *Run the "od" command on the
> raw component drives (e.g. /dev/sda, /dev/sdb, /dev/sdc, etc), not on the
> assembled MD RAID array (e.g. NOT /dev/md0).
>
> The data blocks on the raw devices will be correct, with every 1/N chunks of
> space being used for parity information (so will look like garbage). *That
> won't prevent you from seeing the data in the group descriptor table and
> allowing you to see the order in which the disks are supposed to be AND the
> chunk size.
>
> Since the group descriptor table is only a few kB from the start of the disk
> (I'm assuming you used whole-disk devices for the MD array, instead of DOS
> partitions) you can just use "od ... | less" and your eyes to see what is
> there. *No programming needed.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users


All times are GMT. The time now is 06:27 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.