I think growing my RAID array after replacing all the
drives with bigger ones has somehow hosed the array.
The system is Etch with a stock 2.6.18 kernel and
mdadm v. 2.5.6, running on an Athlon 1700 box.
The array is 6 disk (5 active, one spare) RAID 5
that has been humming along quite nicely for
a few months now. However, I decided to replace
all the drives with larger ones.
The RAID reassembled fine at each boot as the drives
were replaced one by one. After the last drive was
partitioned and added to the array, I issued the
command
"mdadm -G /dev/md/0 -z max"
to grow the array to the maximum space available
on the smallest drive. That appeared to work just
fine at the time, but booting today the array
refused to assemble with the following error:
md: hdg1 has invalid sb, not importing!
md: md_import_device returned -22
I tried to force assembly but only two of the remaining
4 active drives appeared to be fault free. dmesg gives
md: kicking non-fresh hde1 from array!
md: unbind<hde1>
md: export_rdev(hde1)
md: kicking non-fresh hdi1 from array!
md: unbind<hdi1>
md: export_rdev(hdi1)
I also noticed that "mdadm -X <drive>" shows
the pre-grow device size for 2 of the devices
and some discrepancies between event and event cleared
counts.
One last thing I found curious---from dmesg:
EXT3-fs error (device hdg1): ext3_check_descriptors: Block
bitmap for group 0 not in group (block 2040936682)!
EXT3-fs: group descriptors corrupted!
There is not ext3 directly on hdg1. LVM sits between the
and the filesystem, so the above message seems suspect.
I hope someone will be able to help me with this. I feel
like the info above is pertinent, but I don't know where to
go from here.
Thanks
--
whollygoat@letterboxes.org
--
http://www.fastmail.fm - Does exactly what it says on the tin
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
01-05-2009, 01:13 PM
Justin Piszcz
RAID5 (mdadm) array hosed after grow operation
cc linux-raid
On Mon, 5 Jan 2009, whollygoat@letterboxes.org wrote:
I think growing my RAID array after replacing all the
drives with bigger ones has somehow hosed the array.
The system is Etch with a stock 2.6.18 kernel and
mdadm v. 2.5.6, running on an Athlon 1700 box.
The array is 6 disk (5 active, one spare) RAID 5
that has been humming along quite nicely for
a few months now. However, I decided to replace
all the drives with larger ones.
The RAID reassembled fine at each boot as the drives
were replaced one by one. After the last drive was
partitioned and added to the array, I issued the
command
"mdadm -G /dev/md/0 -z max"
to grow the array to the maximum space available
on the smallest drive. That appeared to work just
fine at the time, but booting today the array
refused to assemble with the following error:
md: hdg1 has invalid sb, not importing!
md: md_import_device returned -22
I tried to force assembly but only two of the remaining
4 active drives appeared to be fault free. dmesg gives
md: kicking non-fresh hde1 from array!
md: unbind<hde1>
md: export_rdev(hde1)
md: kicking non-fresh hdi1 from array!
md: unbind<hdi1>
md: export_rdev(hdi1)
I also noticed that "mdadm -X <drive>" shows
the pre-grow device size for 2 of the devices
and some discrepancies between event and event cleared
counts.
One last thing I found curious---from dmesg:
EXT3-fs error (device hdg1): ext3_check_descriptors: Block
bitmap for group 0 not in group (block 2040936682)!
EXT3-fs: group descriptors corrupted!
There is not ext3 directly on hdg1. LVM sits between the
and the filesystem, so the above message seems suspect.
I hope someone will be able to help me with this. I feel
like the info above is pertinent, but I don't know where to
go from here.
Thanks
--
whollygoat@letterboxes.org
--
http://www.fastmail.fm - Does exactly what it says on the tin
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
01-05-2009, 09:17 PM
Neil Brown
RAID5 (mdadm) array hosed after grow operation
On Monday January 5, jpiszcz@lucidpixels.com wrote:
> cc linux-raid
>
> On Mon, 5 Jan 2009, whollygoat@letterboxes.org wrote:
>
> > I think growing my RAID array after replacing all the
> > drives with bigger ones has somehow hosed the array.
> >
> > The system is Etch with a stock 2.6.18 kernel and
> > mdadm v. 2.5.6, running on an Athlon 1700 box.
> > The array is 6 disk (5 active, one spare) RAID 5
> > that has been humming along quite nicely for
> > a few months now. However, I decided to replace
> > all the drives with larger ones.
> >
> > The RAID reassembled fine at each boot as the drives
> > were replaced one by one. After the last drive was
> > partitioned and added to the array, I issued the
> > command
> >
> > "mdadm -G /dev/md/0 -z max"
> >
> > to grow the array to the maximum space available
> > on the smallest drive. That appeared to work just
> > fine at the time, but booting today the array
> > refused to assemble with the following error:
> >
> > md: hdg1 has invalid sb, not importing!
> > md: md_import_device returned -22
> >
> > I tried to force assembly but only two of the remaining
> > 4 active drives appeared to be fault free. dmesg gives
> >
> > md: kicking non-fresh hde1 from array!
> > md: unbind<hde1>
> > md: export_rdev(hde1)
> > md: kicking non-fresh hdi1 from array!
> > md: unbind<hdi1>
> > md: export_rdev(hdi1)
Please report
mdadm --examine /dev/whatever
for every device that you think should be a part of the array.
> >
> > I also noticed that "mdadm -X <drive>" shows
> > the pre-grow device size for 2 of the devices
> > and some discrepancies between event and event cleared
> > counts.
You cannot grow an array with an active bitmap... or at least you
shouldn't be able to. Maybe 2.6.18 didn't enforce that. Maybe that
is what caused the problem - not sure.
> >
> > One last thing I found curious---from dmesg:
> >
> > EXT3-fs error (device hdg1): ext3_check_descriptors: Block
> > bitmap for group 0 not in group (block 2040936682)!
> > EXT3-fs: group descriptors corrupted!
> >
> > There is not ext3 directly on hdg1. LVM sits between the
> > and the filesystem, so the above message seems suspect.
Seems like something got confused during boot and the wrong device got
mounted. That is bad.
NeilBrown
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
01-06-2009, 07:45 AM
RAID5 (mdadm) array hosed after grow operation
On Tue, 6 Jan 2009 09:17:46 +1100, "Neil Brown" <neilb@suse.de> said:
> On Monday January 5, jpiszcz@lucidpixels.com wrote:
> > cc linux-raid
> >
> > On Mon, 5 Jan 2009, whollygoat@letterboxes.org wrote:
> >
> > > I think growing my RAID array after replacing all the
> > > drives with bigger ones has somehow hosed the array.
> > >
> > > The system is Etch with a stock 2.6.18 kernel and
> > > mdadm v. 2.5.6, running on an Athlon 1700 box.
> > > The array is 6 disk (5 active, one spare) RAID 5
> > > that has been humming along quite nicely for
> > > a few months now. However, I decided to replace
> > > all the drives with larger ones.
> > >
> > > The RAID reassembled fine at each boot as the drives
> > > were replaced one by one. After the last drive was
> > > partitioned and added to the array, I issued the
> > > command
> > >
> > > "mdadm -G /dev/md/0 -z max"
> > >
> > > to grow the array to the maximum space available
> > > on the smallest drive. That appeared to work just
> > > fine at the time, but booting today the array
> > > refused to assemble with the following error:
> > >
> > > md: hdg1 has invalid sb, not importing!
> > > md: md_import_device returned -22
> > >
> > > I tried to force assembly but only two of the remaining
> > > 4 active drives appeared to be fault free. dmesg gives
> > >
> > > md: kicking non-fresh hde1 from array!
> > > md: unbind<hde1>
> > > md: export_rdev(hde1)
> > > md: kicking non-fresh hdi1 from array!
> > > md: unbind<hdi1>
> > > md: export_rdev(hdi1)
>
> Please report
> mdadm --examine /dev/whatever
> for every device that you think should be a part of the array.
I noticed as I copied and pasted below the requested info,
that "Device Size" and "Used Size" all make sense, whereas
with the -X option "Sync Size" reflects the sizes of the
swapped out drives "39078016 (37.27 GiB 40.02 GB)"
for hdg1 and hdo1.
Also, when booting today, I was able to get my eye balls
moving fast enough to capture boot messages I noticed but
couldn't decipher yesterday "incorrect meta data area
header checksum" for hdo and hdg, but for at least one, and I
think two other drives that I still wasn't fast enough to
capture.
Also, with regard to your comment below, what do you mean by
"active bitmap". I seems to me I couldn't do anything with
the array until it was activated.
Hmm, just noticed something else that seems weird. There seem
to be 10 and 11 place holders (3 drives each) in the "Array Slot"
field below which is respectively 4 and 5 more places than there
are drives.
Thanks for you help.
------------- begin output --------------
fly:~# mdadm -E /dev/hde1
/dev/hde1:
Magic : a92b4efc
Version : 01
Feature Map : 0x1
Array UUID : 6d57c75c:01b1b110:524cdc82:f2fc9c68
Name : fly:FlyFileServ (local to host fly)
Creation Time : Mon Aug 4 00:59:16 2008
Raid Level : raid5
Raid Devices : 5
-------------- end output ---------------
>
> > >
> > > I also noticed that "mdadm -X <drive>" shows
> > > the pre-grow device size for 2 of the devices
> > > and some discrepancies between event and event cleared
> > > counts.
>
> You cannot grow an array with an active bitmap... or at least you
> shouldn't be able to. Maybe 2.6.18 didn't enforce that. Maybe that
> is what caused the problem - not sure.
>
> > >
> > > One last thing I found curious---from dmesg:
> > >
> > > EXT3-fs error (device hdg1): ext3_check_descriptors: Block
> > > bitmap for group 0 not in group (block 2040936682)!
> > > EXT3-fs: group descriptors corrupted!
> > >
> > > There is not ext3 directly on hdg1. LVM sits between the
> > > and the filesystem, so the above message seems suspect.
>
> Seems like something got confused during boot and the wrong device got
> mounted. That is bad.
>
> NeilBrown
--
whollygoat@letterboxes.org
--
http://www.fastmail.fm - The professional email service
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
01-08-2009, 03:19 AM
RAID5 (mdadm) array hosed after grow operation
On Tue, 6 Jan 2009 09:17:46 +1100, "Neil Brown" <neilb@suse.de> said:
> On Monday January 5, jpiszcz@lucidpixels.com wrote:
> > cc linux-raid
> >
> > On Mon, 5 Jan 2009, whollygoat@letterboxes.org wrote:
> >
> > >
[snip]
> > > The RAID reassembled fine at each boot as the drives
> > > were replaced one by one. After the last drive was
> > > partitioned and added to the array, I issued the
> > > command
> > >
> > > "mdadm -G /dev/md/0 -z max"
> > >
[snip]
>
> You cannot grow an array with an active bitmap... or at least you
> shouldn't be able to. Maybe 2.6.18 didn't enforce that. Maybe that
> is what caused the problem - not sure.
>
I've decided to swap the smaller drives back in and start the upgrade
process over again. Seems that might be the fastest way to fix the
problem.
How should I have done the grow operation if not as above? The only
thing I see in man mdadm is the "-S" switch which seems to disassemble
the array. Maybe this is because I've only tried it on the degraded
array this problem has left with. At any rate, after
mdadm -S /dev/md/0
running
mdadm -D /dev/md/0
gave me an error something to the effect the array didn't exist or
couldn't be found or something like that.
Or maybe do I need to add "--bitmap=none" to remove the bitmap
when running the above grow command?
Hope you can help,
Thanks
goat
--
whollygoat@letterboxes.org
--
http://www.fastmail.fm - Same, same, but different...
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
01-08-2009, 09:12 AM
Alex Samad
RAID5 (mdadm) array hosed after grow operation
On Wed, Jan 07, 2009 at 08:19:05PM -0800, whollygoat@letterboxes.org wrote:
>
> On Tue, 6 Jan 2009 09:17:46 +1100, "Neil Brown" <neilb@suse.de> said:
[snip]
> How should I have done the grow operation if not as above? The only
> thing I see in man mdadm is the "-S" switch which seems to disassemble
> the array. Maybe this is because I've only tried it on the degraded
> array this problem has left with. At any rate, after
>
> mdadm -S /dev/md/0
>
[snip]
>
> Hope you can help,
Hi
I have grown raid5 arrays either by disk number or disk size, I have
only ever used --grow and never used the -z option
I would re copy the info over from the small drives to the large drives
(if you can have all the drives in at one time that might be better.
increase the partition size and then run --grow on the array. I have
done this going from 250G -> 500G -> 750g -> 1T. although when I have
done it, I fail one drive and then add the new drive, expand the
partition size and re add it back into the array, once I have done all
the drives I then ran the grow.
>
> Thanks
>
> goat
> --
>
> whollygoat@letterboxes.org
>
> --
> http://www.fastmail.fm - Same, same, but different...
>
>
> --
> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
> with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>
>
--
unfair competition, n.:
Selling cheaper than we do.
01-15-2009, 02:45 AM
RAID5 (mdadm) array hosed after grow operation
On Tue, 13 Jan 2009 15:07:37 +1100, "Alex Samad" <alex@samad.com.au>
said:
> On Mon, Jan 12, 2009 at 07:46:08PM -0800, whollygoat@letterboxes.org
> wrote:
> >
> > On Fri, 09 Jan 2009 10:45:56 +0000, "John Robinson"
> > <john.robinson@anonymous.org.uk> said:
> > > On 09/01/2009 02:41, whollygoat@letterboxes.org wrote:
>
> [snip]
>
> >
> > But, this has all become moot anyway. When I put the original, smaller
> > drives back in, hoping to do the grow op overagain, I was faced with a
> > similar problem assembling the array, so I'm guessing the problem
> > caused by something other than the grow. I put the larger drives in,
> > zeroed them, and am in the process of recreating the array and
> > file systems to be populated from backups.
>
> just fell into same boat, 3 drives in a 10 drive raid6 died at the same
> time on me, and I was unable to recreate the raid6 so back to the
> backup machine
>
> to answer your question about the smaller disks, there is an option with
> create that says the drives are okay and not to prep them
>
> --assume-clean
>
> so you can recreate the array without over writing stuff
I wonder if that would have helped with the larger drives. Too late
The smaller drives shouldn't have been bad. All I did to them was fail
them one by one and replace them with the larger ones. I was only
trying to assemble them not recreate anything. Thanks anyway. The docs
left some doubt in my mind about --assume-clean. I'll keep that in
mind for the future.
cheers,
wg
--
whollygoat@letterboxes.org
--
http://www.fastmail.fm - One of many happy users:
http://www.fastmail.fm/docs/quotes.html
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
01-15-2009, 04:44 PM
Alex Samad
RAID5 (mdadm) array hosed after grow operation
On Wed, Jan 14, 2009 at 07:45:24PM -0800, whollygoat@letterboxes.org wrote:
> On Tue, 13 Jan 2009 15:07:37 +1100, "Alex Samad" <alex@samad.com.au>
> said:
> > On Mon, Jan 12, 2009 at 07:46:08PM -0800, whollygoat@letterboxes.org
> > wrote:
> > >
[snip]
>
> I wonder if that would have helped with the larger drives. Too late
> The smaller drives shouldn't have been bad. All I did to them was fail
once you fail them they are marked as failed, plus if you did them one
by one they would have different event id and be out of sync
> them one by one and replace them with the larger ones. I was only
> trying to assemble them not recreate anything. Thanks anyway. The docs
> left some doubt in my mind about --assume-clean. I'll keep that in
> mind for the future.
>
> cheers,
>
> wg
> --
>
> whollygoat@letterboxes.org
>
> --
> http://www.fastmail.fm - One of many happy users:
> http://www.fastmail.fm/docs/quotes.html
>
>
--
"We spent a lot of time talking about Africa, as we should. Africa is a nation that suffers from incredible disease."
- George W. Bush
06/14/2001
Gothenburg, Sweden
at a news conference in Europe
01-19-2009, 05:22 AM
RAID5 (mdadm) array hosed after grow operation
On Fri, 16 Jan 2009 04:44:11 +1100, "Alex Samad" <alex@samad.com.au>
said:
> On Wed, Jan 14, 2009 at 07:45:24PM -0800, whollygoat@letterboxes.org
> wrote:
> >
> > I wonder if that would have helped with the larger drives. Too late
> > The smaller drives shouldn't have been bad. All I did to them was fail
>
> once you fail them they are marked as failed, plus if you did them one
> by one they would have different event id and be out of sync
Hmm, I've already recreated the array with the larger drives and
restored
the filesystem from backups, but I think I will try your --assume-clean
on the smaller drives to see if I can recover them. Could be a useful
little thing to have some practice with.
Thanks for the info. Bit by bit it's helping me to understand how md
works (and how to interpret the man page
Cheers
--
whollygoat@letterboxes.org
--
http://www.fastmail.fm - A fast, anti-spam email service.
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org