Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Gentoo User (http://www.linux-archive.org/gentoo-user/)
-   -   recovering RAID from an old server (http://www.linux-archive.org/gentoo-user/328680-recovering-raid-old-server.html)

Iain Buchanan 02-19-2010 11:15 AM

recovering RAID from an old server
 
Hi all,

I'm trying to recover some data from and old Snap Server 4200 (c2003)
belonging to a local charity. It has 4 80Gb IDE drives, and runs some
sort of Linux kernel with their (snap's) own applications on top.

It won't boot to the Snap OS (Guardian OS 3.1.079 - quite an old one,
major version 4 and 5 have succeeded it) but it does boot to a
"recovery" console with a simple web page showing some details. From my
google searches the OS resides on the disks (perhaps just the first
one?) but I don't know where this recovery console is coming from.

I've managed to put Gentoo 2008.0_beta2 minimal (because I happened to
have the iso) on a USB key and made it bootable. It boots and fdisk -l
shows me the four drives and some partitions. (Ubuntu wouldn't even
boot ;)

I don't have the original CD's with the OS recovery on it, nor can I
download it (upgraded versions are $600+). I can't even find any *ahem*
backup versions online in the usual channels.

OK so the question: How can I recover the RAID data? It's RAID5
(probably) with 4 disks. Can I just run some up-to-date raid tools and
mount the drives or do I have to get exactly the same kernal and setup?
I don't have much experience with RAID. (It's software raid - no card
just 2 IDE channels with master and slave).

Once I've recovered the data I don't really care what goes on it - there
are some great free NAS OS's, but it's mounting the RAID partition that
I'm not sure about.

Can I randomly mount partitions read-only or will this screw things up
further?

thanks for any suggestions,
--
Iain Buchanan <iaindb at netspace dot net dot au>

"Don't fear the pen. When in doubt, draw a pretty picture."
--Baker's Third Law of Design.

Stroller 02-19-2010 01:44 PM

recovering RAID from an old server
 
On 19 Feb 2010, at 12:15, Iain Buchanan wrote:

...
Can I randomly mount partitions read-only or will this screw things up
further?


If this is unsafe I will have ketchup & mustard on my baseball cap.

Stroller.

Iain Buchanan 02-20-2010 03:31 AM

recovering RAID from an old server
 
On Fri, 2010-02-19 at 14:44 +0000, Stroller wrote:
> On 19 Feb 2010, at 12:15, Iain Buchanan wrote:
> > ...
> > Can I randomly mount partitions read-only or will this screw things up
> > further?
>
> If this is unsafe I will have ketchup & mustard on my baseball cap.

er... could you translate that? How about "dead horse on my baggy
green"?

Should I be able to mount them automatically and let the SW RAID module
sort it out or do I have to know how they're tied together beforehand?

The message from the kernel is:

Linux version 2.4.19-snap (root@BuildSys) (gcc version egcs-2.91.66
19990314/Linux (egcs-1.1.2 release)) #1 Tue Jul 13 20:24:35 PDT 2004

and later there's output from "md" which is (I assume) the linux
software raid module (this is a grep, so there are other messages in
between):

md: linear personality registered as nr 1
md: raid0 personality registered as nr 2
md: raid1 personality registered as nr 3
md: raid5 personality registered as nr 4
md: spare personality registered as nr 8
md: md driver 0.91.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
md: autorun ...
md: ... autorun DONE.
md: bind<hdg2,1>
md: bind<hde2,2>
md: bind<hda2,3>
md: hda2's event counter: 0000039d
md: hde2's event counter: 0000039d
md: hdg2's event counter: 0000039d
md: md100: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md100: max total readahead window set to 124k
md100: 1 data-disks, max readahead per data-disk: 124k
raid1: md100, not all disks are operational -- trying to recover array
raid1: raid set md100 active with 3 out of 4 mirrors
md: updating md100 RAID superblock on device
md: hda2 [events: 0000039e]<6>(write) hda2's sb offset: 546112
md: recovery thread got woken up ...
md: looking for a shared spare drive
md100: no spare disk to reconstruct array! -- continuing in degraded
mode
md: recovery thread finished ...
md: hde2 [events: 0000039e]<6>(write) hde2's sb offset: 546112
md: hdg2 [events: 0000039e]<6>(write) hdg2's sb offset: 546112
md: bind<hdg5,1>
md: bind<hde5,2>
md: bind<hda5,3>
md: hda5's event counter: 000003a4
md: hde5's event counter: 000003a4
md: hdg5's event counter: 000003a4
md: md101: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md101: max total readahead window set to 124k
md101: 1 data-disks, max readahead per data-disk: 124k
raid1: md101, not all disks are operational -- trying to recover array
raid1: raid set md101 active with 3 out of 4 mirrors
md: updating md101 RAID superblock on device
md: hda5 [events: 000003a5]<6>(write) hda5's sb offset: 273024
md: recovery thread got woken up ...
md: looking for a shared spare drive
md101: no spare disk to reconstruct array! -- continuing in degraded
mode
md: looking for a shared spare drive
md100: no spare disk to reconstruct array! -- continuing in degraded
mode
md: recovery thread finished ...
md: hde5 [events: 000003a5]<6>(write) hde5's sb offset: 273024
md: hdg5 [events: 000003a5]<6>(write) hdg5's sb offset: 273024
XFS mounting filesystem md(9,100)
Ending clean XFS mount for filesystem: md(9,100)

The partitions look like:
9 100 546112 md100
9 101 273024 md101
34 0 78150744 hdg
34 1 16041 hdg1
34 2 546210 hdg2
34 3 1 hdg3
34 4 76656636 hdg4
34 5 273104 hdg5
34 6 273104 hdg6
33 0 78150744 hde
33 1 16041 hde1
33 2 546210 hde2
33 3 1 hde3
33 4 76656636 hde4
33 5 273104 hde5
33 6 273104 hde6
22 0 78150744 hdc
22 1 16041 hdc1
22 2 546210 hdc2
22 3 1 hdc3
22 4 76656636 hdc4
22 5 273104 hdc5
22 6 273104 hdc6
3 0 78150744 hda
3 1 16041 hda1
3 2 546210 hda2
3 3 1 hda3
3 4 76656636 hda4
3 5 273104 hda5
3 6 273104 hda6

many thanks!
--
Iain Buchanan <iaindb at netspace dot net dot au>

By golly, I'm beginning to think Linux really *is* the best thing since
sliced bread.
-- Vance Petree, Virginia Power

Iain Buchanan 02-20-2010 05:29 AM

recovering RAID from an old server
 
On Sat, 2010-02-20 at 14:01 +0930, Iain Buchanan wrote:
> On Fri, 2010-02-19 at 14:44 +0000, Stroller wrote:
> > On 19 Feb 2010, at 12:15, Iain Buchanan wrote:
> > > ...
> > > Can I randomly mount partitions read-only or will this screw things up
> > > further?

OK, I've randomly mounted partitions, and now I'm stuck because I don't
know what the original /etc/raidtab was. /proc/mdstat just says:

Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>

which looks like nothing is used in any RAID set. Autodetect seems not
to be working, perhaps because the ID wasn't set to 0xFD or 253. Each
drive has identical partitions:
Device Boot Start End Blocks Id System
/dev/hda1 * 1 2 16041+ 83 Linux
/dev/hda2 3 70 546210 83 Linux
/dev/hda3 71 138 546210 5 Extended
/dev/hda4 139 9682 76656636 83 Linux
/dev/hda5 71 104 273104+ 83 Linux
/dev/hda6 105 138 273104+ 83 Linux

and /dev/hd[aceg]1 is "/boot" on each one.

all the other /dev/hd[aceg][2-6] mount says:
mount: unknown filesystem type 'linux_raid_member
obviously this is the raid. But how do I get to it?

All "/boot"s mount ok and are readable with some kernel files and stuff,
however /dev/hdc1 give some errors:

hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=585, sector=575
hdc: possibly failed opcode: 0x25
end_request: I/O error, dev hdc, sector 575
__ratelimit: 22 callbacks suppressed
Buffer I/O error on device hdc1, logical block 528
Buffer I/O error on device hdc1, logical block 529
Buffer I/O error on device hdc1, logical block 530
Buffer I/O error on device hdc1, logical block 531
Buffer I/O error on device hdc1, logical block 532
Buffer I/O error on device hdc1, logical block 533
Buffer I/O error on device hdc1, logical block 534
Buffer I/O error on device hdc1, logical block 535
Buffer I/O error on device hdc1, logical block 536
Buffer I/O error on device hdc1, logical block 537

so it looks like there's some problems with hdc. Are there any disk
hardware testing tools on the gentoo minimal live cd?

thanks,
--
Iain Buchanan <iaindb at netspace dot net dot au>

It's simply unbelievable how much energy and creativity people have
invested into creating contradictory, bogus and stupid licenses...
--- Sven Rudolph about licences in debian/non-free.

Mick 02-20-2010 07:41 AM

recovering RAID from an old server
 
On Saturday 20 February 2010 06:29:03 Iain Buchanan wrote:

> so it looks like there's some problems with hdc. Are there any disk
> hardware testing tools on the gentoo minimal live cd?

If you want to check the disk use sys-apps/smartmontools, but this problem may
be a fs corruption - which could of course have been caused by the hardware
failing.

--
Regards,
Mick

Francesco Talamona 02-20-2010 08:46 AM

recovering RAID from an old server
 
> Should I be able to mount them automatically and let the SW RAID
> module sort it out or do I have to know how they're tied together
> beforehand?

> md: looking for a shared spare drive
> md100: no spare disk to reconstruct array! -- continuing in degraded
> mode
> md: recovery thread finished ...
> md: hde5 [events: 000003a5]<6>(write) hde5's sb offset: 273024
> md: hdg5 [events: 000003a5]<6>(write) hdg5's sb offset: 273024
> XFS mounting filesystem md(9,100)
> Ending clean XFS mount for filesystem: md(9,100)
>
> The partitions look like:
> 9 100 546112 md100
> 9 101 273024 md101

It seems it has correctly mounted its partition... Can't you find it?

I have the feeling that you are messing it up. If I understand it
correctly the server has an hardware RAID controller, that has to be
managed via its drivers.

Software RAID tools aren't suitable to mount correctly this setup, I
would mount random partition for testing purposes only, on a spare
machine.

The wiser thing to do is find an old livecd supporting PERC SAS (or
whatever raid card is in that Snap) RAID cards and assemble the array in
degraded mode for data recovery.

Another thing can come very useful: we once had a similar problem, we
ended up borrowing one identical disc from another running server to put
the array back online, we recovered our data, then restored the other
server's array.

HTH
Francesco

--
Linux Version 2.6.32-gentoo-r5, Compiled #2 SMP PREEMPT Wed Feb 17
20:30:02 CET 2010
Two 1GHz AMD Athlon 64 Processors, 4GB RAM, 4021.84 Bogomips Total
aemaeth

Stroller 02-20-2010 12:39 PM

recovering RAID from an old server
 
On 20 Feb 2010, at 04:31, Iain Buchanan wrote:


On Fri, 2010-02-19 at 14:44 +0000, Stroller wrote:

On 19 Feb 2010, at 12:15, Iain Buchanan wrote:

...
Can I randomly mount partitions read-only or will this screw
things up

further?


If this is unsafe I will have ketchup & mustard on my baseball cap.


er... could you translate that? How about "dead horse on my baggy
green"?


http://idioms.thefreedictionary.com/I'll+eat+my+hat

I just don't see how you can break anything *as long as* you don't let
the system write anything to the disks. How can read-only be unsafe?


One might be paranoid enough to clone images of the drive before
proceeding, however.


My one concern is over how you know this system uses software RAID.
You know that EIDE hardware RAID was available, right? I'm sure this
would rarely be available built-in to the motherboard.


Stroller.

Iain Buchanan 02-20-2010 12:51 PM

recovering RAID from an old server
 
On Sat, 2010-02-20 at 10:46 +0100, Francesco Talamona wrote:
> > Should I be able to mount them automatically and let the SW RAID
> > module sort it out or do I have to know how they're tied together
> > beforehand?
>
> > md: looking for a shared spare drive
> > md100: no spare disk to reconstruct array! -- continuing in degraded
> > mode
> > md: recovery thread finished ...
> > md: hde5 [events: 000003a5]<6>(write) hde5's sb offset: 273024
> > md: hdg5 [events: 000003a5]<6>(write) hdg5's sb offset: 273024
> > XFS mounting filesystem md(9,100)
> > Ending clean XFS mount for filesystem: md(9,100)
> >
> > The partitions look like:
> > 9 100 546112 md100
> > 9 101 273024 md101
>
> It seems it has correctly mounted its partition... Can't you find it?

This is with the server recovery console, which is basically just a web
page. No shell access. There's not much I can do to get at md100 and
md101 (is this what software RAID devices usually appear as?)

> I have the feeling that you are messing it up. If I understand it
> correctly the server has an hardware RAID controller, that has to be
> managed via its drivers.

I think it's software RAID. There is no RAID controller AFAICT. All 4
drives are visible to the BIOS as Primary and Secondary Master and
Slaves.

> Another thing can come very useful: we once had a similar problem, we
> ended up borrowing one identical disc from another running server to put
> the array back online, we recovered our data, then restored the other
> server's array.

That's a possibility given what I can find on Google, however these are
few and far between, so I'd have to find someone willing to send their
drive to me (or vice versa) or send me the OS, which overlandstorage
doesn't like!

thanks,
--
Iain Buchanan <iaindb at netspace dot net dot au>

Come quickly, I am tasting stars!
-- Dom Perignon, upon discovering champagne.

Iain Buchanan 02-20-2010 12:59 PM

recovering RAID from an old server
 
On Sat, 2010-02-20 at 13:39 +0000, Stroller wrote:
> On 20 Feb 2010, at 04:31, Iain Buchanan wrote:
>
> > On Fri, 2010-02-19 at 14:44 +0000, Stroller wrote:
> >> On 19 Feb 2010, at 12:15, Iain Buchanan wrote:
> >>> ...
> >>> Can I randomly mount partitions read-only or will this screw
> >>> things up
> >>> further?
> >>
> >> If this is unsafe I will have ketchup & mustard on my baseball cap.
> >
> > er... could you translate that? How about "dead horse on my baggy
> > green"?
>
> http://idioms.thefreedictionary.com/I'll+eat+my+hat

yeah, I got that, I was just picking on your use of ketchup & baseball.
Over here it's tomatoe sauce (dead horse) and cricket (baggy greens) :)
Most of my jokes need explaining %-)

> I just don't see how you can break anything *as long as* you don't let
> the system write anything to the disks. How can read-only be unsafe?

Perhaps something to do with the superblock or "last mount time" or
something? I don't know! I know that mounting a drive while a system
is hibernated, even ro, will kill kittens.

> One might be paranoid enough to clone images of the drive before
> proceeding, however.

I don't have enough spare...

> My one concern is over how you know this system uses software RAID.
> You know that EIDE hardware RAID was available, right? I'm sure this
> would rarely be available built-in to the motherboard.

well there appears to be no RAID controller, unless it's onboard, but as
I mentioned to Francessco the BIOS can see all drives, so can gentoo
minimal...

I've since found that the OS is in flash RAM, and only the help files
are on disk, so maybe I have bigger problems if I can't boot :( I hope
to get a copy of Guardian OS somehow...

thanks,
--
Iain Buchanan <iaindb at netspace dot net dot au>

"Go ahead, bake my quiche"
-- Magrat instructs the castle cook
(Terry Pratchett, Lords and Ladies)

Francesco Talamona 02-20-2010 02:08 PM

recovering RAID from an old server
 
On Saturday 20 February 2010, Iain Buchanan wrote:
> On Sat, 2010-02-20 at 10:46 +0100, Francesco Talamona wrote:
> > > Should I be able to mount them automatically and let the SW RAID
> > > module sort it out or do I have to know how they're tied
> > > together beforehand?
> > >
> > > md: looking for a shared spare drive
> > > md100: no spare disk to reconstruct array! -- continuing in
> > > degraded mode
> > > md: recovery thread finished ...
> > > md: hde5 [events: 000003a5]<6>(write) hde5's sb offset: 273024
> > > md: hdg5 [events: 000003a5]<6>(write) hdg5's sb offset: 273024
> > > XFS mounting filesystem md(9,100)
> > > Ending clean XFS mount for filesystem: md(9,100)
> > >
> > > The partitions look like:
> > > 9 100 546112 md100
> > > 9 101 273024 md101
> >
> > It seems it has correctly mounted its partition... Can't you find
> > it?
>
> This is with the server recovery console, which is basically just a
> web page. No shell access. There's not much I can do to get at
> md100 and md101 (is this what software RAID devices usually appear
> as?)
>
> > I have the feeling that you are messing it up. If I understand it
> > correctly the server has an hardware RAID controller, that has to
> > be managed via its drivers.
>
> I think it's software RAID. There is no RAID controller AFAICT. All
> 4 drives are visible to the BIOS as Primary and Secondary Master and
> Slaves.

This isn't a proof: most hardware RAID are proprietary software
solutions pretending to be hardware. Linux without the driver can't see
the logical volume and shows all the physical drives.

You should do some research about that server hardware... Aren't snap
equipped with PERC controller?.

> > Another thing can come very useful: we once had a similar problem,
> > we ended up borrowing one identical disc from another running
> > server to put the array back online, we recovered our data, then
> > restored the other server's array.
>
> That's a possibility given what I can find on Google, however these
> are few and far between, so I'd have to find someone willing to send
> their drive to me (or vice versa) or send me the OS, which
> overlandstorage doesn't like!

What happens if you physically remove the drive marked as bad?

You may image it for backup, then format it at low level, then put it
back in place as if it was brand new. Or add a similar disk to be
considered spare by the controller (given that it is looking for a spare
disk in first instance).

Most controller have automated procedures to manage failures, disk swaps
and so on.

For this reason you can't be sure that the inspection operations you are
doing are read only. Unless the drives are attached to another machine
with a trusted OS doing nothing on its own.

The ideas given above may let you to waste all of your data, be very
careful and patient.

Good luck.
Francesco

--
Linux Version 2.6.32-gentoo-r5, Compiled #2 SMP PREEMPT Wed Feb 17
20:30:02 CET 2010
Two 2.9GHz AMD Athlon 64 Processors, 4GB RAM, 11659 Bogomips Total
aemaeth


All times are GMT. The time now is 01:19 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.