FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 02-28-2012, 11:27 PM
Kahlil Hodgson
 
Default Software RAID1 with CentOS-6.2

Hello,

Having a problem with software RAID that is driving me crazy.

Here's the details:

1. CentOS 6.2 x86_64 install from the minimal iso (via pxeboot).
2. Reasonably good PC hardware (i.e. not budget, but not server grade either)
with a pair of 1TB Western Digital SATA3 Drives.
3. Drives are plugged into the SATA3 ports on the mainboard (both drives and
cables say they can do 6Gb/s).
4. During the install I set up software RAID1 for the two drives with two raid
partitions:
md0 - 500M for /boot
md1 - "the rest" for a physical volume
5. Setup LVM on md1 in the standard slash, swap, home layout

Install goes fine (actually really fast) and I reboot into CentoS 6.2. Next I
ran yum update, added a few minor packages and performed some basic
configuration.

Now I start to get I/O errors on printed on the console. Run 'mdadm -D
/dev/md1' and see the array is degraded and /dev/sdb2 has been marked as
faulty.

Okay, fair enough, I've got at least one bad drive. I boot the system from a
live usb and run the short and long SMART tests on both drive. No problems
reported but I know that can be misleading, so I'm going to have to gather some
evidence before I try to return these drives. I run badblocks in destructive
mode on both drives as follows

badblocks -w -b 4096 -c 98304 -s /dev/sda
badblocks -w -b 4096 -c 98304 -s /dev/sdb

Come back the next day and see that no errors are reported. Er thats odd. I
check the SMART data in case badblocks activity has triggered something.
Nope. Maybe I screwed up the install somehow?

So I start again and repeat the install process very carefully. This time I
check the raid array straight after boot.

mdadm -D /dev/md0 - all is fine.
mdadm -D /dev/md1 - the two drives are resyncing.

Okay, that is odd. The RAID1 array was created at the start of the install
process, before any software was installed. Surely it should be in sync
already? Googled a bit and found a post were someone else had seen same thing
happen. The advice was to just wait until the drives sync so the 'blocks
match exactly' but I'm not really happy with the explanation. At this rate
its going to take a whole day to do a single minimal install and I'm sure I
would have heard others complaining about the process.

Anyway, I leave the system to sync for the rest of the day. When I get back to
it I see the same (similar) I/O errors on the console and mdadm shows the RAID
array is degraded, /dev/sdb2 has been marked as faulty. This time I notice
that the I/O errors all refer to /dev/sda. Have to reboot because the fs is
now readonly. When the system comes back up, its trying to resync the drive
again. Eh?

Any ideas what is going on here? If its bad drives, I really need some
confirmation independent of the software raid failing. I thought SMART or
badblocks give me that. Perhaps it has nothing to do with the drives. Could a
problem with the mainboard or the memory cause this issue? Is it a SATA3
issue? Should I try it on the 3Gb/s channels since there's probably little
speed difference with non-SSDs?

Cheers,

Kal


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 02-28-2012, 11:43 PM
Keith Keller
 
Default Software RAID1 with CentOS-6.2

On 2012-02-29, Kahlil Hodgson <kahlil.hodgson@dealmax.com.au> wrote:
>
> 2. Reasonably good PC hardware (i.e. not budget, but not server grade either)
> with a pair of 1TB Western Digital SATA3 Drives.

One thing you can try is to download WD's drive tester and throw it at
your drives. It seems unlikely to find anything, but you never know.
The tester is available on the UBCD bootable CD image (which has lots of
other handy tools).

Which model drives do you have? I've found a lot of variability between
WDxxEARS vs their RE drives.

> Okay, that is odd. The RAID1 array was created at the start of the install
> process, before any software was installed. Surely it should be in sync
> already? Googled a bit and found a post were someone else had seen same thing
> happen. The advice was to just wait until the drives sync so the 'blocks
> match exactly' but I'm not really happy with the explanation.

Supposedly, at least with RAID[456], the array is completely usable when
it's resyncing after an initial creation. In practice, I found that
writing significant amounts of data to that array killed resync
performance, so I just let the resync finish before doing any heavy
lifting on the array.

> Anyway, I leave the system to sync for the rest of the day. When I get back to
> it I see the same (similar) I/O errors on the console and mdadm shows the RAID
> array is degraded, /dev/sdb2 has been marked as faulty. This time I notice
> that the I/O errors all refer to /dev/sda. Have to reboot because the fs is
> now readonly. When the system comes back up, its trying to resync the drive
> again. Eh?

This sounds a little odd. You're having IO errors on sda, but sdb2 has
been kicked out of the RAID? Do you have any other errors in
/var/log/messages that relate to sdb, and/or the errors right around
when the md devices failed?


--keith

--
kkeller-usenet@wombat.san-francisco.ca.us


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 02-28-2012, 11:48 PM
Scott Silva
 
Default Software RAID1 with CentOS-6.2

on 2/28/2012 4:27 PM Kahlil Hodgson spake the following:
> Hello,
>
> Having a problem with software RAID that is driving me crazy.
>
> Here's the details:
>
> 1. CentOS 6.2 x86_64 install from the minimal iso (via pxeboot).
> 2. Reasonably good PC hardware (i.e. not budget, but not server grade either)
> with a pair of 1TB Western Digital SATA3 Drives.
> 3. Drives are plugged into the SATA3 ports on the mainboard (both drives and
> cables say they can do 6Gb/s).
> 4. During the install I set up software RAID1 for the two drives with two raid
> partitions:
> md0 - 500M for /boot
> md1 - "the rest" for a physical volume
> 5. Setup LVM on md1 in the standard slash, swap, home layout
>
> Install goes fine (actually really fast) and I reboot into CentoS 6.2. Next I
> ran yum update, added a few minor packages and performed some basic
> configuration.
>
> Now I start to get I/O errors on printed on the console. Run 'mdadm -D
> /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as
> faulty.
>
> Okay, fair enough, I've got at least one bad drive. I boot the system from a
> live usb and run the short and long SMART tests on both drive. No problems
> reported but I know that can be misleading, so I'm going to have to gather some
> evidence before I try to return these drives. I run badblocks in destructive
> mode on both drives as follows
>
> badblocks -w -b 4096 -c 98304 -s /dev/sda
> badblocks -w -b 4096 -c 98304 -s /dev/sdb
>
> Come back the next day and see that no errors are reported. Er thats odd. I
> check the SMART data in case badblocks activity has triggered something.
> Nope. Maybe I screwed up the install somehow?
>
> So I start again and repeat the install process very carefully. This time I
> check the raid array straight after boot.
>
> mdadm -D /dev/md0 - all is fine.
> mdadm -D /dev/md1 - the two drives are resyncing.
>
> Okay, that is odd. The RAID1 array was created at the start of the install
> process, before any software was installed. Surely it should be in sync
> already? Googled a bit and found a post were someone else had seen same thing
> happen. The advice was to just wait until the drives sync so the 'blocks
> match exactly' but I'm not really happy with the explanation. At this rate
> its going to take a whole day to do a single minimal install and I'm sure I
> would have heard others complaining about the process.
>
> Anyway, I leave the system to sync for the rest of the day. When I get back to
> it I see the same (similar) I/O errors on the console and mdadm shows the RAID
> array is degraded, /dev/sdb2 has been marked as faulty. This time I notice
> that the I/O errors all refer to /dev/sda. Have to reboot because the fs is
> now readonly. When the system comes back up, its trying to resync the drive
> again. Eh?
>
> Any ideas what is going on here? If its bad drives, I really need some
> confirmation independent of the software raid failing. I thought SMART or
> badblocks give me that. Perhaps it has nothing to do with the drives. Could a
> problem with the mainboard or the memory cause this issue? Is it a SATA3
> issue? Should I try it on the 3Gb/s channels since there's probably little
> speed difference with non-SSDs?
>
> Cheers,
>
> Kal
First thing... Are they green drives? Green drives power down randomly and can
cause these types of errors... Also, maybe the 6GB sata isn't fully supported
by linux and that board... Try the 3 GB channels



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 02-29-2012, 12:08 AM
Kahlil Hodgson
 
Default Software RAID1 with CentOS-6.2

Hi Scott,

On Tue, 2012-02-28 at 16:48 -0800, Scott Silva wrote:
> First thing... Are they green drives? Green drives power down randomly and can
> cause these types of errors...

These are 'Black' drives.

> Also, maybe the 6GB sata isn't fully supported
> by linux and that board... Try the 3 GB channels

Yer, I was thinking that might be the case. I'll give that a go.

Thanks,

Kal


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 02-29-2012, 12:24 AM
Kahlil Hodgson
 
Default Software RAID1 with CentOS-6.2

Hi Keith,

On Tue, 2012-02-28 at 16:43 -0800, Keith Keller wrote:
> One thing you can try is to download WD's drive tester and throw it at
> your drives. It seems unlikely to find anything, but you never know.
> The tester is available on the UBCD bootable CD image (which has lots of
> other handy tools).

Ah cool. I'll give that a go :-)

> Which model drives do you have? I've found a lot of variability between
> WDxxEARS vs their RE drives.

These are WD1002FAEX drives (qTB, SATA3, 7200rpm, 64MB).

> Supposedly, at least with RAID[456], the array is completely usable when
> it's resyncing after an initial creation. In practice, I found that
> writing significant amounts of data to that array killed resync
> performance, so I just let the resync finish before doing any heavy
> lifting on the array.

Yeah. That was my understanding. Thanks for the confirmation:-)

> > Anyway, I leave the system to sync for the rest of the day. When I get back to
> > it I see the same (similar) I/O errors on the console and mdadm shows the RAID
> > array is degraded, /dev/sdb2 has been marked as faulty. This time I notice
> > that the I/O errors all refer to /dev/sda. Have to reboot because the fs is
> > now readonly. When the system comes back up, its trying to resync the drive
> > again. Eh?
>
> This sounds a little odd. You're having IO errors on sda, but sdb2 has
> been kicked out of the RAID? Do you have any other errors in
> /var/log/messages that relate to sdb, and/or the errors right around
> when the md devices failed?

Having a little trouble getting at the log files. When it fails the fs
goes read-only and I can't run any programs (less, tail, ...) except
'cat' against the log file or dmesg output (I get I/O errors). On
reboot there's nothing in the log files, presumably because they could
not be written to. May have to have to set up a remote logging to get
at this (PITA).

Thanks for the suggestions :-)

Kal

--
Kahlil (Kal) Hodgson GPG: C9A02289
Head of Technology (m) +61 (0) 4 2573 0382
DealMax Pty Ltd (w) +61 (0) 3 9008 5281

Suite 1005
401 Docklands Drive
Docklands VIC 3008 Australia

"All parts should go together without forcing. You must remember that
the parts you are reassembling were disassembled by you. Therefore,
if you can't get them together again, there must be a reason. By all
means, do not use a hammer." -- IBM maintenance manual, 1925




_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 02-29-2012, 12:30 AM
"Luke S. Crawford"
 
Default Software RAID1 with CentOS-6.2

On Wed, Feb 29, 2012 at 11:27:53AM +1100, Kahlil Hodgson wrote:
> Now I start to get I/O errors on printed on the console. Run 'mdadm -D
> /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as
> faulty.

what I/O errors?


> So I start again and repeat the install process very carefully. This time I
> check the raid array straight after boot.
>
> mdadm -D /dev/md0 - all is fine.
> mdadm -D /dev/md1 - the two drives are resyncing.
>
> Okay, that is odd. The RAID1 array was created at the start of the install
> process, before any software was installed. Surely it should be in sync
> already? Googled a bit and found a post were someone else had seen same thing
> happen. The advice was to just wait until the drives sync so the 'blocks
> match exactly' but I'm not really happy with the explanation. At this rate
> its going to take a whole day to do a single minimal install and I'm sure I
> would have heard others complaining about the process.

Yeah, it's normal for a raid1 to 'sync' when you first create it.
the odd part is the I/O errors.

> Any ideas what is going on here? If its bad drives, I really need some
> confirmation independent of the software raid failing. I thought SMART or
> badblocks give me that. Perhaps it has nothing to do with the drives. Could a
> problem with the mainboard or the memory cause this issue? Is it a SATA3
> issue? Should I try it on the 3Gb/s channels since there's probably little
> speed difference with non-SSDs?

look up the drive errors.

Oh, and my experience? both wd and seagate won't complain if you
error on the side of 'when in doubt, return the drive' - that's what I
do.

But yeah, usually smart will report something... at least a high reallocated
sectors or something.


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 02-29-2012, 12:57 AM
Kahlil Hodgson
 
Default Software RAID1 with CentOS-6.2

On Tue, 2012-02-28 at 20:30 -0500, Luke S. Crawford wrote:
> On Wed, Feb 29, 2012 at 11:27:53AM +1100, Kahlil Hodgson wrote:
> > Now I start to get I/O errors on printed on the console. Run 'mdadm -D
> > /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as
> > faulty.
>
> what I/O errors?

Good point Okay, copied manually from the console:

end_request: I/O error, dev sda, sector 8690896
Buffer I/O error on device dm-0, logical block 1081344
JBD2: I/O error detected wen updating journal superblock for dm-0-8
end_request: I/0 error, dev sda, sector 1026056
etc

I gather device mapper and journal errors are caused by the preceding
low level error.

> Oh, and my experience? both wd and seagate won't complain if you
> error on the side of 'when in doubt, return the drive' - that's what I
> do.

Yeah, was hopping to avoid the delay though. Its already sucked two
days of my time so I might just have to bite the bullet :-(

Cheers!

Kal




_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 02-29-2012, 12:59 AM
Ellen Shull
 
Default Software RAID1 with CentOS-6.2

On Tue, Feb 28, 2012 at 5:27 PM, Kahlil Hodgson
<kahlil.hodgson@dealmax.com.au> wrote:
> Now I start to get I/O errors on printed on the console. *Run 'mdadm -D
> /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as
> faulty.

I had a problem like this once. In a heterogeneous array of 80 GB
PATA drives (it was a while ago), the one WD drive kept dropping out
like this. WD's diagnostic tool showed a problem, so I RMA'ed the
drive... only to discover the replacement did the same thing on the
system, but checked out just fine on a different system. Turned out
to be a combination of a power supply with less-than-stellar
regulation (go Enermax...) and the WD was particularly sensitive to
it; nothing else in the system seemed to be affected Replacing the
power supply finally eliminated the issue.

--ln
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 02-29-2012, 01:18 AM
Emmett Culley
 
Default Software RAID1 with CentOS-6.2

On 02/28/2012 04:27 PM, Kahlil Hodgson wrote:
> Hello,
>
> Having a problem with software RAID that is driving me crazy.
>
> Here's the details:
>
> 1. CentOS 6.2 x86_64 install from the minimal iso (via pxeboot).
> 2. Reasonably good PC hardware (i.e. not budget, but not server grade either)
> with a pair of 1TB Western Digital SATA3 Drives.
> 3. Drives are plugged into the SATA3 ports on the mainboard (both drives and
> cables say they can do 6Gb/s).
> 4. During the install I set up software RAID1 for the two drives with two raid
> partitions:
> md0 - 500M for /boot
> md1 - "the rest" for a physical volume
> 5. Setup LVM on md1 in the standard slash, swap, home layout
>
> Install goes fine (actually really fast) and I reboot into CentoS 6.2. Next I
> ran yum update, added a few minor packages and performed some basic
> configuration.
>
> Now I start to get I/O errors on printed on the console. Run 'mdadm -D
> /dev/md1' and see the array is degraded and /dev/sdb2 has been marked as
> faulty.
>
> Okay, fair enough, I've got at least one bad drive. I boot the system from a
> live usb and run the short and long SMART tests on both drive. No problems
> reported but I know that can be misleading, so I'm going to have to gather some
> evidence before I try to return these drives. I run badblocks in destructive
> mode on both drives as follows
>
> badblocks -w -b 4096 -c 98304 -s /dev/sda
> badblocks -w -b 4096 -c 98304 -s /dev/sdb
>
> Come back the next day and see that no errors are reported. Er thats odd. I
> check the SMART data in case badblocks activity has triggered something.
> Nope. Maybe I screwed up the install somehow?
>
> So I start again and repeat the install process very carefully. This time I
> check the raid array straight after boot.
>
> mdadm -D /dev/md0 - all is fine.
> mdadm -D /dev/md1 - the two drives are resyncing.
>
> Okay, that is odd. The RAID1 array was created at the start of the install
> process, before any software was installed. Surely it should be in sync
> already? Googled a bit and found a post were someone else had seen same thing
> happen. The advice was to just wait until the drives sync so the 'blocks
> match exactly' but I'm not really happy with the explanation. At this rate
> its going to take a whole day to do a single minimal install and I'm sure I
> would have heard others complaining about the process.
>
> Anyway, I leave the system to sync for the rest of the day. When I get back to
> it I see the same (similar) I/O errors on the console and mdadm shows the RAID
> array is degraded, /dev/sdb2 has been marked as faulty. This time I notice
> that the I/O errors all refer to /dev/sda. Have to reboot because the fs is
> now readonly. When the system comes back up, its trying to resync the drive
> again. Eh?
>
> Any ideas what is going on here? If its bad drives, I really need some
> confirmation independent of the software raid failing. I thought SMART or
> badblocks give me that. Perhaps it has nothing to do with the drives. Could a
> problem with the mainboard or the memory cause this issue? Is it a SATA3
> issue? Should I try it on the 3Gb/s channels since there's probably little
> speed difference with non-SSDs?
>
> Cheers,
>
> Kal
>
>
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos
>
>
I just had a very similar problem with a raid 10 array with four new 1TB drives. It turned out to be the SATA cable.

I first tried a new drive and even replaced the five disk hot plug carrier. It was always the same logical drive (/dev/sdb). I then tried using an additional SATA adapter card. That cinched it, as the only thing common to all the above was the SATA cable.

All has been well for a week now.

I should have tired replacing the cable first :-)

Emmett

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 02-29-2012, 01:29 AM
John R Pierce
 
Default Software RAID1 with CentOS-6.2

On 02/28/12 5:57 PM, Kahlil Hodgson wrote:
> end_request: I/O error, dev sda, sector 8690896
> Buffer I/O error on device dm-0, logical block 1081344
> JBD2: I/O error detected wen updating journal superblock for dm-0-8
> end_request: I/0 error, dev sda, sector 1026056

there's no more info on those I/O error's in DMESG or whatever?

sounds like /dev/sda may be a bad drive. it happens.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 07:40 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org