FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 11-11-2010, 01:58 AM
Gilbert Sebenste
 
Default Question about a hard drive error

Hey everyone,

I just got one of these today:

Nov 10 16:07:54 stormy kernel: sd 0:0:0:0: SCSI error: return code =
0x08000000
Nov 10 16:07:54 stormy kernel: sda: Current: sense key: Medium Error
Nov 10 16:07:54 stormy kernel: Add. Sense: Unrecovered read error
Nov 10 16:07:54 stormy kernel:
Nov 10 16:07:54 stormy kernel: Info fld=0x0
Nov 10 16:07:54 stormy kernel: end_request: I/O error, dev sda, sector
3896150669
Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743752)
Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743760)
Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743768)
Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743776)
Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743784)
Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743792)
Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743800)
Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743808)

My question is this: I have RAID00 set up, but don't really understand
it well. This is how my disks are set up:

Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
1886608544 296733484 1492495120 17% /
/dev/sda1 101086 19877 75990 21% /boot
tmpfs 1684312 1204416 479896 72% /dev/shm


Which one is having the trouble? Any ideas so I can swap it out?

************************************************** *****************************
Gilbert Sebenste ********
(My opinions only!) ******
Staff Meteorologist, Northern Illinois University ****
E-mail: sebenste@weather.admin.niu.edu ***
web: http://weather.admin.niu.edu **
************************************************** *****************************
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-11-2010, 02:31 AM
John R Pierce
 
Default Question about a hard drive error

On 11/10/10 6:58 PM, Gilbert Sebenste wrote:
> Hey everyone,
>
> I just got one of these today:
>
> Nov 10 16:07:54 stormy kernel: sd 0:0:0:0: SCSI error: return code =
> 0x08000000
> Nov 10 16:07:54 stormy kernel: sda: Current: sense key: Medium Error
> Nov 10 16:07:54 stormy kernel: Add. Sense: Unrecovered read error
> Nov 10 16:07:54 stormy kernel:
> Nov 10 16:07:54 stormy kernel: Info fld=0x0
> Nov 10 16:07:54 stormy kernel: end_request: I/O error, dev sda, sector
> 3896150669

see where it says dev sda ? thats physical drive zero which has a read
error on that sector.


> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743752)
> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743760)
> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743768)
> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743776)
> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743784)
> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743792)
> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743800)
> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743808)
>
> My question is this: I have RAID00 set up, but don't really understand
> it well. This is how my disks are set up:
>
> Filesystem 1K-blocks Used Available Use% Mounted on
> /dev/mapper/VolGroup00-LogVol00
> 1886608544 296733484 1492495120 17% /
> /dev/sda1 101086 19877 75990 21% /boot
> tmpfs 1684312 1204416 479896 72% /dev/shm
>

that is not how your disks are setup, thats how your FILE SYSTEMS are setup.

that dev/mapper thing is a LVM volume. you can display the physical
volumes behind a LVM with the command 'pvs'




> Which one is having the trouble? Any ideas so I can swap it out?


raid0 is not suitable for reliability. if any one drive in the raid0
fails (or is removed) the whole volume has failed and will become unusable.


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-15-2010, 05:41 PM
Gilbert Sebenste
 
Default Question about a hard drive error

On Wed, 10 Nov 2010, John R Pierce wrote:

> On 11/10/10 6:58 PM, Gilbert Sebenste wrote:
>> Hey everyone,
>>
>> I just got one of these today:
>>
>> Nov 10 16:07:54 stormy kernel: sd 0:0:0:0: SCSI error: return code =
>> 0x08000000
>> Nov 10 16:07:54 stormy kernel: sda: Current: sense key: Medium Error
>> Nov 10 16:07:54 stormy kernel: Add. Sense: Unrecovered read error
>> Nov 10 16:07:54 stormy kernel:
>> Nov 10 16:07:54 stormy kernel: Info fld=0x0
>> Nov 10 16:07:54 stormy kernel: end_request: I/O error, dev sda, sector
>> 3896150669
>
> see where it says dev sda ? thats physical drive zero which has a read
> error on that sector.
>
>
>> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743752)
>> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743760)
>> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743768)
>> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743776)
>> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743784)
>> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743792)
>> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743800)
>> Nov 10 16:07:54 stormy kernel: Read-error on swap-device (253:1:743808)
>>
>> My question is this: I have RAID00 set up, but don't really understand
>> it well. This is how my disks are set up:
>>
>> Filesystem 1K-blocks Used Available Use% Mounted on
>> /dev/mapper/VolGroup00-LogVol00
>> 1886608544 296733484 1492495120 17% /
>> /dev/sda1 101086 19877 75990 21% /boot
>> tmpfs 1684312 1204416 479896 72% /dev/shm
>>
>
> that is not how your disks are setup, thats how your FILE SYSTEMS are setup.

Correct, apologies for the incorrect wording.

> that dev/mapper thing is a LVM volume. you can display the physical volumes
> behind a LVM with the command 'pvs'

Thank you! That was helpful.

>> Which one is having the trouble? Any ideas so I can swap it out?
>
> raid0 is not suitable for reliability. if any one drive in the raid0 fails
> (or is removed) the whole volume has failed and will become unusable.

Thanks John, I appreciate it! Both are being replaced after a nearby 55
KV power line shorted to ground and blew a manhole cover 50' into the air,
damaging a lot of equipment over here, even those on UPS's. Nobody was
hurt, thank goodness. But, I'll be looking into RAID 5 in the future.

************************************************** *****************************
Gilbert Sebenste ********
(My opinions only!) ******
************************************************** *****************************
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-16-2010, 03:31 PM
Benjamin Franz
 
Default Question about a hard drive error

On 11/15/2010 10:41 AM, Gilbert Sebenste wrote:
> Thanks John, I appreciate it! Both are being replaced after a nearby 55
> KV power line shorted to ground and blew a manhole cover 50' into the air,
> damaging a lot of equipment over here, even those on UPS's. Nobody was
> hurt, thank goodness. But, I'll be looking into RAID 5 in the future.

In these days of multi-terabyte drives you should be looking at RAID6
instead. The chances of a 'double failure' during degraded
operation/resync is too high to ignore.

--
Benjamin Franz
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-16-2010, 03:42 PM
Alan Hodgson
 
Default Question about a hard drive error

On November 16, 2010 08:31:05 am Benjamin Franz wrote:
> On 11/15/2010 10:41 AM, Gilbert Sebenste wrote:
> > Thanks John, I appreciate it! Both are being replaced after a nearby 55
> > KV power line shorted to ground and blew a manhole cover 50' into the
> > air, damaging a lot of equipment over here, even those on UPS's. Nobody
> > was hurt, thank goodness. But, I'll be looking into RAID 5 in the
> > future.
>
> In these days of multi-terabyte drives you should be looking at RAID6
> instead. The chances of a 'double failure' during degraded
> operation/resync is too high to ignore.

Like almost 100% ..
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-16-2010, 04:25 PM
John R Pierce
 
Default Question about a hard drive error

On 11/16/10 8:31 AM, Benjamin Franz wrote:
> In these days of multi-terabyte drives you should be looking at RAID6
> instead. The chances of a 'double failure' during degraded
> operation/resync is too high to ignore.


These days of cheap drives, I use raid10 almost exclusively. and if its
at all mission critical, I like to have 1-2 hotspares. if I was
deploying a new server, and its workload was at all database-centric,
I'd want to use use 2.5" SAS rather than 3.5" SATA

With RAID10, the rebuild time is how long it takes to copy the one
drive. if you have 6 drives in a raid10 and one fails, leaving 5, and
another fails, there's only a 1 in 5 chance of that other failure being
the mirror of the dead drive. If you have a hot spare, that
rebuild starts immediately, reducing the window for that dreaded double
failure to a minimum.


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-16-2010, 05:41 PM
Benjamin Franz
 
Default Question about a hard drive error

On 11/16/2010 09:25 AM, John R Pierce wrote:
>
> These days of cheap drives, I use raid10 almost exclusively. and if its
> at all mission critical, I like to have 1-2 hotspares. if I was
> deploying a new server, and its workload was at all database-centric,
> I'd want to use use 2.5" SAS rather than 3.5" SATA
>
> With RAID10, the rebuild time is how long it takes to copy the one
> drive. if you have 6 drives in a raid10 and one fails, leaving 5, and
> another fails, there's only a 1 in 5 chance of that other failure being
> the mirror of the dead drive. If you have a hot spare, that
> rebuild starts immediately, reducing the window for that dreaded double
> failure to a minimum.
>

Oh, I agree - and when price is no object, or if write performance is
the bottleneck, or if you need huge numbers of drives, I love RAID10.
You can take it to crazy levels of redundancy + performance by going to
RAID0 layered over multiple three-way RAID1 arrays. Why have multiple
hotspares when you can go for N>2-RAID1 + 0 instead and get a hefty
performance boost on reads for almost free at even higher reliability?

--
Benjamin Franz

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-16-2010, 05:47 PM
John R Pierce
 
Default Question about a hard drive error

On 11/16/10 10:41 AM, Benjamin Franz wrote:
> Oh, I agree - and when price is no object, or if write performance is

the price spread isn't that big of a deal.

a 6-drive raid-6 gives you 4x space, while a 6 drive raid-10 gives you
3X. not that big of a deal.

an 8-drive RAID-6 gives you 6X space, while an 8-drive RAID-10 gives you
4x space. not much bigger of a gap.

raid sets really shouldn't be much bigger than about 8 drives,
anyways. rebuild times for a 12 drive raid6 would be astronomical.


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-16-2010, 06:07 PM
Benjamin Franz
 
Default Question about a hard drive error

On 11/16/2010 10:47 AM, John R Pierce wrote:
>
> raid sets really shouldn't be much bigger than about 8 drives,
> anyways. rebuild times for a 12 drive raid6 would be astronomical.
>

You are ok up to here. Rebuild time for replacement of a failed drive
scales by drive size, not raid set size, regardless of whether it is
RAID1, 5, 6 or 10. It remains roughly the amount of time it takes to
completely write one drive at full speed (at least unless you run out of
bus bandwidth - but that takes a lot of drives).

However, system availability/performance is much better for RAID10 than
for the others during a rebuild because of the isolation of the rebuild
work to only the involved spindles.

--
Benjamin Franz

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 03:56 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org