FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo User

 
 
LinkBack Thread Tools
 
Old 03-18-2010, 08:45 PM
Carlos Hendson
 
Default Intermittent software RAID failures

Hello,

I've got a Dell Inspiron 1720 laptop with dual 2.5" hard drives setup
using software RAID1. I've had this computer for about a year and half
and all's been working well.

I've experienced intermittent software RAID errors like those found in
the "softraid-fail.txt" attachment.

Initially I suspected a kernel bug because it started around the same
time I'd upgraded the kernel (around the 2.6.30 upgrade) but subsequent
kernel upgrades haven't improved the situation.

I've run smartctl --all and bablocks on both disks, but nothing is
reported as faulty.

I don't understand what is causing RAID to report these faults and would
like some ideas as to how I can further diagnose the problem.

Thanks in advance,
Carlos
Feb 28 15:14:16 pheonix kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
Feb 28 15:14:16 pheonix kernel: ata3.00: irq_stat 0x00400000, PHY RDY changed
Feb 28 15:14:16 pheonix kernel: ata3: SError: { PHYRdyChg }
Feb 28 15:14:16 pheonix kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Feb 28 15:14:16 pheonix kernel: res 40/00:0c:97:74:25/00:00:0c:00:00/40 Emask 0x10 (ATA bus error)
Feb 28 15:14:16 pheonix kernel: ata3.00: status: { DRDY }
Feb 28 15:14:16 pheonix kernel: ata3: hard resetting link
Feb 28 15:14:19 pheonix kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Feb 28 15:14:19 pheonix kernel: ata3.00: configured for UDMA/133
Feb 28 15:14:19 pheonix kernel: ata3: EH complete
Feb 28 15:14:19 pheonix kernel: end_request: I/O error, dev sdb, sector 178062452
Feb 28 15:14:19 pheonix kernel: raid1: Disk failure on sdb1, disabling device.
Feb 28 15:14:19 pheonix kernel: raid1: Operation continuing on 1 devices.
Feb 28 15:14:19 pheonix kernel: md: recovery of RAID array md0
Feb 28 15:14:19 pheonix kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Feb 28 15:14:19 pheonix kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Feb 28 15:14:19 pheonix kernel: md: using 128k window, over a total of 178024192 blocks.
Feb 28 15:14:19 pheonix kernel: md: resuming recovery of md0 from checkpoint.
Feb 28 15:14:19 pheonix kernel: md: md0: recovery done.
Feb 28 15:14:19 pheonix kernel: RAID1 conf printout:
Feb 28 15:14:19 pheonix kernel: --- wd:1 rd:2
Feb 28 15:14:19 pheonix kernel: disk 0, wo:0, o:1, dev:sda8
Feb 28 15:14:19 pheonix kernel: disk 1, wo:1, o:0, dev:sdb1
Feb 28 15:14:19 pheonix kernel: md: recovery of RAID array md0
Feb 28 15:14:19 pheonix kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Feb 28 15:14:19 pheonix kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Feb 28 15:14:19 pheonix kernel: md: using 128k window, over a total of 178024192 blocks.
Feb 28 15:14:19 pheonix kernel: md: resuming recovery of md0 from checkpoint.
Feb 28 15:14:19 pheonix kernel: md: md0: recovery done.
Feb 28 15:14:20 pheonix kernel: RAID1 conf printout:
Feb 28 15:14:20 pheonix kernel: --- wd:1 rd:2
Feb 28 15:14:20 pheonix kernel: disk 0, wo:0, o:1, dev:sda8
Feb 28 15:14:20 pheonix kernel: disk 1, wo:1, o:0, dev:sdb1
Feb 28 15:14:20 pheonix kernel: RAID1 conf printout:
Feb 28 15:14:20 pheonix kernel: --- wd:1 rd:2
Feb 28 15:14:20 pheonix kernel: disk 0, wo:0, o:1, dev:sda8
Feb 28 15:14:20 pheonix kernel: disk 1, wo:1, o:0, dev:sdb1
Feb 28 15:14:20 pheonix kernel: RAID1 conf printout:
Feb 28 15:14:20 pheonix kernel: --- wd:1 rd:2
Feb 28 15:14:20 pheonix kernel: disk 0, wo:0, o:1, dev:sda8


Mar 12 19:38:06 pheonix kernel: ata1.00: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
Mar 12 19:38:06 pheonix kernel: ata1.00: irq_stat 0x00400000, PHY RDY changed
Mar 12 19:38:06 pheonix kernel: ata1: SError: { PHYRdyChg }
Mar 12 19:38:06 pheonix kernel: ata1.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Mar 12 19:38:06 pheonix kernel: res 40/00:24:b6:fa:df/00:00:17:00:00/40 Emask 0x10 (ATA bus error)
Mar 12 19:38:06 pheonix kernel: ata1.00: status: { DRDY }
Mar 12 19:38:06 pheonix kernel: ata1: hard resetting link
Mar 12 19:38:09 pheonix kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Mar 12 19:38:09 pheonix kernel: ata1.00: configured for UDMA/133
Mar 12 19:38:09 pheonix kernel: ata1: EH complete
Mar 12 19:38:09 pheonix kernel: end_request: I/O error, dev sda, sector 305244964
Mar 12 19:38:09 pheonix kernel: raid1: Disk failure on sda8, disabling device.
Mar 12 19:38:09 pheonix kernel: raid1: Operation continuing on 1 devices.
Mar 12 19:38:09 pheonix kernel: md: recovery of RAID array md0
Mar 12 19:38:09 pheonix kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Mar 12 19:38:09 pheonix kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Mar 12 19:38:09 pheonix kernel: md: using 128k window, over a total of 178024192 blocks.
Mar 12 19:38:09 pheonix kernel: md: resuming recovery of md0 from checkpoint.
Mar 12 19:38:09 pheonix kernel: md: md0: recovery done.
Mar 12 19:38:09 pheonix kernel: RAID1 conf printout:
Mar 12 19:38:09 pheonix kernel: --- wd:1 rd:2
Mar 12 19:38:09 pheonix kernel: disk 0, wo:1, o:0, dev:sda8
Mar 12 19:38:09 pheonix kernel: disk 1, wo:0, o:1, dev:sdb1
Mar 12 19:38:09 pheonix kernel: md: recovery of RAID array md0
Mar 12 19:38:09 pheonix kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Mar 12 19:38:09 pheonix kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Mar 12 19:38:09 pheonix kernel: md: using 128k window, over a total of 178024192 blocks.
Mar 12 19:38:09 pheonix kernel: md: resuming recovery of md0 from checkpoint.
Mar 12 19:38:09 pheonix kernel: md: md0: recovery done.
Mar 12 19:38:09 pheonix kernel: RAID1 conf printout:
Mar 12 19:38:09 pheonix kernel: --- wd:1 rd:2
Mar 12 19:38:09 pheonix kernel: disk 0, wo:1, o:0, dev:sda8
Mar 12 19:38:09 pheonix kernel: disk 1, wo:0, o:1, dev:sdb1
Mar 12 19:38:09 pheonix kernel: md: recovery of RAID array md0
Mar 12 19:38:09 pheonix kernel: md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
Mar 12 19:38:09 pheonix kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
Mar 12 19:38:09 pheonix kernel: md: using 128k window, over a total of 178024192 blocks.
Mar 12 19:38:09 pheonix kernel: md: resuming recovery of md0 from checkpoint.
Mar 12 19:38:09 pheonix kernel: md: md0: recovery done.
Mar 12 19:38:10 pheonix kernel: RAID1 conf printout:
Mar 12 19:38:10 pheonix kernel: --- wd:1 rd:2
Mar 12 19:38:10 pheonix kernel: disk 0, wo:1, o:0, dev:sda8
Mar 12 19:38:10 pheonix kernel: disk 1, wo:0, o:1, dev:sdb1
Mar 12 19:38:10 pheonix kernel: md: recovery of RAID array md0


Mar 18 21:57:33 pheonix kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x10000 action 0xe frozen
Mar 18 21:57:33 pheonix kernel: ata3.00: irq_stat 0x00400000, PHY RDY changed
Mar 18 21:57:33 pheonix kernel: ata3: SError: { PHYRdyChg }
Mar 18 21:57:33 pheonix kernel: ata3.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
Mar 18 21:57:33 pheonix kernel: res 40/00:24:bf:1c:1f/00:00:00:00:00/40 Emask 0x10 (ATA bus error)
Mar 18 21:57:33 pheonix kernel: ata3.00: status: { DRDY }
Mar 18 21:57:33 pheonix kernel: ata3: hard resetting link
Mar 18 21:57:37 pheonix kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Mar 18 21:57:37 pheonix kernel: ata3.00: configured for UDMA/133
Mar 18 21:57:37 pheonix kernel: ata3: EH complete
Mar 18 21:57:37 pheonix kernel: end_request: I/O error, dev sdb, sector 178116972
 
Old 03-18-2010, 08:58 PM
Mark Knecht
 
Default Intermittent software RAID failures

On Thu, Mar 18, 2010 at 2:45 PM, Carlos Hendson <skyclan@gmx.net> wrote:
> Hello,
>
> I've got a Dell Inspiron 1720 laptop with dual 2.5" hard drives setup
> using software RAID1. *I've had this computer for about a year and half
> and all's been working well.
>
> I've experienced intermittent software RAID errors like those found in
> the "softraid-fail.txt" attachment.
>
> Initially I suspected a kernel bug because it started around the same
> time I'd upgraded the kernel (around the 2.6.30 upgrade) but subsequent
> kernel upgrades haven't improved the situation.
>
> I've run smartctl --all and bablocks on both disks, but nothing is
> reported as faulty.
>
> I don't understand what is causing RAID to report these faults and would
> like some ideas as to how I can further diagnose the problem.
>
> Thanks in advance,
> Carlos
>

Kernel upgrades might not tell you much. Kernel downgrades might.

- Mark
 
Old 03-18-2010, 09:45 PM
Keith Dart
 
Default Intermittent software RAID failures

=== On Thu, 03/18, Carlos Hendson wrote: ===
> I've experienced intermittent software RAID errors like those found in
> the "softraid-fail.txt" attachment.

===

That's most likely your disk starting to fail.



-- Keith Dart

--

-- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~
Keith Dart <keith@dartworks.biz>
public key: ID: 19017044
<http://www.dartworks.biz/>
================================================== ===================
 
Old 03-19-2010, 07:11 AM
Carlos
 
Default Intermittent software RAID failures

Keith Dart wrote:

=== On Thu, 03/18, Carlos Hendson wrote: ===

I've experienced intermittent software RAID errors like those found in
the "softraid-fail.txt" attachment.


===

That's most likely your disk starting to fail.



How would I go about categorically proving such a thing? What are the
right tools for the job? I found it strange that both /dev/sda8 and
/dev/sdb1 have reported similar problems.


No disk errors are reported when using non-RAID partitions which reside
on the same physical disk. This is why I'm not 100% convinced it's a
disk failure.


Regards,
Carlos
 
Old 03-19-2010, 01:33 PM
Paul Hartman
 
Default Intermittent software RAID failures

On Thu, Mar 18, 2010 at 4:45 PM, Carlos Hendson <skyclan@gmx.net> wrote:
> Hello,
>
> I've got a Dell Inspiron 1720 laptop with dual 2.5" hard drives setup
> using software RAID1. I've had this computer for about a year and half
> and all's been working well.
>
> I've experienced intermittent software RAID errors like those found in
> the "softraid-fail.txt" attachment.
>
> Initially I suspected a kernel bug because it started around the same
> time I'd upgraded the kernel (around the 2.6.30 upgrade) but subsequent
> kernel upgrades haven't improved the situation.
>
> I've run smartctl --all and bablocks on both disks, but nothing is
> reported as faulty.
>
> I don't understand what is causing RAID to report these faults and would
> like some ideas as to how I can further diagnose the problem.

I remember reading something recently (within the last year?) about
smartmontools causing disks to go offline unnecessarily in some
situation, i think due to a bug in the smart tools. I don't know if
it's a certain version of smartmontools or combination of that and
other things. Maybe you can try upgrade/downgrade of smart or
temporarily disable smartd to see if it stops the disks from being
taken offline.
 
Old 03-19-2010, 01:37 PM
Volker Armin Hemmann
 
Default Intermittent software RAID failures

On Freitag 19 März 2010, Carlos wrote:
> Keith Dart wrote:
> > === On Thu, 03/18, Carlos Hendson wrote: ===
> >
> >> I've experienced intermittent software RAID errors like those found in
> >> the "softraid-fail.txt" attachment.
> >
> > ===
> >
> > That's most likely your disk starting to fail.
>
> How would I go about categorically proving such a thing? What are the
> right tools for the job? I found it strange that both /dev/sda8 and
> /dev/sdb1 have reported similar problems.
>
> No disk errors are reported when using non-RAID partitions which reside
> on the same physical disk. This is why I'm not 100% convinced it's a
> disk failure.
>
> Regards,
> Carlos

well, the error is located on the raid partition....
 

Thread Tools




All times are GMT. The time now is 06:47 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org