Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   EXT3 Users (http://www.linux-archive.org/ext3-users/)
-   -   file system, kernel or hardware raid failure? (http://www.linux-archive.org/ext3-users/256830-file-system-kernel-hardware-raid-failure.html)

Vegard Svanberg 03-04-2009 09:53 AM

file system, kernel or hardware raid failure?
 
I had a busy mailserver fail on me the other day. Below is what was
printed in dmesg. We first suspected a hardware failure (raid controller
or something else), so we moved the drives to another (identical
hardware) machine and ran fsck. Fsck complained ("short read while
reading inode") and asked if I wanted to ignore and rewrite (which I
did).

After booting up again, the problem came back immediately and root was
remounted read only. We moved the data from the read only drive to a new
machine. While copying the data, we got this message from time to time
(on various files): "EXT3-fs error (device dm-0): ext3_get_inode_loc:
unable to read inode block - inode=22561891, block=90243144.

I need to find the cause(s) of the problems. So far I have these
questions/concerns:

- Kernel bug? (This is Ubuntu 8.10 with 2.6.27-7-server)
- Filesystem bug/failure?
- Did the RAID controller fail to detect a failing drive? This is an
Adaptec aoc-usas-s4ir running on a Supermicro motherboard.

I suspect that one of the drives (RAID 6 btw) has failed, but I'm not
sure what to do from here.

Any ideas? Thanks in advance.

dmesg:

[ 38.907730] end_request: I/O error, dev sda, sector 284688831
[ 38.907802] EXT3-fs error (device dm-0): read_block_bitmap: Cannot =
read block bitmap - block_group =3D 1086, block_bitmap =3D 35586048
[ 38.907956] Aborting journal on device dm-0.
[ 38.919742] ext3_abort called.
[ 38.919798] EXT3-fs error (device dm-0): ext3_journal_start_sb: =
Detected aborted journal
[ 38.919942] Remounting filesystem read-only
[ 38.925855] __journal_remove_journal_head: freeing b_committed_data
[ 38.925915] journal commit I/O error
[ 38.925935] journal commit I/O error
[ 38.925953] journal commit I/O error
[ 38.943245] Remounting filesystem read-only
[ 38.958907] EXT3-fs error (device dm-0) in ext3_reserve_inode_write: =
Journal has aborted
[ 38.958988] EXT3-fs error (device dm-0) in ext3_truncate: Journal has =
aborted
[ 38.959051] EXT3-fs error (device dm-0) in ext3_reserve_inode_write: =
Journal has aborted
[ 38.959137] EXT3-fs error (device dm-0) in ext3_orphan_del: Journal =
has aborted
[ 38.959222] EXT3-fs error (device dm-0) in ext3_reserve_inode_write: =
Journal has aborted
[ 39.024087] journal commit I/O error
[ 39.024103] journal commit I/O error
[ 39.024117] journal commit I/O error
[ 39.024124] journal commit I/O error
[ 39.024181] journal commit I/O error
[ 39.024201] journal commit I/O error
[ 39.024208] journal commit I/O error
[ 39.024258] journal commit I/O error
[ 39.024275] journal commit I/O error
[ 39.024284] journal commit I/O error
[ 39.024330] journal commit I/O error
[ 39.024358] journal commit I/O error
[ 39.024384] journal commit I/O error
[ 39.024432] journal commit I/O error
[ 39.024481] journal commit I/O error
[ 45.749997] sd 0:0:0:0: [sda] Result: hostbyte=3DDID_OK driverbyte=3DD=
RIVER_SENSE,SUGGEST_OK
[ 45.750008] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]=20
[ 45.750012] sd 0:0:0:0: [sda] Add. Sense: Internal target failure
[ 45.750017] end_request: I/O error, dev sda, sector 721945599
[ 45.750079] Buffer I/O error on device dm-0, logical block 90243144
[ 45.750137] lost page write due to I/O error on dm-0
[ 87.970284] sd 0:0:0:0: [sda] Result: hostbyte=3DDID_OK driverbyte=3DD=
RIVER_SENSE,SUGGEST_OK
[ 87.970292] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]=20
[ 87.970296] sd 0:0:0:0: [sda] Add. Sense: Internal target failure
[ 87.970302] end_request: I/O error, dev sda, sector 83324999

--
Vegard Svanberg <vegard@svanberg.no> [*Takapa@IRC (EFnet)]

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Eric Sandeen 03-04-2009 04:26 PM

file system, kernel or hardware raid failure?
 
Vegard Svanberg wrote:
> I had a busy mailserver fail on me the other day. Below is what was
> printed in dmesg. We first suspected a hardware failure (raid controller
> or something else), so we moved the drives to another (identical
> hardware) machine and ran fsck. Fsck complained ("short read while
> reading inode") and asked if I wanted to ignore and rewrite (which I
> did).
>
> After booting up again, the problem came back immediately and root was
> remounted read only. We moved the data from the read only drive to a new
> machine. While copying the data, we got this message from time to time
> (on various files): "EXT3-fs error (device dm-0): ext3_get_inode_loc:
> unable to read inode block - inode=22561891, block=90243144.
>
> I need to find the cause(s) of the problems. So far I have these
> questions/concerns:
>
> - Kernel bug? (This is Ubuntu 8.10 with 2.6.27-7-server)
> - Filesystem bug/failure?
> - Did the RAID controller fail to detect a failing drive? This is an
> Adaptec aoc-usas-s4ir running on a Supermicro motherboard.
>
> I suspect that one of the drives (RAID 6 btw) has failed, but I'm not
> sure what to do from here.
>
> Any ideas? Thanks in advance.
>
> dmesg:
>
> [ 38.907730] end_request: I/O error, dev sda, sector 284688831

Drive hardware on sda failing; I'd run smart tools or vendor
diagnostics, to be sure.

<snip>

> [ 45.749997] sd 0:0:0:0: [sda] Result: hostbyte=3DDID_OK driverbyte=3DD=
> RIVER_SENSE,SUGGEST_OK
> [ 45.750008] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]
^^^^^^^^^^^^^^

I can't speak to whether the raid controller should have detected this.

-Eric

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Vegard Svanberg 03-16-2009 06:57 AM

file system, kernel or hardware raid failure?
 
* Eric Sandeen <sandeen@redhat.com> [2009-03-04 18:26]:

> > [ 38.907730] end_request: I/O error, dev sda, sector 284688831
>
> Drive hardware on sda failing; I'd run smart tools or vendor
> diagnostics, to be sure.

Late answer, but...

After posting this, we figured this had to be due to a power failure
occuring some weeks before. But yesterday, we suddenly had one other,
identical, machine failing with exactly the same error messages:

[2834866.071770] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
[2834866.071778] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]
[2834866.071782] sd 0:0:0:0: [sda] Add. Sense: Internal target failure
[2834866.071787] end_request: I/O error, dev sda, sector 302515639
[2834866.071823] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=9455580, block=37814399

Any ideas? Fsck will fix the errors, but report short reads while
fixing, and on reboot/remount, the problems are back.

As this probably isn't relevant to ext3 list anymore (I guess it's more
relevant for kernel/SCSI subsystem developers), I'll find some other
lists.

--
Vegard Svanberg <vegard@svanberg.no> [*Takapa@IRC (EFnet)]

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users

Christian Kujau 03-18-2009 06:53 AM

file system, kernel or hardware raid failure?
 
On Mon, 16 Mar 2009, Vegard Svanberg wrote:
> [2834866.071770] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
> [2834866.071778] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]
^^^^^^^^^^^^^^

> Any ideas? Fsck will fix the errors, but report short reads while

No, you don't want to run fsck on a faulty device, it'll probably make
things even worse. If the data is valuable, make two copies of it
(with dd(1) or dd_rescue), then run fsck on one of these.

Christian.
--
Bruce Schneier has found SHA-512 preimages of all these facts.

_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users


All times are GMT. The time now is 07:35 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.