I had a busy mailserver fail on me the other day. Below is what was
printed in dmesg. We first suspected a hardware failure (raid controller
or something else), so we moved the drives to another (identical
hardware) machine and ran fsck. Fsck complained ("short read while
reading inode") and asked if I wanted to ignore and rewrite (which I
did).
After booting up again, the problem came back immediately and root was
remounted read only. We moved the data from the read only drive to a new
machine. While copying the data, we got this message from time to time
(on various files): "EXT3-fs error (device dm-0): ext3_get_inode_loc:
unable to read inode block - inode=22561891, block=90243144.
I need to find the cause(s) of the problems. So far I have these
questions/concerns:
- Kernel bug? (This is Ubuntu 8.10 with 2.6.27-7-server)
- Filesystem bug/failure?
- Did the RAID controller fail to detect a failing drive? This is an
Adaptec aoc-usas-s4ir running on a Supermicro motherboard.
I suspect that one of the drives (RAID 6 btw) has failed, but I'm not
sure what to do from here.
_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
03-04-2009, 04:26 PM
Eric Sandeen
file system, kernel or hardware raid failure?
Vegard Svanberg wrote:
> I had a busy mailserver fail on me the other day. Below is what was
> printed in dmesg. We first suspected a hardware failure (raid controller
> or something else), so we moved the drives to another (identical
> hardware) machine and ran fsck. Fsck complained ("short read while
> reading inode") and asked if I wanted to ignore and rewrite (which I
> did).
>
> After booting up again, the problem came back immediately and root was
> remounted read only. We moved the data from the read only drive to a new
> machine. While copying the data, we got this message from time to time
> (on various files): "EXT3-fs error (device dm-0): ext3_get_inode_loc:
> unable to read inode block - inode=22561891, block=90243144.
>
> I need to find the cause(s) of the problems. So far I have these
> questions/concerns:
>
> - Kernel bug? (This is Ubuntu 8.10 with 2.6.27-7-server)
> - Filesystem bug/failure?
> - Did the RAID controller fail to detect a failing drive? This is an
> Adaptec aoc-usas-s4ir running on a Supermicro motherboard.
>
> I suspect that one of the drives (RAID 6 btw) has failed, but I'm not
> sure what to do from here.
>
> Any ideas? Thanks in advance.
>
> dmesg:
>
> [ 38.907730] end_request: I/O error, dev sda, sector 284688831
Drive hardware on sda failing; I'd run smart tools or vendor
diagnostics, to be sure.
I can't speak to whether the raid controller should have detected this.
-Eric
_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
03-16-2009, 06:57 AM
Vegard Svanberg
file system, kernel or hardware raid failure?
* Eric Sandeen <sandeen@redhat.com> [2009-03-04 18:26]:
> > [ 38.907730] end_request: I/O error, dev sda, sector 284688831
>
> Drive hardware on sda failing; I'd run smart tools or vendor
> diagnostics, to be sure.
Late answer, but...
After posting this, we figured this had to be due to a power failure
occuring some weeks before. But yesterday, we suddenly had one other,
identical, machine failing with exactly the same error messages:
_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users
03-18-2009, 06:53 AM
Christian Kujau
file system, kernel or hardware raid failure?
On Mon, 16 Mar 2009, Vegard Svanberg wrote:
> [2834866.071770] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
> [2834866.071778] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]
^^^^^^^^^^^^^^
> Any ideas? Fsck will fix the errors, but report short reads while
No, you don't want to run fsck on a faulty device, it'll probably make
things even worse. If the data is valuable, make two copies of it
(with dd(1) or dd_rescue), then run fsck on one of these.
Christian.
--
Bruce Schneier has found SHA-512 preimages of all these facts.
_______________________________________________
Ext3-users mailing list
Ext3-users@redhat.com
https://www.redhat.com/mailman/listinfo/ext3-users