file system, kernel or hardware raid failure?
I had a busy mailserver fail on me the other day. Below is what was
printed in dmesg. We first suspected a hardware failure (raid controller or something else), so we moved the drives to another (identical hardware) machine and ran fsck. Fsck complained ("short read while reading inode") and asked if I wanted to ignore and rewrite (which I did). After booting up again, the problem came back immediately and root was remounted read only. We moved the data from the read only drive to a new machine. While copying the data, we got this message from time to time (on various files): "EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=22561891, block=90243144. I need to find the cause(s) of the problems. So far I have these questions/concerns: - Kernel bug? (This is Ubuntu 8.10 with 2.6.27-7-server) - Filesystem bug/failure? - Did the RAID controller fail to detect a failing drive? This is an Adaptec aoc-usas-s4ir running on a Supermicro motherboard. I suspect that one of the drives (RAID 6 btw) has failed, but I'm not sure what to do from here. Any ideas? Thanks in advance. dmesg: [ 38.907730] end_request: I/O error, dev sda, sector 284688831 [ 38.907802] EXT3-fs error (device dm-0): read_block_bitmap: Cannot = read block bitmap - block_group =3D 1086, block_bitmap =3D 35586048 [ 38.907956] Aborting journal on device dm-0. [ 38.919742] ext3_abort called. [ 38.919798] EXT3-fs error (device dm-0): ext3_journal_start_sb: = Detected aborted journal [ 38.919942] Remounting filesystem read-only [ 38.925855] __journal_remove_journal_head: freeing b_committed_data [ 38.925915] journal commit I/O error [ 38.925935] journal commit I/O error [ 38.925953] journal commit I/O error [ 38.943245] Remounting filesystem read-only [ 38.958907] EXT3-fs error (device dm-0) in ext3_reserve_inode_write: = Journal has aborted [ 38.958988] EXT3-fs error (device dm-0) in ext3_truncate: Journal has = aborted [ 38.959051] EXT3-fs error (device dm-0) in ext3_reserve_inode_write: = Journal has aborted [ 38.959137] EXT3-fs error (device dm-0) in ext3_orphan_del: Journal = has aborted [ 38.959222] EXT3-fs error (device dm-0) in ext3_reserve_inode_write: = Journal has aborted [ 39.024087] journal commit I/O error [ 39.024103] journal commit I/O error [ 39.024117] journal commit I/O error [ 39.024124] journal commit I/O error [ 39.024181] journal commit I/O error [ 39.024201] journal commit I/O error [ 39.024208] journal commit I/O error [ 39.024258] journal commit I/O error [ 39.024275] journal commit I/O error [ 39.024284] journal commit I/O error [ 39.024330] journal commit I/O error [ 39.024358] journal commit I/O error [ 39.024384] journal commit I/O error [ 39.024432] journal commit I/O error [ 39.024481] journal commit I/O error [ 45.749997] sd 0:0:0:0: [sda] Result: hostbyte=3DDID_OK driverbyte=3DD= RIVER_SENSE,SUGGEST_OK [ 45.750008] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]=20 [ 45.750012] sd 0:0:0:0: [sda] Add. Sense: Internal target failure [ 45.750017] end_request: I/O error, dev sda, sector 721945599 [ 45.750079] Buffer I/O error on device dm-0, logical block 90243144 [ 45.750137] lost page write due to I/O error on dm-0 [ 87.970284] sd 0:0:0:0: [sda] Result: hostbyte=3DDID_OK driverbyte=3DD= RIVER_SENSE,SUGGEST_OK [ 87.970292] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current]=20 [ 87.970296] sd 0:0:0:0: [sda] Add. Sense: Internal target failure [ 87.970302] end_request: I/O error, dev sda, sector 83324999 -- Vegard Svanberg <vegard@svanberg.no> [*Takapa@IRC (EFnet)] _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
file system, kernel or hardware raid failure?
Vegard Svanberg wrote:
> I had a busy mailserver fail on me the other day. Below is what was > printed in dmesg. We first suspected a hardware failure (raid controller > or something else), so we moved the drives to another (identical > hardware) machine and ran fsck. Fsck complained ("short read while > reading inode") and asked if I wanted to ignore and rewrite (which I > did). > > After booting up again, the problem came back immediately and root was > remounted read only. We moved the data from the read only drive to a new > machine. While copying the data, we got this message from time to time > (on various files): "EXT3-fs error (device dm-0): ext3_get_inode_loc: > unable to read inode block - inode=22561891, block=90243144. > > I need to find the cause(s) of the problems. So far I have these > questions/concerns: > > - Kernel bug? (This is Ubuntu 8.10 with 2.6.27-7-server) > - Filesystem bug/failure? > - Did the RAID controller fail to detect a failing drive? This is an > Adaptec aoc-usas-s4ir running on a Supermicro motherboard. > > I suspect that one of the drives (RAID 6 btw) has failed, but I'm not > sure what to do from here. > > Any ideas? Thanks in advance. > > dmesg: > > [ 38.907730] end_request: I/O error, dev sda, sector 284688831 Drive hardware on sda failing; I'd run smart tools or vendor diagnostics, to be sure. <snip> > [ 45.749997] sd 0:0:0:0: [sda] Result: hostbyte=3DDID_OK driverbyte=3DD= > RIVER_SENSE,SUGGEST_OK > [ 45.750008] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] ^^^^^^^^^^^^^^ I can't speak to whether the raid controller should have detected this. -Eric _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
file system, kernel or hardware raid failure?
* Eric Sandeen <sandeen@redhat.com> [2009-03-04 18:26]:
> > [ 38.907730] end_request: I/O error, dev sda, sector 284688831 > > Drive hardware on sda failing; I'd run smart tools or vendor > diagnostics, to be sure. Late answer, but... After posting this, we figured this had to be due to a power failure occuring some weeks before. But yesterday, we suddenly had one other, identical, machine failing with exactly the same error messages: [2834866.071770] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK [2834866.071778] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] [2834866.071782] sd 0:0:0:0: [sda] Add. Sense: Internal target failure [2834866.071787] end_request: I/O error, dev sda, sector 302515639 [2834866.071823] EXT3-fs error (device dm-0): ext3_get_inode_loc: unable to read inode block - inode=9455580, block=37814399 Any ideas? Fsck will fix the errors, but report short reads while fixing, and on reboot/remount, the problems are back. As this probably isn't relevant to ext3 list anymore (I guess it's more relevant for kernel/SCSI subsystem developers), I'll find some other lists. -- Vegard Svanberg <vegard@svanberg.no> [*Takapa@IRC (EFnet)] _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
file system, kernel or hardware raid failure?
On Mon, 16 Mar 2009, Vegard Svanberg wrote:
> [2834866.071770] sd 0:0:0:0: [sda] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK > [2834866.071778] sd 0:0:0:0: [sda] Sense Key : Hardware Error [current] ^^^^^^^^^^^^^^ > Any ideas? Fsck will fix the errors, but report short reads while No, you don't want to run fsck on a faulty device, it'll probably make things even worse. If the data is valuable, make two copies of it (with dd(1) or dd_rescue), then run fsck on one of these. Christian. -- Bruce Schneier has found SHA-512 preimages of all these facts. _______________________________________________ Ext3-users mailing list Ext3-users@redhat.com https://www.redhat.com/mailman/listinfo/ext3-users |
| All times are GMT. The time now is 07:34 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.