FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo User

 
 
LinkBack Thread Tools
 
Old 03-03-2010, 11:24 AM
Stroller
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

There seem to have been a few people posting with filesystem
corruption in the last week or two. It seems to be my turn, so I hope
it isn't contagious. The cause here is quite clear - whilst rummaging
in the server cupboard yesterday, power to the machine was
accidentally disconnected.


I have booted with a live CD & run `reiserfsck --fix-fixable` on the
filesystem, but nevertheless when I attempt to boot the system I get a
"failed to open the device... no such file or directory" message,
followed by another error as per subject line.


However, you will see from this screenshot (taken with an IP KVM) that
the filesystem does indeed seem to have been mounted successfully, if
read-only:


http://linux.stroller.uk.eu.org/fs-corruption.png

All I did here was log in with the root password.


When I boot with a live CD I can mount, read & write the filesystem:

root@sysresccd /root % mount -v -L root /mnt/gentoo
mount: you didn't specify a filesystem type for /dev/sda3
I will try type reiserfs
/dev/sda3 on /mnt/gentoo type reiserfs (rw)
root@sysresccd /root % ls /mnt/gentoo
bin boot dev etc home lib mnt opt proc root sbin sys tmp
usr var

root@sysresccd /root % touch /mnt/gentoo/foo
root@sysresccd /root % echo foobar >> /mnt/gentoo/foo
root@sysresccd /root % ls -lh !!:$
ls -lh /mnt/gentoo/foo
-rw-r--r-- 1 root root 7 2010-03-03 11:18 /mnt/gentoo/foo
root@sysresccd /root % cat !!:$
cat /mnt/gentoo/foo
foobar
root@sysresccd /root % rm !!:$
rm /mnt/gentoo/foo
rm: remove regular file `/mnt/gentoo/foo'? y
root@sysresccd /root %

All the important system stuff on this PC is on a single partition. I
have two other drives attached at /mnt/space & /mnt/morespace - they
are XFS and I have run xfs_repair on both of them, which completes
quickly indicating no problems.


I'm not really sure how to proceed next. I feel the problem is indeed
on this reiserfs filesystem, the root filesystem with the label
"root". I can't help thinking that the problem is not that the system
"failed to open the device", but instead maybe that there's an
important system file missing that means the init script (or whatever
responsible for mounting the fiesystem) is not properly returning 0.
Does this seem possible? Maybe the reiserfs handler for mount is
somehow broken (performing the mount, but not returning 0, or perhaps
broken in such as was it is able to mount read-only but not read-write).


I am tempted to chroot into the system and re-emerge system &
baselayout. If I'm correct in this above guess then re-emerging the
correct file will fix the problem. Right?


`reiserfsck --help` shows some other options besides the simple --fix-
fixable - I assume the "expert option" of --scan-whole-partition is
unsafe, but what about the --rebuild-sb or --rebuild-tree? Can I
safely run these? Am I advised to run these?


Stroller.
 
Old 03-03-2010, 11:42 AM
Willie Wong
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

On Wed, Mar 03, 2010 at 12:24:42PM +0000, Stroller wrote:
> There seem to have been a few people posting with filesystem
> corruption in the last week or two. It seems to be my turn, so I hope
> it isn't contagious. The cause here is quite clear - whilst rummaging
> in the server cupboard yesterday, power to the machine was
> accidentally disconnected.
>
> I have booted with a live CD & run `reiserfsck --fix-fixable` on the
> filesystem, but nevertheless when I attempt to boot the system I get a
> "failed to open the device... no such file or directory" message,
> followed by another error as per subject line.

from the output it looks like you are mounting by label? What if you
edit fstab to point to the device name /dev/hd?? instead of
LABEL=root? Check the filesystem label to make sure it is ok?

Cheers,

W
--
Willie W. Wong wwong@math.princeton.edu
Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
et vice versa ~~~ I. Newton
 
Old 03-03-2010, 12:28 PM
Stroller
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

On 3 Mar 2010, at 12:42, Willie Wong wrote:


On Wed, Mar 03, 2010 at 12:24:42PM +0000, Stroller wrote:

There seem to have been a few people posting with filesystem
corruption in the last week or two. It seems to be my turn, so I hope
it isn't contagious. The cause here is quite clear - whilst rummaging
in the server cupboard yesterday, power to the machine was
accidentally disconnected.

I have booted with a live CD & run `reiserfsck --fix-fixable` on the
filesystem, but nevertheless when I attempt to boot the system I
get a

"failed to open the device... no such file or directory" message,
followed by another error as per subject line.


from the output it looks like you are mounting by label? What if you
edit fstab to point to the device name /dev/hd?? instead of
LABEL=root? Check the filesystem label to make sure it is ok?


Many thanks for this suggestion, however following it makes no
difference, except in the trivia that it says "failed to open the
device '/dev/hda3': No such file or directory" (instead of "LABEL=...").


I also tried editing grub to point to /dev/sda3 (although admittedly
with the LABEL= entry in /etc/fstab) but that makes no difference. I
have never tried (intentionally) reconfiguring this kernel to use /dev/
sdX instead of /dev/hdX and I'm pretty sure it's booted using the
current kernel & configuration in the past.


Stroller.
 
Old 03-03-2010, 01:26 PM
Stroller
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

On 3 Mar 2010, at 14:00, Mark Knecht wrote:
On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@stellar.eclipse.co.uk
> wrote:
There seem to have been a few people posting with filesystem
corruption in
the last week or two. It seems to be my turn, so I hope it isn't
contagious.
The cause here is quite clear - whilst rummaging in the server
cupboard

yesterday, power to the machine was accidentally disconnected.

...
Sorry for your problems. I've had a rash of machine problems over
the last 6 weeks. No fun. I feel for you.

In my most recent case what looked like a simple disk corruption
problem was really a prelude to the drive just plain going bad. Have
you tried smartctl to see what it says about the drive at this point?

It would be even more frustrating to chroot in, do all the work,
think you had it fixed and then the underlying foundation of your
house crumbles beneath you 3 weeks from now.


I don't think this is a problem. I would love to know what others
think of the `smartctl` output:



root@sysresccd /root % smartctl -H /dev/sda
smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce
Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
9 Power_On_Seconds 0x0012 001 001 020 Old_age
Always FAILING_NOW 44803h+12m+16s


root@sysresccd /root % smartctl -i /dev/sda
smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce
Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Fujitsu MPA..MPG series
Device Model: FUJITSU MPF3204AT
Serial Number: 05030567
Firmware Version: 0028
User Capacity: 20,496,236,544 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 5
ATA Standard is: ATA/ATAPI-5 T13 1321D revision 1
Local Time is: Wed Mar 3 14:14:31 2010 UTC
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

root@sysresccd /root %


This looks to me like smartctl is going "OMG! What an ancient drive!"
- it's a 20gig EIDE drive and if my pocket calculator is correct
(44803/24/365), it's seen 5 years of active use - and that's the
"marginal attribute" referred to.


Like I said, the power plug was accidentally pulled on this drive, so
I'm inclined to attribute the corruption only to that, not to the
drive actually failing.


The drive is in a computer that has rarely been turned off in the last
couple of years, and is also in a warm environment, conditions which
are ideal. I appreciate the latter seems unintuitive, but in fact
studies have showed that drives in somewhat warm environments last
longer than those that are cooled.


That it passes the "SMART overall-health self-assessment test"
suggests to me that it is chugging away quite happily.


I would have dismissed your concerns were it not for the capitalised
"FAILING_NOW" in the output. Like I say, I think this is just smartctl
declaring "OMG! this drive is old!", but I open this matter to the
list for discussion (should you wish).


I think I'm actually nearly ready to migrate off this system. The
power was actually pulled as I installed 3 new (to me) rackmount
machines in the server cupboard - the plan is to have identical
machines running RAID, so that in the case of ANY problems I have
spares available. I have take nightly backups of the important data on
this machine, however I'd prefer it to run just a couple or a few
weeks longer to allow me to migrate at my own leisure.


Stroller.
 
Old 03-03-2010, 02:18 PM
Willie Wong
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

On Wed, Mar 03, 2010 at 01:28:11PM +0000, Stroller wrote:
> >from the output it looks like you are mounting by label? What if you
> >edit fstab to point to the device name /dev/hd?? instead of
> >LABEL=root? Check the filesystem label to make sure it is ok?
>
> Many thanks for this suggestion, however following it makes no
> difference, except in the trivia that it says "failed to open the
> device '/dev/hda3': No such file or directory" (instead of "LABEL=...").

If you try to boot, after the failure to check rootfs, it should dump
you to a recovery console, what happens if you issue ls /dev ?

Also check dmesg?

Cheers,

W
--
Willie W. Wong wwong@math.princeton.edu
Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
et vice versa ~~~ I. Newton
 
Old 03-03-2010, 02:28 PM
Willie Wong
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

On Wed, Mar 03, 2010 at 02:26:46PM +0000, Stroller wrote:
> I don't think this is a problem. I would love to know what others
> think of the `smartctl` output:
>
>
> root@sysresccd /root % smartctl -H /dev/sda
> smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce
> Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> Please note the following marginal Attributes:
> ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
> UPDATED WHEN_FAILED RAW_VALUE
> 9 Power_On_Seconds 0x0012 001 001 020 Old_age
> Always FAILING_NOW 44803h+12m+16s

You can always run the smart long-test to double check. The
FAILING_NOW just indicates that the normalised value falls below the
threshold. For Power_On_Seconds, this usually just indicates that your
are way pass the warranty. If you really care about your data, swap it
out now or make frequent backups. Otherwise I don't see the harm of
keeping it until it actually dies.

Cheers,

W
--
Willie W. Wong wwong@math.princeton.edu
Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
et vice versa ~~~ I. Newton
 
Old 03-03-2010, 02:29 PM
Harry Putnam
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

Mark Knecht <markknecht@gmail.com> writes:

> In my most recent case what looked like a simple disk corruption
> problem was really a prelude to the drive just plain going bad. Have
> you tried smartctl to see what it says about the drive at this point?

Sorry to butt in here... is that tool, smartctl in some pkg on portage?
 
Old 03-03-2010, 02:31 PM
Willie Wong
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

On Wed, Mar 03, 2010 at 09:29:43AM -0600, Harry Putnam wrote:
> Mark Knecht <markknecht@gmail.com> writes:
>
> > In my most recent case what looked like a simple disk corruption
> > problem was really a prelude to the drive just plain going bad. Have
> > you tried smartctl to see what it says about the drive at this point?
>
> Sorry to butt in here... is that tool, smartctl in some pkg on portage?
>

sys-app/smartmontools

W
--
Willie W. Wong wwong@math.princeton.edu
Data aequatione quotcunque fluentes quantitae involvente fluxiones invenire
et vice versa ~~~ I. Newton
 
Old 03-03-2010, 03:16 PM
Stroller
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

Many thanks for your help, Willie!


On 3 Mar 2010, at 15:18, Willie Wong wrote:

On Wed, Mar 03, 2010 at 01:28:11PM +0000, Stroller wrote:

from the output it looks like you are mounting by label? What if you
edit fstab to point to the device name /dev/hd?? instead of
LABEL=root? Check the filesystem label to make sure it is ok?


Many thanks for this suggestion, however following it makes no
difference, except in the trivia that it says "failed to open the
device '/dev/hda3': No such file or directory" (instead of
"LABEL=...").


If you try to boot, after the failure to check rootfs, it should dump
you to a recovery console, what happens if you issue ls /dev ?


About 13 items. Is this unlucky?

http://linux.stroller.uk.eu.org/fs-corruption-dev.png


Also check dmesg?


I don't think this gives any clues:

http://linux.stroller.uk.eu.org/fs-corruption-dmesg.png

Stroller.
 
Old 03-03-2010, 03:19 PM
Stroller
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

On 3 Mar 2010, at 14:01, Mick wrote:

... Once or twice
things went hairy and I would get a message similar to yours. On
these rare occasions I booted with a LiveCD and with the partitions
unmounted I ran --check, then --fix-fixable and finally
--rebuild-tree. You may want to use an external drive with dd to
image the current / partition and do all your recovery work on that.
If you don't care too much about the risk of catastrophic failure then
just run --rebuild-tree with a LiveCD and see what you get.


That's a great idea. I'm (now) religious about backing up my
customers' computers, often using dd like this, but for some reason it
hadn't yet occurred to me today.


Stroller.
 

Thread Tools




All times are GMT. The time now is 12:49 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org