FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo User

 
 
LinkBack Thread Tools
 
Old 03-03-2010, 01:00 PM
Mark Knecht
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@stellar.eclipse.co.uk> wrote:
> There seem to have been a few people posting with filesystem corruption in
> the last week or two. It seems to be my turn, so I hope it isn't contagious.
> The cause here is quite clear - whilst rummaging in the server cupboard
> yesterday, power to the machine was accidentally disconnected.
>
> I have booted with a live CD & run `reiserfsck --fix-fixable` on the
> filesystem, but nevertheless when I attempt to boot the system I get a
> "failed to open the device... no such file or directory" message, followed
> by another error as per subject line.
>
> However, you will see from this screenshot (taken with an IP KVM) that the
> filesystem does indeed seem to have been mounted successfully, if read-only:
>
> http://linux.stroller.uk.eu.org/fs-corruption.png
>
> All I did here was log in with the root password.
>
>
> When I boot with a live CD I can mount, read & write the filesystem:
>
> root@sysresccd /root % mount -v -L root /mnt/gentoo
> mount: you didn't specify a filesystem type for /dev/sda3
> * * * I will try type reiserfs
> /dev/sda3 on /mnt/gentoo type reiserfs (rw)
> root@sysresccd /root % ls /mnt/gentoo
> bin *boot *dev *etc *home *lib *mnt *opt *proc *root *sbin *sys *tmp *usr
> *var
> root@sysresccd /root % touch /mnt/gentoo/foo
> root@sysresccd /root % echo foobar >> /mnt/gentoo/foo
> root@sysresccd /root % ls -lh !!:$
> ls -lh /mnt/gentoo/foo
> -rw-r--r-- 1 root root 7 2010-03-03 11:18 /mnt/gentoo/foo
> root@sysresccd /root % cat !!:$
> cat /mnt/gentoo/foo
> foobar
> root@sysresccd /root % rm !!:$
> rm /mnt/gentoo/foo
> rm: remove regular file `/mnt/gentoo/foo'? y
> root@sysresccd /root %
>
> All the important system stuff on this PC is on a single partition. I have
> two other drives attached at /mnt/space & /mnt/morespace - they are XFS and
> I have run xfs_repair on both of them, which completes quickly indicating no
> problems.
>
> I'm not really sure how to proceed next. I feel the problem is indeed on
> this reiserfs filesystem, the root filesystem with the label "root". I can't
> help thinking that the problem is not that the system "failed to open the
> device", but instead maybe that there's an important system file missing
> that means the init script (or whatever responsible for mounting the
> fiesystem) is not properly returning 0. Does this seem possible? Maybe the
> reiserfs handler for mount is somehow broken (performing the mount, but not
> returning 0, or perhaps broken in such as was it is able to mount read-only
> but not read-write).
>
> I am tempted to chroot into the system and re-emerge system & baselayout. If
> I'm correct in this above guess then re-emerging the correct file * will fix
> the problem. Right?
>
> `reiserfsck --help` shows some other options besides the simple
> --fix-fixable - I assume the "expert option" of --scan-whole-partition is
> unsafe, but what about the --rebuild-sb or --rebuild-tree? Can I safely run
> these? Am I advised to run these?
>
> Stroller.

Hi Stroller,
Sorry for your problems. I've had a rash of machine problems over
the last 6 weeks. No fun. I feel for you.

In my most recent case what looked like a simple disk corruption
problem was really a prelude to the drive just plain going bad. Have
you tried smartctl to see what it says about the drive at this point?

It would be even more frustrating to chroot in, do all the work,
think you had it fixed and then the underlying foundation of your
house crumbles beneath you 3 weeks from now.

Good luck,
Mark
 
Old 03-03-2010, 01:01 PM
Mick
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

On 3 March 2010 13:28, Stroller <stroller@stellar.eclipse.co.uk> wrote:
>
> On 3 Mar 2010, at 12:42, Willie Wong wrote:
>
>> On Wed, Mar 03, 2010 at 12:24:42PM +0000, Stroller wrote:
>>>
>>> There seem to have been a few people posting with filesystem
>>> corruption in the last week or two. It seems to be my turn, so I hope
>>> it isn't contagious. The cause here is quite clear - whilst rummaging
>>> in the server cupboard yesterday, power to the machine was
>>> accidentally disconnected.
>>>
>>> I have booted with a live CD & run `reiserfsck --fix-fixable` on the
>>> filesystem, but nevertheless when I attempt to boot the system I get a
>>> "failed to open the device... no such file or directory" message,
>>> followed by another error as per subject line.
>>
>> from the output it looks like you are mounting by label? What if you
>> edit fstab to point to the device name /dev/hd?? instead of
>> LABEL=root? Check the filesystem label to make sure it is ok?
>
> Many thanks for this suggestion, however following it makes no difference,
> except in the trivia that it says "failed to open the device '/dev/hda3': No
> such file or directory" (instead of "LABEL=...").
>
> I also tried editing grub to point to /dev/sda3 (although admittedly with
> the LABEL= entry in /etc/fstab) but that makes no difference. I have never
> tried (intentionally) reconfiguring this kernel to use /dev/sdX instead of
> /dev/hdX and I'm pretty sure it's booted using the current kernel &
> configuration in the past.

In my experience reiserfs is a very stable fs. I had a dodgy memory
module once which I put up with for more than 9 months. The machine
would lock up hard on a daily basis and the only way to get it going
again would be to pull the plug. That would happen at random,
midstream emerge --sync, package updates, updatedb, etc. It survived
through hundreds of crashes by fsck at the next boot. Once or twice
things went hairy and I would get a message similar to yours. On
these rare occasions I booted with a LiveCD and with the partitions
unmounted I ran --check, then --fix-fixable and finally
--rebuild-tree. You may want to use an external drive with dd to
image the current / partition and do all your recovery work on that.
If you don't care too much about the risk of catastrophic failure then
just run --rebuild-tree with a LiveCD and see what you get.

Good luck.
--
Regards,
Mick
 
Old 03-03-2010, 02:26 PM
Mark Knecht
 
Default Filesystem corruption - reiserfs? - won't boot, "filesystem couldn't be fixed :("

On Wed, Mar 3, 2010 at 6:26 AM, Stroller <stroller@stellar.eclipse.co.uk> wrote:
>
> On 3 Mar 2010, at 14:00, Mark Knecht wrote:
>>
>> On Wed, Mar 3, 2010 at 4:24 AM, Stroller <stroller@stellar.eclipse.co.uk>
>> wrote:
>>>
>>> There seem to have been a few people posting with filesystem corruption
>>> in
>>> the last week or two. It seems to be my turn, so I hope it isn't
>>> contagious.
>>> The cause here is quite clear - whilst rummaging in the server cupboard
>>> yesterday, power to the machine was accidentally disconnected.
>>
>> ...
>> *Sorry for your problems. I've had a rash of machine problems over
>> the last 6 weeks. No fun. I feel for you.
>>
>> *In my most recent case what looked like a simple disk corruption
>> problem was really a prelude to the drive just plain going bad. Have
>> you tried smartctl to see what it says about the drive at this point?
>>
>> *It would be even more frustrating to chroot in, do all the work,
>> think you had it fixed and then the underlying foundation of your
>> house crumbles beneath you 3 weeks from now.
>
> I don't think this is a problem. I would love to know what others think of
> the `smartctl` output:
>
>
> root@sysresccd /root % smartctl -H /dev/sda
> smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> Please note the following marginal Attributes:
> ID# ATTRIBUTE_NAME * * * * *FLAG * * VALUE WORST THRESH TYPE * * *UPDATED
> *WHEN_FAILED RAW_VALUE
> *9 Power_On_Seconds * * * *0x0012 * 001 * 001 * 020 * *Old_age * Always
> FAILING_NOW 44803h+12m+16s
>
> root@sysresccd /root % smartctl -i /dev/sda
> smartctl version 5.38 [i486-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
>
> === START OF INFORMATION SECTION ===
> Model Family: * * Fujitsu MPA..MPG series
> Device Model: * * FUJITSU MPF3204AT
> Serial Number: * *05030567
> Firmware Version: 0028
> User Capacity: * *20,496,236,544 bytes
> Device is: * * * *In smartctl database [for details use: -P show]
> ATA Version is: * 5
> ATA Standard is: *ATA/ATAPI-5 T13 1321D revision 1
> Local Time is: * *Wed Mar *3 14:14:31 2010 UTC
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> root@sysresccd /root %
>
>
> This looks to me like smartctl is going "OMG! What an ancient drive!" - it's
> a 20gig EIDE drive and if my pocket calculator is correct (44803/24/365),
> it's seen 5 years of active use - and that's the "marginal attribute"
> referred to.
>
> Like I said, the power plug was accidentally pulled on this drive, so I'm
> inclined to attribute the corruption only to that, not to the drive actually
> failing.
>
> The drive is in a computer that has rarely been turned off in the last
> couple of years, and is also in a warm environment, conditions which are
> ideal. I appreciate the latter seems unintuitive, but in fact studies have
> showed that drives in somewhat warm environments last longer than those that
> are cooled.
>
> That it passes the "SMART overall-health self-assessment test" suggests to
> me that it is chugging away quite happily.
>
> I would have dismissed your concerns were it not for the capitalised
> "FAILING_NOW" in the output. Like I say, I think this is just smartctl
> declaring "OMG! this drive is old!", but I open this matter to the list for
> discussion (should you wish).
>
> I think I'm actually nearly ready to migrate off this system. The power was
> actually pulled as I installed 3 new (to me) rackmount machines in the
> server cupboard - the plan is to have identical machines running RAID, so
> that in the case of ANY problems I have spares available. I have take
> nightly backups of the important data on this machine, however I'd prefer it
> to run just a couple or a few weeks longer to allow me to migrate at my own
> leisure.
>
> Stroller.

I've had two machines go bad due to hard drive problems in the last 6
weeks. One drive was 4.5 years old, the other 6 years old. I have no
experience with smart. I'm just learning about it. However it is
generated by the microcontroller in the hard drive as per the view of
the drive manufacturer so if the drive is telling you it's failing
then...

My 4.5 year failure actually stopped producing smart output somewhere
along the way before it failed. The 6 year drive I wasn't using smart
at the time so I had no data from it but it was in an environment
where the UPS went through a lot of abuse.

I sounds like you have good backups so just make sure they are good
and do what you want.

- Mark
 

Thread Tools




All times are GMT. The time now is 08:23 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org