FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Gentoo > Gentoo User

 
 
LinkBack Thread Tools
 
Old 02-26-2010, 04:59 PM
Volker Armin Hemmann
 
Default recovery from /var corruption?

On Freitag 26 Februar 2010, Mark Knecht wrote:

>
> The machine _mostly_ crashed while running badblocks. I say mostly
> because the mouse is still alive but I can no longer ssh in and cannot
> open a terminal on my wife's desktop or get to the console.

because it is not crashed but waiting for the ide timeouts.

>
> I tried to Ctrl-C out out of badblocks here (this is running shelled
> in) before I figured out it was a total crash which messed up the
> terminal a bit but you can see what it was reporting before the crash
>
> dragonfly ~ # badblocks -sv /dev/hda
> Checking blocks 0 to 156290903
> Checking for bad blocks (read-only test): 89360960done, 35:00 elapsed
> 89360961done, 35:09 elapsed
> 89360962
> 89360963
> ^C^C18% done, 35:27 elapsed
>
> So, there seem to be problems, possibly with the drive, or maybe it's
> some sort of overheating problem on the processor and this was just
> the way the processor failed before the crash?
>
> I ran memtest86 night before last for 8 hours and had no memory
> problems. I'll remove memory and PCI cards, reseat everything, and
> then see what happens.

protip: if you are running badblocks (or ddrescue) on a probably damaged
device - attach it with an usb adapter. That way your box is still usable.

/me hates linux kernel for making processes in D unkillable and sucking very
much on diskio.
 
Old 02-26-2010, 05:19 PM
Paul Hartman
 
Default recovery from /var corruption?

On Fri, Feb 26, 2010 at 11:59 AM, Volker Armin Hemmann
<volkerarmin@googlemail.com> wrote:
> protip: if you are running badblocks (or ddrescue) on a probably damaged
> device - attach it with an usb adapter. That way your box is still usable.

+1, i had a bad drive and it's so much easier to unplug/replug the USB
instead of rebooting and etc.
 
Old 02-26-2010, 05:26 PM
Mark Knecht
 
Default recovery from /var corruption?

On Fri, Feb 26, 2010 at 9:59 AM, Volker Armin Hemmann
<volkerarmin@googlemail.com> wrote:
> On Freitag 26 Februar 2010, Mark Knecht wrote:
>
>>
>> The machine _mostly_ crashed while running badblocks. I say mostly
>> because the mouse is still alive but I can no longer ssh in and cannot
>> open a terminal on my wife's desktop or get to the console.
>
> because it is not crashed but waiting for the ide timeouts.

So if I let it continue running is it going to come back in the next
hour or two? I am assuming the IDE timeouts are because the drive is
having trouble, correct? That's the theory here? If so then unless the
software can mark them bad and somehow create good files out of bad
then I'm still left with a machine that is going to need serious work
done before it's a happy box again, correct?

On the other hand, because I have reasonably good user backups
(although no real system backups) right now if I bite the bullet and
build the machine then when my wife gets it back it's hopefully going
to be more reliable, wouldn't it?

I'm thinking that maybe I just copy a little stuff off the box - /etc
and the like - and then boot the machine with the Gentoo install CD or
System Resuce CD and see what the drive is doing?

That doesn't cost me anything to look around, but if SMART won't turn
on and badblocks is suggesting the drive is having trouble maybe
running something like badblocks and actually __marking__ blocks as
bad and then reloading Gentoo would work in the long run? (A lot of
work though.)

I'm really not interested in buying new drive because the machine is
ATA100/133 and if it's not the drive then the money is wasted for a
new machine. The cheapest at NewEgg is about $40. Why spend the buck
for an old Intel Centrino machine?

>
>>
>> I tried to Ctrl-C out out of badblocks here (this is running shelled
>> in) before I figured out it was a total crash which messed up the
>> terminal a bit but you can see what it was reporting before the crash
>>
>> dragonfly ~ # badblocks -sv /dev/hda
>> Checking blocks 0 to 156290903
>> Checking for bad blocks (read-only test): 89360960done, 35:00 elapsed
>> 89360961done, 35:09 elapsed
>> 89360962
>> 89360963
>> ^C^C18% done, 35:27 elapsed
>>
>> So, there seem to be problems, possibly with the drive, or maybe it's
>> some sort of overheating problem on the processor and this was just
>> the way the processor failed before the crash?
>>
>> I ran memtest86 night before last for 8 hours and had no memory
>> problems. I'll remove memory and PCI cards, reseat everything, and
>> then see what happens.
>
> protip: if you are running badblocks (or ddrescue) on a probably damaged
> device - attach it with an usb adapter. That way your box is still usable.
>
> /me hates linux kernel for making processes in D unkillable and sucking very
> much on diskio.
>
>

Good inputs. Thanks!

Cheers,
Mark
 
Old 02-26-2010, 05:37 PM
Volker Armin Hemmann
 
Default recovery from /var corruption?

On Freitag 26 Februar 2010, Mark Knecht wrote:
> On Fri, Feb 26, 2010 at 9:59 AM, Volker Armin Hemmann
>
> <volkerarmin@googlemail.com> wrote:
> > On Freitag 26 Februar 2010, Mark Knecht wrote:
> >> The machine _mostly_ crashed while running badblocks. I say mostly
> >> because the mouse is still alive but I can no longer ssh in and cannot
> >> open a terminal on my wife's desktop or get to the console.
> >
> > because it is not crashed but waiting for the ide timeouts.
>
> So if I let it continue running is it going to come back in the next
> hour or two?

yes
> I am assuming the IDE timeouts are because the drive is
> having trouble, correct? That's the theory here?

yes

> If so then unless the software can mark them bad and somehow create good
files out of bad
> then I'm still left with a machine that is going to need serious work
> done before it's a happy box again, correct?

and with 'serious work' you mean 'replace the harddisk' ...

>
> On the other hand, because I have reasonably good user backups
> (although no real system backups) right now if I bite the bullet and
> build the machine then when my wife gets it back it's hopefully going
> to be more reliable, wouldn't it?

yes

>
> I'm thinking that maybe I just copy a little stuff off the box - /etc
> and the like - and then boot the machine with the Gentoo install CD or
> System Resuce CD and see what the drive is doing?

you could do that.

>
> That doesn't cost me anything to look around, but if SMART won't turn
> on and badblocks is suggesting the drive is having trouble maybe
> running something like badblocks and actually __marking__ blocks as
> bad and then reloading Gentoo would work in the long run? (A lot of
> work though.)

you would need to save the badblocks to a file, than feed that file to mkfs. And
you are not even save - because when a drive starts to have bad blocks the
chance that more are popping up some is pretty high. So you might be lucky and
the drive is able to run for a long while (even maybe mapping out bad blocks
while testing them - so always run badblocks twice), but you have at least a
as a good chance that the whole thing starts over in a couple of weeks.

>
> I'm really not interested in buying new drive because the machine is
> ATA100/133 and if it's not the drive then the money is wasted for a
> new machine. The cheapest at NewEgg is about $40. Why spend the buck
> for an old Intel Centrino machine?

you could take the drive with you when you buy a new machine. Moving harddisks
is not that hard. Or put it in an usb enclosure when you don't need it
anymore. ide-usb enclosures are cheap.
 
Old 02-26-2010, 05:48 PM
Mark Knecht
 
Default recovery from /var corruption?

On Fri, Feb 26, 2010 at 10:26 AM, Mark Knecht <markknecht@gmail.com> wrote:
<SNIP>
>
> On the other hand, because I have reasonably good user backups
> (although no real system backups) right now if I bite the bullet and
> build the machine then when my wife gets it back it's hopefully going
> to be more reliable, wouldn't it?
>
> I'm thinking that maybe I just copy a little stuff off the box - /etc
> and the like - and then boot the machine with the Gentoo install CD or
> System Resuce CD and see what the drive is doing?
>
<SNIP>

As a related idea I dug out an old copy of Spinrite which I'll run on
all the partitions just to see what it says. However if the problem is
currently 1 partition (/var) which is still mostly readable, could I
not just create a new var partition - the drive has space free - and
then copy important stuff from old var to new var, change fstab and
then basically just go on from there?

Cheers,
Mark
 
Old 02-26-2010, 05:57 PM
Mark Knecht
 
Default recovery from /var corruption?

On Fri, Feb 26, 2010 at 9:38 AM, daid kahl <daidxor@gmail.com> wrote:
> On 26 February 2010 12:33, Mark Knecht <markknecht@gmail.com> wrote:
>> So I got my wife's machine booted today using a install disk and
>> played a bit with e2fsck. The machine stopped being happy last night
>> due to some sort of corruption on the /var partition. e2fsck
>> complained about 3 or 4 files and then repaired the partition. The
>> machine booted cleanly as far as I can tell.
>>
>> So, something went bad and I managed to sneak around it for a while
>> and now I'm sort of living with the machine wondering what to do.
>>
>> Do I just watch the logs looking for problems? I have no way of
>> knowing right now whether this was a disk problem that's going to come
>> back, a 1 time deal due to power, or something else entirely.
>>
>> As these cheap machines that don't use RAID what's the right way to
>> go? emerge -e @world and then wait for the next event? Do nothing and
>> wait?
>>
>> We've got decent personal data backups as well as basic /etc data.
>>
>> Thanks,
>> Mark
>>
>
> I reconsidered your problem, and I actually wonder if emerging world
> is a valid notion in this case, as the world file is under /var and
> this is reported as corrupt.
>
> In this sense, it may be entirely non-trivial to regenerate (without
> backup) the correct world-file for a system.
>
> Am I out in the deep end, or is this, in fact, the critical point that
> needs consideration here?
>
> ~daid

Hi daid,
In general you are correct. If I didn't have a copy of the world
file then it would be a bit hit and miss. In this case I do have it
saved elsewhere so it's actually quite easy.

This failure is more (it seems) a few bad blocks on one partition
and not a total drive failure.

I'm leaning toward a new /var partition and just ignoring the
partition that has problems. It will sit on the disk but it's only
10GB out of 160GB so it's not the end of the world by any means.

Thanks!

- Mark
 

Thread Tools




All times are GMT. The time now is 05:36 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org