FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > ArchLinux > ArchLinux General Discussion

 
 
LinkBack Thread Tools
 
Old 06-09-2010, 08:44 PM
Mauro Santos
 
Default Bad Magic Number in Superblock - Any trick for Arch or for new kernels?

On 06/09/2010 07:16 PM, David C. Rankin wrote:
>
> Before I add the drive to the heap of drives in my 'dead drive box' are there
> any other silver bullets I should try to try and resurrect the drive? (Data
> isn't an issue, it's all backed up :-)
>
> What say the gurus?
>

Try issuing a smart short test and see if it goes well, if it does
finish without errors then issue a full smart self test and check for
any smart attribute changes and/or errors.

If that goes well, which seems to indicate that the drive is still ok,
then use badblocks and let it do the 4 write and read passes, keep an
eye on the smart attributes because any problem not detected before may
show up now. If you get no errors or smart attribute changes then the
drive should be ok.

--
Mauro Santos
 
Old 06-10-2010, 01:00 AM
"David C. Rankin"
 
Default Bad Magic Number in Superblock - Any trick for Arch or for new kernels?

On 06/09/2010 03:44 PM, Mauro Santos wrote:
> On 06/09/2010 07:16 PM, David C. Rankin wrote:
>>
>> Before I add the drive to the heap of drives in my 'dead drive box' are there
>> any other silver bullets I should try to try and resurrect the drive? (Data
>> isn't an issue, it's all backed up :-)
>>
>> What say the gurus?
>>
>
> Try issuing a smart short test and see if it goes well, if it does
> finish without errors then issue a full smart self test and check for
> any smart attribute changes and/or errors.
>
> If that goes well, which seems to indicate that the drive is still ok,
> then use badblocks and let it do the 4 write and read passes, keep an
> eye on the smart attributes because any problem not detected before may
> show up now. If you get no errors or smart attribute changes then the
> drive should be ok.
>

Thanks Mauro,

I do like badblocks. It saved my bacon once before. Rather than doing the
badblock recovery (since I have the data), what I think I'll do is search a bit
more for the 'fdisk -l' info for the drive. If I find it, I'll try recreating
the partitions and see what is left with the drive. If not, then I'll just add
the drive to the pile. Eventually I'll do some type of chronological art exhibit
with drives. Everything from 8 'Meg' MFMRLL drives to the new Seagate 500-750G
drives that drop like flies now for some reason

--
David C. Rankin, J.D.,P.E.
Rankin Law Firm, PLLC
510 Ochiltree Street
Nacogdoches, Texas 75961
Telephone: (936) 715-9333
Facsimile: (936) 715-9339
www.rankinlawfirm.com
 
Old 06-10-2010, 01:46 PM
Mauro Santos
 
Default Bad Magic Number in Superblock - Any trick for Arch or for new kernels?

On 06/10/2010 02:00 AM, David C. Rankin wrote:
> I do like badblocks. It saved my bacon once before. Rather than doing the
> badblock recovery (since I have the data), what I think I'll do is search a bit
> more for the 'fdisk -l' info for the drive. If I find it, I'll try recreating
> the partitions and see what is left with the drive. If not, then I'll just add
> the drive to the pile. Eventually I'll do some type of chronological art exhibit
> with drives. Everything from 8 'Meg' MFMRLL drives to the new Seagate 500-750G
> drives that drop like flies now for some reason
>

I guess that you can't recover much more from the drive as it is just by
trying to read from it (unless you get hold of some advanced tool to
make some sense of the whole drive).

That problem may not be caused by a drive failure but a combination of
factors, you said that this particular disk has been running for a few
years without problems and there is no indication of failure by the
smart attributes (I read that smart catches only about 2/3 of failures).

In my experience, power supplies go bad after 2 or 3 year or continuous
use if they are consumer grade hardware, so a bad power supply coupled
with the worst case for the hard disk can lead to problems, that is why
I have suggested badblocks to look for problems while keeping and eye on
the smart attributes. Also you may have an hardware failure somewhere
else, the motherboard or the hardware connected directly to the disk are
good candidates (as much as anything else actually if the system is 5 or
6 years old).

>From my experience only, I find it quite hard to know when a disk is
about to fail. Currently I am trying to figure out if an hard disk in a
machine I manage is about to fail or not (3'5 drive), smart says it is,
badblocks can't find anything wrong with the drive (even after 2 full
write passes) but one of the smart attributes, the one that says
failing_now increases by one with each full read cycle, smart attributes
do not report any reallocated sectors. This is a new drive (6 months
old, give or take) and the other drives assembled in the machine have
exactly the same usage and do not show any signs of trouble (the serial
numbers of the drives are all very close, almost sequential, all from
the same manufacturer).

I have had some trouble with a drive from the same manufacturer before
(2'5 drive), but things seem to go smoothly after I did just one 'dd
if=/dev/zero of=/dev/sd?' and then read it back, no smart attribute said
the drive was failing that time, so it might be just a bad coincidence.

As far as I can see, you have done the best thing you could have done,
which is keep backups of the important data, now all you can do is try
to decide if that drive can still be used and trust it a bit less (put
it in a raid array that can tolerate failures). Unless the drive fails
terribly with no margin for doubt it is hard to say, from the users
point of view, if it is really failing or not.

--
Mauro Santos
 
Old 06-10-2010, 04:06 PM
"David C. Rankin"
 
Default Bad Magic Number in Superblock - Any trick for Arch or for new kernels?

On 06/10/2010 08:46 AM, Mauro Santos wrote:
>>From my experience only, I find it quite hard to know when a disk is
> about to fail. Currently I am trying to figure out if an hard disk in a
> machine I manage is about to fail or not (3'5 drive), smart says it is,
> badblocks can't find anything wrong with the drive (even after 2 full
> write passes) but one of the smart attributes, the one that says
> failing_now increases by one with each full read cycle, smart attributes
> do not report any reallocated sectors. This is a new drive (6 months
> old, give or take) and the other drives assembled in the machine have
> exactly the same usage and do not show any signs of trouble (the serial
> numbers of the drives are all very close, almost sequential, all from
> the same manufacturer).

Mauro,

Your experience sounds exactly like mine over the past year. I have had 4
Seagate drives supposedly "go bad" after 13-14 months use (1-2 months after
warranty runs out). The problem is always the same - smart says there is a
badblock problem and it logs the time/date of the error. Subsequent passes with
smartctl -t long shows no additional problem and the drives always 'PASS'.

Where this behavior between badblock/smart/Seagate drives is killing me is that
most of my drives run in raid1 sets with either dmraid or mdraid. The dmraid
installs seem to be the most sensitive to this problem. I know that the hardware
ought to provide badblock remapping on a per-drive basis on the fly, but I still
don't have a good feel for how dmraid handles this internally.

Regardless, when I split an array where one drive is showing badblock issues
and then use the drive as a single drive, then I don't have any more problems
with the drive. So, from what I'm seeing, there is a problem in the way
smart/badblocks/dmraid plays together. I don't have a clue what it is, but I've
been through that scenario 4 times in the past 12 months.

This failues is different. Here the drive was stand-alone to begin with and
contrary to the earlier badblock/dmraid drives, this drive can no longer be read
with any power supply. (when I work on them out of the machine, they have a
dedicated power source provided by the usb connection kit) I think the only way
I will ever get an answer on this drive is if I find my dump of the CHS
partition info for the drive and then manually re-create the partitions to tell
the drive where to start looking.

Ceste La Vie.... I'll provide a follow up if I manage to uncover any more on
the reason for the failure. Thanks for your help.

--
David C. Rankin, J.D.,P.E.
Rankin Law Firm, PLLC
510 Ochiltree Street
Nacogdoches, Texas 75961
Telephone: (936) 715-9333
Facsimile: (936) 715-9339
www.rankinlawfirm.com
 
Old 06-10-2010, 06:16 PM
Mauro Santos
 
Default Bad Magic Number in Superblock - Any trick for Arch or for new kernels?

On 06/10/2010 05:06 PM, David C. Rankin wrote:
> Your experience sounds exactly like mine over the past year. I have had 4
> Seagate drives supposedly "go bad" after 13-14 months use (1-2 months after
> warranty runs out). The problem is always the same - smart says there is a
> badblock problem and it logs the time/date of the error. Subsequent passes with
> smartctl -t long shows no additional problem and the drives always 'PASS'.

I didn't want to name the manufacturer because I think it is not fair,
failure due to normal wear is acceptable and expected (it seems to be
your case), it is also normal to see some drives fail early after
starting to work, this follows what we call here the bathtub curve,
higher failure rates at the start of the life, then failure rates
decrease significantly and then rise again at the end of life.

As a side note, here in Europe the warranty is 24 months (at least where
I live) so I doubt the manufacturer would make drives that would last
less than that, besides some manufacturers are offering/advertising 3 or
5 year warranty in their websites, so I guess they must be quite sure
their drives are reliable enough, you may want to look at that too and
see if you are illegible for a free replacement from the manufacturer
itself.

So far I have only seen consumer grade drives being used in machines
that are working 24/7, that is clearly a mistake but it's the cheapest
option and I guess that most of the times it works fine, hence it is
hard to explain why should more money be spent on server grade hardware.

Like I said before, the problem may be caused or aggravated by some
other component. Even with my limited experience I've seen some weird
problems caused by components that would not seem liable at first
glance. The latest trend from my limited experience seems to be power
supply failure, if not complete failure, at least not conforming to
specs and causing instability.

The trend seems to be to supply most components from 12V to reduce the
current flowing from the power supply to the component being supplied
(the supply voltage keeps decreasing with the latest technology nodes),
but hard disks (the 3'5 ones at least) still rely on 12v to make the
platters spin.

If you have to reduce the voltage from, lets say, 12V to 1V or 3.3V or
whatever close to that, you have lots of working margin, but because the
hard disk still requires 12V+-10% (it is what the atx spec requires) if
the 12V line goes out of spec, which it can go if the power supply is
going bad, then the hard disk may not be able to work properly while
everything else will still probably work happily.

All this to say, you may not have a bad drive on your hands, it may be
just an unfortunate coincidence, if you really have a backup of all the
data, try to write to the drive while it is connected to a "good" power
supply and the problems may be gone (happened to me before with a 2'5
hard disk), however it is a good opportunity to try to recover some data
from that drive, just to learn some tricks for the future before you
write to it and wonder what has caused that problem.

--
Mauro Santos
 

Thread Tools




All times are GMT. The time now is 12:19 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org