FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Fedora User

 
 
LinkBack Thread Tools
 
Old 03-26-2008, 01:49 PM
John Summerfield
 
Default Catastrophic disk failure, where was smartd?

Tom Horsley wrote:

On Wed, 26 Mar 2008 08:35:49 -0500
"David G. Mackay" <mackay_d@bellsouth.net> wrote:


Shouldn't there have been some indication of problems prior to the
failure?
I suppose it depends on how fast the drive goes from "working" to "not
working."


I have a feeling smartd is quite a lot like the emergency lights
we have at work. They come on fine when you press the "test" button,
but if the power actually goes out, they don't work at all :-).

Most UPS boxes seem to be about the same. They'll be reporting
self test OK and lots of battery life, then the power actually
fails, and they fall over dead.


Really all it has to go on is the potential difference between positive
and negative connectors. The electronics aren't going to know how fast
it goes from 72V (about where my UPS should be) to something
unsatisfactory without actually running it (partially) down.








--

Cheers
John

-- spambait
1aaaaaaa@coco.merseine.nu Z1aaaaaaa@coco.merseine.nu
-- Advice
http://webfoot.com/advice/email.top.php
http://www.catb.org/~esr/faqs/smart-questions.html
http://support.microsoft.com/kb/555375

You cannot reply off-list:-)

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 03-26-2008, 02:01 PM
Les Mikesell
 
Default Catastrophic disk failure, where was smartd?

David G. Mackay wrote:

I tried to install Centos 5.1 on a hard drive, and ran into errors
formatting it. I downloaded and ran the dos version of the Seagate
tools against the disk, which found all sorts of errors, and claimed
that it was running at 253 degrees. This drive has been sitting idle,
but powered on and mounted in a system running FC6 on a different drive.
Looking at /var/log/messages, I see no smart warnings until after I
rebooted into FC6. Prior to the failure I see messages indicating that
the drive is in the smartd database, and that it has been added to the
monitor list.

Shouldn't there have been some indication of problems prior to the
failure?



What did smartctl -A say about it? I'm not sure that the tests that it
needs

for good diagnostics ever get run automatically.

--
Les Mikesell
lesmikesell@gmail.com



--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 03-26-2008, 04:27 PM
Bruno Wolff III
 
Default Catastrophic disk failure, where was smartd?

On Wed, Mar 26, 2008 at 08:35:49 -0500,
"David G. Mackay" <mackay_d@bellsouth.net> wrote:
>
> Shouldn't there have been some indication of problems prior to the
> failure?

Only if you are lucky. Someone at Google published some information about
smart around a year ago. In cases where catastrophic failures occur, for a high
percentage there is no warning from smart.

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 03-26-2008, 05:28 PM
Roger Heflin
 
Default Catastrophic disk failure, where was smartd?

Bruno Wolff III wrote:

On Wed, Mar 26, 2008 at 08:35:49 -0500,
"David G. Mackay" <mackay_d@bellsouth.net> wrote:

Shouldn't there have been some indication of problems prior to the
failure?


Only if you are lucky. Someone at Google published some information about
smart around a year ago. In cases where catastrophic failures occur, for a high
percentage there is no warning from smart.



The big issue is that most of the smart implementations don't scan the disk for
bad blocks, and in my experience several years ago with a 1000+ disks in
services was that the #1 failure was bad blocks, and smart did little to catch
that. The #2 failure was failure to spin up at all, but this seemed to be
confined to certain batches.


One thing that I would do was do a simple "dd if=/dev/sdx of=/dev/null bs=1M" on
all of my disks maybe 1x per week or 1x per month to scan it yourself, if the
disk detects a sector getting too many errors (still correctable with the extra
bits they have) they will move the data from the bad sector to a spare, and mark
the bad sector bad, and I believe smart counts when this has been done.


Roger

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 03-26-2008, 06:37 PM
"David G. Mackay"
 
Default Catastrophic disk failure, where was smartd?

On Wed, 2008-03-26 at 23:49 +0900, John Summerfield wrote:
> Tom Horsley wrote:
> > On Wed, 26 Mar 2008 08:35:49 -0500
> > "David G. Mackay" <mackay_d@bellsouth.net> wrote:
> >
> >> Shouldn't there have been some indication of problems prior to the
> >> failure?
> I suppose it depends on how fast the drive goes from "working" to "not
> working."

That's the problem. I don't know how long it took. I suppose that all
of this could have happened at once without prior warning, but I doubt
it.

Dave


--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 03-26-2008, 06:41 PM
"David G. Mackay"
 
Default Catastrophic disk failure, where was smartd?

On Wed, 2008-03-26 at 12:27 -0500, Bruno Wolff III wrote:
> On Wed, Mar 26, 2008 at 08:35:49 -0500,
> "David G. Mackay" <mackay_d@bellsouth.net> wrote:
> >
> > Shouldn't there have been some indication of problems prior to the
> > failure?
>
> Only if you are lucky. Someone at Google published some information about
> smart around a year ago. In cases where catastrophic failures occur, for a high
> percentage there is no warning from smart.

Thanks, I'll have to look that up. Suddenly the feeling that I'm
getting about smart isn't quite so warm and fuzzy.

Dave


--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 03-26-2008, 08:16 PM
Claude Jones
 
Default Catastrophic disk failure, where was smartd?

On Wednesday March 26 2008 1:27:45 pm Bruno Wolff III wrote:
> > Shouldn't there have been some indication of problems prior
> > to the failure?
>
> Only if you are lucky. Someone at Google published some
> information about smart around a year ago. In cases where
> catastrophic failures occur, for a high percentage there is no
> warning from smart.

>From the google hard drive study:
********************************************
Our key findings are:
• Contrary to previously reported results, we found
very little correlation between failure rates and either
elevated temperature or activity levels.
• Some SMART parameters (scan errors, reallocation
counts, offline reallocation counts, and probational
counts) have a large impact on failure probability.
• Given the lack of occurrence of predictive SMART
signals on a large fraction of failed drives, it is unlikely
that an accurate predictive failure model can
be built based on these signals alone
*****************************************

Their database was in excess of 100,000 hard drives. Details
available here:
http://research.google.com/archive/disk_failures.pdf

--
Claude Jones
Brunswick, MD, USA

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 03-26-2008, 09:36 PM
Bob Kinney
 
Default Catastrophic disk failure, where was smartd?

--- Tom Horsley <tom.horsley@att.net> wrote:

> On Wed, 26 Mar 2008 08:35:49 -0500
> "David G. Mackay" <mackay_d@bellsouth.net> wrote:
>
> > Shouldn't there have been some indication of problems prior to the
> > failure?
>
> I have a feeling smartd is quite a lot like the emergency lights
> we have at work. They come on fine when you press the "test" button,
> but if the power actually goes out, they don't work at all :-).
>
> Most UPS boxes seem to be about the same. They'll be reporting
> self test OK and lots of battery life, then the power actually
> fails, and they fall over dead.
>
>


Perhaps S.M.A.R.T. never mentioned a problem.

Unfortunately, there is no reliable indicator of imminent destruction.
S.M.A.R.T. was born to make an attempt at failure prediction, through
statistical analysis of certain performance factors. While this might
work alright for a relatively long, slow descent into oblivion, it cannot
possibly foresee a catastrophic sudden mechanical failure.

Google published an extensive study on the topic of disk failures.
Check this out:

http://labs.google.com/papers/disk_failures.html




__________________________________________________ __________________________________
Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 03-27-2008, 12:14 PM
Bruno Wolff III
 
Default Catastrophic disk failure, where was smartd?

On Wed, Mar 26, 2008 at 13:28:01 -0500,
Roger Heflin <rogerheflin@gmail.com> wrote:
>
> The big issue is that most of the smart implementations don't scan the disk
> for bad blocks, and in my experience several years ago with a 1000+ disks
> in services was that the #1 failure was bad blocks, and smart did little to
> catch that. The #2 failure was failure to spin up at all, but this
> seemed to be confined to certain batches.

Isn't that what the long surface scan test is supposed to do?

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 
Old 03-27-2008, 02:05 PM
Roger Heflin
 
Default Catastrophic disk failure, where was smartd?

Bruno Wolff III wrote:

On Wed, Mar 26, 2008 at 13:28:01 -0500,
Roger Heflin <rogerheflin@gmail.com> wrote:
The big issue is that most of the smart implementations don't scan the disk
for bad blocks, and in my experience several years ago with a 1000+ disks
in services was that the #1 failure was bad blocks, and smart did little to
catch that. The #2 failure was failure to spin up at all, but this
seemed to be confined to certain batches.


Isn't that what the long surface scan test is supposed to do?



Probably. I started using dd test before disks and Linux and other oses
supported smart. It works on any disk (or array) whether smart works or not.


Roger

--
fedora-list mailing list
fedora-list@redhat.com
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
 

Thread Tools




All times are GMT. The time now is 02:05 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org