FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 10-31-2010, 06:27 PM
Keith Roberts
 
Default PATA Hard Drive woes

Hi All.

Yesterday I was installing Centos 5.5 to my web server, and
it looks like the main hard drive has gone AWOL.

Fedora 12 put the file system into r/o mode.

The drive is an Hitachi, still under warranty.

There are bad sectors on it, and running the Hitachi DFT
tool confirms this. Also I cannot repair the bad sectors.

Would this be caused by a faulty I/O chip, or is it safe to
say it's definately the HDD at fault?

Kind Regards,

Keith Roberts

--
In theory, theory and practice are the same;
in practice they are not.

This email was sent from my laptop with Centos 5.5
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 10-31-2010, 06:31 PM
William Warren
 
Default PATA Hard Drive woes

On 10/31/2010 3:27 PM, Keith Roberts wrote:
> Hi All.
>
> Yesterday I was installing Centos 5.5 to my web server, and
> it looks like the main hard drive has gone AWOL.
>
> Fedora 12 put the file system into r/o mode.
>
> The drive is an Hitachi, still under warranty.
>
> There are bad sectors on it, and running the Hitachi DFT
> tool confirms this. Also I cannot repair the bad sectors.
>
> Would this be caused by a faulty I/O chip, or is it safe to
> say it's definately the HDD at fault?
>
> Kind Regards,
>
> Keith Roberts
>
hdd at fault
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 10-31-2010, 07:28 PM
Keith Roberts
 
Default PATA Hard Drive woes

On Sun, 31 Oct 2010, William Warren wrote:

> To: CentOS mailing list <centos@centos.org>
> From: William Warren <hescominsoon@emmanuelcomputerconsulting.com>
> Subject: Re: [CentOS] PATA Hard Drive woes
>
> On 10/31/2010 3:27 PM, Keith Roberts wrote:
>> Hi All.
>>
>> Yesterday I was installing Centos 5.5 to my web server, and
>> it looks like the main hard drive has gone AWOL.
>>
>> Fedora 12 put the file system into r/o mode.
>>
>> The drive is an Hitachi, still under warranty.
>>
>> There are bad sectors on it, and running the Hitachi DFT
>> tool confirms this. Also I cannot repair the bad sectors.
>>
>> Would this be caused by a faulty I/O chip, or is it safe to
>> say it's definately the HDD at fault?
>>
>> Kind Regards,
>>
>> Keith Roberts
>>
> hdd at fault

OK - thanks for confirming that Bill.

I'll remove it and take it back for replacement.

Keith


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 01:32 PM
Keith Roberts
 
Default PATA Hard Drive woes

On Sun, 31 Oct 2010, Keith Roberts wrote:

> To: CentOS mailing list <centos@centos.org>
> From: Keith Roberts <keith@karsites.net>
> Subject: Re: [CentOS] PATA Hard Drive woes
>
> On Sun, 31 Oct 2010, William Warren wrote:
>
>> To: CentOS mailing list <centos@centos.org>
>> From: William Warren <hescominsoon@emmanuelcomputerconsulting.com>
>> Subject: Re: [CentOS] PATA Hard Drive woes
>>
>> On 10/31/2010 3:27 PM, Keith Roberts wrote:
>>> Hi All.
>>>
>>> Yesterday I was installing Centos 5.5 to my web server, and
>>> it looks like the main hard drive has gone AWOL.
>>>
>>> Fedora 12 put the file system into r/o mode.
>>>
>>> The drive is an Hitachi, still under warranty.
>>>
>>> There are bad sectors on it, and running the Hitachi DFT
>>> tool confirms this. Also I cannot repair the bad sectors.
>>>
>>> Would this be caused by a faulty I/O chip, or is it safe to
>>> say it's definately the HDD at fault?
>>>
>>> Kind Regards,
>>>
>>> Keith Roberts
>>>
>> hdd at fault
>
> OK - thanks for confirming that Bill.
>
> I'll remove it and take it back for replacement.
>
> Keith

There were about 79 Seek errors in the SMART logs of the
HDD.

I moved the drive from the Primary Master cable, to the
Secondary Master cable, and I ran Hitachi's DFT tool, did a
complete disk erase, and that terminated with errors.

So to prepare the disk for returning under warranty, I used
another HDD utility to clean the disk again, still on Sec
Master cable.

I used vivard 0.4 to do a complete disk erase.

That was on the http://www.ultimatebootcd.com/index.html

Under HDD utils.

vivard did not show any errors when doing a full disk erase.

So I ran an Advanced r/w scan again with Hitachi DFT, and
the result was OK.

Any ideas what's happening please?

Is this disk usable, or is it still in need of replacing?

Kind Regards,

Keith Roberts
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 02:46 PM
Todd Denniston
 
Default PATA Hard Drive woes

Keith Roberts wrote, On 11/03/2010 10:32 AM:
> On Sun, 31 Oct 2010, Keith Roberts wrote:
>
<SNIP>
> There were about 79 Seek errors in the SMART logs of the
> HDD.
>
<SNIP>
> vivard did not show any errors when doing a full disk erase.
>
> So I ran an Advanced r/w scan again with Hitachi DFT, and
> the result was OK.
>
> Any ideas what's happening please?

WFG: In writing it all, the seek motor knocked the dust out of it's way? (what dust?)
How about checking all the smart attributes and seeing if others are elevated.
http://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes

Are you seeing any block "remap" activity?
http://en.wikipedia.org/wiki/Hard_disk_drive#Error_handling

>
> Is this disk usable, or is it still in need of replacing?
>

http://en.wikipedia.org/wiki/S.M.A.R.T.#Background
You have gotten SMART errors from this drive already, so:
You have to ask yourself, 'Do you feel lucky?', Well do y'a...

And the other question: If this drive up and dies shortly and I knew about the smart errors, will
the data owner complain more or less to me about the drive death later or drive replacement hassle now?

Only YOU (and the data owner) know the risk trade-off levels you have to consider.
--
Todd Denniston
Crane Division, Naval Surface Warfare Center (NSWC Crane)
Harnessing the Power of Technology for the Warfighter
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 03:01 PM
Keith Roberts
 
Default PATA Hard Drive woes

On Wed, 3 Nov 2010, Todd Denniston wrote:

> To: CentOS mailing list <centos@centos.org>
> From: Todd Denniston <Todd.Denniston@tsb.cranrdte.navy.mil>
> Subject: Re: [CentOS] PATA Hard Drive woes
>
> Keith Roberts wrote, On 11/03/2010 10:32 AM:
>> On Sun, 31 Oct 2010, Keith Roberts wrote:
>>
> <SNIP>
>> There were about 79 Seek errors in the SMART logs of the
>> HDD.
>>
> <SNIP>
>> vivard did not show any errors when doing a full disk erase.
>>
>> So I ran an Advanced r/w scan again with Hitachi DFT, and
>> the result was OK.
>>
>> Any ideas what's happening please?
>
> WFG: In writing it all, the seek motor knocked the dust
> out of it's way? (what dust?) How about checking all the
> smart attributes and seeing if others are elevated.
> http://en.wikipedia.org/wiki/S.M.A.R.T.#Known_ATA_S.M.A.R.T._attributes
>
> Are you seeing any block "remap" activity?
> http://en.wikipedia.org/wiki/Hard_disk_drive#Error_handling
>
>>
>> Is this disk usable, or is it still in need of replacing?
>>
>
> http://en.wikipedia.org/wiki/S.M.A.R.T.#Background You
> have gotten SMART errors from this drive already, so: You
> have to ask yourself, 'Do you feel lucky?', Well do y'a...
>
> And the other question: If this drive up and dies shortly
> and I knew about the smart errors, will the data owner
> complain more or less to me about the drive death later or
> drive replacement hassle now?
>
> Only YOU (and the data owner) know the risk trade-off
> levels you have to consider.
>

Thanks Todd for the reply.

There were no sectors remapped, which is odd as there were
bad sectors originally on the drive. I ran MemTest86+ out of
curiousity, and there are 5120 Errors, some at 0.4MB & 0.5
MB.

The BIOS has been playing up, not recognising the Primary
Master drive. This is the channel the Hitachi disk was on
when it developed the sector read errors.

Could a bad controller or bad RAM cause Hard Drive sector
errors?

The drive is as good as uninstalled, so I may as well send
it for replacement.

Regards,

Keith

NB: The box is down now, and I'll try and test and identify
the bad memory module next.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 04:19 PM
Warren Young
 
Default PATA Hard Drive woes

On 11/3/2010 8:32 AM, Keith Roberts wrote:
>
> So to prepare the disk for returning under warranty, I used
> another HDD utility to clean the disk again

...

> So I ran an Advanced r/w scan again with Hitachi DFT, and
> the result was OK.

A complete disk wipe brings bad sectors to the drive's attention,
forcing it to remap them using spare sectors set aside for the purpose.

All drives can do this, and they do it without logging the change. You
can't tell, from the outside, when or whether the drive has done this.
All you can do is infer it, because a sector that once tested bad now
tests good.

As to why this happened only during a format, not during the previous
disk test, it's probably because the format zeroed the disk. That
particular drive may have a policy to only remap sectors on write, so as
to preserve the sector contents for potential recovery later. (See
below for one way this can be done.)

It may be that your drive is now fine.

If you put it back into service, at minimum I would set up smartd, from
the smartmontools package. Maybe run smartctl on it by hand daily or
weekly, too. If you find that errors start happening again, there is
something continually degrading the drive's integrity, so the automatic
sector remapping will eventually run the drive out of spare sectors.

SpinRite (http://spinrite.com/) does nondestructive sector remapping.
At level 4 and above, it reads each sector in and then writes it back
out to the drive. Because remapping is silent, it's possible for it to
appear to do nothing, yet improve data integrity by bringing dodgy
sectors to the drive's attention.

If a sector can't be read without error, SpinRite forces the drive to
ignore the CRC and return the data anyway, retrying many times, then
making a statistical guess about the most likely contents of the sector.
(Reading a bad sector won't necessarily give the same value each try.)
Then on writing the reconstructed data back out, the drive
automatically remaps the sector, repairing it.

You might want to combine the SMART monitoring with periodic SpinRite
runs on the drive until you regain confidence in it.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 04:27 PM
 
Default PATA Hard Drive woes

Warren Young wrote:
> On 11/3/2010 8:32 AM, Keith Roberts wrote:
>>
>> So to prepare the disk for returning under warranty, I used
>> another HDD utility to clean the disk again
>
> ...
>
>> So I ran an Advanced r/w scan again with Hitachi DFT, and
>> the result was OK.
>
> A complete disk wipe brings bad sectors to the drive's attention,
> forcing it to remap them using spare sectors set aside for the purpose.
<snip>
> If you put it back into service, at minimum I would set up smartd, from
> the smartmontools package. Maybe run smartctl on it by hand daily or
> weekly, too. If you find that errors start happening again, there is
> something continually degrading the drive's integrity, so the automatic
> sector remapping will eventually run the drive out of spare sectors.
<snip>
Yeah, but I have problems with smartmon: for example, I've got a drive in
one server that's got two bad sectors, which SMART reports. I've followed
the instructions on how to make the log messages go away, and fsck -c...
but on reboot, SMART seems to ignore what badblocks found, and the
irritating messages are back.

mark

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 04:41 PM
Warren Young
 
Default PATA Hard Drive woes

On 11/3/2010 11:27 AM, m.roth@5-cent.us wrote:
> Yeah, but I have problems with smartmon:

More likely, problems with SMART. S.M.A.R.T. is D.U.M.B.

It's better than nothing, but sometimes not by a whole lot.

> one server that's got two bad sectors, which SMART reports. I've followed
> the instructions on how to make the log messages go away, and fsck -c...
> but on reboot, SMART seems to ignore what badblocks found, and the
> irritating messages are back.

It may be that SpinRite could fix that by forcing a remap.

Another option -- which I didn't mention because it probably isn't an
option for the original poster, but which may work with your servers --
is that some high-end RAID systems can do something like SpinRite at
level 4+, as can ZFS. They call it resilvering. I don't think these
systems do statistical reconstruction, but periodic read-then-rewrite
can stave off the need to reconstruct.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 05:51 PM
RedShift
 
Default PATA Hard Drive woes

On 11/03/10 17:01, Keith Roberts wrote:
>
> There were no sectors remapped, which is odd as there were
> bad sectors originally on the drive. I ran MemTest86+ out of
> curiousity, and there are 5120 Errors, some at 0.4MB& 0.5
> MB.
>

You should fix that first.

> The BIOS has been playing up, not recognising the Primary
> Master drive. This is the channel the Hitachi disk was on
> when it developed the sector read errors.
>
> Could a bad controller or bad RAM cause Hard Drive sector
> errors?
>

Neither bad RAM or a bad controllor can physically damage a hard drive. A bad controller will not cause reallocated sectors. It can however cause UDMA CRC errors and other weird non-SMART related behaviour.

> The drive is as good as uninstalled, so I may as well send
> it for replacement.
>

Send the output of smartctl -a /dev/yourdisk, that'll give us more factual data than speculation.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 02:49 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org