FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 11-03-2010, 06:59 PM
Keith Roberts
 
Default PATA Hard Drive woes

On Wed, 3 Nov 2010, RedShift wrote:

> To: CentOS mailing list <centos@centos.org>
> From: RedShift <redshift@pandora.be>
> Subject: Re: [CentOS] PATA Hard Drive woes
>
> On 11/03/10 17:01, Keith Roberts wrote:
>>
>> There were no sectors remapped, which is odd as there were
>> bad sectors originally on the drive. I ran MemTest86+ out of
>> curiousity, and there are 5120 Errors, some at 0.4MB& 0.5
>> MB.
>>
>
> You should fix that first.

Working on that one now


>> The BIOS has been playing up, not recognising the Primary
>> Master drive. This is the channel the Hitachi disk was on
>> when it developed the sector read errors.
>>
>> Could a bad controller or bad RAM cause Hard Drive sector
>> errors?
>>
>
> Neither bad RAM or a bad controllor can physically damage
> a hard drive. A bad controller will not cause reallocated
> sectors. It can however cause UDMA CRC errors and other
> weird non-SMART related behaviour.
>
>> The drive is as good as uninstalled, so I may as well send
>> it for replacement.
>>
>
> Send the output of smartctl -a /dev/yourdisk, that'll give us more factual data than speculation.

Will do as soon as the memory checks are done, and the
machine is up again.

Keith

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 07:16 PM
Lamar Owen
 
Default PATA Hard Drive woes

On Wednesday, November 03, 2010 02:51:02 pm RedShift wrote:
> On 11/03/10 17:01, Keith Roberts wrote:
> > Could a bad controller or bad RAM cause Hard Drive sector
> > errors?
> >
>
> Neither bad RAM or a bad controllor can physically damage a hard drive. A bad controller will not cause reallocated sectors. It can however cause UDMA CRC errors and other weird non-SMART related behaviour.

Might want to check the power supply as well. Bad/flakey power can indeed case damage to the drive surface; been there, done that, have two Maxtor 250GB drives with scribbled servo data to prove it.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 09:13 PM
Keith Roberts
 
Default PATA Hard Drive woes

On Wed, 3 Nov 2010, Lamar Owen wrote:

> To: CentOS mailing list <centos@centos.org>
> From: Lamar Owen <lowen@pari.edu>
> Subject: Re: [CentOS] PATA Hard Drive woes
>
> On Wednesday, November 03, 2010 02:51:02 pm RedShift wrote:
>> On 11/03/10 17:01, Keith Roberts wrote:
>>> Could a bad controller or bad RAM cause Hard Drive sector
>>> errors?
>>>
>>
>> Neither bad RAM or a bad controllor can physically damage
>> a hard drive. A bad controller will not cause reallocated
>> sectors. It can however cause UDMA CRC errors and other
>> weird non-SMART related behaviour.
>
> Might want to check the power supply as well. Bad/flakey
> power can indeed case damage to the drive surface; been
> there, done that, have two Maxtor 250GB drives with
> scribbled servo data to prove it.

OK.

I'm running the server from an APC UPS Back-UPS 650, so
there should not be any glitches in the power supply, should
there?

Keith

--
In theory, theory and practice are the same;
in practice they are not.

This email was sent from my laptop with Centos 5.5
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 09:18 PM
Keith Roberts
 
Default PATA Hard Drive woes

On Wed, 3 Nov 2010, Warren Young wrote:

> To: CentOS mailing list <centos@centos.org>
> From: Warren Young <warren@etr-usa.com>
> Subject: Re: [CentOS] PATA Hard Drive woes
>
> On 11/3/2010 8:32 AM, Keith Roberts wrote:
>>
>> So to prepare the disk for returning under warranty, I used
>> another HDD utility to clean the disk again
>
> ...
>
>> So I ran an Advanced r/w scan again with Hitachi DFT, and
>> the result was OK.
>
> A complete disk wipe brings bad sectors to the drive's attention,
> forcing it to remap them using spare sectors set aside for the purpose.
>
> All drives can do this, and they do it without logging the change. You
> can't tell, from the outside, when or whether the drive has done this.
> All you can do is infer it, because a sector that once tested bad now
> tests good.
>
> As to why this happened only during a format, not during the previous
> disk test, it's probably because the format zeroed the disk. That
> particular drive may have a policy to only remap sectors on write, so as
> to preserve the sector contents for potential recovery later. (See
> below for one way this can be done.)
>
> It may be that your drive is now fine.
>
> If you put it back into service, at minimum I would set up smartd, from
> the smartmontools package. Maybe run smartctl on it by hand daily or
> weekly, too. If you find that errors start happening again, there is
> something continually degrading the drive's integrity, so the automatic
> sector remapping will eventually run the drive out of spare sectors.
>
> SpinRite (http://spinrite.com/) does nondestructive sector remapping.
> At level 4 and above, it reads each sector in and then writes it back
> out to the drive. Because remapping is silent, it's possible for it to
> appear to do nothing, yet improve data integrity by bringing dodgy
> sectors to the drive's attention.
>
> If a sector can't be read without error, SpinRite forces the drive to
> ignore the CRC and return the data anyway, retrying many times, then
> making a statistical guess about the most likely contents of the sector.
> (Reading a bad sector won't necessarily give the same value each try.)
> Then on writing the reconstructed data back out, the drive
> automatically remaps the sector, repairing it.
>
> You might want to combine the SMART monitoring with periodic SpinRite
> runs on the drive until you regain confidence in it.

Thanks Warren. I've read good reports about SpinRite.

I might shell out some dosh for a copy if it can
non-destructably repair bad sectors. I heard it's worth
running just to keep your HDD's in shape.

Regards,

Keith

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 09:23 PM
John R Pierce
 
Default PATA Hard Drive woes

On 11/03/10 3:13 PM, Keith Roberts wrote:
> I'm running the server from an APC UPS Back-UPS 650, so
> there should not be any glitches in the power supply, should
> there?

thats a simple standby kind of UPS, acts like a 'surge protector' when
the AC is on, and only switches to the battery powered inverter when the
AC is completely off.


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 09:25 PM
Benjamin Franz
 
Default PATA Hard Drive woes

On 11/03/2010 03:13 PM, Keith Roberts wrote:
> On Wed, 3 Nov 2010, Lamar Owen wrote:
>> Might want to check the power supply as well. Bad/flakey
>> power can indeed case damage to the drive surface; been
>> there, done that, have two Maxtor 250GB drives with
>> scribbled servo data to prove it.
> OK.
>
> I'm running the server from an APC UPS Back-UPS 650, so
> there should not be any glitches in the power supply, should
> there?

Lamar was probably talking about the machine's *own* power supply. The
one inside the computer case. When they start to fail they can produce
incorrect DC voltages and then you can get all kinds of weird failures.

--
Benjamin Franz

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 10:00 PM
Warren Young
 
Default PATA Hard Drive woes

On 11/3/2010 4:18 PM, Keith Roberts wrote:
> I might shell out some dosh for a copy if it can
> non-destructably repair bad sectors.

Try fsck -cc first. (Or badblocks -n) These do part of what SR does
already, so if they work, that's all you need. Step up only when you
need something that tries harder.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-03-2010, 10:38 PM
Robert Heller
 
Default PATA Hard Drive woes

At Wed, 3 Nov 2010 22:13:03 +0000 (GMT) CentOS mailing list <centos@centos.org> wrote:

>
> On Wed, 3 Nov 2010, Lamar Owen wrote:
>
> > To: CentOS mailing list <centos@centos.org>
> > From: Lamar Owen <lowen@pari.edu>
> > Subject: Re: [CentOS] PATA Hard Drive woes
> >
> > On Wednesday, November 03, 2010 02:51:02 pm RedShift wrote:
> >> On 11/03/10 17:01, Keith Roberts wrote:
> >>> Could a bad controller or bad RAM cause Hard Drive sector
> >>> errors?
> >>>
> >>
> >> Neither bad RAM or a bad controllor can physically damage
> >> a hard drive. A bad controller will not cause reallocated
> >> sectors. It can however cause UDMA CRC errors and other
> >> weird non-SMART related behaviour.
> >
> > Might want to check the power supply as well. Bad/flakey
> > power can indeed case damage to the drive surface; been
> > there, done that, have two Maxtor 250GB drives with
> > scribbled servo data to prove it.
>
> OK.
>
> I'm running the server from an APC UPS Back-UPS 650, so
> there should not be any glitches in the power supply, should
> there?

Unless the power supply itself is failing.

>
> Keith
>

--
Robert Heller -- 978-544-6933 / heller@deepsoft.com
Deepwoods Software -- http://www.deepsoft.com/
() ascii ribbon campaign -- against html e-mail
/ www.asciiribbon.org -- against proprietary attachments



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-04-2010, 12:48 PM
Lamar Owen
 
Default PATA Hard Drive woes

>From: Keith Roberts <keith@karsites.net>

>On Wed, 3 Nov 2010, Lamar Owen wrote:
>> Might want to check the power supply as well. Bad/flakey
>> power can indeed case damage to the drive surface; been
>> there, done that, have two Maxtor 250GB drives with
>> scribbled servo data to prove it.

>OK.

> I'm running the server from an APC UPS Back-UPS 650, so
> there should not be any glitches in the power supply, should
> there?

Probably not on the AC side, although the Back-UPS 650 isn't a full online UPS but a switching standby UPS (full online, like the APC Symmetra 16KVA units I have here) rectify to DC, float the batteries at all times, and run the output from inverter all of the time (unless they're switched to bypass). The SmartUPS 1400RM I had in front of the PC that suffered the glitchy power is, unless I'm mistaken, also a full online pure sinewave UPS like the Symmetra, and is still in service (I checked its output on my oscilloscope first, though).

No, I was referring to the output DC voltages (+12V, +5V, +3.3V,-5V, and -12V) from the power supply inside the system.

In addition to my own personal RAID1 of 250GB drives, I also, a different time, lost a RAID5 array of 15K 36GB SCSI drives in a Dell 1600SC server; testing the power supply showed lots of noise and complete dropouts of a few milliseconds duration on the drive connectors' 5V supply pins. Completely and thoroughly scrambled the servo data on the Hitachi drives. Meaning they didn't just start showing bad sectors; they started getting seek errors. The 5V line on the drive connectors was reading an AC RMS of 4V superimposed on the +5V, yielding an effective DC voltage of 4V. Happened over a period of three weeks, during which time I had a number of mysterious failures (the Hitachi drives were error-correcting so well that by the time they started reporting errors, it was way past too late, and it became impossible for the Hitachi drives to even power up). I found that the power supply in question, upon investigation, provided the motherboard (where the DC power sensors on tha
t box are) with clean 5V, and the drives were powered from a separate 5V rail, meaning the Dell management system wasn't seeing the power problems.

A simple power supply tester with a built-in meter can be bought for less than $20; a more thorough power analyzer will run more than that. But even the simple one caught the failing Dell 1600SC supply. It took an oscilloscope to test the Antec in my personal box; turned out it was a cold solder joint in the Antec. A new power supply is less expensive than the equivalent labor it took to fix the Antec. I keep a known good 500W ATX 12V server-grade (8 pin 12V plug with adapters, and 24-pin ATX plug with 20-pin adapter) around for testing; that's one of the very first things I check when a PC is brought in that is flaky. (The very first thing is the dust accumulation, and the second thing is the heatsink compound).

One of the first things I do on any CentOS system I put together is install lm_sensors and gkrellm (gkrellm from a third-party repo). I then enable all the motherboard sensors that are available in the gkrellm plugins, and run it (either local GUI or through ssh X forwarding to my central monitoring PC). On supermicro boards I install SuperODoctor for Linux, available on the supermicro site. The GUI runs well (there are some odd dependencies, however) and will e-mail you on alarm conditions that you can set. These include fan RPM, temperatures, and voltages. The CLI program isn't quite so sophisticated, but it can be run periodically and the result sent by e-mail for health checks.

Drives that are having trouble will show up with high iowaits; run iostat (from the sysstat package) and look at the await result. Long awaits mean the drive is having trouble (or it has firmware issues like WD's EARS and EADS drives have in RAID configurations).
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-20-2010, 09:12 PM
Keith Roberts
 
Default PATA Hard Drive woes

Well I'm geting there slowly but surely.

This home-built server machine is using hard drive caddies.

I've taken my working backup drive from the caddy (secondary
master), and replaced it with a small GB test drive.

The problem was originally with the drive connected to the
onboard IDE primary channel being intermittently
autodetected at boot time.

I have now swopped the IDE ribbon cables, so the cable that
was connected to the primary IDE channel is now plugged into
the secondary channel onboard IDE socket, and vice versa for
the secondary ribbon cable.

Now when I reboot the machine the problem of drives not
being detected now appears on the secondary channel, and the
ATA drive and CD/DVD-ROM drive are detected OK on the
primary channel.

I have also replaced the IDE ribbon cable for the channel
that was originally connected as primary.

So it appears the onboard IDE controller is working OK, and
the problem appears to be from the IDE ribbon cable, to one
of the HDD caddies.

Any suggs please?

Kind Regards,

Keith Roberts


--
In theory, theory and practice are the same;
in practice they are not.

This email was sent from my laptop with Centos 5.5
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 07:24 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org