Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Fedora User (http://www.linux-archive.org/fedora-user/)
-   -   Advanced format drives with block errors (http://www.linux-archive.org/fedora-user/485495-advanced-format-drives-block-errors.html)

Terry Barnaby 02-05-2011 06:57 PM

Advanced format drives with block errors
 
Hi,

Just a bit of info. I have some Western Digital Caviar Green (Adv. Format),
WD20EARS drives. These have the "new" 4096 byte physical sector.
One of these drives had a faulty block which the drive had not been able
to automatically relocate.

I tried to force a relocation by overwriting the block with dd:
dd if=/dev/zero of=/dev/sdb count=8 seek=694341800

This failed with a write error and a kernel message:
sd 3:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed

Eventually I tried:
dd if=/dev/zero of=/dev/sdb bs=4096 count=1 seek=86792725

This worked. It makes sense, I guess, as in the first dd it may have tried
to do a single 512 byte block write using a read/modify/write cycle which
would fail as the drive could not read the 4096 byte block in to modify
the 512 bytes contained within.

I wonder what would happen if a program creates a file that ends up spanning
a duff block on one of these drives ? With a 512byte sector drive, the drive
would automatically relocate the sector and no one would notice. What would
happen with a 4096 byte sector drive ?
Will the kernel output 4096byte blocks or multiple 512byte blocks during the
write ? If the latter, and I guess it depends on the program, then the file
write will fail and manual block repair would be needed. This would not
be good ...

Perhaps one thing to watch out when using these 4096 byte sector drives.
--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines

"compdoc" 02-05-2011 07:28 PM

Advanced format drives with block errors
 
> One of these drives had a faulty block which the drive had not been able
to automatically relocate.


Just curious, what's the reallocated sector count for the drive? And how
many bad sectors do you feel comfortable with?

I used to live with drives with a low reallocated sector count, but I
recently noticed a problem: when a drive develops a new bad sector it can
cause one of two things to happen to a server:

1) if the drive is part of a 3ware raid5 array, it is dropped by the
controller, causing a degraded array.

2) if the drive is connected directly to the motherboard's onboard sata
ports, it causes the system to hang for a period of time. Completely
unresponsive.


So now I change the drive if a drive's reallocated sector count is greater
than zero.

By the way, I hear the WD green drives spin down after a period of time with
no access. Is this a problem for you?







--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines

Terry Barnaby 02-06-2011 05:18 PM

Advanced format drives with block errors
 
On 02/05/2011 08:28 PM, compdoc wrote:
>> One of these drives had a faulty block which the drive had not been able
> to automatically relocate.
>
>
> Just curious, what's the reallocated sector count for the drive? And how
> many bad sectors do you feel comfortable with?
>
> I used to live with drives with a low reallocated sector count, but I
> recently noticed a problem: when a drive develops a new bad sector it can
> cause one of two things to happen to a server:
>
> 1) if the drive is part of a 3ware raid5 array, it is dropped by the
> controller, causing a degraded array.
>
> 2) if the drive is connected directly to the motherboard's onboard sata
> ports, it causes the system to hang for a period of time. Completely
> unresponsive.
>
>
> So now I change the drive if a drive's reallocated sector count is greater
> than zero.
>
> By the way, I hear the WD green drives spin down after a period of time with
> no access. Is this a problem for you?
>
>
>
>
>
>
>
The Reallocated_Event_Count is now 0 ...
I suspect that the drive managed to write over the duff sector.
When I wrote over the sector with dd the Current_Pending_Sector
went down to 0 and short tests returned no errors but the
Offline_Uncorrectable and Multi_Zone_Error_Rate when up to 1.
After some time (10 hours ?), the drive has reset the
Offline_Uncorrectable and Multi_Zone_Error_Rate values to 0.

As to how many I feel comfortable with, 0 ...
I haven't had any known hard bad sectors on disks for a number of years
now, but just had a spate with WD20EARS drives. One relatively
new one (3 months) reported 17 bad sectors all in a row. This sounded
like a head crash to me so I asked Western Digital and they replaced
the drive. The new one had
this single bad sector after copying on 1.5T of data and I thought
it was worth fixing this one. The drive is going to be used as
a non powered backup drive for my MythTv video archive and won't
be backing up any important data.

Yes, the WD green drives remove the heads and I think lower spin speed
after a time. They cannot be used in RAID arrays, at least if you want
more than a few megabytes/sec out of them. Been there, done that.

You have to make sure the drives are rated as suitable for RAID to
use them reliably under raid. There are features needed such a TLER
(time limited error recovery). Modern desktop drives have many
optimisation and other features such as spinning faster/slower,
removing heads, and trying for long times to recover data from duff
sectors. The TLER issue will cause the problems you mentioned.
With a RAID drive it should should only try for a short time to
recover a duff sector, if it takes to long the OS will assume the
drive is dead and take it out of the array. The OS is happy for it
to return the fact that the sector is duff as it may be able to work
around this. With a RAID array the OS can re-write the block as it
knows the data from the other drive/drives. The drives own recovery
system will get in the way.

At work we only use RAID rated drives. At home I have used normal drives
in the past.

I did try to use some Green drives in a RAID array, on my home server
to lower power usage and it caused no end of problems. The main one being
performance. The data access speed would
drop from around 90 MBytes/sec to around 5MBytes per second and stay there.
I'm not sure what was happening, but I suspect interaction between the Linux
kernel and the two Green drives as they sped up and down or changed the kernels
block ordering to its own scheme causing interaction problems.
Not sure what, but after reading about TLER and trying to sort out my
performance issues, I will always make sure the drives are rated for RAID
now ...

Terry


--
users mailing list
users@lists.fedoraproject.org
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines


All times are GMT. The time now is 10:02 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.