FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 10-29-2008, 11:50 PM
MHR
 
Default Question re RHEL 5.3

On Wed, Oct 29, 2008 at 5:12 PM, Karanbir Singh <mail-lists@karan.org> wrote:
>
> do you have any bug report numbers for these issues ?
>
No, and from what I saw on the RH bugzilla list of SATA disk related
bugs, none of them seem to be that serious except w.r.t. specific
controllers.

I will go back and dig deeper.

Thanks.

mhr
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 10-29-2008, 11:59 PM
MHR
 
Default Question re RHEL 5.3

On Wed, Oct 29, 2008 at 5:25 PM, Jim Perrin <jperrin@gmail.com> wrote:
>
> The only issue I've ever seen has been with the onboard fakeraid stuff
> more and more vendors seem to be adding. I've been using SATA disks
> with centos since the early 4.x days without issue, so you have me at
> a bit of a loss here. I'd say if anything it's due to controller
> support, and much of that can be chalked up to what hardware vendors
> are pawning off as 'controllers' these days.
>

The one problem I've seen and posted here was w.r.t. smartd error
reports showing 2^32 - 1 errors on one of the disks (probably my
system disk) every few minutes. I thought this was more than just a
bit suspicious, since there are only 4,687,500,000 sectors on a 300GB
disk, and the likelihood of having errors on 4,294,967,295 (~92%) of
them is rather slim unless the whole system is crashing a lot (it's
not). It's a Seagate 300GB, so I ran Seagate's SeaTools on it in
lightweight mode, and no problems were reported, which is good because
the disk is only about a year and a half old and has my CentOS root,
swap, boot and home partitions on it.

I'll dig deeper on this one - sounds fishy to me, too, now....

mhr
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 10-29-2008, 11:59 PM
Ben Mills
 
Default Question re RHEL 5.3

Jim Perrin wrote:
> On Wed, Oct 29, 2008 at 8:01 PM, MHR <mhullrich@gmail.com> wrote:
>> I've heard now from more than one source about problems with CentOS
>> (and RH) at least up through 5.2 w.r.t. SATA drive handling, and I've
>> even reported on this myself in this list before.
>>
>> My question is, do we have any idea if 5.3 has any improvements in this area?
>>
>> One of my cohorts here, who happens to be a Fedora fan, says that
>> these problems are fixed in F9, but I have grave concerns about
>> putting an enterprise lifeline main application on any Fedora release.
>> If 5.3 solves these issues, I'd much rather go with that.
>>
>> Any ideas? Any places I might look to see for myself?
>
> The only issue I've ever seen has been with the onboard fakeraid stuff
> more and more vendors seem to be adding. I've been using SATA disks
> with centos since the early 4.x days without issue, so you have me at
> a bit of a loss here. I'd say if anything it's due to controller
> support, and much of that can be chalked up to what hardware vendors
> are pawning off as 'controllers' these days.

I recently set up a CentOS 5.2 server with RAID 1 (software) and 2 sata
drives. During burn-in I see no problems. I'm using a Supermicro PDSBM
series system board.

Ben
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 10-30-2008, 06:46 AM
Spike Turner
 
Default Question re RHEL 5.3

MHR wrote:

> I've heard now from more than one source about problems
> with CentOS
> (and RH) at least up through 5.2 w.r.t. SATA drive
> handling, and I've
> even reported on this myself in this list before.
>
> My question is, do we have any idea if 5.3 has any
> improvements in this area?
>
> One of my cohorts here, who happens to be a Fedora fan,
> says that
> these problems are fixed in F9, but I have grave concerns
> about
> putting an enterprise lifeline main application on any
> Fedora release.
> If 5.3 solves these issues, I'd much rather go with
> that.
>

You seem to be relying on hearsay and rumor-mongering.
Any bug reports you have filed on the CentOS bug tracker?

> Any ideas? Any places I might look to see for myself?
>

You have to install 5.3 beta and test it yourself. Then
again if your "alleged problems" only surface in a
conversation with your "fedora buddy" you have to have
more than your word for it.

Comprende?

Spike.




_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 10-30-2008, 08:47 AM
"William L. Maltby"
 
Default Question re RHEL 5.3

On Wed, 2008-10-29 at 17:59 -0700, MHR wrote:
> On Wed, Oct 29, 2008 at 5:25 PM, Jim Perrin <jperrin@gmail.com> wrote:
> >
> > The only issue I've ever seen has been with the on-board fakeraid stuff
> > more and more vendors seem to be adding. I've been using SATA disks
> > with centos since the early 4.x days without issue, so you have me at
> > a bit of a loss here. I'd say if anything it's due to controller
> > support, and much of that can be chalked up to what hardware vendors
> > are pawning off as 'controllers' these days.
> >
>
> The one problem I've seen and posted here was w.r.t. smartd error
> reports showing 2^32 - 1 errors on one of the disks (probably my
> system disk) every few minutes. I thought this was more than just a
> bit suspicious, since there are only 4,687,500,000 sectors on a 300GB
> disk, and the likelihood of having errors on 4,294,967,295 (~92%) of
> them is rather slim unless the whole system is crashing a lot (it's
> not). It's a Seagate 300GB, so I ran Seagate's SeaTools on it in
> lightweight mode, and no problems were reported, which is good because
> the disk is only about a year and a half old and has my CentOS root,
> swap, boot and home partitions on it.
>
> I'll dig deeper on this one - sounds fishy to me, too, now....

With my usual jaundiced eye, my first thought is that the fault is not
the obvious one. So I suggest temporarily abandoning "The Usual
Suspects" (TM) - what a *great* movie.

Is it a consistent or sporadic issue? Is the controller an on-board or
after-market? If on-board, is the BIOS the latest? Have you checked
connections power/data cable connections? The number you mention makes
me think of a bad cable (or connections). Any pattern if it's recurring?
Temperature steady in the area? If you had a temporary rise/fall in
temperature it could have exposed weak connections, micro-fractures in
various cables, poor seating of memory, add-in cards, etc.

Any other messages, that might be related, in the log file when it
happens? I'm wondering if some spurious interrupt might be involved.

Have you memtested recently? ISTM that a memory error could "fool" the
system. Re-seated the memory?

How about the kernel version? On the latest kernel, 2.6.18-92.1.13.el5,
I recently got this.

----------------------------------------------------------
Oct 29 07:09:41 centos501 kernel: Uhhuh. NMI received for unknown reason
2c on CPU 0.
Oct 29 07:09:41 centos501 kernel: Do you have a strange power saving
mode enabled?
Oct 29 07:09:41 centos501 kernel: Dazed and confused, but trying to
continue
-----------------------------------------------------------

Never seen before. Only once, so far. I've not yet investigated this. No
recent changes to the system since 5.0 but normal yum updates to current
5.2 status. The case cover is off right now though, so it could be some
EMI (heh, or an EMP from the recent trash on this list) :-)

That's all I can think of ATM but for power from the utility company or
marginal power supply in the unit.

>
> mhr
> <snip sig stuff>

HTH
--
Bill

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 10-30-2008, 12:53 PM
Robert Nichols
 
Default Question re RHEL 5.3

MHR wrote:


The one problem I've seen and posted here was w.r.t. smartd error
reports showing 2^32 - 1 errors on one of the disks (probably my
system disk) every few minutes. I thought this was more than just a
bit suspicious, since there are only 4,687,500,000 sectors on a 300GB
disk, and the likelihood of having errors on 4,294,967,295 (~92%) of
them is rather slim unless the whole system is crashing a lot (it's
not). It's a Seagate 300GB, so I ran Seagate's SeaTools on it in
lightweight mode, and no problems were reported, which is good because
the disk is only about a year and a half old and has my CentOS root,
swap, boot and home partitions on it.


Precisely what error counters are alarming you? If these are the
raw numbers for Raw_Read_Error_Rate, Hardware_ECC_Recovered, and
Seek_Error_Rate, it is normal for Seagate drives. Look at the
normalized values for these attributes. As long as they are not
approaching their failure thresholds, the drive is OK. For further
reassurance you can run the SMART long offline tests ("smartctl -t
long /dev/whatever" -- see smartctl manpage for details) on the
drive.

You need to understand something about modern drives. In the past,
drives achieved the first level of redundancy by recording each bit
in a large enough area to include many magnetic domains. If some
percentage of the domains failed to hold the data (a highly likely
situation), that was OK because the read head would get enough
signal from the rest of the domains so that the bit would be
detected correctly. Fast forward to today. That multi-domain
redundancy is all but gone, having been replaced by more advanced
error correcting codes implemented in hardware. Seagate has elected
to have the raw number for Raw_Read_Error_Rate report each instance
of sectors needing this level of correction and let the normalized
values reflect whether these corrections are occurring at a rate
higher than expected.

A similar situation exists for Seek_Error_Rate. When a drive
performs a seek, there is a trade-off between speed and accuracy.
You can make it more likely that the heads go directly to the right
track by moving them more slowly and allowing more settling time.
Performance can be improved significantly by moving the heads more
abruptly and accepting that some percentage of the time a subsequent
small adjustment will be needed to get to the right track. Again,
it is the normalized value for Seek_Error_Rate that reports whether
these adjustments are becoming necessary more often than expected.


--
Bob Nichols "NOSPAM" is really part of my email address.
Do NOT delete it.

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-01-2008, 12:31 PM
Kai Schaetzl
 
Default Question re RHEL 5.3

Mhr wrote on Wed, 29 Oct 2008 17:59:40 -0700:

> The one problem I've seen and posted here was w.r.t. smartd error
> reports showing 2^32 - 1 errors on one of the disks (probably my
> system disk) every few minutes.

How has this anything to do with "SATA problems/drive handling"? And could
you please use a decent subject next time?

Regarding your problem: Have you done a smartctl selftest since then, did
you go to smartmontools.sf.net since then and read up on smartmon?
This may just be a problem with smartd not being able to handle the error
codes/number of errors from that disk. If you look at smartmontools.sf.net
and read the man you'll see that vendors are quite inconsistent in what
and how they report and a reversal of byte ordering every now and then
seems to be common. Not to mention that ther smartmon shipping with CentOS
naturally doesn't include the latest code.


Kai

--
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com



_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-02-2008, 08:03 AM
MHR
 
Default Question re RHEL 5.3

On Sat, Nov 1, 2008 at 6:31 AM, Kai Schaetzl <maillists@conactive.com> wrote:
> Mhr wrote on Wed, 29 Oct 2008 17:59:40 -0700:
>
>> The one problem I've seen and posted here was w.r.t. smartd error
>> reports showing 2^32 - 1 errors on one of the disks (probably my
>> system disk) every few minutes.
>
> How has this anything to do with "SATA problems/drive handling"?

Possibly because my system drive is a SATA disk? (FTR, the drive does
not appear to be the slightest bit unstable and it runs just fine. In
fact, I recently modified the system so that it now runs on three
SATA-2 drives exclusively. For whatever reason, the WD drives do not
report any errors - see also below.)

> And could you please use a decent subject next time?

When I select the subject, I usually do. This was a reply to a
thread, so I didn't pick the subject. There's no need to be testy....

> Regarding your problem: Have you done a smartctl selftest since then, did
> you go to smartmontools.sf.net since then and read up on smartmon?

Yes and not until now, in that order. The smartctl selftest has the
same problem, IIRC, but the seatools test showed nothing wrong.

> This may just be a problem with smartd not being able to handle the error
> codes/number of errors from that disk. If you look at smartmontools.sf.net
> and read the man you'll see that vendors are quite inconsistent in what
> and how they report and a reversal of byte ordering every now and then
> seems to be common. Not to mention that ther smartmon shipping with CentOS
> naturally doesn't include the latest code.

All good information, thank you. I did not see anything specific to
the issue I am seeing, which is that every half hour, smartd reports
the following:

Nov 2 01:56:11 mhrichter smartd[3121]: Device: /dev/sda, 4294967295
Currently unreadable (pending) sectors
Nov 2 01:56:11 mhrichter smartd[3121]: Device: /dev/sda, 4294967295
Offline uncorrectable sectors

In each case, it also sends a warning email to root, which is kind of
annoying since these do not appear to be legitimate error conditions.

Someone mentioned that this is a recurring problem with Seagate drives
- more info, please?

Thanks.

mhr
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-02-2008, 10:34 AM
"Akemi Yagi"
 
Default Question re RHEL 5.3

On Sun, Nov 2, 2008 at 1:03 AM, MHR <mhullrich@gmail.com> wrote:
>
> Nov 2 01:56:11 mhrichter smartd[3121]: Device: /dev/sda, 4294967295
> Currently unreadable (pending) sectors
> Nov 2 01:56:11 mhrichter smartd[3121]: Device: /dev/sda, 4294967295
> Offline uncorrectable sectors
>
> In each case, it also sends a warning email to root, which is kind of
> annoying since these do not appear to be legitimate error conditions.
>
> Someone mentioned that this is a recurring problem with Seagate drives
> - more info, please?

You might want to check out the following CentOS forum thread. I,
too, had the same problem (see comment #3).

http://www.centos.org/modules/newbb/viewtopic.php?viewmode=flat&topic_id=15880&forum=3 9

Akemi / toracat
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 08:56 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org