Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   CentOS (http://www.linux-archive.org/centos/)
-   -   SATA errors in log (http://www.linux-archive.org/centos/675770-sata-errors-log.html)

Steve Brooks 06-22-2012 11:58 AM

SATA errors in log
 
Hi,

I have a SATA PCIe 6Gbps 4 port controller card made by Startech. The
kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as

Marvell Technology Group Ltd. 88SE9123

I use it to provide extra SATA ports to a raid system.

The HD's are all "WD2003FYYS" and so run at 3Gbps on the 6Gbps controller.

However I am seeing lots of instances of errors like this

-----------------------------------------

Jun 22 03:13:23 viz1 kernel: ata13.00: exception Emask 0x10 SAct 0x4 SErr
0x400000 action 0x6 frozen
Jun 22 03:13:23 viz1 kernel: ata13.00: irq_stat 0x08000000, interface
fatal error
Jun 22 03:13:23 viz1 kernel: ata13: SError: { Handshk }
Jun 22 03:13:23 viz1 kernel: ata13.00: failed command: WRITE FPDMA QUEUED
Jun 22 03:13:23 viz1 kernel: ata13.00: cmd
61/e8:10:98:05:1b/01:00:66:00:00/40 tag 2 ncq 249856 out
Jun 22 03:13:23 viz1 kernel: ata13.00: status: { DRDY }
Jun 22 03:13:23 viz1 kernel: ata13: hard resetting link
Jun 22 03:13:24 viz1 kernel: ata13: SATA link up 3.0 Gbps (SStatus 123
SControl 330)
Jun 22 03:13:24 viz1 kernel: ata13.00: configured for UDMA/133
Jun 22 03:13:24 viz1 kernel: ata13: EH complete

---------------------------------------

Vendor ID : 1b4b
Device ID : 9123

I tried to see what drivers were currently being used but the command
below gave nothing

grep -i 1b4b /lib/modules/*/modules.alias | grep -i 9123

I have changed the card and cables but still get the same errors. I am
wondering if the el6 kernel is using the correct drivers I checked
"elrepo" against the "Vendor:Device ID pairing" and it also came up blank.

Any ideas would be much appreciated.

Regards,

Steve

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Steve Brooks 06-22-2012 12:06 PM

SATA errors in log
 
On Fri, 22 Jun 2012, Reindl Harald wrote:

>
>
> Am 22.06.2012 13:58, schrieb Steve Brooks:
>> I have a SATA PCIe 6Gbps 4 port controller card made by Startech. The
>> kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as
>>
>> Marvell Technology Group Ltd. 88SE9123
>>
>> I use it to provide extra SATA ports to a raid system.
>> The HD's are all "WD2003FYYS" and so run at 3Gbps on the 6Gbps controller.
>> However I am seeing lots of instances of errors like this
>>
>> -----------------------------------------
>>
>> Jun 22 03:13:23 viz1 kernel: ata13.00: exception Emask 0x10 SAct 0x4 SErr
>> 0x400000 action 0x6 frozen
>> Jun 22 03:13:23 viz1 kernel: ata13.00: irq_stat 0x08000000, interface
>> fatal error
>> Jun 22 03:13:23 viz1 kernel: ata13: SError: { Handshk }
>> Jun 22 03:13:23 viz1 kernel: ata13.00: failed command: WRITE FPDMA QUEUED
>> Jun 22 03:13:23 viz1 kernel: ata13.00: cmd
>> 61/e8:10:98:05:1b/01:00:66:00:00/40 tag 2 ncq 249856 out
>> Jun 22 03:13:23 viz1 kernel: ata13.00: status: { DRDY }
>> Jun 22 03:13:23 viz1 kernel: ata13: hard resetting link
>> Jun 22 03:13:24 viz1 kernel: ata13: SATA link up 3.0 Gbps (SStatus 123
>> SControl 330)
>> Jun 22 03:13:24 viz1 kernel: ata13.00: configured for UDMA/133
>> Jun 22 03:13:24 viz1 kernel: ata13: EH complete
>>
>> ---------------------------------------
>>
>> Vendor ID : 1b4b
>> Device ID : 9123
>>
>> I tried to see what drivers were currently being used but the command
>> below gave nothing
>
>
> why do you care for drivers?
>
> this looks like dying hard-drives are always looking in syslog

Hi Reindl,

I should have mentioned I swapped out the hard-drive and same errors on
new drive. I checked the SMART attributes of the drive and nothing
untoward, also executed the

smartctl -long ....

test wich came back error free.

Steve


--
Dr Stephen Brooks

http://www-solar.mcs.st-and.ac.uk/
Solar MHD Theory Group
Tel :: 01334 463735
Fax :: 01334 463748
E-mail :: steveb@mcs.st-andrews.ac.uk
---------------------------------------
Mathematical Institute
North Haugh
University of St. Andrews
St Andrews, Fife KY16 9SS
SCOTLAND
---------------------------------------

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

06-22-2012 01:57 PM

SATA errors in log
 
Steve Brooks wrote:
>
> I have a SATA PCIe 6Gbps 4 port controller card made by Startech. The
> kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as
>
> Marvell Technology Group Ltd. 88SE9123
>
> I use it to provide extra SATA ports to a raid system.
> The HD's are all "WD2003FYYS" and so run at 3Gbps on the 6Gbps controller.
> However I am seeing lots of instances of errors like this
>
> -----------------------------------------
>
> Jun 22 03:13:23 viz1 kernel: ata13.00: exception Emask 0x10 SAct 0x4 SErr
> 0x400000 action 0x6 frozen
> Jun 22 03:13:23 viz1 kernel: ata13.00: irq_stat 0x08000000, interface
> fatal error
> Jun 22 03:13:23 viz1 kernel: ata13: SError: { Handshk }
> Jun 22 03:13:23 viz1 kernel: ata13.00: failed command: WRITE FPDMA QUEUED
> Jun 22 03:13:23 viz1 kernel: ata13.00: cmd
> 61/e8:10:98:05:1b/01:00:66:00:00/40 tag 2 ncq 249856 out
> Jun 22 03:13:23 viz1 kernel: ata13.00: status: { DRDY }
> Jun 22 03:13:23 viz1 kernel: ata13: hard resetting link
<snip>
Crap. First question: what make & model are the drives on it? If they're
Caviar Green, you're hosed. WD, and *maybe* Seagate as well, disabled a
certain function you used to be able to set on the lower cost,
consumer-grade models (in '09, I believe), and so when a server controller
is trying to do i/o, and has a problem, in server-grade drives, it gives
up after something like 6 sec, and does error handling, I *think* to other
sectors. The consumer ones, on the other hand, keep trying for 1? 2?
*minutes*; the disabled function allowed a used to tell it to give up in a
shorter time. Meanwhile, a hardware controller will, as I said, have fits.

mark "you'd think I just spent months dealing with this...."

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Steve Brooks 06-22-2012 02:10 PM

SATA errors in log
 
On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:

> Steve Brooks wrote:
>>
>> I have a SATA PCIe 6Gbps 4 port controller card made by Startech. The
>> kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as
>>
>> Marvell Technology Group Ltd. 88SE9123
>>
>> I use it to provide extra SATA ports to a raid system.
>> The HD's are all "WD2003FYYS" and so run at 3Gbps on the 6Gbps controller.
>> However I am seeing lots of instances of errors like this
>>
>> -----------------------------------------
>>
>> Jun 22 03:13:23 viz1 kernel: ata13.00: exception Emask 0x10 SAct 0x4 SErr
>> 0x400000 action 0x6 frozen
>> Jun 22 03:13:23 viz1 kernel: ata13.00: irq_stat 0x08000000, interface
>> fatal error
>> Jun 22 03:13:23 viz1 kernel: ata13: SError: { Handshk }
>> Jun 22 03:13:23 viz1 kernel: ata13.00: failed command: WRITE FPDMA QUEUED
>> Jun 22 03:13:23 viz1 kernel: ata13.00: cmd
>> 61/e8:10:98:05:1b/01:00:66:00:00/40 tag 2 ncq 249856 out
>> Jun 22 03:13:23 viz1 kernel: ata13.00: status: { DRDY }
>> Jun 22 03:13:23 viz1 kernel: ata13: hard resetting link
> <snip>
> Crap. First question: what make & model are the drives on it? If they're
> Caviar Green, you're hosed. WD, and *maybe* Seagate as well, disabled a
> certain function you used to be able to set on the lower cost,
> consumer-grade models (in '09, I believe), and so when a server controller
> is trying to do i/o, and has a problem, in server-grade drives, it gives
> up after something like 6 sec, and does error handling, I *think* to other
> sectors. The consumer ones, on the other hand, keep trying for 1? 2?
> *minutes*; the disabled function allowed a used to tell it to give up in a
> shorter time. Meanwhile, a hardware controller will, as I said, have fits.
>
> mark "you'd think I just spent months dealing with this...."
>

As mentioned in the original post the drives are all "WD2003FYYS". I am
convinced it has nothing to do with TLER enabled on the WD drives as we
run hundreds of them using linux mdadm raid on motherboard SATA
controllers with no problems in the last eight or so years. This appears
to be specific to the SATA PCIe 6Gbps 4 port controller card made by
Startech. There are four other HD's (WD2003FYYS) in the machine running on
an onboard "Intel Corporation Patsburg 6-Port SATA AHCI Controller" with
no problems.

Steve

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

06-22-2012 02:25 PM

SATA errors in log
 
Steve Brooks wrote:
> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>> Steve Brooks wrote:
>>>
>>> I have a SATA PCIe 6Gbps 4 port controller card made by Startech. The
>>> kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as
>>>
>>> Marvell Technology Group Ltd. 88SE9123
>>>
>>> I use it to provide extra SATA ports to a raid system.
>>> The HD's are all "WD2003FYYS" and so run at 3Gbps on the 6Gbps
>>> controller. However I am seeing lots of instances of errors like this
>>>
>>> Jun 22 03:13:23 viz1 kernel: ata13.00: exception Emask 0x10 SAct 0x4
>>> SErr
>>> 0x400000 action 0x6 frozen
>>> Jun 22 03:13:23 viz1 kernel: ata13.00: irq_stat 0x08000000, interface
>>> fatal error
>>> Jun 22 03:13:23 viz1 kernel: ata13: SError: { Handshk }
>>> Jun 22 03:13:23 viz1 kernel: ata13.00: failed command: WRITE FPDMA
>>> QUEUED
>>> Jun 22 03:13:23 viz1 kernel: ata13.00: cmd
>>> 61/e8:10:98:05:1b/01:00:66:00:00/40 tag 2 ncq 249856 out
>>> Jun 22 03:13:23 viz1 kernel: ata13.00: status: { DRDY }
>>> Jun 22 03:13:23 viz1 kernel: ata13: hard resetting link
>> <snip>
>> Crap. First question: what make & model are the drives on it? If they're
>> Caviar Green, you're hosed. WD, and *maybe* Seagate as well, disabled a
>> certain function you used to be able to set on the lower cost,
>> consumer-grade models (in '09, I believe), and so when a server
>> controller is trying to do i/o, and has a problem, in server-grade drives,
>> it gives up after something like 6 sec, and does error handling, I *
>> think* to other sectors. The consumer ones, on the other hand, keep trying
>> for 1? 2? *minutes*; the disabled function allowed a used to tell it to
>> give up in a shorter time. Meanwhile, a hardware controller will, as I
said,
>> have fits.
>>
>> mark "you'd think I just spent months dealing with this...."
>>
>
> As mentioned in the original post the drives are all "WD2003FYYS". I am

Missed the original post; sorry.

> convinced it has nothing to do with TLER enabled on the WD drives as we

Thanks, that was the acronym I was trying to remember.

> run hundreds of them using linux mdadm raid on motherboard SATA
> controllers with no problems in the last eight or so years. This appears
> to be specific to the SATA PCIe 6Gbps 4 port controller card made by
> Startech. There are four other HD's (WD2003FYYS) in the machine running on
> an onboard "Intel Corporation Patsburg 6-Port SATA AHCI Controller" with
> no problems.

I also see those are "enterprise" drives, not consumer grade, which
implies that they ought to work. It still looks to me as though it's
timing out, which I'd think is a function of the RAID card. You might see
if it has any firmware configuration options.

mark

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Steve Brooks 06-22-2012 02:32 PM

SATA errors in log
 
On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:

> Steve Brooks wrote:
>> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>>> Steve Brooks wrote:
>>>>
>>>> I have a SATA PCIe 6Gbps 4 port controller card made by Startech. The
>>>> kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as
>>>>
>>>> Marvell Technology Group Ltd. 88SE9123
>>>>
>>>> I use it to provide extra SATA ports to a raid system.
>>>> The HD's are all "WD2003FYYS" and so run at 3Gbps on the 6Gbps
>>>> controller. However I am seeing lots of instances of errors like this
>>>>
>>>> Jun 22 03:13:23 viz1 kernel: ata13.00: exception Emask 0x10 SAct 0x4
>>>> SErr
>>>> 0x400000 action 0x6 frozen
>>>> Jun 22 03:13:23 viz1 kernel: ata13.00: irq_stat 0x08000000, interface
>>>> fatal error
>>>> Jun 22 03:13:23 viz1 kernel: ata13: SError: { Handshk }
>>>> Jun 22 03:13:23 viz1 kernel: ata13.00: failed command: WRITE FPDMA
>>>> QUEUED
>>>> Jun 22 03:13:23 viz1 kernel: ata13.00: cmd
>>>> 61/e8:10:98:05:1b/01:00:66:00:00/40 tag 2 ncq 249856 out
>>>> Jun 22 03:13:23 viz1 kernel: ata13.00: status: { DRDY }
>>>> Jun 22 03:13:23 viz1 kernel: ata13: hard resetting link
>>> <snip>
>>> Crap. First question: what make & model are the drives on it? If they're
>>> Caviar Green, you're hosed. WD, and *maybe* Seagate as well, disabled a
>>> certain function you used to be able to set on the lower cost,
>>> consumer-grade models (in '09, I believe), and so when a server
>>> controller is trying to do i/o, and has a problem, in server-grade drives,
>>> it gives up after something like 6 sec, and does error handling, I *
>>> think* to other sectors. The consumer ones, on the other hand, keep trying
>>> for 1? 2? *minutes*; the disabled function allowed a used to tell it to
>>> give up in a shorter time. Meanwhile, a hardware controller will, as I
> said,
>>> have fits.
>>>
>>> mark "you'd think I just spent months dealing with this...."
>>>
>>
>> As mentioned in the original post the drives are all "WD2003FYYS". I am
>
> Missed the original post; sorry.
>
>> convinced it has nothing to do with TLER enabled on the WD drives as we
>
> Thanks, that was the acronym I was trying to remember.
>
>> run hundreds of them using linux mdadm raid on motherboard SATA
>> controllers with no problems in the last eight or so years. This appears
>> to be specific to the SATA PCIe 6Gbps 4 port controller card made by
>> Startech. There are four other HD's (WD2003FYYS) in the machine running on
>> an onboard "Intel Corporation Patsburg 6-Port SATA AHCI Controller" with
>> no problems.
>
> I also see those are "enterprise" drives, not consumer grade, which
> implies that they ought to work. It still looks to me as though it's
> timing out, which I'd think is a function of the RAID card. You might see
> if it has any firmware configuration options.


Thanks for the reply, the card is purely JBOD no RAID or other
configuration available. It simply posts the SATA devices attached to the
OS. I am wondering if it could be a strange symptom of running SATA3
drives on this particular SATA6 controller but that is just a stab in the
dark.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

06-22-2012 02:46 PM

SATA errors in log
 
Steve Brooks wrote:
> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>
>> Steve Brooks wrote:
>>> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>>>> Steve Brooks wrote:
>>>>>
>>>>> I have a SATA PCIe 6Gbps 4 port controller card made by Startech. The
>>>>> kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as
>>>>>
>>>>> Marvell Technology Group Ltd. 88SE9123
>>>>>
Is this your card?

<http://www.startech.com/Cards-Adapters/HDD-Controllers/SATA-Cards/4-Port-PCI-Express-SATA-III-6Gbps-Controller-Card-with-eSATA-PCIe-4-Line~PEXSAT34>

mark

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Steve Brooks 06-22-2012 02:59 PM

SATA errors in log
 
On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:

> Steve Brooks wrote:
>> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>>
>>> Steve Brooks wrote:
>>>> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>>>>> Steve Brooks wrote:
>>>>>>
>>>>>> I have a SATA PCIe 6Gbps 4 port controller card made by Startech. The
>>>>>> kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as
>>>>>>
>>>>>> Marvell Technology Group Ltd. 88SE9123
>>>>>>
> Is this your card?


Hi Mark,

Yes that is the very card, the page says the chipset is Marvell 88SE9128
but "lspci" shows

Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller

Steve

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

Steve Brooks 06-22-2012 03:12 PM

SATA errors in log
 
On Fri, 22 Jun 2012, Steve Brooks wrote:

> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>
>> Steve Brooks wrote:
>>> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>>>
>>>> Steve Brooks wrote:
>>>>> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>>>>>> Steve Brooks wrote:
>>>>>>>
>>>>>>> I have a SATA PCIe 6Gbps 4 port controller card made by Startech. The
>>>>>>> kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as
>>>>>>>
>>>>>>> Marvell Technology Group Ltd. 88SE9123
>>>>>>>
>> Is this your card?
>
>
> Hi Mark,
>
> Yes that is the very card, the page says the chipset is Marvell 88SE9128
> but "lspci" shows
>
> Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller

It is odd because the kernel reports it as "88SE9123" the web page says it
is "88SE9128" as does the manual supplied with the card. Now the
motherboard already has an onboard Marvell "88SE9128" controller which is
correctly identified by the kernel and works properly so I know the
correct divers are in the kernel but the Startech card does not seem to be
using them.

[root@viz1 ~]# lspci | grep SATA
00:1f.2 SATA controller: Intel Corporation Patsburg 6-Port SATA AHCI Controller (rev 05)
04:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11)
05:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller (rev 11)
0f:00.0 SATA controller: ASMedia Technology Inc. ASM1062 Serial ATA Controller (rev 01)
10:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9128 PCIe SATA
6 Gb/s RAID controller with HyperDuo (rev 11)

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos

06-22-2012 03:33 PM

SATA errors in log
 
Hi, Steve,

Steve Brooks wrote:
> On Fri, 22 Jun 2012, Steve Brooks wrote:
>> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>>> Steve Brooks wrote:
>>>> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>>>>> Steve Brooks wrote:
>>>>>> On Fri, 22 Jun 2012, m.roth@5-cent.us wrote:
>>>>>>> Steve Brooks wrote:
>>>>>>>>
>>>>>>>> I have a SATA PCIe 6Gbps 4 port controller card made by Startech.
>>>>>>>> The kernel (Linux viz1 2.6.32-220.4.1.el6.x86_64) sees it as
>>>>>>>>
>>>>>>>> Marvell Technology Group Ltd. 88SE9123
>>>>>>>>
>>> Is this your card?
>>
>> Yes that is the very card, the page says the chipset is Marvell 88SE9128
>> but "lspci" shows
>>
>> Marvell Technology Group Ltd. 88SE9123 PCIe SATA 6.0 Gb/s controller
>
> It is odd because the kernel reports it as "88SE9123" the web page says it
> is "88SE9128" as does the manual supplied with the card. Now the

Yeah, I noticed that too, and thought it odd.
<snip>
I looked at the "manual", and the only thing that came to mind was to try
going into the BIOS and making sure that it was set to AHCI rather than,
say, IDE, or whatever.

mark

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


All times are GMT. The time now is 04:38 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.