FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 01-02-2009, 11:00 AM
Bengt Samuelsson
 
Default Corrupt data - RAID sata_sil 3114 chip

Hi,

I need some support for this soft-raid system.

I am running it as RAID5 with 4 samsung spinpoint 500G SATA300 tot 1.3T byte

And it runs in http://sm7jqb.dnsalias.com
I use mdadm sytem in a Debian Linux
CPU 1.2Mhz 1G memory ( my older 433Mhz / 512M dont work at all )

I have 'some courrupt' data. And I don't understand whay and how to fix it.
Mybee slow it down more, but how slow it down?

Any with experents from this cheep way of RAID systems.

Ask for more information and I can get it, logs, setup files and what you want
to know.

--
Bengt Samuelsson
Nydalavägen 30 A
352 48 Växjö

+46(0)703686441

http://sm7jqb.se


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-02-2009, 11:42 AM
Justin Piszcz
 
Default Corrupt data - RAID sata_sil 3114 chip

On Fri, 2 Jan 2009, Bengt Samuelsson wrote:



Hi,

I need some support for this soft-raid system.

I am running it as RAID5 with 4 samsung spinpoint 500G SATA300 tot 1.3T byte

And it runs in http://sm7jqb.dnsalias.com
I use mdadm sytem in a Debian Linux
CPU 1.2Mhz 1G memory ( my older 433Mhz / 512M dont work at all )

I have 'some courrupt' data. And I don't understand whay and how to fix it.
Mybee slow it down more, but how slow it down?

Any with experents from this cheep way of RAID systems.

Ask for more information and I can get it, logs, setup files and what you
want

to know.

--
Bengt Samuelsson
Nydalavägen 30 A
352 48 Växjö

+46(0)703686441

http://sm7jqb.se


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject
of "unsubscribe". Trouble? Contact listmaster@lists.debian.org




If this is an mdadm-related raid (not dmraid) please show all relevant md
info, mdadm -D /dev/md0, I have cc'd linux-raid on this thread for you.


You'll want to read md.txt in /usr/src/linux/Documentation and read on the
check and repair commands.


In addition, have you run memtest86 on your system first to make sure its
not memory related?


Justin.
 
Old 01-02-2009, 08:30 PM
Bernd Schubert
 
Default Corrupt data - RAID sata_sil 3114 chip

Hello Bengt,

sil3114 is known to cause data corruption with some disks. So far I only know
about Seagate, but maybe there issues with newer Samsungs as well?

http://lkml.indiana.edu/hypermail/linux/kernel/0710.2/2035.html

Unfortuntely this issue has been simply ignored by the SATA developers
So if you want to be on the safe side, go an get another controller.

I hope I won't frighten you too much, but it also might be possible one of
your disks has a problem, I have also seen a few broken disks, which don't
return what you write to it...


Cheers,
Bernd


On Fri, Jan 02, 2009 at 07:42:30AM -0500, Justin Piszcz wrote:
>
>
> On Fri, 2 Jan 2009, Bengt Samuelsson wrote:
>
>>
>> Hi,
>>
>> I need some support for this soft-raid system.
>>
>> I am running it as RAID5 with 4 samsung spinpoint 500G SATA300 tot 1.3T byte
>>
>> And it runs in http://sm7jqb.dnsalias.com
>> I use mdadm sytem in a Debian Linux
>> CPU 1.2Mhz 1G memory ( my older 433Mhz / 512M dont work at all )
>>
>> I have 'some courrupt' data. And I don't understand whay and how to fix it.
>> Mybee slow it down more, but how slow it down?
>>
>> Any with experents from this cheep way of RAID systems.
>>
>> Ask for more information and I can get it, logs, setup files and what
>> you want
>> to know.
>>
>> --
>> Bengt Samuelsson
>> Nydalavägen 30 A
>> 352 48 Växjö
>>
>> +46(0)703686441
>>
>> http://sm7jqb.se
>>
>>
>> --
>> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a
>> subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>>
>
> If this is an mdadm-related raid (not dmraid) please show all relevant md
> info, mdadm -D /dev/md0, I have cc'd linux-raid on this thread for you.
>
> You'll want to read md.txt in /usr/src/linux/Documentation and read on
> the check and repair commands.
>
> In addition, have you run memtest86 on your system first to make sure its
> not memory related?
>
> Justin.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-02-2009, 08:47 PM
Twigathy
 
Default Corrupt data - RAID sata_sil 3114 chip

Hi,

I also had problems with the sata_sil driver with more than one
silicon image card in the same machine about a year or two back. Don't
remember the specifics, but basically the cards would occasionally
drop the SATA link. This was with Western Digital drives. With a
Samsung 750GB disk the disk and controller absolutely refused to talk
to each other.

I've since got rid of all but one silicon image card and haven't had
problems since and swapped out cables. Coincidence? No idea.

04:01.0 RAID bus controller: Silicon Image, Inc. SiI 3512
[SATALink/SATARaid] Serial ATA Controller (rev 01)
Currently running kernel 2.6.24-21

Not much fun when disks don't work properly, is it? :-(

T

2009/1/2 Bernd Schubert <bs@q-leap.de>:
> Hello Bengt,
>
> sil3114 is known to cause data corruption with some disks. So far I only know
> about Seagate, but maybe there issues with newer Samsungs as well?
>
> http://lkml.indiana.edu/hypermail/linux/kernel/0710.2/2035.html
>
> Unfortuntely this issue has been simply ignored by the SATA developers
> So if you want to be on the safe side, go an get another controller.
>
> I hope I won't frighten you too much, but it also might be possible one of
> your disks has a problem, I have also seen a few broken disks, which don't
> return what you write to it...
>
>
> Cheers,
> Bernd
>
>
> On Fri, Jan 02, 2009 at 07:42:30AM -0500, Justin Piszcz wrote:
>>
>>
>> On Fri, 2 Jan 2009, Bengt Samuelsson wrote:
>>
>>>
>>> Hi,
>>>
>>> I need some support for this soft-raid system.
>>>
>>> I am running it as RAID5 with 4 samsung spinpoint 500G SATA300 tot 1.3T byte
>>>
>>> And it runs in http://sm7jqb.dnsalias.com
>>> I use mdadm sytem in a Debian Linux
>>> CPU 1.2Mhz 1G memory ( my older 433Mhz / 512M dont work at all )
>>>
>>> I have 'some courrupt' data. And I don't understand whay and how to fix it.
>>> Mybee slow it down more, but how slow it down?
>>>
>>> Any with experents from this cheep way of RAID systems.
>>>
>>> Ask for more information and I can get it, logs, setup files and what
>>> you want
>>> to know.
>>>
>>> --
>>> Bengt Samuelsson
>>> Nydalavägen 30 A
>>> 352 48 Växjö
>>>
>>> +46(0)703686441
>>>
>>> http://sm7jqb.se
>>>
>>>
>>> --
>>> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a
>>> subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
>>>
>>
>> If this is an mdadm-related raid (not dmraid) please show all relevant md
>> info, mdadm -D /dev/md0, I have cc'd linux-raid on this thread for you.
>>
>> You'll want to read md.txt in /usr/src/linux/Documentation and read on
>> the check and repair commands.
>>
>> In addition, have you run memtest86 on your system first to make sure its
>> not memory related?
>>
>> Justin.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
 
Old 01-03-2009, 01:31 AM
Redeeman
 
Default Corrupt data - RAID sata_sil 3114 chip

On Fri, 2009-01-02 at 22:30 +0100, Bernd Schubert wrote:
> Hello Bengt,
>
> sil3114 is known to cause data corruption with some disks. So far I only know
> about Seagate, but maybe there issues with newer Samsungs as well?
>
> http://lkml.indiana.edu/hypermail/linux/kernel/0710.2/2035.html
>
> Unfortuntely this issue has been simply ignored by the SATA developers
> So if you want to be on the safe side, go an get another controller.

Are you sure? is this not the "15" or "slow_down" thing mentioned here:
http://ata.wiki.kernel.org/index.php/Sata_sil ?
>
> I hope I won't frighten you too much, but it also might be possible one of
> your disks has a problem, I have also seen a few broken disks, which don't
> return what you write to it...
>
>
> Cheers,
> Bernd
>
>
> On Fri, Jan 02, 2009 at 07:42:30AM -0500, Justin Piszcz wrote:
> >
> >
> > On Fri, 2 Jan 2009, Bengt Samuelsson wrote:
> >
> >>
> >> Hi,
> >>
> >> I need some support for this soft-raid system.
> >>
> >> I am running it as RAID5 with 4 samsung spinpoint 500G SATA300 tot 1.3T byte
> >>
> >> And it runs in http://sm7jqb.dnsalias.com
> >> I use mdadm sytem in a Debian Linux
> >> CPU 1.2Mhz 1G memory ( my older 433Mhz / 512M dont work at all )
> >>
> >> I have 'some courrupt' data. And I don't understand whay and how to fix it.
> >> Mybee slow it down more, but how slow it down?
> >>
> >> Any with experents from this cheep way of RAID systems.
> >>
> >> Ask for more information and I can get it, logs, setup files and what
> >> you want
> >> to know.
> >>
> >> --
> >> Bengt Samuelsson
> >> Nydalavägen 30 A
> >> 352 48 Växjö
> >>
> >> +46(0)703686441
> >>
> >> http://sm7jqb.se
> >>
> >>
> >> --
> >> To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a
> >> subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
> >>
> >
> > If this is an mdadm-related raid (not dmraid) please show all relevant md
> > info, mdadm -D /dev/md0, I have cc'd linux-raid on this thread for you.
> >
> > You'll want to read md.txt in /usr/src/linux/Documentation and read on
> > the check and repair commands.
> >
> > In addition, have you run memtest86 on your system first to make sure its
> > not memory related?
> >
> > Justin.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-03-2009, 08:06 AM
Bengt Samuelsson
 
Default Corrupt data - RAID sata_sil 3114 chip

Justin Piszcz skrev:



On Fri, 2 Jan 2009, Bengt Samuelsson wrote:



Hi,

I need some support for this soft-raid system.

I am running it as RAID5 with 4 samsung spinpoint 500G SATA300 tot
1.3T byte


And it runs in http://sm7jqb.dnsalias.com
I use mdadm sytem in a Debian Linux
CPU 1.2Mhz 1G memory ( my older 433Mhz / 512M dont work at all )

I have 'some courrupt' data. And I don't understand whay and how to
fix it.

Mybee slow it down more, but how slow it down?

Any with experents from this cheep way of RAID systems.

Ask for more information and I can get it, logs, setup files and what
you want

to know.

--
Bengt Samuelsson
Nydalavägen 30 A
352 48 Växjö

+46(0)703686441

http://sm7jqb.se


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a
subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org




If this is an mdadm-related raid (not dmraid) please show all relevant
md info, mdadm -D /dev/md0, I have cc'd linux-raid on this thread for you.




~# mdadm -D /dev/md0
------------------------------
/dev/md0:
Version : 00.90.03
Creation Time : Fri Sep 12 19:08:22 2008
Raid Level : raid5
Array Size : 1465151616 (1397.28 GiB 1500.32 GB)
Device Size : 488383872 (465.76 GiB 500.11 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Fri Jan 2 16:53:10 2009
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 128K

UUID : 68439662:90431c4a:5e66217b:5a1a585f (local to host
sm7jqb.dnsalias.com)

Events : 0.13406

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
3 8 49 3 active sync /dev/sdd1
------------------------------


>
You'll want to read md.txt in /usr/src/linux/Documentation and read on
the check and repair commands.


In addition, have you run memtest86 on your system first to make sure
its not memory related?


I am vorking on this.


Justin.




--
Bengt Samuelsson
Nydalavägen 30 A
352 48 Växjö

+46(0)703686441

http://sm7jqb.se


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-03-2009, 12:13 PM
Bernd Schubert
 
Default Corrupt data - RAID sata_sil 3114 chip

On Saturday 03 January 2009 03:31:57 Redeeman wrote:
> On Fri, 2009-01-02 at 22:30 +0100, Bernd Schubert wrote:
> > Hello Bengt,
> >
> > sil3114 is known to cause data corruption with some disks. So far I only
> > know about Seagate, but maybe there issues with newer Samsungs as well?
> >
> > http://lkml.indiana.edu/hypermail/linux/kernel/0710.2/2035.html
> >
> > Unfortuntely this issue has been simply ignored by the SATA developers
> > So if you want to be on the safe side, go an get another controller.
>
> Are you sure? is this not the "15" or "slow_down" thing mentioned here:
> http://ata.wiki.kernel.org/index.php/Sata_sil ?
>

According to Jeff Garzik and Tejun Heo 3114 is not affected by the mod15 bug.
The mod15 also help in our case, but probably we are just luckily.

https://kerneltrap.org/mailarchive/linux-kernel/2007/10/11/334985/thread


Cheers,
Bernd

--
Bernd Schubert
Q-Leap Networks GmbH


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-03-2009, 12:39 PM
Alan Cox
 
Default Corrupt data - RAID sata_sil 3114 chip

On Fri, 2 Jan 2009 22:30:07 +0100
Bernd Schubert <bs@q-leap.de> wrote:

> Hello Bengt,
>
> sil3114 is known to cause data corruption with some disks.

News to me. There are a few people with lots of SI and other devices
jammed into the same mainboard who had problems but that doesn't appear
to be an SI problem as far as I can tell.

There are some incompatibilities between certain silicon image chips and
Nvidia chipsets needing BIOS workarounds according to the errata docs.

Alan


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-03-2009, 03:20 PM
Bernd Schubert
 
Default Corrupt data - RAID sata_sil 3114 chip

On Sat, Jan 03, 2009 at 01:39:36PM +0000, Alan Cox wrote:
> On Fri, 2 Jan 2009 22:30:07 +0100
> Bernd Schubert <bs@q-leap.de> wrote:
>
> > Hello Bengt,
> >
> > sil3114 is known to cause data corruption with some disks.
>
> News to me. There are a few people with lots of SI and other devices

No no, you just forgot about it, since you even reviewed the patches

http://lkml.org/lkml/2007/10/11/137

> jammed into the same mainboard who had problems but that doesn't appear
> to be an SI problem as far as I can tell.
>
> There are some incompatibilities between certain silicon image chips and
> Nvidia chipsets needing BIOS workarounds according to the errata docs.

Well, I already posted the the links to the discussion we had in the past.
The corruption issue is easily reproducible on Tyan S2882 with AMD-8111,
SiI 3114 and ST3250820AS disks. This is on a compute cluster, and we run into
the problem, when a few ST3200822AS failed and got replaced by newer 250GB
disks. The 200GB ST3200822AS work perfectly fine, while the 250GB ST3250820AS
disks cause data corrution.

Presently the cluster is empty, so if you want do help me, your help to
properly solve the issue would be highly appreciated (*).


Cheers,
Bernd

PS: The patches I posted work fine on these systems, but they are not upstream
and I really would prefer to find a way in vanilla linux to prevent this
data corruption.

PPS: Its a bit funny with this cluster, since it is located at my university
group and I did and do many calculations on it myself. But presently I work
for the company we bought it from and which is responsible to maintain it...


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 01-03-2009, 05:31 PM
Robert Hancock
 
Default Corrupt data - RAID sata_sil 3114 chip

Bernd Schubert wrote:

On Sat, Jan 03, 2009 at 01:39:36PM +0000, Alan Cox wrote:

On Fri, 2 Jan 2009 22:30:07 +0100
Bernd Schubert <bs@q-leap.de> wrote:


Hello Bengt,

sil3114 is known to cause data corruption with some disks.

News to me. There are a few people with lots of SI and other devices


No no, you just forgot about it, since you even reviewed the patches

http://lkml.org/lkml/2007/10/11/137


And Jeff explained why they were not merged:

http://lkml.org/lkml/2007/10/11/166

All the patch does is try to reduce the speed impact of the workaround.
But as was pointed out, they don't reliably solve the problem the
workaround is trying to fix, and besides, the workaround is already not
applied to SiI3114 at all, as it is apparently not applicable on that
controller (only 3112).





jammed into the same mainboard who had problems but that doesn't appear
to be an SI problem as far as I can tell.

There are some incompatibilities between certain silicon image chips and
Nvidia chipsets needing BIOS workarounds according to the errata docs.


Do you have details of these Alan?



Well, I already posted the the links to the discussion we had in the past.
The corruption issue is easily reproducible on Tyan S2882 with AMD-8111,
SiI 3114 and ST3250820AS disks. This is on a compute cluster, and we run into
the problem, when a few ST3200822AS failed and got replaced by newer 250GB
disks. The 200GB ST3200822AS work perfectly fine, while the 250GB ST3250820AS
disks cause data corrution.

Presently the cluster is empty, so if you want do help me, your help to
properly solve the issue would be highly appreciated (*).



Cheers,
Bernd

PS: The patches I posted work fine on these systems, but they are not upstream
and I really would prefer to find a way in vanilla linux to prevent this

data corruption.


Some people have tried turning on the slow_down option or adding their
drive to the mod15 blacklist and found that problems went away, but that
in no way implies that their setup actually needs this workaround, only
that it slows down the IO enough that the problem no longer shows up.
It's a big hammer that can cover up all kinds of other issues and has
confused a lot of people into thinking the mod15write problem is bigger
than it actually is.




PPS: Its a bit funny with this cluster, since it is located at my university
group and I did and do many calculations on it myself. But presently I work
for the company we bought it from and which is responsible to maintain it...



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 

Thread Tools




All times are GMT. The time now is 11:36 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org