FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 03-09-2012, 10:05 AM
Robert Gstoehl
 
Default SATA SB700/SB80 io errors

Hey there,

Our DRBD primary machine expirenced a rather spontanous reboot some time ago.

We were happily starting / stopping kvm virtual machines, syncing a
new drbd resource and
then this happened:

...
Feb 29 06:53:47 node2 kernel: [217385.578661] ata3.00: disabled
Feb 29 06:53:47 node2 kernel: [217385.578703] sd 2:0:0:0: [sda]
Unhandled error code
Feb 29 06:53:47 node2 kernel: [217385.578707] sd 2:0:0:0: [sda]
Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 29 06:53:47 node2 kernel: [217385.578712] sd 2:0:0:0: [sda] CDB:
Read(10): 28 00 19 74 18 00 00 01 38 00
Feb 29 06:53:47 node2 kernel: [217385.661238] sd 2:0:0:0: [sda] Stopping disk
Feb 29 06:53:47 node2 kernel: [217385.661977] sd 2:0:0:0: [sda]
START_STOP FAILED
Feb 29 06:53:47 node2 kernel: [217385.661981] sd 2:0:0:0: [sda]
Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 29 06:53:47 node2 kernel: [217385.662391] ata4.00: disabled
Feb 29 06:53:47 node2 kernel: [217385.668821] sd 3:0:0:0: [sdb] Stopping disk
Feb 29 06:53:47 node2 kernel: [217385.668864] sd 3:0:0:0: [sdb]
START_STOP FAILED
Feb 29 06:53:47 node2 kernel: [217385.668867] sd 3:0:0:0: [sdb]
Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 29 06:53:47 node2 kernel: [217385.669000] ata5.00: disabled
Feb 29 06:53:47 node2 kernel: [217385.686506] md: super_written gets
error=-5, uptodate=0
Feb 29 06:53:47 node2 kernel: [217385.755989] md: super_written gets
error=-5, uptodate=0
Feb 29 06:53:47 node2 kernel: [217385.756202] sd 4:0:0:0: [sdc] Stopping disk
Feb 29 06:53:47 node2 kernel: [217385.756257] sd 4:0:0:0: [sdc]
START_STOP FAILED
Feb 29 06:53:47 node2 kernel: [217385.756260] sd 4:0:0:0: [sdc]
Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 29 06:53:47 node2 kernel: [217385.756779] ata6.00: disabled
Feb 29 06:53:47 node2 kernel: [217385.816675] md: super_written gets
error=-5, uptodate=0
Feb 29 06:53:47 node2 kernel: [217385.900415] RAID5 conf printout:
Feb 29 06:53:47 node2 kernel: [217385.900418] *--- rd:4 wd:0
Feb 29 06:53:47 node2 kernel: [217385.900421] *disk 0, o:0, dev:sda
Feb 29 06:53:47 node2 kernel: [217385.900424] *disk 1, o:0, dev:sdb
Feb 29 06:53:47 node2 kernel: [217385.900426] *disk 2, o:0, dev:sdc
Feb 29 06:53:47 node2 kernel: [217385.900429] *disk 3, o:0, dev:sdd
Feb 29 06:53:47 node2 kernel: [217385.900771] sd 5:0:0:0: [sdd] Stopping disk
Feb 29 06:53:47 node2 kernel: [217385.901157] sd 5:0:0:0: [sdd]
START_STOP FAILED
Feb 29 06:53:47 node2 kernel: [217385.901162] sd 5:0:0:0: [sdd]
Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Feb 29 06:53:47 node2 kernel: [217385.901487] ahci 0000:00:11.0: PCI
INT A disabled
Feb 29 06:53:47 node2 kernel: [217385.902756] pci-stub 0000:00:11.0:
claimed by stub
Feb 29 06:53:47 node2 kernel: [217385.904721] RAID5 conf printout:
Feb 29 06:53:47 node2 kernel: [217385.904727] *--- rd:4 wd:0
Feb 29 06:53:47 node2 kernel: [217385.904732] *disk 1, o:0, dev:sdb
Feb 29 06:53:47 node2 kernel: [217385.904735] *disk 2, o:0, dev:sdc
Feb 29 06:53:47 node2 kernel: [217385.904738] *disk 3, o:0, dev:sdd
Feb 29 06:53:47 node2 kernel: [217385.904752] RAID5 conf printout:
Feb 29 06:53:47 node2 kernel: [217385.904755] *--- rd:4 wd:0
Feb 29 06:53:47 node2 kernel: [217385.904757] *disk 1, o:0, dev:sdb
Feb 29 06:53:47 node2 kernel: [217385.904759] *disk 2, o:0, dev:sdc
Feb 29 06:53:47 node2 kernel: [217385.904762] *disk 3, o:0, dev:sdd
Feb 29 06:53:47 node2 kernel: [217385.916029] RAID5 conf printout:
Feb 29 06:53:47 node2 kernel: [217385.916035] *--- rd:4 wd:0
Feb 29 06:53:47 node2 kernel: [217385.916040] *disk 1, o:0, dev:sdb
Feb 29 06:53:47 node2 kernel: [217385.916042] *disk 2, o:0, dev:sdc
Feb 29 06:53:47 node2 kernel: [217385.916056] RAID5 conf printout:
Feb 29 06:53:47 node2 kernel: [217385.916058] *--- rd:4 wd:0
Feb 29 06:53:47 node2 kernel: [217385.916060] *disk 1, o:0, dev:sdb
Feb 29 06:53:47 node2 kernel: [217385.916062] *disk 2, o:0, dev:sdc
Feb 29 06:53:47 node2 kernel: [217385.932427] RAID5 conf printout:
Feb 29 06:53:47 node2 kernel: [217385.932432] *--- rd:4 wd:0
Feb 29 06:53:47 node2 kernel: [217385.932437] *disk 1, o:0, dev:sdb
Feb 29 06:53:47 node2 kernel: [217385.932450] RAID5 conf printout:
Feb 29 06:53:47 node2 kernel: [217385.932452] *--- rd:4 wd:0
Feb 29 06:53:47 node2 kernel: [217385.932455] *disk 1, o:0, dev:sdb
Feb 29 06:53:47 node2 kernel: [217385.948162] RAID5 conf printout:
Feb 29 06:53:47 node2 kernel: [217385.948168] *--- rd:4 wd:0
Feb 29 06:53:47 node2 kernel: [217385.949817] block drbd0: Barriers
not supported on meta data device - disabling
Feb 29 06:53:47 node2 kernel: [217385.950177] block drbd0: read:
error=-5 s=232535040s
Feb 29 06:53:47 node2 kernel: [217385.950184] block drbd0: Resync aborted.
Feb 29 06:53:47 node2 kernel: [217385.950189] block drbd0: conn(
SyncSource -> Connected ) disk( UpToDate -> Failed )
Feb 29 06:53:47 node2 kernel: [217385.981468] block drbd0: read:
error=-5 s=232536064s
Feb 29 06:53:47 node2 kernel: [217385.981479] block drbd0: read:
error=-5 s=232534016s
Feb 29 06:53:47 node2 kernel: [217385.981648] block drbd0: read:
error=-5 s=232535048s

<snip ~ 600 more lines like this...>

Feb 29 06:53:47 node2 kernel: [217385.985444] block drbd0: p write: error=-5
Feb 29 06:53:47 node2 kernel: [217386.016978] block drbd0: p write: error=-5
Feb 29 06:53:47 node2 kernel: [217386.136316] block drbd0: helper
command: /sbin/drbdadm pri-on-incon-degr minor-0
Feb 29 06:53:47 node2 kernel: [217386.153546] block drbd0: read:
error=-5 s=232539272s

Feb 29 06:53:47 node2 notify-pri-on-incon-degr.sh[25841]: invoked for lv0

Feb 29 06:53:48 node2 kernel: [217386.403458] lost page write due to
I/O error on drbd0
Feb 29 06:53:48 node2 kernel: [217386.471193] lost page write due to
I/O error on drbd0

Feb 29 06:53:48 node2 kernel: [217386.511306] block drbd1: p write: error=-5
Feb 29 06:53:48 node2 kernel: [217386.526164] block drbd1: disk(
UpToDate -> Failed )
Feb 29 06:53:48 node2 kernel: [217386.585614] block drbd1: p write: error=-5
Feb 29 06:53:48 node2 kernel: [217386.624749] block drbd1: disk(
Failed -> Diskless )
Feb 29 06:53:48 node2 kernel: [217386.624764] block drbd1: Notified
peer that my disk is broken.

Feb 29 06:53:48 node2 kernel: [217386.917071] ahci 0000:00:11.0: PCI
INT A -> GSI 19 (level, low) -> IRQ 19
Feb 29 06:53:48 node2 kernel: [217386.917872] ahci 0000:00:11.0: AHCI
0001.0200 32 slots 4 ports 3 Gbps 0xf impl SATA mode
Feb 29 06:53:48 node2 kernel: [217386.917879] ahci 0000:00:11.0:
flags: 64bit ncq sntf ilck pm led clo pmp pio slum part
Feb 29 06:53:48 node2 kernel: [217386.918363] scsi7 : ahci
Feb 29 06:53:48 node2 kernel: [217386.918492] scsi8 : ahci
Feb 29 06:53:48 node2 kernel: [217386.918571] scsi9 : ahci
Feb 29 06:53:48 node2 kernel: [217386.919291] scsi10 : ahci
Feb 29 06:53:48 node2 kernel: [217386.919361] ata7: SATA max UDMA/133
abar m1024@0xfe4ffc00 port 0xfe4ffd00 irq 30
Feb 29 06:53:48 node2 kernel: [217386.919367] ata8: SATA max UDMA/133
abar m1024@0xfe4ffc00 port 0xfe4ffd80 irq 30
Feb 29 06:53:48 node2 kernel: [217386.919372] ata9: SATA max UDMA/133
abar m1024@0xfe4ffc00 port 0xfe4ffe00 irq 30
Feb 29 06:53:48 node2 kernel: [217386.919377] ata10: SATA max UDMA/133
abar m1024@0xfe4ffc00 port 0xfe4ffe80 irq 30
Feb 29 06:53:49 node2 kernel: [217387.404053] ata9: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Feb 29 06:53:49 node2 kernel: [217387.404091] ata7: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Feb 29 06:53:49 node2 kernel: [217387.404116] ata8: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Feb 29 06:53:49 node2 kernel: [217387.404141] ata10: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Feb 29 06:53:49 node2 kernel: [217387.409780] ata9.00: ATA-8: SAMSUNG
HD103SJ, 1AJ10001, max UDMA/133
Feb 29 06:53:49 node2 kernel: [217387.409786] ata9.00: 1953525168
sectors, multi 0: LBA48 NCQ (depth 31/32), AA
Feb 29 06:53:49 node2 kernel: [217387.409823] ata8.00: ATA-8: SAMSUNG
HD103SJ, 1AJ10001, max UDMA/133
Feb 29 06:53:49 node2 kernel: [217387.409828] ata8.00: 1953525168
sectors, multi 0: LBA48 NCQ (depth 31/32), AA
Feb 29 06:53:49 node2 kernel: [217387.410142] ata7.00: ATA-8: SAMSUNG
HD103SJ, 1AJ10001, max UDMA/133
Feb 29 06:53:49 node2 kernel: [217387.410149] ata7.00: 1953525168
sectors, multi 0: LBA48 NCQ (depth 31/32), AA
Feb 29 06:53:49 node2 kernel: [217387.410198] ata10.00: ATA-8: SAMSUNG
HD103SJ, 1AJ10001, max UDMA/133
Feb 29 06:53:49 node2 kernel: [217387.410203] ata10.00: 1953525168
sectors, multi 0: LBA48 NCQ (depth 31/32), AA
Feb 29 06:53:49 node2 kernel: [217387.415580] ata9.00: configured for UDMA/133
Feb 29 06:53:49 node2 kernel: [217387.415615] ata8.00: configured for UDMA/133
Feb 29 06:53:49 node2 kernel: [217387.415939] ata7.00: configured for UDMA/133
Feb 29 06:53:49 node2 kernel: [217387.415980] ata10.00: configured for UDMA/133
Feb 29 06:53:49 node2 kernel: [217387.428686] scsi 7:0:0:0:
Direct-Access * * ATA * * *SAMSUNG HD103SJ *1AJ1 PQ: 0 ANSI: 5
Feb 29 06:53:49 node2 kernel: [217387.429015] sd 7:0:0:0: [sdf]
1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Feb 29 06:53:49 node2 kernel: [217387.429450] scsi 8:0:0:0:
Direct-Access * * ATA * * *SAMSUNG HD103SJ *1AJ1 PQ: 0 ANSI: 5
Feb 29 06:53:49 node2 kernel: [217387.429756] scsi 9:0:0:0:
Direct-Access * * ATA * * *SAMSUNG HD103SJ *1AJ1 PQ: 0 ANSI: 5
Feb 29 06:53:49 node2 kernel: [217387.430666] sd 9:0:0:0: [sdh]
1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Feb 29 06:53:49 node2 kernel: [217387.430741] sd 9:0:0:0: [sdh] Write
Protect is off
Feb 29 06:53:49 node2 kernel: [217387.430774] sd 9:0:0:0: [sdh] Write
cache: disabled, read cache: enabled, doesn't support DPO or FUA
Feb 29 06:53:49 node2 kernel: [217387.430974] *sdh:
Feb 29 06:53:49 node2 kernel: [217387.431199] sd 8:0:0:0: [sdg]
1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Feb 29 06:53:49 node2 kernel: [217387.431278] sd 8:0:0:0: [sdg] Write
Protect is off
Feb 29 06:53:49 node2 kernel: [217387.431313] sd 8:0:0:0: [sdg] Write
cache: disabled, read cache: enabled, doesn't support DPO or FUA
Feb 29 06:53:49 node2 kernel: [217387.436193] *sdg:
Feb 29 06:53:49 node2 kernel: [217387.436382] scsi 10:0:0:0:
Direct-Access * * ATA * * *SAMSUNG HD103SJ *1AJ1 PQ: 0 ANSI: 5
Feb 29 06:53:49 node2 kernel: [217387.436580] sd 10:0:0:0: [sdi]
1953525168 512-byte logical blocks: (1.00 TB/931 GiB)
Feb 29 06:53:49 node2 kernel: [217387.436649] sd 10:0:0:0: [sdi] Write
Protect is off
Feb 29 06:53:49 node2 kernel: [217387.436682] sd 10:0:0:0: [sdi] Write
cache: disabled, read cache: enabled, doesn't support DPO or FUA
Feb 29 06:53:49 node2 kernel: [217387.436888] *sdi:
Feb 29 06:53:49 node2 kernel: [217387.437033] sd 7:0:0:0: [sdf] Write
Protect is off
Feb 29 06:53:49 node2 kernel: [217387.437064] sd 7:0:0:0: [sdf] Write
cache: disabled, read cache: enabled, doesn't support DPO or FUA
Feb 29 06:53:49 node2 kernel: [217387.437212] *sdf: unknown partition table
Feb 29 06:53:49 node2 kernel: [217387.439934] sd 9:0:0:0: [sdh]
Attached SCSI disk
Feb 29 06:53:49 node2 kernel: [217387.445677] *unknown partition table
Feb 29 06:53:49 node2 kernel: [217387.446324] sd 10:0:0:0: [sdi]
Attached SCSI disk
Feb 29 06:53:49 node2 kernel: [217387.451006] *unknown partition table
Feb 29 06:53:49 node2 kernel: [217387.451309] sd 8:0:0:0: [sdg]
Attached SCSI disk
Feb 29 06:53:49 node2 kernel: [217387.451325]
Feb 29 06:53:49 node2 kernel: [217387.452053] sd 7:0:0:0: [sdf]
Attached SCSI disk

<snip drbd does propably the right thing and initiates a reboot>

Feb 29 06:53:50 node2 notify-emergency-reboot.sh[25900]: invoked for lv0

Setup

Both nodes run squeeze stock 2.6.32-5-amd64 kernel.

node2 drbd primary
HP Proliant Micro Server
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA
Controller [AHCI mode] (rev 40)

4 sata disks sd[a-d]
1 vg "data" 2.73TB
1 lv "export" 500GB / /dev/drbd1
1 lv "lv0" 500GB * */ /dev/drbd0

node3 drbd secondary
HP Proliant Micro Server
00:11.0 SATA controller: ATI Technologies Inc SB700/SB800 SATA
Controller [AHCI mode] (rev 40)

4 sata disks sd[a-d]
1 vg "data" 2.73TB
1 lv "export" 500GB / /dev/drbd1
1 lv "lv0" 500GB * */ /dev/drbd0

drbd resources

resource r0 {
device * */dev/drbd1;
disk * * */dev/mapper/data-export;
meta-disk internal;
startup { wfc-timeout 90; }
# net { on-disconnect reconnect; }
disk { on-io-error detach; }
on node2 { address * 10.1.5.2:7789; }
on node3 { address * 10.1.5.3:7789; }
}

resource lv0 {
device * */dev/drbd0;
disk * * */dev/mapper/data-lv0;
meta-disk internal;
startup { wfc-timeout 90; }
# net { on-disconnect reconnect; }
disk { on-io-error detach; }
on node2 { address * 10.1.5.2:7790; }
on node3 { address * 10.1.5.3:7790; }
}

switched gigabit ethernet hooks all this together

I was since able to reproduce the problem on another hp miniserver,
identical to this one but with slower and bigger disks in it - same sata
controller tough.

Other people might be having issues with this sata controller too:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/550559

The machines are deployed at the customers site, run currently mostly stable,
as long as we keep the io load down...

Any help is appreciated to get this sorted out.

Cheers Robert


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: CAMPMgM9ozKVs=K5hH5H7T4JPgH8tNxhSphAYCv-OMFnjWeOZVg@mail.gmail.com">http://lists.debian.org/CAMPMgM9ozKVs=K5hH5H7T4JPgH8tNxhSphAYCv-OMFnjWeOZVg@mail.gmail.com
 
Old 03-09-2012, 09:58 PM
Stan Hoeppner
 
Default SATA SB700/SB80 io errors

On 3/9/2012 5:05 AM, Robert Gstoehl wrote:
> Hey there,
>
> Our DRBD primary machine expirenced a rather spontanous reboot some time ago.
>
> We were happily starting / stopping kvm virtual machines, syncing a
> new drbd resource and
> then this happened:

[snipped logs]

> node2 drbd primary
> HP Proliant Micro Server

> node3 drbd secondary
> HP Proliant Micro Server

/me tries not to laugh

> Samsung Spinpoint drives

/me tries harder not to laugh

You've taken two cheap college student dorm room 'servers', clustered
them, and sold the solution to a commercial client...

> Any help is appreciated to get this sorted out.

I have advice to offer, but you probably won't consider it helpful. And
you'll likely find it offensive/insulting, as it is my intention to
publicly shame/humiliate you in to a proper way of thinking:

Don't sell your commercial clients $300 USD consumer desktops
masquerading as 'servers', and the cheapest consumer drives on the
planet for commercial use. I still have trouble digesting the fact you
did such a thing. There could have been many reasons for you doing so,
and no of them reflect positively on you, your company, or the way you
conduct business.

The only way to "fix" this, technically, is to drop a real SAS/SATA HBA
and 4 enterprise class SATA drives into each existing box. This will
cost ~$2400 USD w/8x1TB drives. The other option is to throw all the
junk hardware out and replace it with a single commercial quality box
with a quad core CPU, 8GB RAM, enterprise HBA or RAID card, and 4
enterprise SATA or SAS drives. The cost is roughly the same as the
"upgrade solution above" using HBA+SATA, a couple hundred more for RAID,
and another couple hundred for 600GB 10k SAS drives. You'll have 1.8TB
total RAID5 or 1.2TB less than the current system, but random disk IOPS
will be doubled vs SATA, which is desirable for VM workloads.

--
Stan


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4F5A8B0A.3070500@hardwarefreak.com">http://lists.debian.org/4F5A8B0A.3070500@hardwarefreak.com
 
Old 03-10-2012, 11:00 AM
Jude DaShiell
 
Default SATA SB700/SB80 io errors

The nvidia sata drivers aren't working with the latest kernel either. I
wrote about that earlier and was told the drivers are present but the
drive isn't detected correctly by the installer. As of now, both
Slackware 13.0 and archlinux do not have problems installing on my sata
drive. I'll probably try fedora and gentoo to round out the accessible
linux distributions to see if they can install too in a while also. I
have more than one sata drive though the others are smaller I can do this
with and drive sleds on this box so changing out drives when available for
me is no problem.



---------------------------------------------------------------- Jude
<jdashiel-at-shellworld-dot-net>
<http://www.shellworld.net/~jdashiel/nj.html>


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: alpine.BSF.2.01.1203100656090.72815@freire1.furyyj beyq.arg">http://lists.debian.org/alpine.BSF.2.01.1203100656090.72815@freire1.furyyj beyq.arg
 

Thread Tools




All times are GMT. The time now is 08:12 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org