FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 01-26-2010, 09:09 PM
Jakov Sosic
 
Default Trouble with StorageTek 2530 (SAS) and RDAC

Hi!

I have contacted list almost a half a year ago about this storage. I
still haven't figured out how to set it up... I have 3 nodes connected
to it, and 2 volumes shared across all 3 nodes. I'm using CentOS 5.4.
Here is my multipath.conf:


defaults {
udev_dir /dev
polling_interval 10
selector "round-robin 0"
path_grouping_policy multibus
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout /bin/true
path_checker readsector0
rr_min_io 100
max_fds 8192
rr_weight priorities
failback immediate
no_path_retry fail
}
blacklist {
devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
devnode "^hd[a-z]"
devnode "^sda"
}
multipaths {
multipath {
wwid 3600a0b80003abc5c000011504b52f919
alias sas-qd
}
multipath {
wwid 3600a0b80002fcd1800001a374b52fa1e
alias sas-data
}
}

devices {
device {
vendor "SUN"
product "LCSM100_S"
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
prio_callout "/sbin/mpath_prio_rdac /dev/%n"
features "0"
hardware_handler "1 rdac"
path_grouping_policy group_by_prio
failback immediate
path_checker rdac
rr_weight uniform
no_path_retry 300
rr_min_io 1000
}
}


And here is multipath -ll:
# multipath -ll sas-data
sas-data (3600a0b80002fcd1800001a374b52fa1e) dm-1 SUN,LCSM100_S
[size=2.7T][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=100][enabled]
\_ 1:0:3:1 sde 8:64 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:1 sdc 8:32 [active][ghost]


On that volume, I have set up CLVM, and I have created one logical
clustered volume. If I try to format it with ext3, here is what I finish
with:


Jan 26 23:00:43 node01 kernel: mptbase: ioc1: LogInfo(0x31140000):
Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
1360267648
Jan 26 23:00:43 node01 kernel: device-mapper: multipath: Failing path 8:64.
Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
1360269696
Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
1360527744
Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
1360528768
Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
1360529792
Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:43 node01 multipathd: 8:64: mark as failed
Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
1360530816
Jan 26 23:00:44 node01 multipathd: sas-data: remaining active paths: 1
Jan 26 23:00:44 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:44 node01 multipathd: dm-1: add map (uevent)
Jan 26 23:00:44 node01 kernel: end_request: I/O error, dev sde, sector
1360531840
Jan 26 23:00:44 node01 multipathd: dm-1: devmap already registered
Jan 26 23:00:44 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:44 node01 multipathd: sdd: remove path (uevent)
Jan 26 23:00:44 node01 kernel: end_request: I/O error, dev sde, sector
1360789888
Jan 26 23:00:44 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:44 node01 kernel: end_request: I/O error, dev sde, sector
1360790912
.
.
.
lot of similar messages
.
.
.

Jan 26 23:00:50 node01 kernel: sd 1:0:1:1: SCSI error: return code =
0x00010000
Jan 26 23:00:50 node01 kernel: end_request: I/O error, dev sde, sector
1358694784
Jan 26 23:00:50 node01 kernel: mptsas: ioc1: removing ssp device,
channel 0, id 4, phy 7
Jan 26 23:00:50 node01 kernel: scsi 1:0:1:0: rdac Dettached
Jan 26 23:00:50 node01 kernel: scsi 1:0:1:1: rdac Dettached
Jan 26 23:00:50 node01 kernel: sd 1:0:0:1: queueing MODE_SELECT command.
Jan 26 23:00:50 node01 kernel: device-mapper: multipath: Using scsi_dh
module scsi_dh_rdac for failover/failback and device management.
Jan 26 23:00:51 node01 kernel: sd 1:0:0:0: rdac Dettached
Jan 26 23:00:51 node01 multipathd: sas-qd: load table [0 204800
multipath 0 1 rdac 1 1 round-robin 0 1 1 8:16 1000]
Jan 26 23:00:51 node01 multipathd: sde: remove path (uevent)
Jan 26 23:00:51 node01 kernel: device-mapper: multipath: Using scsi_dh
module scsi_dh_rdac for failover/failback and device management.
Jan 26 23:00:52 node01 kernel: sd 1:0:0:1: rdac Dettached
Jan 26 23:00:52 node01 multipathd: sas-data: load table [0 5855165440
multipath 0 1 rdac 1 1 round-robin 0 1 1 8:32 1000]
Jan 26 23:00:52 node01 multipathd: dm-0: add map (uevent)
Jan 26 23:00:52 node01 multipathd: dm-0: devmap already registered
Jan 26 23:00:52 node01 multipathd: dm-1: add map (uevent)
Jan 26 23:00:52 node01 multipathd: dm-1: devmap already registered
Jan 26 23:00:52 node01 kernel: device-mapper: multipath: Cannot failover
device because scsi_dh_rdac was not loaded.


Any ideas?


--
| Jakov Sosic | ICQ: 28410271 | PGP: 0x965CAE2D |
================================================== ===============
| start fighting cancer -> http://www.worldcommunitygrid.org/ |

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-27-2010, 01:23 AM
Chandra Seetharaman
 
Default Trouble with StorageTek 2530 (SAS) and RDAC

On Tue, 2010-01-26 at 23:09 +0100, Jakov Sosic wrote:
> Hi!
>
> I have contacted list almost a half a year ago about this storage. I
> still haven't figured out how to set it up... I have 3 nodes connected
> to it, and 2 volumes shared across all 3 nodes. I'm using CentOS 5.4.
> Here is my multipath.conf:
>
>
> defaults {
> udev_dir /dev
> polling_interval 10
> selector "round-robin 0"
> path_grouping_policy multibus
> getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
> prio_callout /bin/true
> path_checker readsector0
> rr_min_io 100
> max_fds 8192
> rr_weight priorities
> failback immediate
> no_path_retry fail
> }
> blacklist {
> devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
> devnode "^hd[a-z]"
> devnode "^sda"
> }
> multipaths {
> multipath {
> wwid 3600a0b80003abc5c000011504b52f919
> alias sas-qd
> }
> multipath {
> wwid 3600a0b80002fcd1800001a374b52fa1e
> alias sas-data
> }
> }
>
> devices {
> device {
> vendor "SUN"
> product "LCSM100_S"
> getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
> prio_callout "/sbin/mpath_prio_rdac /dev/%n"
> features "0"
> hardware_handler "1 rdac"
> path_grouping_policy group_by_prio
> failback immediate
> path_checker rdac
> rr_weight uniform
> no_path_retry 300
> rr_min_io 1000
> }
> }
>
>
> And here is multipath -ll:
> # multipath -ll sas-data
> sas-data (3600a0b80002fcd1800001a374b52fa1e) dm-1 SUN,LCSM100_S
> [size=2.7T][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> \_ round-robin 0 [prio=100][enabled]
> \_ 1:0:3:1 sde 8:64 [active][ready]
> \_ round-robin 0 [prio=0][enabled]
> \_ 1:0:0:1 sdc 8:32 [active][ghost]
>
>
> On that volume, I have set up CLVM, and I have created one logical
> clustered volume. If I try to format it with ext3, here is what I finish
> with:
>
>
> Jan 26 23:00:43 node01 kernel: mptbase: ioc1: LogInfo(0x31140000):
> Originator={PL}, Code={IO Executed}, SubCode(0x0000)
> Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
> 0x00010000
> Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
> 1360267648

Is the message got from the same node as where you got the multipath -ll
o/p from ?

>From these messages it looks like sde is 1:0:1:1, but from the multipath
-ll o/p it looks like it is 1:0:3:1.

> Jan 26 23:00:43 node01 kernel: device-mapper: multipath: Failing path 8:64.
> Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
> 0x00010000

This return code means that the host is returning DID_NO_CONNECT. which
means that the host is not able to connect to the end point.

I would suggest you to go step-by-step.
1. Try to access both the paths of a lun (in all nodes).
one should succeed and other should fail.
2. Try to access the multipath device and see if all is good.
3. Create a LVM on a single node (not clusters) and see if that works.
4. Create a clustered LVM on top of all the Active (non-ghost) sd
devices and see if it works.

When you send the results include o/p "dmsetup table" and "dmsetup ls"


> Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
> 1360269696
> Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
> 0x00010000
> Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
> 1360527744
> Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
> 0x00010000
> Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
> 1360528768
> Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
> 0x00010000
> Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
> 1360529792
> Jan 26 23:00:43 node01 kernel: sd 1:0:1:1: SCSI error: return code =
> 0x00010000
> Jan 26 23:00:43 node01 multipathd: 8:64: mark as failed
> Jan 26 23:00:43 node01 kernel: end_request: I/O error, dev sde, sector
> 1360530816
> Jan 26 23:00:44 node01 multipathd: sas-data: remaining active paths: 1
> Jan 26 23:00:44 node01 kernel: sd 1:0:1:1: SCSI error: return code =
> 0x00010000
> Jan 26 23:00:44 node01 multipathd: dm-1: add map (uevent)
> Jan 26 23:00:44 node01 kernel: end_request: I/O error, dev sde, sector
> 1360531840
> Jan 26 23:00:44 node01 multipathd: dm-1: devmap already registered
> Jan 26 23:00:44 node01 kernel: sd 1:0:1:1: SCSI error: return code =
> 0x00010000
> Jan 26 23:00:44 node01 multipathd: sdd: remove path (uevent)
> Jan 26 23:00:44 node01 kernel: end_request: I/O error, dev sde, sector
> 1360789888
> Jan 26 23:00:44 node01 kernel: sd 1:0:1:1: SCSI error: return code =
> 0x00010000
> Jan 26 23:00:44 node01 kernel: end_request: I/O error, dev sde, sector
> 1360790912
> .
> .
> .
> lot of similar messages
> .
> .
> .
>
> Jan 26 23:00:50 node01 kernel: sd 1:0:1:1: SCSI error: return code =
> 0x00010000
> Jan 26 23:00:50 node01 kernel: end_request: I/O error, dev sde, sector
> 1358694784
> Jan 26 23:00:50 node01 kernel: mptsas: ioc1: removing ssp device,
> channel 0, id 4, phy 7
> Jan 26 23:00:50 node01 kernel: scsi 1:0:1:0: rdac Dettached
> Jan 26 23:00:50 node01 kernel: scsi 1:0:1:1: rdac Dettached
> Jan 26 23:00:50 node01 kernel: sd 1:0:0:1: queueing MODE_SELECT command.
> Jan 26 23:00:50 node01 kernel: device-mapper: multipath: Using scsi_dh
> module scsi_dh_rdac for failover/failback and device management.
> Jan 26 23:00:51 node01 kernel: sd 1:0:0:0: rdac Dettached
> Jan 26 23:00:51 node01 multipathd: sas-qd: load table [0 204800
> multipath 0 1 rdac 1 1 round-robin 0 1 1 8:16 1000]
> Jan 26 23:00:51 node01 multipathd: sde: remove path (uevent)
> Jan 26 23:00:51 node01 kernel: device-mapper: multipath: Using scsi_dh
> module scsi_dh_rdac for failover/failback and device management.
> Jan 26 23:00:52 node01 kernel: sd 1:0:0:1: rdac Dettached
> Jan 26 23:00:52 node01 multipathd: sas-data: load table [0 5855165440
> multipath 0 1 rdac 1 1 round-robin 0 1 1 8:32 1000]
> Jan 26 23:00:52 node01 multipathd: dm-0: add map (uevent)
> Jan 26 23:00:52 node01 multipathd: dm-0: devmap already registered
> Jan 26 23:00:52 node01 multipathd: dm-1: add map (uevent)
> Jan 26 23:00:52 node01 multipathd: dm-1: devmap already registered
> Jan 26 23:00:52 node01 kernel: device-mapper: multipath: Cannot failover
> device because scsi_dh_rdac was not loaded.
>
>
> Any ideas?
>
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-28-2010, 11:24 PM
Jakov Sosic
 
Default Trouble with StorageTek 2530 (SAS) and RDAC

On 01/27/2010 03:23 AM, Chandra Seetharaman wrote:

> This return code means that the host is returning DID_NO_CONNECT. which
> means that the host is not able to connect to the end point.
>
> I would suggest you to go step-by-step.
> 1. Try to access both the paths of a lun (in all nodes).
> one should succeed and other should fail.
> 2. Try to access the multipath device and see if all is good.
> 3. Create a LVM on a single node (not clusters) and see if that works.
> 4. Create a clustered LVM on top of all the Active (non-ghost) sd
> devices and see if it works.
>
> When you send the results include o/p "dmsetup table" and "dmsetup ls"


Thank you! I've solved the multipath problems with new kernel I built
with my device added to scsi_dh_rdac.c! I've added the "SUN"
"LCMS100_S", just as few months back Charlie Brady suggested to me! That
was the solution for the multipath problems.

Now multipath is able to do it's own part. But, after the failover,
secondary path works for just a bit, and then hangs... When I disconnect
active SAS cable from the server, multipath and scsi_dh_rdac do their
thing, but if I have active read/write processes (like copying one file
over on the volume mounted from storage to the exact same partition for
example), everything hangs few seconds after multipath failover.



Very strange behaviour indeed. This is what happens now:

Jan 28 20:26:12 node01 kernel: mptbase: ioc1: LogInfo(0x31140000):
Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Jan 28 20:26:12 node01 kernel: mptbase: ioc1: LogInfo(0x31140000):
Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Jan 28 20:26:12 node01 kernel: sd 1:0:0:1: SCSI error: return code =
0x00010000
Jan 28 20:26:12 node01 kernel: end_request: I/O error, dev sdc, sector
7012168
Jan 28 20:26:12 node01 kernel: device-mapper: multipath: Failing path 8:32.
Jan 28 20:26:12 node01 kernel: sd 1:0:0:1: SCSI error: return code =
0x00010000
Jan 28 20:26:12 node01 kernel: end_request: I/O error, dev sdc, sector
7012424

So, multipath activated... Lots of similar scsi I/O error messages
follow, and in between I see this:

Jan 28 20:26:12 node01 multipathd: dm-1: add map (uevent)
Jan 28 20:26:12 node01 multipathd: dm-1: devmap already registered
Jan 28 20:26:12 node01 multipathd: 8:32: mark as failed
Jan 28 20:26:12 node01 multipathd: sas-data: remaining active paths: 1
Jan 28 20:26:12 node01 multipathd: sdb: remove path (uevent)


and then

Jan 28 20:26:13 node01 kernel: mptbase: ioc1: LogInfo(0x31140000):
Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Jan 28 20:26:13 node01 last message repeated 61 times



Jan 28 20:26:18 node01 multipathd: sas-qd: load table [0 204800
multipath 0 1 rdac 1 1 round-robin 0 1 1 8:80 3000]
Jan 28 20:26:18 node01 multipathd: sdc: remove path (uevent)
Jan 28 20:26:18 node01 multipathd: sas-data: load table [0 3774873600
multipath 0 1 rdac 1 1 round-robin 0 1 1 8:96 1000]
Jan 28 20:26:18 node01 multipathd: sdd: remove path (uevent)
Jan 28 20:26:18 node01 kernel: mptsas: ioc1: removing ssp device,
channel 0, id 1, phy 3
Jan 28 20:26:18 node01 multipathd: sas-os: load table [0 2080291840
multipath 0 1 rdac 1 1 round-robin 0 1 1 8:112 3000]
Jan 28 20:26:18 node01 multipathd: sde: remove path (uevent)
Jan 28 20:26:18 node01 kernel: scsi 1:0:0:0: rdac Dettached
Jan 28 20:26:19 node01 multipathd: sde: spurious uevent, path not in pathvec
Jan 28 20:26:19 node01 kernel: scsi 1:0:0:1: rdac Dettached
Jan 28 20:26:19 node01 multipathd: uevent trigger error
Jan 28 20:26:19 node01 kernel: scsi 1:0:0:2: rdac Dettached
Jan 28 20:26:19 node01 multipathd: dm-0: add map (uevent)
Jan 28 20:26:19 node01 kernel: sd 1:0:3:1: queueing MODE_SELECT command.
Jan 28 20:26:19 node01 multipathd: dm-0: devmap already registered
Jan 28 20:26:19 node01 kernel: device-mapper: multipath: Using scsi_dh
module scsi_dh_rdac for failover/failback and device management.
Jan 28 20:26:19 node01 multipathd: dm-1: add map (uevent)
Jan 28 20:26:19 node01 multipathd: dm-1: devmap already registered
Jan 28 20:26:19 node01 multipathd: dm-2: add map (uevent)
Jan 28 20:26:19 node01 kernel: scsi 1:0:0:1: rejecting I/O to dead device
Jan 28 20:26:19 node01 multipathd: dm-2: devmap already registered
Jan 28 20:26:19 node01 kernel: device-mapper: multipath: Using scsi_dh
module scsi_dh_rdac for failover/failback and device management.
Jan 28 20:26:19 node01 kernel: device-mapper: multipath: Using scsi_dh
module scsi_dh_rdac for failover/failback and device management.
Jan 28 20:26:20 node01 multipathd: 8:96: reinstated
Jan 28 20:27:08 node01 multipathd: dm-1: add map (uevent)
Jan 28 20:27:08 node01 multipathd: dm-1: devmap already registered
Jan 28 20:27:08 node01 kernel: mptbase: ioc1: LogInfo(0x31140000):
Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code =
0x00010000
Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector
29045144
Jan 28 20:27:08 node01 kernel: device-mapper: multipath: Failing path 8:96.
Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code =
0x00010000
Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector
29089224
Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code =
0x00010000
Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector
29090248
Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code =
0x00010000
Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector
29091272
Jan 28 20:27:08 node01 multipathd: 8:96: mark as failed
Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code =
0x00010000
Jan 28 20:27:08 node01 multipathd: sas-data: Entering recovery mode:
max_retries=300
Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector
29092296
Jan 28 20:27:08 node01 multipathd: sas-data: remaining active paths: 0
Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code =
0x00010000
Jan 28 20:27:08 node01 multipathd: sdf: remove path (uevent)
Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector
29093320
Jan 28 20:27:08 node01 multipathd: sas-qd: stop event checker thread
Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code =
0x00010000
Jan 28 20:27:08 node01 multipathd: sdg: remove path (uevent)
Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector
29094344
Jan 28 20:27:08 node01 multipathd: sas-data: map in use
Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code =
0x00010000
Jan 28 20:27:08 node01 multipathd: sas-data: can't flush
Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector
29095368
Jan 28 20:27:08 node01 multipathd: sdh: remove path (uevent)
Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code =
0x00010000
Jan 28 20:27:08 node01 multipathd: sas-os: stop event checker thread
Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector
29096400
Jan 28 20:27:08 node01 multipathd: sdi: remove path (uevent)
Jan 28 20:27:08 node01 kernel: mptbase: ioc1: LogInfo(0x31140000):
Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Jan 28 20:27:08 node01 multipathd: sdi: spurious uevent, path not in pathvec
Jan 28 20:27:08 node01 kernel: mptbase: ioc1: LogInfo(0x31140000):
Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Jan 28 20:27:08 node01 multipathd: uevent trigger error
Jan 28 20:27:08 node01 kernel: mptbase: ioc1: LogInfo(0x31140000):
Originator={PL}, Code={IO Executed}, SubCode(0x0000)
Jan 28 20:27:08 node01 last message repeated 60 times
Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code =
0x00010000
Jan 28 20:27:08 node01 kernel: end_request: I/O error, dev sdg, sector
29097424
Jan 28 20:27:08 node01 kernel: sd 1:0:3:1: SCSI error: return code =
0x00010000


lots of SCSI errors...


Jan 28 20:27:14 node01 kernel: mptsas: ioc1: removing ssp device,
channel 0, id 4, phy 7
Jan 28 20:27:14 node01 kernel: scsi 1:0:3:0: rdac Dettached
Jan 28 20:27:14 node01 kernel: scsi 1:0:3:1: rdac Dettached
Jan 28 20:27:14 node01 kernel: scsi 1:0:3:2: rdac Dettached
Jan 28 20:27:14 node01 kernel: scsi 1:0:3:1: rejecting I/O to dead device
Jan 28 20:28:18 node01 kernel: scsi 1:0:3:1: rejecting I/O to dead device
Jan 28 20:28:18 node01 multipathd: sdg: rdac checker reports path is down
Jan 28 20:29:29 node01 kernel: scsi 1:0:3:1: rejecting I/O to dead device
Jan 28 20:29:29 node01 multipathd: sdg: rdac checker reports path is down
Jan 28 20:30:40 node01 kernel: scsi 1:0:3:1: rejecting I/O to dead device
Jan 28 20:30:40 node01 multipathd: sdg: rdac checker reports path is down


And that's it... all path's lost. Node is still alive, I can access it,
read from it, write to it, but commands like "multipath -ll" just hang
forever... And if I try to restart the server, it hangs too.

I do use CLVM partition, but I'm willing to try going on raw SAS volume,
if you think that would be solution.

And about your suggestions:

1. Try to access both the paths of a lun (in all nodes).
one should succeed and other should fail.
This works OK. No problems noticed.

2. Try to access the multipath device and see if all is good.
This works too, if I don't disconnect one of the two cables

3. Create a LVM on a single node (not clusters) and see if that works.
4. Create a clustered LVM on top of all the Active (non-ghost) sd
devices and see if it works.
3 & 4 I did not try.


Problem is that after I get errors, I loose all the volumes from the
nodes. It is ok to loose one path, but on secondary path, I get
something like

# # # # (failed)(failed)

in multipath -ll output... Also, all other volumes are simply lost,
there are no devices present. It seems to me like the controller itself,
or maybe mptsas driver goes berzerk in the process.


Any ideas?

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 12:57 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org