FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 10-24-2008, 11:11 PM
"Moger, Babu"
 
Default i/o error due to all path failure with rdac

Hi,

I am running an online/offline test. I have two paths to the controller. One is active and one is passive. When I fail (offline) the active path (sde 8:64), the Device mapper is failing passive path(sdf 8:80) as well leading to all path failure. Any ideas or hints?

Here is output multipath -ll. I have only one lun.

[root@localhost ~]# multipath -ll
mpathie (3600a0b80000f6a7d0000cff048fed59c) dm-2 LSI,INF-01-00
[size=10G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
\_ round-robin 0 [prio=2][enabled]
\_ 3:0:0:0 sde 8:64 [active][undef]
\_ round-robin 0 [prio=1][enabled]
\_ 3:0:1:0 sdf 8:80 [active][undef]


Here is the detailed log.

Oct 24 16:50:50 localhost multipathd: sdf: rdac prio = 0
Oct 24 16:51:06 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
Oct 24 16:51:06 localhost kernel: end_request: I/O error, dev sde, sector 1047072
Oct 24 16:51:06 localhost kernel: device-mapper: multipath: Failing path 8:64.
Oct 24 16:51:06 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
Oct 24 16:51:06 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
Oct 24 16:51:06 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
Oct 24 16:51:06 localhost multipathd: pg_timeout = NONE (internal default)
Oct 24 16:51:06 localhost multipathd: 8:64: mark as failed
Oct 24 16:51:06 localhost multipathd: uevent 'change' from '/block/dm-2'
Oct 24 16:51:06 localhost multipathd: UDEV_LOG=3
Oct 24 16:51:06 localhost multipathd: ACTION=change
Oct 24 16:51:06 localhost multipathd: DEVPATH=/block/dm-2
Oct 24 16:51:06 localhost multipathd: SUBSYSTEM=block
Oct 24 16:51:06 localhost multipathd: DM_TARGET=multipath
Oct 24 16:51:06 localhost multipathd: DM_ACTION=PATH_FAILED
Oct 24 16:51:06 localhost multipathd: DM_SEQNUM=1
Oct 24 16:51:06 localhost multipathd: DM_PATH=8:64
Oct 24 16:51:06 localhost multipathd: DM_NR_VALID_PATHS=1
Oct 24 16:51:06 localhost multipathd: DM_NAME=mpathie
Oct 24 16:51:06 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
Oct 24 16:51:06 localhost multipathd: MAJOR=253
Oct 24 16:51:06 localhost multipathd: MINOR=2
Oct 24 16:51:06 localhost multipathd: DEVTYPE=disk
Oct 24 16:51:06 localhost multipathd: SEQNUM=1254
Oct 24 16:51:06 localhost multipathd: UDEVD_EVENT=1
Oct 24 16:51:06 localhost multipathd: dm-2: add map (uevent)
Oct 24 16:51:08 localhost kernel: device-mapper: multipath: Failing path 8:80.
Oct 24 16:51:08 localhost multipathd: mpathie: devmap event #3
Oct 24 16:51:08 localhost multipathd: mpathie: discover
Oct 24 16:51:08 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
Oct 24 16:51:08 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
Oct 24 16:51:08 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
Oct 24 16:51:08 localhost multipathd: pg_timeout = NONE (internal default)
Oct 24 16:51:08 localhost multipathd: 8:80: mark as failed
Oct 24 16:51:08 localhost multipathd: mpathie: Entering recovery mode: max_retries=10
Oct 24 16:51:08 localhost multipathd: uevent 'change' from '/block/dm-2'
Oct 24 16:51:08 localhost multipathd: UDEV_LOG=3
Oct 24 16:51:08 localhost multipathd: ACTION=change
Oct 24 16:51:08 localhost multipathd: DEVPATH=/block/dm-2
Oct 24 16:51:08 localhost multipathd: SUBSYSTEM=block
Oct 24 16:51:08 localhost multipathd: DM_TARGET=multipath
Oct 24 16:51:08 localhost multipathd: DM_ACTION=PATH_FAILED
Oct 24 16:51:08 localhost multipathd: DM_SEQNUM=2
Oct 24 16:51:08 localhost multipathd: DM_PATH=8:80
Oct 24 16:51:08 localhost multipathd: DM_NR_VALID_PATHS=0
Oct 24 16:51:08 localhost multipathd: DM_NAME=mpathie
Oct 24 16:51:08 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
Oct 24 16:51:08 localhost multipathd: MAJOR=253
Oct 24 16:51:08 localhost multipathd: MINOR=2
Oct 24 16:51:08 localhost multipathd: DEVTYPE=disk
Oct 24 16:51:08 localhost multipathd: SEQNUM=1255
Oct 24 16:51:08 localhost multipathd: UDEVD_EVENT=1
Oct 24 16:51:08 localhost multipathd: dm-2: add map (uevent)
Oct 24 16:51:36 localhost kernel: rport-3:0-2: blocked FC remote port time out: removing target and saving binding
Oct 24 16:51:36 localhost multipathd: sde: rdac checker reports path is down
Oct 24 16:51:36 localhost multipathd: sde: mask = 0x8
Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Synchronizing SCSI cache
Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
Oct 24 16:51:36 localhost kernel: scsi 3:0:0:0: rdac: Detached
Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_generic/sg5'
Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
Oct 24 16:51:36 localhost multipathd: ACTION=remove
Oct 24 16:51:36 localhost multipathd: DEVPATH=/class/scsi_generic/sg5
Oct 24 16:51:36 localhost multipathd: SUBSYSTEM=scsi_generic
Oct 24 16:51:36 localhost multipathd: MAJOR=21
Oct 24 16:51:36 localhost multipathd: MINOR=5
Oct 24 16:51:36 localhost multipathd: PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:06:00.3/0000:0b:01.0/host3/rport-3:0-2/target3:0:0/3:0:0:0
Oct 24 16:51:36 localhost multipathd: PHYSDEVBUS=scsi
Oct 24 16:51:36 localhost multipathd: PHYSDEVDRIVER=sd
Oct 24 16:51:36 localhost multipathd: SEQNUM=1256
Oct 24 16:51:36 localhost multipathd: UDEVD_EVENT=1
Oct 24 16:51:36 localhost multipathd: DEVNAME=/dev/sg5
Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_device/3:0:0:0'
Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
Oct 24 16:51:36 localhost kernel: device-mapper: multipath: Failing path 8:80.
Oct 24 16:51:36 localhost multipathd: ACTION=remove
Oct 24 16:51:36 localhost UnixSmash4[9200]: 7:UnixSmash has experienced a write failure.

Thanks
Babu Moger


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-30-2008, 04:34 PM
Mike Anderson
 
Default i/o error due to all path failure with rdac

Moger, Babu <Babu.Moger@lsi.com> wrote:
>
> Hi,
>
> I am running an online/offline test. I have two paths to the controller. One is active and one is passive. When I fail (offline) the active path (sde 8:64), the Device mapper is failing passive path(sdf 8:80) as well leading to all path failure. Any ideas or hints?
>

What version of multipath tools and kernel are you running? If this is a
newer kernel I would have expected to see "queueing MODE_SELECT command"
during failover.

> Here is output multipath -ll. I have only one lun.
>
> [root@localhost ~]# multipath -ll
> mpathie (3600a0b80000f6a7d0000cff048fed59c) dm-2 LSI,INF-01-00
> [size=10G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> \_ round-robin 0 [prio=2][enabled]
> \_ 3:0:0:0 sde 8:64 [active][undef]
> \_ round-robin 0 [prio=1][enabled]
> \_ 3:0:1:0 sdf 8:80 [active][undef]
>
>
> Here is the detailed log.
>
> Oct 24 16:50:50 localhost multipathd: sdf: rdac prio = 0
> Oct 24 16:51:06 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
> Oct 24 16:51:06 localhost kernel: end_request: I/O error, dev sde, sector 1047072
> Oct 24 16:51:06 localhost kernel: device-mapper: multipath: Failing path 8:64.
> Oct 24 16:51:06 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> Oct 24 16:51:06 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> Oct 24 16:51:06 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> Oct 24 16:51:06 localhost multipathd: pg_timeout = NONE (internal default)
> Oct 24 16:51:06 localhost multipathd: 8:64: mark as failed
> Oct 24 16:51:06 localhost multipathd: uevent 'change' from '/block/dm-2'
> Oct 24 16:51:06 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:06 localhost multipathd: ACTION=change
> Oct 24 16:51:06 localhost multipathd: DEVPATH=/block/dm-2
> Oct 24 16:51:06 localhost multipathd: SUBSYSTEM=block
> Oct 24 16:51:06 localhost multipathd: DM_TARGET=multipath
> Oct 24 16:51:06 localhost multipathd: DM_ACTION=PATH_FAILED
> Oct 24 16:51:06 localhost multipathd: DM_SEQNUM=1
> Oct 24 16:51:06 localhost multipathd: DM_PATH=8:64
> Oct 24 16:51:06 localhost multipathd: DM_NR_VALID_PATHS=1
> Oct 24 16:51:06 localhost multipathd: DM_NAME=mpathie
> Oct 24 16:51:06 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> Oct 24 16:51:06 localhost multipathd: MAJOR=253
> Oct 24 16:51:06 localhost multipathd: MINOR=2
> Oct 24 16:51:06 localhost multipathd: DEVTYPE=disk
> Oct 24 16:51:06 localhost multipathd: SEQNUM=1254
> Oct 24 16:51:06 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:06 localhost multipathd: dm-2: add map (uevent)
> Oct 24 16:51:08 localhost kernel: device-mapper: multipath: Failing path 8:80.
> Oct 24 16:51:08 localhost multipathd: mpathie: devmap event #3
> Oct 24 16:51:08 localhost multipathd: mpathie: discover
> Oct 24 16:51:08 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> Oct 24 16:51:08 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> Oct 24 16:51:08 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> Oct 24 16:51:08 localhost multipathd: pg_timeout = NONE (internal default)
> Oct 24 16:51:08 localhost multipathd: 8:80: mark as failed
> Oct 24 16:51:08 localhost multipathd: mpathie: Entering recovery mode: max_retries=10
> Oct 24 16:51:08 localhost multipathd: uevent 'change' from '/block/dm-2'
> Oct 24 16:51:08 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:08 localhost multipathd: ACTION=change
> Oct 24 16:51:08 localhost multipathd: DEVPATH=/block/dm-2
> Oct 24 16:51:08 localhost multipathd: SUBSYSTEM=block
> Oct 24 16:51:08 localhost multipathd: DM_TARGET=multipath
> Oct 24 16:51:08 localhost multipathd: DM_ACTION=PATH_FAILED
> Oct 24 16:51:08 localhost multipathd: DM_SEQNUM=2
> Oct 24 16:51:08 localhost multipathd: DM_PATH=8:80
> Oct 24 16:51:08 localhost multipathd: DM_NR_VALID_PATHS=0
> Oct 24 16:51:08 localhost multipathd: DM_NAME=mpathie
> Oct 24 16:51:08 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> Oct 24 16:51:08 localhost multipathd: MAJOR=253
> Oct 24 16:51:08 localhost multipathd: MINOR=2
> Oct 24 16:51:08 localhost multipathd: DEVTYPE=disk
> Oct 24 16:51:08 localhost multipathd: SEQNUM=1255
> Oct 24 16:51:08 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:08 localhost multipathd: dm-2: add map (uevent)
> Oct 24 16:51:36 localhost kernel: rport-3:0-2: blocked FC remote port time out: removing target and saving binding
> Oct 24 16:51:36 localhost multipathd: sde: rdac checker reports path is down
> Oct 24 16:51:36 localhost multipathd: sde: mask = 0x8
> Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Synchronizing SCSI cache
> Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> Oct 24 16:51:36 localhost kernel: scsi 3:0:0:0: rdac: Detached
> Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_generic/sg5'
> Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:36 localhost multipathd: ACTION=remove
> Oct 24 16:51:36 localhost multipathd: DEVPATH=/class/scsi_generic/sg5
> Oct 24 16:51:36 localhost multipathd: SUBSYSTEM=scsi_generic
> Oct 24 16:51:36 localhost multipathd: MAJOR=21
> Oct 24 16:51:36 localhost multipathd: MINOR=5
> Oct 24 16:51:36 localhost multipathd: PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:06:00.3/0000:0b:01.0/host3/rport-3:0-2/target3:0:0/3:0:0:0
> Oct 24 16:51:36 localhost multipathd: PHYSDEVBUS=scsi
> Oct 24 16:51:36 localhost multipathd: PHYSDEVDRIVER=sd
> Oct 24 16:51:36 localhost multipathd: SEQNUM=1256
> Oct 24 16:51:36 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:36 localhost multipathd: DEVNAME=/dev/sg5
> Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_device/3:0:0:0'
> Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:36 localhost kernel: device-mapper: multipath: Failing path 8:80.
> Oct 24 16:51:36 localhost multipathd: ACTION=remove
> Oct 24 16:51:36 localhost UnixSmash4[9200]: 7:UnixSmash has experienced a write failure.
>
> Thanks
> Babu Moger
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-30-2008, 05:56 PM
Chandra Seetharaman
 
Default i/o error due to all path failure with rdac

Hi Babu,

As Mike asked, can you provide the kernel version and the multipath
version.

Can you also provide the var/log/messages file from the start (before
you fail the first path) to the finish of your test.

Also, what kind of I/Os are you running.

BTW, if it is mainline code, can you apply the attached patch and see if
you see any better behavior.

regards,

chandra
On Fri, 2008-10-24 at 17:11 -0600, Moger, Babu wrote:
> Hi,
>
> I am running an online/offline test. I have two paths to the controller. One is active and one is passive. When I fail (offline) the active path (sde 8:64), the Device mapper is failing passive path(sdf 8:80) as well leading to all path failure. Any ideas or hints?
>
> Here is output multipath -ll. I have only one lun.
>
> [root@localhost ~]# multipath -ll
> mpathie (3600a0b80000f6a7d0000cff048fed59c) dm-2 LSI,INF-01-00
> [size=10G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> \_ round-robin 0 [prio=2][enabled]
> \_ 3:0:0:0 sde 8:64 [active][undef]
> \_ round-robin 0 [prio=1][enabled]
> \_ 3:0:1:0 sdf 8:80 [active][undef]
>
>
> Here is the detailed log.
>
> Oct 24 16:50:50 localhost multipathd: sdf: rdac prio = 0
> Oct 24 16:51:06 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
> Oct 24 16:51:06 localhost kernel: end_request: I/O error, dev sde, sector 1047072
> Oct 24 16:51:06 localhost kernel: device-mapper: multipath: Failing path 8:64.
> Oct 24 16:51:06 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> Oct 24 16:51:06 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> Oct 24 16:51:06 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> Oct 24 16:51:06 localhost multipathd: pg_timeout = NONE (internal default)
> Oct 24 16:51:06 localhost multipathd: 8:64: mark as failed
> Oct 24 16:51:06 localhost multipathd: uevent 'change' from '/block/dm-2'
> Oct 24 16:51:06 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:06 localhost multipathd: ACTION=change
> Oct 24 16:51:06 localhost multipathd: DEVPATH=/block/dm-2
> Oct 24 16:51:06 localhost multipathd: SUBSYSTEM=block
> Oct 24 16:51:06 localhost multipathd: DM_TARGET=multipath
> Oct 24 16:51:06 localhost multipathd: DM_ACTION=PATH_FAILED
> Oct 24 16:51:06 localhost multipathd: DM_SEQNUM=1
> Oct 24 16:51:06 localhost multipathd: DM_PATH=8:64
> Oct 24 16:51:06 localhost multipathd: DM_NR_VALID_PATHS=1
> Oct 24 16:51:06 localhost multipathd: DM_NAME=mpathie
> Oct 24 16:51:06 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> Oct 24 16:51:06 localhost multipathd: MAJOR=253
> Oct 24 16:51:06 localhost multipathd: MINOR=2
> Oct 24 16:51:06 localhost multipathd: DEVTYPE=disk
> Oct 24 16:51:06 localhost multipathd: SEQNUM=1254
> Oct 24 16:51:06 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:06 localhost multipathd: dm-2: add map (uevent)
> Oct 24 16:51:08 localhost kernel: device-mapper: multipath: Failing path 8:80.
> Oct 24 16:51:08 localhost multipathd: mpathie: devmap event #3
> Oct 24 16:51:08 localhost multipathd: mpathie: discover
> Oct 24 16:51:08 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> Oct 24 16:51:08 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> Oct 24 16:51:08 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> Oct 24 16:51:08 localhost multipathd: pg_timeout = NONE (internal default)
> Oct 24 16:51:08 localhost multipathd: 8:80: mark as failed
> Oct 24 16:51:08 localhost multipathd: mpathie: Entering recovery mode: max_retries=10
> Oct 24 16:51:08 localhost multipathd: uevent 'change' from '/block/dm-2'
> Oct 24 16:51:08 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:08 localhost multipathd: ACTION=change
> Oct 24 16:51:08 localhost multipathd: DEVPATH=/block/dm-2
> Oct 24 16:51:08 localhost multipathd: SUBSYSTEM=block
> Oct 24 16:51:08 localhost multipathd: DM_TARGET=multipath
> Oct 24 16:51:08 localhost multipathd: DM_ACTION=PATH_FAILED
> Oct 24 16:51:08 localhost multipathd: DM_SEQNUM=2
> Oct 24 16:51:08 localhost multipathd: DM_PATH=8:80
> Oct 24 16:51:08 localhost multipathd: DM_NR_VALID_PATHS=0
> Oct 24 16:51:08 localhost multipathd: DM_NAME=mpathie
> Oct 24 16:51:08 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> Oct 24 16:51:08 localhost multipathd: MAJOR=253
> Oct 24 16:51:08 localhost multipathd: MINOR=2
> Oct 24 16:51:08 localhost multipathd: DEVTYPE=disk
> Oct 24 16:51:08 localhost multipathd: SEQNUM=1255
> Oct 24 16:51:08 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:08 localhost multipathd: dm-2: add map (uevent)
> Oct 24 16:51:36 localhost kernel: rport-3:0-2: blocked FC remote port time out: removing target and saving binding
> Oct 24 16:51:36 localhost multipathd: sde: rdac checker reports path is down
> Oct 24 16:51:36 localhost multipathd: sde: mask = 0x8
> Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Synchronizing SCSI cache
> Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> Oct 24 16:51:36 localhost kernel: scsi 3:0:0:0: rdac: Detached
> Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_generic/sg5'
> Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:36 localhost multipathd: ACTION=remove
> Oct 24 16:51:36 localhost multipathd: DEVPATH=/class/scsi_generic/sg5
> Oct 24 16:51:36 localhost multipathd: SUBSYSTEM=scsi_generic
> Oct 24 16:51:36 localhost multipathd: MAJOR=21
> Oct 24 16:51:36 localhost multipathd: MINOR=5
> Oct 24 16:51:36 localhost multipathd: PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:06:00.3/0000:0b:01.0/host3/rport-3:0-2/target3:0:0/3:0:0:0
> Oct 24 16:51:36 localhost multipathd: PHYSDEVBUS=scsi
> Oct 24 16:51:36 localhost multipathd: PHYSDEVDRIVER=sd
> Oct 24 16:51:36 localhost multipathd: SEQNUM=1256
> Oct 24 16:51:36 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:36 localhost multipathd: DEVNAME=/dev/sg5
> Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_device/3:0:0:0'
> Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:36 localhost kernel: device-mapper: multipath: Failing path 8:80.
> Oct 24 16:51:36 localhost multipathd: ACTION=remove
> Oct 24 16:51:36 localhost UnixSmash4[9200]: 7:UnixSmash has experienced a write failure.
>
> Thanks
> Babu Moger
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

Retry mode select.

Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>

Index: linux-2.6.27/drivers/scsi/device_handler/scsi_dh_rdac.c
================================================== =================
--- linux-2.6.27.orig/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ linux-2.6.27/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -24,6 +24,7 @@
#include <scsi/scsi_dh.h>

#define RDAC_NAME "rdac"
+#define RDAC_RETRY_COUNT 5

/*
* LSI mode page stuff
@@ -476,21 +477,27 @@ static int send_mode_select(struct scsi_
{
struct request *rq;
struct request_queue *q = sdev->request_queue;
- int err = SCSI_DH_RES_TEMP_UNAVAIL;
+ int err, retry_cnt = RDAC_RETRY_COUNT;

+retry:
+ err = SCSI_DH_RES_TEMP_UNAVAIL;
rq = rdac_failover_get(sdev, h);
if (!rq)
goto done;

- sdev_printk(KERN_INFO, sdev, "queueing MODE_SELECT command.
");
+ sdev_printk(KERN_INFO, sdev, "%s MODE_SELECT command.
",
+ (retry_cnt == RDAC_RETRY_COUNT) ? "queueing" : "retrying");

err = blk_execute_rq(q, NULL, rq, 1);
- if (err != SCSI_DH_OK)
+ blk_put_request(rq);
+ if (err != SCSI_DH_OK) {
err = mode_select_handle_sense(sdev, h->sense);
+ if (err == SCSI_DH_RETRY && retry_cnt--)
+ goto retry;
+ }
if (err == SCSI_DH_OK)
h->state = RDAC_STATE_ACTIVE;

- blk_put_request(rq);
done:
return err;
}
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-30-2008, 06:17 PM
"Moger, Babu"
 
Default i/o error due to all path failure with rdac

I am running multipath-tools v0.4.8 (I just pulled from mainstream last week) and kernel version 2.6.27-rc7.

I am not seeing "queueing MODE_SELECT command", because this is online/offline test. When you offline the controller the luns are automatically transferred to alt controller.

Thanks
Babu Moger

-----Original Message-----
From: dm-devel-bounces@redhat.com [mailto:dm-devel-bounces@redhat.com] On Behalf Of Mike Anderson
Sent: Thursday, October 30, 2008 12:35 PM
To: device-mapper development; Chandra Seetharaman
Cc: linux-scsi@vger.kernel.org
Subject: Re: [dm-devel] i/o error due to all path failure with rdac

Moger, Babu <Babu.Moger@lsi.com> wrote:
>
> Hi,
>
> I am running an online/offline test. I have two paths to the controller. One is active and one is passive. When I fail (offline) the active path (sde 8:64), the Device mapper is failing passive path(sdf 8:80) as well leading to all path failure. Any ideas or hints?
>

What version of multipath tools and kernel are you running? If this is a
newer kernel I would have expected to see "queueing MODE_SELECT command"
during failover.

> Here is output multipath -ll. I have only one lun.
>
> [root@localhost ~]# multipath -ll
> mpathie (3600a0b80000f6a7d0000cff048fed59c) dm-2 LSI,INF-01-00
> [size=10G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> \_ round-robin 0 [prio=2][enabled]
> \_ 3:0:0:0 sde 8:64 [active][undef]
> \_ round-robin 0 [prio=1][enabled]
> \_ 3:0:1:0 sdf 8:80 [active][undef]
>
>
> Here is the detailed log.
>
> Oct 24 16:50:50 localhost multipathd: sdf: rdac prio = 0
> Oct 24 16:51:06 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
> Oct 24 16:51:06 localhost kernel: end_request: I/O error, dev sde, sector 1047072
> Oct 24 16:51:06 localhost kernel: device-mapper: multipath: Failing path 8:64.
> Oct 24 16:51:06 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> Oct 24 16:51:06 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> Oct 24 16:51:06 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> Oct 24 16:51:06 localhost multipathd: pg_timeout = NONE (internal default)
> Oct 24 16:51:06 localhost multipathd: 8:64: mark as failed
> Oct 24 16:51:06 localhost multipathd: uevent 'change' from '/block/dm-2'
> Oct 24 16:51:06 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:06 localhost multipathd: ACTION=change
> Oct 24 16:51:06 localhost multipathd: DEVPATH=/block/dm-2
> Oct 24 16:51:06 localhost multipathd: SUBSYSTEM=block
> Oct 24 16:51:06 localhost multipathd: DM_TARGET=multipath
> Oct 24 16:51:06 localhost multipathd: DM_ACTION=PATH_FAILED
> Oct 24 16:51:06 localhost multipathd: DM_SEQNUM=1
> Oct 24 16:51:06 localhost multipathd: DM_PATH=8:64
> Oct 24 16:51:06 localhost multipathd: DM_NR_VALID_PATHS=1
> Oct 24 16:51:06 localhost multipathd: DM_NAME=mpathie
> Oct 24 16:51:06 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> Oct 24 16:51:06 localhost multipathd: MAJOR=253
> Oct 24 16:51:06 localhost multipathd: MINOR=2
> Oct 24 16:51:06 localhost multipathd: DEVTYPE=disk
> Oct 24 16:51:06 localhost multipathd: SEQNUM=1254
> Oct 24 16:51:06 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:06 localhost multipathd: dm-2: add map (uevent)
> Oct 24 16:51:08 localhost kernel: device-mapper: multipath: Failing path 8:80.
> Oct 24 16:51:08 localhost multipathd: mpathie: devmap event #3
> Oct 24 16:51:08 localhost multipathd: mpathie: discover
> Oct 24 16:51:08 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> Oct 24 16:51:08 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> Oct 24 16:51:08 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> Oct 24 16:51:08 localhost multipathd: pg_timeout = NONE (internal default)
> Oct 24 16:51:08 localhost multipathd: 8:80: mark as failed
> Oct 24 16:51:08 localhost multipathd: mpathie: Entering recovery mode: max_retries=10
> Oct 24 16:51:08 localhost multipathd: uevent 'change' from '/block/dm-2'
> Oct 24 16:51:08 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:08 localhost multipathd: ACTION=change
> Oct 24 16:51:08 localhost multipathd: DEVPATH=/block/dm-2
> Oct 24 16:51:08 localhost multipathd: SUBSYSTEM=block
> Oct 24 16:51:08 localhost multipathd: DM_TARGET=multipath
> Oct 24 16:51:08 localhost multipathd: DM_ACTION=PATH_FAILED
> Oct 24 16:51:08 localhost multipathd: DM_SEQNUM=2
> Oct 24 16:51:08 localhost multipathd: DM_PATH=8:80
> Oct 24 16:51:08 localhost multipathd: DM_NR_VALID_PATHS=0
> Oct 24 16:51:08 localhost multipathd: DM_NAME=mpathie
> Oct 24 16:51:08 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> Oct 24 16:51:08 localhost multipathd: MAJOR=253
> Oct 24 16:51:08 localhost multipathd: MINOR=2
> Oct 24 16:51:08 localhost multipathd: DEVTYPE=disk
> Oct 24 16:51:08 localhost multipathd: SEQNUM=1255
> Oct 24 16:51:08 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:08 localhost multipathd: dm-2: add map (uevent)
> Oct 24 16:51:36 localhost kernel: rport-3:0-2: blocked FC remote port time out: removing target and saving binding
> Oct 24 16:51:36 localhost multipathd: sde: rdac checker reports path is down
> Oct 24 16:51:36 localhost multipathd: sde: mask = 0x8
> Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Synchronizing SCSI cache
> Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> Oct 24 16:51:36 localhost kernel: scsi 3:0:0:0: rdac: Detached
> Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_generic/sg5'
> Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:36 localhost multipathd: ACTION=remove
> Oct 24 16:51:36 localhost multipathd: DEVPATH=/class/scsi_generic/sg5
> Oct 24 16:51:36 localhost multipathd: SUBSYSTEM=scsi_generic
> Oct 24 16:51:36 localhost multipathd: MAJOR=21
> Oct 24 16:51:36 localhost multipathd: MINOR=5
> Oct 24 16:51:36 localhost multipathd: PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:06:00.3/0000:0b:01.0/host3/rport-3:0-2/target3:0:0/3:0:0:0
> Oct 24 16:51:36 localhost multipathd: PHYSDEVBUS=scsi
> Oct 24 16:51:36 localhost multipathd: PHYSDEVDRIVER=sd
> Oct 24 16:51:36 localhost multipathd: SEQNUM=1256
> Oct 24 16:51:36 localhost multipathd: UDEVD_EVENT=1
> Oct 24 16:51:36 localhost multipathd: DEVNAME=/dev/sg5
> Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_device/3:0:0:0'
> Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> Oct 24 16:51:36 localhost kernel: device-mapper: multipath: Failing path 8:80.
> Oct 24 16:51:36 localhost multipathd: ACTION=remove
> Oct 24 16:51:36 localhost UnixSmash4[9200]: 7:UnixSmash has experienced a write failure.
>
> Thanks
> Babu Moger
>
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

-andmike
--
Michael Anderson
andmike@linux.vnet.ibm.com

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-30-2008, 07:03 PM
Chandra Seetharaman
 
Default i/o error due to all path failure with rdac

On Thu, 2008-10-30 at 13:17 -0600, Moger, Babu wrote:
> I am running multipath-tools v0.4.8 (I just pulled from mainstream last week) and kernel version 2.6.27-rc7.
>
> I am not seeing "queueing MODE_SELECT command", because this is online/offline test. When you offline
> the controller the luns are automatically transferred to alt controller.

No, moving the luns to the other controller is done by the rdac hardware
handler by way of sending a MODE_SELECT to the controller.

So, we should be seeing a MODE_SELECT to the passive controller.

>
> Thanks
> Babu Moger
>
> -----Original Message-----
> From: dm-devel-bounces@redhat.com [mailto:dm-devel-bounces@redhat.com] On Behalf Of Mike Anderson
> Sent: Thursday, October 30, 2008 12:35 PM
> To: device-mapper development; Chandra Seetharaman
> Cc: linux-scsi@vger.kernel.org
> Subject: Re: [dm-devel] i/o error due to all path failure with rdac
>
> Moger, Babu <Babu.Moger@lsi.com> wrote:
> >
> > Hi,
> >
> > I am running an online/offline test. I have two paths to the controller. One is active and one is passive. When I fail (offline) the active path (sde 8:64), the Device mapper is failing passive path(sdf 8:80) as well leading to all path failure. Any ideas or hints?
> >
>
> What version of multipath tools and kernel are you running? If this is a
> newer kernel I would have expected to see "queueing MODE_SELECT command"
> during failover.
>
> > Here is output multipath -ll. I have only one lun.
> >
> > [root@localhost ~]# multipath -ll
> > mpathie (3600a0b80000f6a7d0000cff048fed59c) dm-2 LSI,INF-01-00
> > [size=10G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> > \_ round-robin 0 [prio=2][enabled]
> > \_ 3:0:0:0 sde 8:64 [active][undef]
> > \_ round-robin 0 [prio=1][enabled]
> > \_ 3:0:1:0 sdf 8:80 [active][undef]
> >
> >
> > Here is the detailed log.
> >
> > Oct 24 16:50:50 localhost multipathd: sdf: rdac prio = 0
> > Oct 24 16:51:06 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
> > Oct 24 16:51:06 localhost kernel: end_request: I/O error, dev sde, sector 1047072
> > Oct 24 16:51:06 localhost kernel: device-mapper: multipath: Failing path 8:64.
> > Oct 24 16:51:06 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> > Oct 24 16:51:06 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> > Oct 24 16:51:06 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> > Oct 24 16:51:06 localhost multipathd: pg_timeout = NONE (internal default)
> > Oct 24 16:51:06 localhost multipathd: 8:64: mark as failed
> > Oct 24 16:51:06 localhost multipathd: uevent 'change' from '/block/dm-2'
> > Oct 24 16:51:06 localhost multipathd: UDEV_LOG=3
> > Oct 24 16:51:06 localhost multipathd: ACTION=change
> > Oct 24 16:51:06 localhost multipathd: DEVPATH=/block/dm-2
> > Oct 24 16:51:06 localhost multipathd: SUBSYSTEM=block
> > Oct 24 16:51:06 localhost multipathd: DM_TARGET=multipath
> > Oct 24 16:51:06 localhost multipathd: DM_ACTION=PATH_FAILED
> > Oct 24 16:51:06 localhost multipathd: DM_SEQNUM=1
> > Oct 24 16:51:06 localhost multipathd: DM_PATH=8:64
> > Oct 24 16:51:06 localhost multipathd: DM_NR_VALID_PATHS=1
> > Oct 24 16:51:06 localhost multipathd: DM_NAME=mpathie
> > Oct 24 16:51:06 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> > Oct 24 16:51:06 localhost multipathd: MAJOR=253
> > Oct 24 16:51:06 localhost multipathd: MINOR=2
> > Oct 24 16:51:06 localhost multipathd: DEVTYPE=disk
> > Oct 24 16:51:06 localhost multipathd: SEQNUM=1254
> > Oct 24 16:51:06 localhost multipathd: UDEVD_EVENT=1
> > Oct 24 16:51:06 localhost multipathd: dm-2: add map (uevent)
> > Oct 24 16:51:08 localhost kernel: device-mapper: multipath: Failing path 8:80.
> > Oct 24 16:51:08 localhost multipathd: mpathie: devmap event #3
> > Oct 24 16:51:08 localhost multipathd: mpathie: discover
> > Oct 24 16:51:08 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> > Oct 24 16:51:08 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> > Oct 24 16:51:08 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> > Oct 24 16:51:08 localhost multipathd: pg_timeout = NONE (internal default)
> > Oct 24 16:51:08 localhost multipathd: 8:80: mark as failed
> > Oct 24 16:51:08 localhost multipathd: mpathie: Entering recovery mode: max_retries=10
> > Oct 24 16:51:08 localhost multipathd: uevent 'change' from '/block/dm-2'
> > Oct 24 16:51:08 localhost multipathd: UDEV_LOG=3
> > Oct 24 16:51:08 localhost multipathd: ACTION=change
> > Oct 24 16:51:08 localhost multipathd: DEVPATH=/block/dm-2
> > Oct 24 16:51:08 localhost multipathd: SUBSYSTEM=block
> > Oct 24 16:51:08 localhost multipathd: DM_TARGET=multipath
> > Oct 24 16:51:08 localhost multipathd: DM_ACTION=PATH_FAILED
> > Oct 24 16:51:08 localhost multipathd: DM_SEQNUM=2
> > Oct 24 16:51:08 localhost multipathd: DM_PATH=8:80
> > Oct 24 16:51:08 localhost multipathd: DM_NR_VALID_PATHS=0
> > Oct 24 16:51:08 localhost multipathd: DM_NAME=mpathie
> > Oct 24 16:51:08 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> > Oct 24 16:51:08 localhost multipathd: MAJOR=253
> > Oct 24 16:51:08 localhost multipathd: MINOR=2
> > Oct 24 16:51:08 localhost multipathd: DEVTYPE=disk
> > Oct 24 16:51:08 localhost multipathd: SEQNUM=1255
> > Oct 24 16:51:08 localhost multipathd: UDEVD_EVENT=1
> > Oct 24 16:51:08 localhost multipathd: dm-2: add map (uevent)
> > Oct 24 16:51:36 localhost kernel: rport-3:0-2: blocked FC remote port time out: removing target and saving binding
> > Oct 24 16:51:36 localhost multipathd: sde: rdac checker reports path is down
> > Oct 24 16:51:36 localhost multipathd: sde: mask = 0x8
> > Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Synchronizing SCSI cache
> > Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> > Oct 24 16:51:36 localhost kernel: scsi 3:0:0:0: rdac: Detached
> > Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_generic/sg5'
> > Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> > Oct 24 16:51:36 localhost multipathd: ACTION=remove
> > Oct 24 16:51:36 localhost multipathd: DEVPATH=/class/scsi_generic/sg5
> > Oct 24 16:51:36 localhost multipathd: SUBSYSTEM=scsi_generic
> > Oct 24 16:51:36 localhost multipathd: MAJOR=21
> > Oct 24 16:51:36 localhost multipathd: MINOR=5
> > Oct 24 16:51:36 localhost multipathd: PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:06:00.3/0000:0b:01.0/host3/rport-3:0-2/target3:0:0/3:0:0:0
> > Oct 24 16:51:36 localhost multipathd: PHYSDEVBUS=scsi
> > Oct 24 16:51:36 localhost multipathd: PHYSDEVDRIVER=sd
> > Oct 24 16:51:36 localhost multipathd: SEQNUM=1256
> > Oct 24 16:51:36 localhost multipathd: UDEVD_EVENT=1
> > Oct 24 16:51:36 localhost multipathd: DEVNAME=/dev/sg5
> > Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_device/3:0:0:0'
> > Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> > Oct 24 16:51:36 localhost kernel: device-mapper: multipath: Failing path 8:80.
> > Oct 24 16:51:36 localhost multipathd: ACTION=remove
> > Oct 24 16:51:36 localhost UnixSmash4[9200]: 7:UnixSmash has experienced a write failure.
> >
> > Thanks
> > Babu Moger
> >
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
>
> -andmike
> --
> Michael Anderson
> andmike@linux.vnet.ibm.com
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-30-2008, 07:30 PM
"Moger, Babu"
 
Default i/o error due to all path failure with rdac

This is what happens in my case

When the active path is failed, the dh handler calls rdac_activate to activate the passive path. Then check_ownership is called. As you know check_ownership sends inquiry (page c9). Looking at the response this function sets the lun_state(h->lun_state) to RDAC_LUN_OWNED.

If lun_state is set to RDAC_LUN_OWNED then send_mode_select will not be called. This is what happens in my case.

PS: You are right. In case of link failures we need to transfer the luns by sending the mode select. But, if you offline (or fail the controller) the Luns are automatically transferred to alt controller. This is known behavior.


Thanks
Babu Moger

-----Original Message-----
From: Chandra Seetharaman [mailto:sekharan@us.ibm.com]
Sent: Thursday, October 30, 2008 3:03 PM
To: Moger, Babu
Cc: device-mapper development; linux-scsi@vger.kernel.org
Subject: RE: [dm-devel] i/o error due to all path failure with rdac


On Thu, 2008-10-30 at 13:17 -0600, Moger, Babu wrote:
> I am running multipath-tools v0.4.8 (I just pulled from mainstream last week) and kernel version 2.6.27-rc7.
>
> I am not seeing "queueing MODE_SELECT command", because this is online/offline test. When you offline
> the controller the luns are automatically transferred to alt controller.

No, moving the luns to the other controller is done by the rdac hardware
handler by way of sending a MODE_SELECT to the controller.

So, we should be seeing a MODE_SELECT to the passive controller.

>
> Thanks
> Babu Moger
>
> -----Original Message-----
> From: dm-devel-bounces@redhat.com [mailto:dm-devel-bounces@redhat.com] On Behalf Of Mike Anderson
> Sent: Thursday, October 30, 2008 12:35 PM
> To: device-mapper development; Chandra Seetharaman
> Cc: linux-scsi@vger.kernel.org
> Subject: Re: [dm-devel] i/o error due to all path failure with rdac
>
> Moger, Babu <Babu.Moger@lsi.com> wrote:
> >
> > Hi,
> >
> > I am running an online/offline test. I have two paths to the controller. One is active and one is passive. When I fail (offline) the active path (sde 8:64), the Device mapper is failing passive path(sdf 8:80) as well leading to all path failure. Any ideas or hints?
> >
>
> What version of multipath tools and kernel are you running? If this is a
> newer kernel I would have expected to see "queueing MODE_SELECT command"
> during failover.
>
> > Here is output multipath -ll. I have only one lun.
> >
> > [root@localhost ~]# multipath -ll
> > mpathie (3600a0b80000f6a7d0000cff048fed59c) dm-2 LSI,INF-01-00
> > [size=10G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> > \_ round-robin 0 [prio=2][enabled]
> > \_ 3:0:0:0 sde 8:64 [active][undef]
> > \_ round-robin 0 [prio=1][enabled]
> > \_ 3:0:1:0 sdf 8:80 [active][undef]
> >
> >
> > Here is the detailed log.
> >
> > Oct 24 16:50:50 localhost multipathd: sdf: rdac prio = 0
> > Oct 24 16:51:06 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
> > Oct 24 16:51:06 localhost kernel: end_request: I/O error, dev sde, sector 1047072
> > Oct 24 16:51:06 localhost kernel: device-mapper: multipath: Failing path 8:64.
> > Oct 24 16:51:06 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> > Oct 24 16:51:06 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> > Oct 24 16:51:06 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> > Oct 24 16:51:06 localhost multipathd: pg_timeout = NONE (internal default)
> > Oct 24 16:51:06 localhost multipathd: 8:64: mark as failed
> > Oct 24 16:51:06 localhost multipathd: uevent 'change' from '/block/dm-2'
> > Oct 24 16:51:06 localhost multipathd: UDEV_LOG=3
> > Oct 24 16:51:06 localhost multipathd: ACTION=change
> > Oct 24 16:51:06 localhost multipathd: DEVPATH=/block/dm-2
> > Oct 24 16:51:06 localhost multipathd: SUBSYSTEM=block
> > Oct 24 16:51:06 localhost multipathd: DM_TARGET=multipath
> > Oct 24 16:51:06 localhost multipathd: DM_ACTION=PATH_FAILED
> > Oct 24 16:51:06 localhost multipathd: DM_SEQNUM=1
> > Oct 24 16:51:06 localhost multipathd: DM_PATH=8:64
> > Oct 24 16:51:06 localhost multipathd: DM_NR_VALID_PATHS=1
> > Oct 24 16:51:06 localhost multipathd: DM_NAME=mpathie
> > Oct 24 16:51:06 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> > Oct 24 16:51:06 localhost multipathd: MAJOR=253
> > Oct 24 16:51:06 localhost multipathd: MINOR=2
> > Oct 24 16:51:06 localhost multipathd: DEVTYPE=disk
> > Oct 24 16:51:06 localhost multipathd: SEQNUM=1254
> > Oct 24 16:51:06 localhost multipathd: UDEVD_EVENT=1
> > Oct 24 16:51:06 localhost multipathd: dm-2: add map (uevent)
> > Oct 24 16:51:08 localhost kernel: device-mapper: multipath: Failing path 8:80.
> > Oct 24 16:51:08 localhost multipathd: mpathie: devmap event #3
> > Oct 24 16:51:08 localhost multipathd: mpathie: discover
> > Oct 24 16:51:08 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> > Oct 24 16:51:08 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> > Oct 24 16:51:08 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> > Oct 24 16:51:08 localhost multipathd: pg_timeout = NONE (internal default)
> > Oct 24 16:51:08 localhost multipathd: 8:80: mark as failed
> > Oct 24 16:51:08 localhost multipathd: mpathie: Entering recovery mode: max_retries=10
> > Oct 24 16:51:08 localhost multipathd: uevent 'change' from '/block/dm-2'
> > Oct 24 16:51:08 localhost multipathd: UDEV_LOG=3
> > Oct 24 16:51:08 localhost multipathd: ACTION=change
> > Oct 24 16:51:08 localhost multipathd: DEVPATH=/block/dm-2
> > Oct 24 16:51:08 localhost multipathd: SUBSYSTEM=block
> > Oct 24 16:51:08 localhost multipathd: DM_TARGET=multipath
> > Oct 24 16:51:08 localhost multipathd: DM_ACTION=PATH_FAILED
> > Oct 24 16:51:08 localhost multipathd: DM_SEQNUM=2
> > Oct 24 16:51:08 localhost multipathd: DM_PATH=8:80
> > Oct 24 16:51:08 localhost multipathd: DM_NR_VALID_PATHS=0
> > Oct 24 16:51:08 localhost multipathd: DM_NAME=mpathie
> > Oct 24 16:51:08 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> > Oct 24 16:51:08 localhost multipathd: MAJOR=253
> > Oct 24 16:51:08 localhost multipathd: MINOR=2
> > Oct 24 16:51:08 localhost multipathd: DEVTYPE=disk
> > Oct 24 16:51:08 localhost multipathd: SEQNUM=1255
> > Oct 24 16:51:08 localhost multipathd: UDEVD_EVENT=1
> > Oct 24 16:51:08 localhost multipathd: dm-2: add map (uevent)
> > Oct 24 16:51:36 localhost kernel: rport-3:0-2: blocked FC remote port time out: removing target and saving binding
> > Oct 24 16:51:36 localhost multipathd: sde: rdac checker reports path is down
> > Oct 24 16:51:36 localhost multipathd: sde: mask = 0x8
> > Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Synchronizing SCSI cache
> > Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> > Oct 24 16:51:36 localhost kernel: scsi 3:0:0:0: rdac: Detached
> > Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_generic/sg5'
> > Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> > Oct 24 16:51:36 localhost multipathd: ACTION=remove
> > Oct 24 16:51:36 localhost multipathd: DEVPATH=/class/scsi_generic/sg5
> > Oct 24 16:51:36 localhost multipathd: SUBSYSTEM=scsi_generic
> > Oct 24 16:51:36 localhost multipathd: MAJOR=21
> > Oct 24 16:51:36 localhost multipathd: MINOR=5
> > Oct 24 16:51:36 localhost multipathd: PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:06:00.3/0000:0b:01.0/host3/rport-3:0-2/target3:0:0/3:0:0:0
> > Oct 24 16:51:36 localhost multipathd: PHYSDEVBUS=scsi
> > Oct 24 16:51:36 localhost multipathd: PHYSDEVDRIVER=sd
> > Oct 24 16:51:36 localhost multipathd: SEQNUM=1256
> > Oct 24 16:51:36 localhost multipathd: UDEVD_EVENT=1
> > Oct 24 16:51:36 localhost multipathd: DEVNAME=/dev/sg5
> > Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_device/3:0:0:0'
> > Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> > Oct 24 16:51:36 localhost kernel: device-mapper: multipath: Failing path 8:80.
> > Oct 24 16:51:36 localhost multipathd: ACTION=remove
> > Oct 24 16:51:36 localhost UnixSmash4[9200]: 7:UnixSmash has experienced a write failure.
> >
> > Thanks
> > Babu Moger
> >
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
>
> -andmike
> --
> Michael Anderson
> andmike@linux.vnet.ibm.com
>
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-30-2008, 09:23 PM
Chandra Seetharaman
 
Default i/o error due to all path failure with rdac

On Thu, 2008-10-30 at 14:30 -0600, Moger, Babu wrote:
> This is what happens in my case
>
> When the active path is failed, the dh handler calls rdac_activate to activate the passive path.
> Then check_ownership is called. As you know check_ownership sends inquiry (page c9). Looking at the
> response this function sets the lun_state(h->lun_state) to RDAC_LUN_OWNED.
>
> If lun_state is set to RDAC_LUN_OWNED then send_mode_select will not be called. This is what
> happens in my case.

Ok. Now it is clear. I thought the port was disabled.

The rport failure in the log you sent made me think that it was a port
disable. Why was there a rport failure ?

Can you add these two lines at the top of
scsi_dn_rdac.c:rdac_check_sense() function, retest and send me the log.
------------
sdev_printk(KERN_ERR, sdev, "sense_key:%x; asc %c; ascq %x
",
sense_hdr->sense_key, sense_hdr->asc, sense_hdr->ascq);
-----------
I want to see if we are getting any special sense that we are not
handling.


>
> PS: You are right. In case of link failures we need to transfer the luns by sending the mode
> select. But, if you offline (or fail the controller) the Luns are automatically transferred to
> alt controller. This is known behavior.
>
>
> Thanks
> Babu Moger
>
> -----Original Message-----
> From: Chandra Seetharaman [mailto:sekharan@us.ibm.com]
> Sent: Thursday, October 30, 2008 3:03 PM
> To: Moger, Babu
> Cc: device-mapper development; linux-scsi@vger.kernel.org
> Subject: RE: [dm-devel] i/o error due to all path failure with rdac
>
>
> On Thu, 2008-10-30 at 13:17 -0600, Moger, Babu wrote:
> > I am running multipath-tools v0.4.8 (I just pulled from mainstream last week) and kernel version 2.6.27-rc7.
> >
> > I am not seeing "queueing MODE_SELECT command", because this is online/offline test. When you offline
> > the controller the luns are automatically transferred to alt controller.
>
> No, moving the luns to the other controller is done by the rdac hardware
> handler by way of sending a MODE_SELECT to the controller.
>
> So, we should be seeing a MODE_SELECT to the passive controller.
>
> >
> > Thanks
> > Babu Moger
> >
> > -----Original Message-----
> > From: dm-devel-bounces@redhat.com [mailto:dm-devel-bounces@redhat.com] On Behalf Of Mike Anderson
> > Sent: Thursday, October 30, 2008 12:35 PM
> > To: device-mapper development; Chandra Seetharaman
> > Cc: linux-scsi@vger.kernel.org
> > Subject: Re: [dm-devel] i/o error due to all path failure with rdac
> >
> > Moger, Babu <Babu.Moger@lsi.com> wrote:
> > >
> > > Hi,
> > >
> > > I am running an online/offline test. I have two paths to the controller. One is active and one is passive. When I fail (offline) the active path (sde 8:64), the Device mapper is failing passive path(sdf 8:80) as well leading to all path failure. Any ideas or hints?
> > >
> >
> > What version of multipath tools and kernel are you running? If this is a
> > newer kernel I would have expected to see "queueing MODE_SELECT command"
> > during failover.
> >
> > > Here is output multipath -ll. I have only one lun.
> > >
> > > [root@localhost ~]# multipath -ll
> > > mpathie (3600a0b80000f6a7d0000cff048fed59c) dm-2 LSI,INF-01-00
> > > [size=10G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]
> > > \_ round-robin 0 [prio=2][enabled]
> > > \_ 3:0:0:0 sde 8:64 [active][undef]
> > > \_ round-robin 0 [prio=1][enabled]
> > > \_ 3:0:1:0 sdf 8:80 [active][undef]
> > >
> > >
> > > Here is the detailed log.
> > >
> > > Oct 24 16:50:50 localhost multipathd: sdf: rdac prio = 0
> > > Oct 24 16:51:06 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_BUS_BUSY driverbyte=DRIVER_OK,SUGGEST_OK
> > > Oct 24 16:51:06 localhost kernel: end_request: I/O error, dev sde, sector 1047072
> > > Oct 24 16:51:06 localhost kernel: device-mapper: multipath: Failing path 8:64.
> > > Oct 24 16:51:06 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> > > Oct 24 16:51:06 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> > > Oct 24 16:51:06 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> > > Oct 24 16:51:06 localhost multipathd: pg_timeout = NONE (internal default)
> > > Oct 24 16:51:06 localhost multipathd: 8:64: mark as failed
> > > Oct 24 16:51:06 localhost multipathd: uevent 'change' from '/block/dm-2'
> > > Oct 24 16:51:06 localhost multipathd: UDEV_LOG=3
> > > Oct 24 16:51:06 localhost multipathd: ACTION=change
> > > Oct 24 16:51:06 localhost multipathd: DEVPATH=/block/dm-2
> > > Oct 24 16:51:06 localhost multipathd: SUBSYSTEM=block
> > > Oct 24 16:51:06 localhost multipathd: DM_TARGET=multipath
> > > Oct 24 16:51:06 localhost multipathd: DM_ACTION=PATH_FAILED
> > > Oct 24 16:51:06 localhost multipathd: DM_SEQNUM=1
> > > Oct 24 16:51:06 localhost multipathd: DM_PATH=8:64
> > > Oct 24 16:51:06 localhost multipathd: DM_NR_VALID_PATHS=1
> > > Oct 24 16:51:06 localhost multipathd: DM_NAME=mpathie
> > > Oct 24 16:51:06 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> > > Oct 24 16:51:06 localhost multipathd: MAJOR=253
> > > Oct 24 16:51:06 localhost multipathd: MINOR=2
> > > Oct 24 16:51:06 localhost multipathd: DEVTYPE=disk
> > > Oct 24 16:51:06 localhost multipathd: SEQNUM=1254
> > > Oct 24 16:51:06 localhost multipathd: UDEVD_EVENT=1
> > > Oct 24 16:51:06 localhost multipathd: dm-2: add map (uevent)
> > > Oct 24 16:51:08 localhost kernel: device-mapper: multipath: Failing path 8:80.
> > > Oct 24 16:51:08 localhost multipathd: mpathie: devmap event #3
> > > Oct 24 16:51:08 localhost multipathd: mpathie: discover
> > > Oct 24 16:51:08 localhost multipathd: mpathie: rr_weight = 2 (controller setting)
> > > Oct 24 16:51:08 localhost multipathd: mpathie: pgfailback = 100 (controller setting)
> > > Oct 24 16:51:08 localhost multipathd: mpathie: no_path_retry = 10 (controller setting)
> > > Oct 24 16:51:08 localhost multipathd: pg_timeout = NONE (internal default)
> > > Oct 24 16:51:08 localhost multipathd: 8:80: mark as failed
> > > Oct 24 16:51:08 localhost multipathd: mpathie: Entering recovery mode: max_retries=10
> > > Oct 24 16:51:08 localhost multipathd: uevent 'change' from '/block/dm-2'
> > > Oct 24 16:51:08 localhost multipathd: UDEV_LOG=3
> > > Oct 24 16:51:08 localhost multipathd: ACTION=change
> > > Oct 24 16:51:08 localhost multipathd: DEVPATH=/block/dm-2
> > > Oct 24 16:51:08 localhost multipathd: SUBSYSTEM=block
> > > Oct 24 16:51:08 localhost multipathd: DM_TARGET=multipath
> > > Oct 24 16:51:08 localhost multipathd: DM_ACTION=PATH_FAILED
> > > Oct 24 16:51:08 localhost multipathd: DM_SEQNUM=2
> > > Oct 24 16:51:08 localhost multipathd: DM_PATH=8:80
> > > Oct 24 16:51:08 localhost multipathd: DM_NR_VALID_PATHS=0
> > > Oct 24 16:51:08 localhost multipathd: DM_NAME=mpathie
> > > Oct 24 16:51:08 localhost multipathd: DM_UUID=mpath-3600a0b80000f6a7d0000cff048fed59c
> > > Oct 24 16:51:08 localhost multipathd: MAJOR=253
> > > Oct 24 16:51:08 localhost multipathd: MINOR=2
> > > Oct 24 16:51:08 localhost multipathd: DEVTYPE=disk
> > > Oct 24 16:51:08 localhost multipathd: SEQNUM=1255
> > > Oct 24 16:51:08 localhost multipathd: UDEVD_EVENT=1
> > > Oct 24 16:51:08 localhost multipathd: dm-2: add map (uevent)
> > > Oct 24 16:51:36 localhost kernel: rport-3:0-2: blocked FC remote port time out: removing target and saving binding
> > > Oct 24 16:51:36 localhost multipathd: sde: rdac checker reports path is down
> > > Oct 24 16:51:36 localhost multipathd: sde: mask = 0x8
> > > Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Synchronizing SCSI cache
> > > Oct 24 16:51:36 localhost kernel: sd 3:0:0:0: [sde] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK,SUGGEST_OK
> > > Oct 24 16:51:36 localhost kernel: scsi 3:0:0:0: rdac: Detached
> > > Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_generic/sg5'
> > > Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> > > Oct 24 16:51:36 localhost multipathd: ACTION=remove
> > > Oct 24 16:51:36 localhost multipathd: DEVPATH=/class/scsi_generic/sg5
> > > Oct 24 16:51:36 localhost multipathd: SUBSYSTEM=scsi_generic
> > > Oct 24 16:51:36 localhost multipathd: MAJOR=21
> > > Oct 24 16:51:36 localhost multipathd: MINOR=5
> > > Oct 24 16:51:36 localhost multipathd: PHYSDEVPATH=/devices/pci0000:00/0000:00:02.0/0000:06:00.3/0000:0b:01.0/host3/rport-3:0-2/target3:0:0/3:0:0:0
> > > Oct 24 16:51:36 localhost multipathd: PHYSDEVBUS=scsi
> > > Oct 24 16:51:36 localhost multipathd: PHYSDEVDRIVER=sd
> > > Oct 24 16:51:36 localhost multipathd: SEQNUM=1256
> > > Oct 24 16:51:36 localhost multipathd: UDEVD_EVENT=1
> > > Oct 24 16:51:36 localhost multipathd: DEVNAME=/dev/sg5
> > > Oct 24 16:51:36 localhost multipathd: uevent 'remove' from '/class/scsi_device/3:0:0:0'
> > > Oct 24 16:51:36 localhost multipathd: UDEV_LOG=3
> > > Oct 24 16:51:36 localhost kernel: device-mapper: multipath: Failing path 8:80.
> > > Oct 24 16:51:36 localhost multipathd: ACTION=remove
> > > Oct 24 16:51:36 localhost UnixSmash4[9200]: 7:UnixSmash has experienced a write failure.
> > >
> > > Thanks
> > > Babu Moger
> > >
> > >
> > > --
> > > dm-devel mailing list
> > > dm-devel@redhat.com
> > > https://www.redhat.com/mailman/listinfo/dm-devel
> >
> > -andmike
> > --
> > Michael Anderson
> > andmike@linux.vnet.ibm.com
> >
> > --
> > dm-devel mailing list
> > dm-devel@redhat.com
> > https://www.redhat.com/mailman/listinfo/dm-devel
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-30-2008, 10:35 PM
Chandra Seetharaman
 
Default i/o error due to all path failure with rdac

Can you try this patch.
---------
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>

Index: linux-2.6.27/drivers/scsi/device_handler/scsi_dh_rdac.c
================================================== =================
--- linux-2.6.27.orig/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ linux-2.6.27/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -386,6 +386,7 @@ static int check_ownership(struct scsi_d
struct c9_inquiry *inqp;

h->lun_state = RDAC_LUN_UNOWNED;
+ h->state = RDAC_STATE_ACTIVE;
err = submit_inquiry(sdev, 0xC9, sizeof(struct c9_inquiry), h);
if (err == SCSI_DH_OK) {
inqp = &h->inq.c9;
---

On Thu, 2008-10-30 at 17:21 -0600, Moger, Babu wrote:
> Looks like eventually we get rport failure after the controller offline.
>
>
> I have attached the messages file. I am running raw IO.
>
> Also, please note that I have added following lines in check_sense. This condition (quiescence) should be retried.
>
>
> case UNIT_ATTENTION:
> if ((sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) ||
> (sense_hdr->asc == 0x8b && sense_hdr->ascq == 0x02))
> /*
> * Power On, Reset, or Bus Device Reset, just retry.
> */
> return ADD_TO_MLQUEUE;
>
>
>
> Thanks
> Babu Moger
>
> -----Original Message-----
> From: Chandra Seetharaman [mailto:sekharan@us.ibm.com]
> Sent: Thursday, October 30, 2008 5:24 PM
> To: Moger, Babu
> Cc: device-mapper development; linux-scsi@vger.kernel.org
> Subject: RE: [dm-devel] i/o error due to all path failure with rdac
>
>
> On Thu, 2008-10-30 at 14:30 -0600, Moger, Babu wrote:
> > This is what happens in my case
> >
> > When the active path is failed, the dh handler calls rdac_activate to activate the passive path.
> > Then check_ownership is called. As you know check_ownership sends inquiry (page c9). Looking at the
> > response this function sets the lun_state(h->lun_state) to RDAC_LUN_OWNED.
> >
> > If lun_state is set to RDAC_LUN_OWNED then send_mode_select will not be called. This is what
> > happens in my case.
>
> Ok. Now it is clear. I thought the port was disabled.
>
> The rport failure in the log you sent made me think that it was a port
> disable. Why was there a rport failure ?
>
> Can you add these two lines at the top of
> scsi_dn_rdac.c:rdac_check_sense() function, retest and send me the log.
> ------------
> sdev_printk(KERN_ERR, sdev, "sense_key:%x; asc %c; ascq %x
",
> sense_hdr->sense_key, sense_hdr->asc, sense_hdr->ascq);
> -----------
> I want to see if we are getting any special sense that we are not
> handling.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-31-2008, 03:05 PM
"Moger, Babu"
 
Default i/o error due to all path failure with rdac

Yes, It is working fine with this patch. Now my online/offline test is running fine. Thank you very much. That was really a quick fix.

I am still learning the device mapper. Hopefully I will be some help in the future.

Thanks
Babu Moger

-----Original Message-----
From: Chandra Seetharaman [mailto:sekharan@us.ibm.com]
Sent: Thursday, October 30, 2008 6:35 PM
To: Moger, Babu
Cc: device-mapper development; linux-scsi@vger.kernel.org
Subject: RE: [dm-devel] i/o error due to all path failure with rdac

Can you try this patch.
---------
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>

Index: linux-2.6.27/drivers/scsi/device_handler/scsi_dh_rdac.c
================================================== =================
--- linux-2.6.27.orig/drivers/scsi/device_handler/scsi_dh_rdac.c
+++ linux-2.6.27/drivers/scsi/device_handler/scsi_dh_rdac.c
@@ -386,6 +386,7 @@ static int check_ownership(struct scsi_d
struct c9_inquiry *inqp;

h->lun_state = RDAC_LUN_UNOWNED;
+ h->state = RDAC_STATE_ACTIVE;
err = submit_inquiry(sdev, 0xC9, sizeof(struct c9_inquiry), h);
if (err == SCSI_DH_OK) {
inqp = &h->inq.c9;
---

On Thu, 2008-10-30 at 17:21 -0600, Moger, Babu wrote:
> Looks like eventually we get rport failure after the controller offline.
>
>
> I have attached the messages file. I am running raw IO.
>
> Also, please note that I have added following lines in check_sense. This condition (quiescence) should be retried.
>
>
> case UNIT_ATTENTION:
> if ((sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) ||
> (sense_hdr->asc == 0x8b && sense_hdr->ascq == 0x02))
> /*
> * Power On, Reset, or Bus Device Reset, just retry.
> */
> return ADD_TO_MLQUEUE;
>
>
>
> Thanks
> Babu Moger
>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-31-2008, 07:21 PM
Chandra Seetharaman
 
Default i/o error due to all path failure with rdac

That is good to know. I will push the patch upstream.

BTW, Please be advised that you might see some unknown problems if you
use the multipath tools from mainline on your distro releases.

The library in distro releases and the tools in mainline may not be
totally compatible, which is the cause.

I would suggest you to use the tools from your distro itself. The kernel
changes for SCSI_DH should work with no changes to the tools (in your
distro). IOW, you do not need the latest multipath tools.

chandra
On Fri, 2008-10-31 at 10:05 -0600, Moger, Babu wrote:
> Yes, It is working fine with this patch. Now my online/offline test is running fine. Thank you very much. That was really a quick fix.
>
> I am still learning the device mapper. Hopefully I will be some help in the future.
>
> Thanks
> Babu Moger
>
> -----Original Message-----
> From: Chandra Seetharaman [mailto:sekharan@us.ibm.com]
> Sent: Thursday, October 30, 2008 6:35 PM
> To: Moger, Babu
> Cc: device-mapper development; linux-scsi@vger.kernel.org
> Subject: RE: [dm-devel] i/o error due to all path failure with rdac
>
> Can you try this patch.
> ---------
> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
>
> Index: linux-2.6.27/drivers/scsi/device_handler/scsi_dh_rdac.c
> ================================================== =================
> --- linux-2.6.27.orig/drivers/scsi/device_handler/scsi_dh_rdac.c
> +++ linux-2.6.27/drivers/scsi/device_handler/scsi_dh_rdac.c
> @@ -386,6 +386,7 @@ static int check_ownership(struct scsi_d
> struct c9_inquiry *inqp;
>
> h->lun_state = RDAC_LUN_UNOWNED;
> + h->state = RDAC_STATE_ACTIVE;
> err = submit_inquiry(sdev, 0xC9, sizeof(struct c9_inquiry), h);
> if (err == SCSI_DH_OK) {
> inqp = &h->inq.c9;
> ---
>
> On Thu, 2008-10-30 at 17:21 -0600, Moger, Babu wrote:
> > Looks like eventually we get rport failure after the controller offline.
> >
> >
> > I have attached the messages file. I am running raw IO.
> >
> > Also, please note that I have added following lines in check_sense. This condition (quiescence) should be retried.
> >
> >
> > case UNIT_ATTENTION:
> > if ((sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) ||
> > (sense_hdr->asc == 0x8b && sense_hdr->ascq == 0x02))
> > /*
> > * Power On, Reset, or Bus Device Reset, just retry.
> > */
> > return ADD_TO_MLQUEUE;
> >
> >
> >
> > Thanks
> > Babu Moger
> >

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 12:31 PM.

VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org