Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian User (http://www.linux-archive.org/debian-user/)
-   -   qla2xxx mailbox timeout crashes lenny (http://www.linux-archive.org/debian-user/49929-qla2xxx-mailbox-timeout-crashes-lenny.html)

"Daniel Bakken" 02-07-2008 09:45 PM

qla2xxx mailbox timeout crashes lenny
 
When running rsnapshot backups from an IBM fibre channel disk system using LVM2 snapshots to a Promise fibre channel disk system, the qla2xxx driver causes a system crash
and reboot. I'm running Lenny with kernel 2.6.22--3-vserver-amd64 and stock Debian qla2xxx module. I've already replaced the Qlogic HBA and the Qlogic switch connecting to the storage. Three other servers with similar hardware running the same Debian version don't have this problem. These events were logged with the ql2xextended_error_logging parameter enabled:


Feb* 6 13:40:28 hqhost kernel: qla2xxx_eh_abort(0): aborting sp ffff8101d01aa7c0 from RISC. pid=111928.
Feb* 6 13:40:58 hqhost kernel: qla2x00_mailbox_command(0): timeout calling abort_isp
Feb* 6 13:40:58 hqhost kernel: qla2x00_mailbox_command(0): timeout calling abort_isp

Feb* 6 13:40:58 hqhost kernel: qla2xxx 0000:08:01.0: Mailbox command timeout occured. Issuing ISP abort.
Feb* 6 13:40:58 hqhost kernel: qla2xxx 0000:08:01.0: Performing ISP error recovery - ha= ffff810225a84530.
Feb* 6 13:40:58 hqhost kernel: scsi(0): **** Load RISC code ****

Feb* 6 13:40:58 hqhost kernel: scsi(0): Verifying Checksum of loaded RISC code.
Feb* 6 13:40:58 hqhost kernel: scsi(0): Checksum OK, start firmware.
Feb* 6 13:40:58 hqhost kernel: scsi(0): Issue init firmware.
Feb* 6 13:40:59 hqhost kernel: scsi(0): Asynchronous P2P MODE received.

Feb* 6 13:40:59 hqhost kernel: scsi(0): Asynchronous LOOP UP (4 Gbps).
Feb* 6 13:40:59 hqhost kernel: qla2xxx 0000:08:01.0: LOOP UP detected (4 Gbps).
Feb* 6 13:40:59 hqhost kernel: scsi(0): Asynchronous PORT UPDATE.

Feb* 6 13:40:59 hqhost kernel: scsi(0): Port database changed ffff 0006 0000.
Feb* 6 13:40:59 hqhost kernel: scsi(0): Asynchronous PORT UPDATE ignored 0000/0004/0600.
Feb* 6 13:40:59 hqhost kernel: scsi(0): Asynchronous PORT UPDATE ignored 0000/0007/0b00.

Feb* 6 13:40:59 hqhost kernel: scsi(0): F/W Ready - OK
Feb* 6 13:40:59 hqhost kernel: scsi(0): fw_state=3 curr time=1001756ca.
Feb* 6 13:40:59 hqhost kernel: qla2x00_restart_isp(): Start configure loop, status = 0

Feb* 6 13:40:59 hqhost kernel: scsi(0): Configure loop -- dpc flags =0x4080048
Feb* 6 13:40:59 hqhost kernel: scsi(0): RSCN queue entry[0] = [00/000000].
Feb* 6 13:40:59 hqhost kernel: scsi(0): device_resync: rscn overflow.

Feb* 6 13:40:59 hqhost kernel: scsi(0): RFT_ID failed, completion status (280).
Feb* 6 13:40:59 hqhost kernel: scsi(0): Register FC-4 TYPE failed.
Feb* 6 13:40:59 hqhost kernel: scsi(0): RFF_ID failed, completion status (280).

Feb* 6 13:40:59 hqhost kernel: scsi(0): Register FC-4 Features failed.
Feb* 6 13:40:59 hqhost kernel: scsi(0): RNN_ID failed, completion status (280).
Feb* 6 13:40:59 hqhost kernel: scsi(0): Register Node Name failed.

Feb* 6 13:40:59 hqhost kernel: scsi(0): GID_PT failed, completion status (180).
Feb* 6 13:40:59 hqhost kernel: scsi(0): GA_NXT failed, rejected request:
Feb* 6 13:40:59 hqhost kernel:* 0** 1** 2** 3** 4** 5** 6** 7** 8** 9* Ah* Bh* Ch* Dh* Eh* Fh

Feb* 6 13:40:59 hqhost kernel: --------------------------------------------------------------
Feb* 6 13:40:59 hqhost kernel: 14* 00* 00* 00* 00* 10* 97* 23* 02* 00* 00* 00* 10* 08* 00* 00
Feb* 6 13:40:59 hqhost kernel: qla2xxx 0000:08:01.0: SNS scan failed -- assuming zero-entry result...

Feb* 6 13:40:59 hqhost kernel: scsi(0): fcport-0 - port retry count: 29 remaining
Feb* 6 13:40:59 hqhost kernel: scsi(0): fcport-1 - port retry count: 29 remaining
Feb* 6 13:40:59 hqhost kernel: scsi(0): fcport-2 - port retry count: 29 remaining

Feb* 6 13:40:59 hqhost kernel: qla24xx_fabric_logout(0): failed to complete IOCB -- completion status (2)* ioparam=0/810031.
Feb* 6 13:40:59 hqhost kernel: scsi(0): LOOP READY
Feb* 6 13:40:59 hqhost kernel: qla2x00_restart_isp(): Configure loop done, status = 0x0

Feb* 6 13:40:59 hqhost kernel: APIC error on CPU5: 00(40)
Feb* 6 13:40:59 hqhost kernel: qla2x00_abort_isp(0): exiting.
Feb* 6 13:40:59 hqhost kernel: qla2x00_mailbox_command(0): finished abort_isp
Feb* 6 13:40:59 hqhost kernel: qla2x00_mailbox_command(0): finished abort_isp

Feb* 6 13:40:59 hqhost kernel: qla2x00_mailbox_command(0): **** FAILED. mbx0=54, mbx1=0, mbx2=2397, cmd=54 ****
Feb* 6 13:40:59 hqhost kernel: qla2x00_issue_iocb(0): failed rval 0x100
Feb* 6 13:40:59 hqhost kernel: qla2x00_issue_iocb(0): failed rval 0x100

Feb* 6 13:40:59 hqhost kernel: qla24xx_abort_command(0): failed to issue IOCB (100).
Feb* 6 13:40:59 hqhost kernel: qla2xxx_eh_abort(0): abort_command mbx failed.
Feb* 6 13:40:59 hqhost kernel: qla2xxx 0000:08:01.0: scsi(0:0:0): Abort command issued -- 0 1b538 2002.

Feb* 6 13:41:00 hqhost kernel: scsi(0): fcport-0 - port retry count: 28 remaining
Feb* 6 13:41:00 hqhost kernel: scsi(0): fcport-1 - port retry count: 28 remaining
Feb* 6 13:41:00 hqhost kernel: scsi(0): fcport-2 - port retry count: 28 remaining

Feb* 6 13:41:01 hqhost kernel: scsi(0): fcport-0 - port retry count: 27 remaining
Feb* 6 13:41:01 hqhost kernel: scsi(0): fcport-1 - port retry count: 27 remaining
Feb* 6 13:41:01 hqhost kernel: scsi(0): fcport-2 - port retry count: 27 remaining

Feb* 6 13:41:02 hqhost kernel: scsi(0): fcport-0 - port retry count: 26 remaining
Feb* 6 13:41:02 hqhost kernel: scsi(0): fcport-1 - port retry count: 26 remaining
Feb* 6 13:41:02 hqhost kernel: scsi(0): fcport-2 - port retry count: 26 remaining

...(25 more port retries)...
Feb* 6 13:41:33 hqhost kernel:* rport-0:0-0: blocked FC remote port time out: removing target and saving binding
Feb* 6 13:41:33 hqhost kernel:* rport-0:0-4: blocked FC remote port time out: removing target and saving binding

Feb* 6 13:41:33 hqhost kernel:* rport-0:0-5: blocked FC remote port time out: removing target and saving binding
Feb* 6 13:41:33 hqhost kernel: qla2xxx 0000:08:01.0: scsi(0:0:0): DEVICE RESET ISSUED.
Feb* 6 13:41:33 hqhost kernel: qla2x00_wait_for_hba_online return_status=0



Is this a hardware problem, a kernel problem, or a qlogic driver problem-- or perhaps all three at once? Thanks in advance,
--
Daniel Bakken
Systems Administrator

Economic Modeling Specialists Inc

Moscow, Idaho

"Daniel Bakken" 02-08-2008 04:15 AM

qla2xxx mailbox timeout crashes lenny
 
When running rsnapshot backups from an IBM fibre channel disk system
using LVM2 snapshots to a Promise fibre channel disk system, the
qla2xxx driver causes a system crash
and reboot. I'm running Lenny with kernel 2.6.22--3-vserver-amd64 and
stock Debian qla2xxx module. I've already replaced the Qlogic HBA and
the Qlogic switch connecting to the storage. Three other servers with
similar hardware running the same Debian version don't have this
problem.


Feb* 6 13:40:28 hqhost kernel: qla2xxx_eh_abort(0): aborting sp ffff8101d01aa7c0 from RISC. pid=111928.
Feb* 6 13:40:58 hqhost kernel: qla2x00_mailbox_command(0): timeout calling abort_isp
Feb* 6 13:40:58 hqhost kernel: qla2x00_mailbox_command(0): timeout calling abort_isp


Feb* 6 13:40:58 hqhost kernel: qla2xxx 0000:08:01.0: Mailbox command timeout occured. Issuing ISP abort.
Feb* 6 13:40:58 hqhost kernel: qla2xxx 0000:08:01.0: Performing ISP error recovery - ha= ffff810225a84530.
Feb* 6 13:40:58 hqhost kernel: scsi(0): **** Load RISC code ****


Feb* 6 13:40:58 hqhost kernel: scsi(0): Verifying Checksum of loaded RISC code.
Feb* 6 13:40:58 hqhost kernel: scsi(0): Checksum OK, start firmware.
Feb* 6 13:40:58 hqhost kernel: scsi(0): Issue init firmware.
Feb* 6 13:40:59 hqhost kernel: scsi(0): Asynchronous P2P MODE received.


Feb* 6 13:40:59 hqhost kernel: scsi(0): Asynchronous LOOP UP (4 Gbps).
Feb* 6 13:40:59 hqhost kernel: qla2xxx 0000:08:01.0: LOOP UP detected (4 Gbps).
Feb* 6 13:40:59 hqhost kernel: scsi(0): Asynchronous PORT UPDATE.


Feb* 6 13:40:59 hqhost kernel: scsi(0): Port database changed ffff 0006 0000.
Feb* 6 13:40:59 hqhost kernel: scsi(0): Asynchronous PORT UPDATE ignored 0000/0004/0600.
Feb* 6 13:40:59 hqhost kernel: scsi(0): Asynchronous PORT UPDATE ignored 0000/0007/0b00.


Feb* 6 13:40:59 hqhost kernel: scsi(0): F/W Ready - OK
Feb* 6 13:40:59 hqhost kernel: scsi(0): fw_state=3 curr time=1001756ca.
Feb* 6 13:40:59 hqhost kernel: qla2x00_restart_isp(): Start configure loop, status = 0


Feb* 6 13:40:59 hqhost kernel: scsi(0): Configure loop -- dpc flags =0x4080048
Feb* 6 13:40:59 hqhost kernel: scsi(0): RSCN queue entry[0] = [00/000000].
Feb* 6 13:40:59 hqhost kernel: scsi(0): device_resync: rscn overflow.


Feb* 6 13:40:59 hqhost kernel: scsi(0): RFT_ID failed, completion status (280).
Feb* 6 13:40:59 hqhost kernel: scsi(0): Register FC-4 TYPE failed.
Feb* 6 13:40:59 hqhost kernel: scsi(0): RFF_ID failed, completion status (280).


Feb* 6 13:40:59 hqhost kernel: scsi(0): Register FC-4 Features failed.
Feb* 6 13:40:59 hqhost kernel: scsi(0): RNN_ID failed, completion status (280).
Feb* 6 13:40:59 hqhost kernel: scsi(0): Register Node Name failed.


Feb* 6 13:40:59 hqhost kernel: scsi(0): GID_PT failed, completion status (180).
Feb* 6 13:40:59 hqhost kernel: scsi(0): GA_NXT failed, rejected request:
Feb* 6 13:40:59 hqhost kernel:* 0** 1** 2** 3** 4** 5** 6** 7** 8** 9* Ah* Bh* Ch* Dh* Eh* Fh


Feb* 6 13:40:59 hqhost kernel: --------------------------------------------------------------
Feb* 6 13:40:59 hqhost kernel: 14* 00* 00* 00* 00* 10* 97* 23* 02* 00* 00* 00* 10* 08* 00* 00

Feb* 6 13:40:59 hqhost kernel: qla2xxx 0000:08:01.0: SNS scan failed -- assuming zero-entry result...

Feb* 6 13:40:59 hqhost kernel: scsi(0): fcport-0 - port retry count: 29 remaining
Feb* 6 13:40:59 hqhost kernel: scsi(0): fcport-1 - port retry count: 29 remaining
Feb* 6 13:40:59 hqhost kernel: scsi(0): fcport-2 - port retry count: 29 remaining


Feb* 6 13:40:59 hqhost kernel: qla24xx_fabric_logout(0): failed to complete IOCB -- completion status (2)* ioparam=0/810031.
Feb* 6 13:40:59 hqhost kernel: scsi(0): LOOP READY
Feb* 6 13:40:59 hqhost kernel: qla2x00_restart_isp(): Configure loop done, status = 0x0


Feb* 6 13:40:59 hqhost kernel: APIC error on CPU5: 00(40)
Feb* 6 13:40:59 hqhost kernel: qla2x00_abort_isp(0): exiting.
Feb* 6 13:40:59 hqhost kernel: qla2x00_mailbox_command(0): finished abort_isp
Feb* 6 13:40:59 hqhost kernel: qla2x00_mailbox_command(0): finished abort_isp


Feb* 6 13:40:59 hqhost kernel: qla2x00_mailbox_command(0): **** FAILED. mbx0=54, mbx1=0, mbx2=2397, cmd=54 ****
Feb* 6 13:40:59 hqhost kernel: qla2x00_issue_iocb(0): failed rval 0x100
Feb* 6 13:40:59 hqhost kernel: qla2x00_issue_iocb(0): failed rval 0x100


Feb* 6 13:40:59 hqhost kernel: qla24xx_abort_command(0): failed to issue IOCB (100).
Feb* 6 13:40:59 hqhost kernel: qla2xxx_eh_abort(0): abort_command mbx failed.
Feb* 6 13:40:59 hqhost kernel: qla2xxx 0000:08:01.0: scsi(0:0:0): Abort command issued -- 0 1b538 2002.


Feb* 6 13:41:00 hqhost kernel: scsi(0): fcport-0 - port retry count: 28 remaining
Feb* 6 13:41:00 hqhost kernel: scsi(0): fcport-1 - port retry count: 28 remaining
Feb* 6 13:41:00 hqhost kernel: scsi(0): fcport-2 - port retry count: 28 remaining


Feb* 6 13:41:01 hqhost kernel: scsi(0): fcport-0 - port retry count: 27 remaining
Feb* 6 13:41:01 hqhost kernel: scsi(0): fcport-1 - port retry count: 27 remaining
Feb* 6 13:41:01 hqhost kernel: scsi(0): fcport-2 - port retry count: 27 remaining


Feb* 6 13:41:02 hqhost kernel: scsi(0): fcport-0 - port retry count: 26 remaining
Feb* 6 13:41:02 hqhost kernel: scsi(0): fcport-1 - port retry count: 26 remaining
Feb* 6 13:41:02 hqhost kernel: scsi(0): fcport-2 - port retry count: 26 remaining


...(25 more port retries)...
Feb* 6 13:41:33 hqhost kernel:* rport-0:0-0: blocked FC remote port time out: removing target and saving binding
Feb* 6 13:41:33 hqhost kernel:* rport-0:0-4: blocked FC remote port time out: removing target and saving binding


Feb* 6 13:41:33 hqhost kernel:* rport-0:0-5: blocked FC remote port time out: removing target and saving binding
Feb* 6 13:41:33 hqhost kernel: qla2xxx 0000:08:01.0: scsi(0:0:0): DEVICE RESET ISSUED.
Feb* 6 13:41:33 hqhost kernel: qla2x00_wait_for_hba_online return_status=0




Is this a hardware problem, a kernel problem, or a qlogic driver problem-- or perhaps all three at once? Thanks in advance,--
Daniel Bakken
Systems Administrator

Economic Modeling Specialists Inc

1187 Alturas Drive
Moscow, Idaho 83843
(208) 883-3500 x1016 - office
(208) 596-1446 - cell


All times are GMT. The time now is 08:30 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.