FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 01-23-2012, 05:43 PM
"Moger, Babu"
 
Default scsi : fixing the new host byte settings (DID_TARGET_FAILURE and DID_NEXUS_FAILURE)

This patch fixes the host byte settings DID_TARGET_FAILURE and DID_NEXUS_FAILURE.

The function __scsi_error_from_host_byte,
*tries to reset the host byte to DID_OK. But that

does not happen because of the OR operation.


*

Here is the flow.

scsi_softirq_done->
scsi_decide_disposition -> __scsi_error_from_host_byte

*

Let's take an example with DID_NEXUS_FAILURE. In
scsi_decide_disposition, result will be set as

DID_NEXUS_FAILURE (=0x11). Then in __scsi_error_from_host_byte, when we do OR with


DID_OK.*
Purpose is to reset it back to DID_OK. But that does not happen.
*This patch fixes this issue.

*

Signed-off-by: Babu Moger <babu.moger@netapp.com>

---

diff -uprN -X linux-3.3-rc1/Documentation/dontdiff linux-3.3-rc1//drivers/scsi/scsi_error.c
linux-3.3-rc1-new//drivers/scsi/scsi_error.c

--- linux-3.3-rc1//drivers/scsi/scsi_error.c***
2012-01-19 17:04:48.000000000 -0600

+++ linux-3.3-rc1-new//drivers/scsi/scsi_error.c*****
2012-01-23 11:55:26.000000000 -0600

@@ -1540,7 +1540,7 @@ int scsi_decide_disposition(struct
scsi_

*****************
** Need to modify host byte to signal a

*****************
** permanent target failure

*****************
**/

-**************** scmd->result |= (DID_TARGET_FAILURE << 16);

+**************** set_host_byte(scmd, DID_TARGET_FAILURE);

*****************
rtn = SUCCESS;

***********
}

***********
/* if rtn == FAILED, we have no sense information;

@@ -1560,7 +1560,7 @@ int scsi_decide_disposition(struct
scsi_

*****
case RESERVATION_CONFLICT:

***********
sdev_printk(KERN_INFO, scmd->device,

*****************
*** "reservation conflict
");

-********** scmd->result |= (DID_NEXUS_FAILURE << 16);

+********** set_host_byte(scmd, DID_NEXUS_FAILURE);

***********
return SUCCESS; /* causes immediate i/o error */

*****
default:

***********
return FAILED;

diff -uprN -X linux-3.3-rc1/Documentation/dontdiff linux-3.3-rc1//drivers/scsi/scsi_lib.c linux-3.3-rc1-new//drivers/scsi/scsi_lib.c

--- linux-3.3-rc1//drivers/scsi/scsi_lib.c*****
2012-01-19 17:04:48.000000000 -0600

+++ linux-3.3-rc1-new//drivers/scsi/scsi_lib.c*
2012-01-23 11:50:25.000000000 -0600

@@ -682,11 +682,11 @@ static int __scsi_error_from_host_byte(s

***********
error = -ENOLINK;

***********
break;

*****
case DID_TARGET_FAILURE:

-********** cmd->result |= (DID_OK << 16);

+********** set_host_byte(cmd, DID_OK);

***********
error = -EREMOTEIO;

***********
break;

*****
case DID_NEXUS_FAILURE:

-********** cmd->result |= (DID_OK << 16);

+********** set_host_byte(cmd, DID_OK);

***********
error = -EBADE;

***********
break;

*****
default:

*

*

*

*




--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-24-2012, 07:38 PM
"Moger, Babu"
 
Default scsi : fixing the new host byte settings (DID_TARGET_FAILURE and DID_NEXUS_FAILURE)

Resubmitting as my previous post had format issues and did not go linux-scsi.

This patch fixes the host byte settings DID_TARGET_FAILURE and DID_NEXUS_FAILURE.
The function __scsi_error_from_host_byte, tries to reset the host byte to DID_OK. But that
does not happen because of the OR operation.

Here is the flow.
scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte

Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition, result will be set as
DID_NEXUS_FAILURE (=0x11). Then in __scsi_error_from_host_byte, when we do OR with
DID_OK. Purpose is to reset it back to DID_OK. But that does not happen. This patch fixes this issue.

Signed-off-by: Babu Moger <babu.moger@netapp.com>
---
diff -uprN -X linux-3.3-rc1/Documentation/dontdiff linux-3.3-rc1//drivers/scsi/scsi_error.c linux-3.3-rc1-new//drivers/scsi/scsi_error.c
--- linux-3.3-rc1//drivers/scsi/scsi_error.c 2012-01-19 17:04:48.000000000 -0600
+++ linux-3.3-rc1-new//drivers/scsi/scsi_error.c 2012-01-23 11:55:26.000000000 -0600
@@ -1540,7 +1540,7 @@ int scsi_decide_disposition(struct scsi_
* Need to modify host byte to signal a
* permanent target failure
*/
- scmd->result |= (DID_TARGET_FAILURE << 16);
+ set_host_byte(scmd, DID_TARGET_FAILURE);
rtn = SUCCESS;
}
/* if rtn == FAILED, we have no sense information;
@@ -1560,7 +1560,7 @@ int scsi_decide_disposition(struct scsi_
case RESERVATION_CONFLICT:
sdev_printk(KERN_INFO, scmd->device,
"reservation conflict
");
- scmd->result |= (DID_NEXUS_FAILURE << 16);
+ set_host_byte(scmd, DID_NEXUS_FAILURE);
return SUCCESS; /* causes immediate i/o error */
default:
return FAILED;
diff -uprN -X linux-3.3-rc1/Documentation/dontdiff linux-3.3-rc1//drivers/scsi/scsi_lib.c linux-3.3-rc1-new//drivers/scsi/scsi_lib.c
--- linux-3.3-rc1//drivers/scsi/scsi_lib.c 2012-01-19 17:04:48.000000000 -0600
+++ linux-3.3-rc1-new//drivers/scsi/scsi_lib.c 2012-01-23 11:50:25.000000000 -0600
@@ -682,11 +682,11 @@ static int __scsi_error_from_host_byte(s
error = -ENOLINK;
break;
case DID_TARGET_FAILURE:
- cmd->result |= (DID_OK << 16);
+ set_host_byte(cmd, DID_OK);
error = -EREMOTEIO;
break;
case DID_NEXUS_FAILURE:
- cmd->result |= (DID_OK << 16);
+ set_host_byte(cmd, DID_OK);
error = -EBADE;
break;
default:



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-24-2012, 10:03 PM
Mike Snitzer
 
Default scsi : fixing the new host byte settings (DID_TARGET_FAILURE and DID_NEXUS_FAILURE)

Hi Babu,

Thanks for finding this.

On Tue, Jan 24 2012 at 3:38pm -0500,
Moger, Babu <Babu.Moger@netapp.com> wrote:

> Resubmitting as my previous post had format issues and did not go linux-scsi.
>
> This patch fixes the host byte settings DID_TARGET_FAILURE and DID_NEXUS_FAILURE.
> The function __scsi_error_from_host_byte, tries to reset the host byte to DID_OK. But that
> does not happen because of the OR operation.
>
> Here is the flow.
> scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte

or more accurately:

scsi_softirq_done -> scsi_decide_disposition
scsi_softirq_done -> scsi_finish_command -> scsi_io_completion -> __scsi_error_from_host_byte

> Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition, result will be set as
> DID_NEXUS_FAILURE (=0x11). Then in __scsi_error_from_host_byte, when we do OR with
> DID_OK. Purpose is to reset it back to DID_OK. But that does not happen. This patch fixes this issue.

We clearly aren't properly resetting to DID_OK but I'm not seeing an
obvious "nasty bug" that is lurking due to this. Am I missing
something?

__scsi_error_from_host_byte() is setting error which is passed back up
via blk_end_request() and blk_end_request_all(). And in my previous
testing I know that corresponding errors are making it out to
dm-multipath (e.g. -EREMOTEIO).

Also, your patch header is missing the location where DID_OK is not
properly matched (because it wasn't set exclussively due to being
or'd). Looks like scsi_noretry_cmd() will be made more efficient
because it will match DID_OK immediately. Any other locations? Would
be good to call them out.

Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-25-2012, 08:39 PM
"Moger, Babu"
 
Default scsi : fixing the new host byte settings (DID_TARGET_FAILURE and DID_NEXUS_FAILURE)

> -----Original Message-----
> From: Mike Snitzer [mailto:snitzer@redhat.com]
> Sent: Tuesday, January 24, 2012 5:03 PM
> To: Moger, Babu
> Cc: linux-scsi@vger.kernel.org; device-mapper development (dm-
> devel@redhat.com)
> Subject: Re: [PATCH 2/2] scsi : fixing the new host byte settings
> (DID_TARGET_FAILURE and DID_NEXUS_FAILURE)
>
> Hi Babu,
>
> Thanks for finding this.
>
> On Tue, Jan 24 2012 at 3:38pm -0500,
> Moger, Babu <Babu.Moger@netapp.com> wrote:
>
> > Resubmitting as my previous post had format issues and did not go linux-scsi.
> >
> > This patch fixes the host byte settings DID_TARGET_FAILURE and
> DID_NEXUS_FAILURE.
> > The function __scsi_error_from_host_byte, tries to reset the host byte to
> DID_OK. But that
> > does not happen because of the OR operation.
> >
> > Here is the flow.
> > scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte
>
> or more accurately:
>
> scsi_softirq_done -> scsi_decide_disposition
> scsi_softirq_done -> scsi_finish_command -> scsi_io_completion ->
> __scsi_error_from_host_byte
>
> > Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition,
> result will be set as
> > DID_NEXUS_FAILURE (=0x11). Then in __scsi_error_from_host_byte, when we
> do OR with
> > DID_OK. Purpose is to reset it back to DID_OK. But that does not happen. This
> patch fixes this issue.
>
> We clearly aren't properly resetting to DID_OK but I'm not seeing an
> obvious "nasty bug" that is lurking due to this. Am I missing
> something?

Yes. It is causing some issues in our proprietary multipath driver. Normally, our assumption
is that host status overrides all other statuses. If host status is set to status other than DID_OK
then we normally ignore other statuses(like reading the check sense). We have worked this around.
My assumption is, most of the user Level code does the same thing. It might give wrong impression
about the kind of error.

One question.. Did the newlines wrapped in this patch also?

>
> __scsi_error_from_host_byte() is setting error which is passed back up
> via blk_end_request() and blk_end_request_all(). And in my previous
> testing I know that corresponding errors are making it out to
> dm-multipath (e.g. -EREMOTEIO).
>
> Also, your patch header is missing the location where DID_OK is not
> properly matched (because it wasn't set exclussively due to being

I am not sure what you meant here.

> or'd). Looks like scsi_noretry_cmd() will be made more efficient
> because it will match DID_OK immediately. Any other locations? Would
> be good to call them out.
>
> Mike

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 01-25-2012, 09:47 PM
Mike Snitzer
 
Default scsi : fixing the new host byte settings (DID_TARGET_FAILURE and DID_NEXUS_FAILURE)

On Wed, Jan 25 2012 at 4:39pm -0500,
Moger, Babu <Babu.Moger@netapp.com> wrote:

> > -----Original Message-----
> > From: Mike Snitzer [mailto:snitzer@redhat.com]
> > Sent: Tuesday, January 24, 2012 5:03 PM
> > To: Moger, Babu
> > Cc: linux-scsi@vger.kernel.org; device-mapper development (dm-
> > devel@redhat.com)
> > Subject: Re: [PATCH 2/2] scsi : fixing the new host byte settings
> > (DID_TARGET_FAILURE and DID_NEXUS_FAILURE)
> >
> > Hi Babu,
> >
> > Thanks for finding this.
> >
> > On Tue, Jan 24 2012 at 3:38pm -0500,
> > Moger, Babu <Babu.Moger@netapp.com> wrote:
> >
> > > Resubmitting as my previous post had format issues and did not go linux-scsi.
> > >
> > > This patch fixes the host byte settings DID_TARGET_FAILURE and
> > DID_NEXUS_FAILURE.
> > > The function __scsi_error_from_host_byte, tries to reset the host byte to
> > DID_OK. But that
> > > does not happen because of the OR operation.
> > >
> > > Here is the flow.
> > > scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte
> >
> > or more accurately:
> >
> > scsi_softirq_done -> scsi_decide_disposition
> > scsi_softirq_done -> scsi_finish_command -> scsi_io_completion ->
> > __scsi_error_from_host_byte
> >
> > > Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition,
> > result will be set as
> > > DID_NEXUS_FAILURE (=0x11). Then in __scsi_error_from_host_byte, when we
> > do OR with
> > > DID_OK. Purpose is to reset it back to DID_OK. But that does not happen. This
> > patch fixes this issue.
> >
> > We clearly aren't properly resetting to DID_OK but I'm not seeing an
> > obvious "nasty bug" that is lurking due to this. Am I missing
> > something?
>
> Yes. It is causing some issues in our proprietary multipath driver. Normally, our assumption
> is that host status overrides all other statuses. If host status is set to status other than DID_OK
> then we normally ignore other statuses(like reading the check sense). We have worked this around.
> My assumption is, most of the user Level code does the same thing. It might give wrong impression
> about the kind of error.
>
> One question.. Did the newlines wrapped in this patch also?

Looks fine to me.

> > __scsi_error_from_host_byte() is setting error which is passed back up
> > via blk_end_request() and blk_end_request_all(). And in my previous
> > testing I know that corresponding errors are making it out to
> > dm-multipath (e.g. -EREMOTEIO).
> >
> > Also, your patch header is missing the location where DID_OK is not
> > properly matched (because it wasn't set exclussively due to being
>
> I am not sure what you meant here.

Well like I said, it is clear that scsi_noretry_cmd() won't match the
DID_OK case in the host_byte select statement. I was just wondering
where else this improperly set DID_OK was causing a DID_OK match to not
happen.

But the fact that this is impacting your proprietary multipath driver
basically answers my question (I was trying to understand what was
ultimately broken as a result of us improperly resetting to DID_OK).

All said:

Acked-by: Mike Snitzer <snitzer@redhat.com>

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 11:41 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org