Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Device-mapper Development (http://www.linux-archive.org/device-mapper-development/)
-   -   device-mapper multipath retry IO errors (http://www.linux-archive.org/device-mapper-development/16302-device-mapper-multipath-retry-io-errors.html)

Eddie Williams 12-10-2007 02:06 PM

device-mapper multipath retry IO errors
 
It looks to me like device mapper multipath will retry IO errors, no
matter what the error, indefinitely if no_path_retry is set to anything
other than 0 and the path checker does not detect the failure.

Say you run into a medium error, the particular IO will fail. The path
will be marked failed and retried on another path. This will exhaust
the list of paths since the medium error will happen on each path. If
no_path_retry is set to 1 or more then the IO will be queued. The path
checker will come along and the TUR or the IO to block 0 will pass so it
will mark the path as good, clearing the error. The IO will then get
reissued, marking the paths failed, etc.

Am I missing something in the code that will catch this?

I don't have a means to force a medium error but I was able to create
something similar by creating a reservation conflict.

Eddie

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

James Bottomley 12-10-2007 02:18 PM

device-mapper multipath retry IO errors
 
On Mon, 2007-12-10 at 10:06 -0500, Eddie Williams wrote:
> It looks to me like device mapper multipath will retry IO errors, no
> matter what the error, indefinitely if no_path_retry is set to anything
> other than 0 and the path checker does not detect the failure.
>
> Say you run into a medium error, the particular IO will fail. The path
> will be marked failed and retried on another path. This will exhaust
> the list of paths since the medium error will happen on each path. If
> no_path_retry is set to 1 or more then the IO will be queued. The path
> checker will come along and the TUR or the IO to block 0 will pass so it
> will mark the path as good, clearing the error. The IO will then get
> reissued, marking the paths failed, etc.
>
> Am I missing something in the code that will catch this?

I've been advocating for some time that we need to split our errors into
transport related (and therefore potentially retryable over a different
path) and device related (and therefore path independent and needing to
be reported to the user).

> I don't have a means to force a medium error but I was able to create
> something similar by creating a reservation conflict.

Actually, just for the record, there is a way to force devices to report
medium error using the READ LONG/WRITE LONG commands (these allow you to
pull the "real" data off the drive including the crc information. If
you save the old data and fill it with random bits before writing it,
the crc will inevitably mismatch and the device will signal a medium
error for that sector (on a read, anyway).

James


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


All times are GMT. The time now is 11:58 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.