If we get a transient error then we may not want to fail the path
right away. This patch fails the path after X seconds.
I am not sure how valuable this is. If users just set the no_path_retry
option then we end up with similar results. Without the patch + no_path_retry
then the IO is quickly sent to the new path and has a smaller chance of
getting sent to a queue that is blocked. With the patch we might avoid
some of the path failure messages that scare users. But most users
are not setting no_path_retry. Will they set this new timer?
struct dm_path path;
};
@@ -313,6 +320,14 @@ static int map_io(struct multipath *m, struct bio *bio,
spin_lock_irqsave(&m->lock, flags);
+ /*
+ * If the path is experiencing problems but is not marked failed,
+ * then throttle it until IO starts to execute correctly again.
+ */
+ if (m->current_pgpath && m->current_pgpath->curr_fail_count > 0 &&
+ m->repeat_count > 1)
+ m->repeat_count = 2;
+
/* Do we need to select a new pgpath? */
if (!m->current_pgpath ||
(!m->queue_io && (m->repeat_count && --m->repeat_count == 0)))
@@ -847,7 +862,15 @@ static int fail_path(struct pgpath *pgpath)
if (!pgpath->path.is_active)
goto out;