FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 02-05-2009, 09:52 PM
Mikulas Patocka
 
Default 2.6.28.2 & dm-snapshot or kcopyd Oops

On Thu, 5 Feb 2009, Jacky Kim wrote:

> Hi Mikulas,
>
> Please check my kernel config file in the attachment.
> I have tested with your patch, and get the follow message:
>
> # dmesg

Hi

Try this patch. With the same workload and kernel config.

See if it fixes the crash.

Mikulas

---

Under specific circumstances, kcopyd callback could be called under from
the thread that called dm_kcopyd_copy not from kcopyd workqueue.

dm_kcopyd_copy -> split_job -> segment_complete -> job->fn()

This code path is taken if thread calling dm_kcopyd_copy is delayed due to
scheduling inside split_job/segment_complete and the subjobs complete before
the loop in split_job completes.

Snapshots depend on the fact that callbacks are called from the singlethreaded
kcopyd workqueue and expect that there is no racing between individual
callbacks. The racing between callbacks can lead to corruption of exception
store and it can also cause that exception store callbacks are called twice
for the same exception --- a likely reason for crashes inside
pending_complete() / remove_exception().


When I reviewed kcopyd further, I found four total problems:

1. job->fn being called from the thread that submitted the job (see above).

2. job->fn(read_err, write_err, job->context); in segment_complete
reports the error of the last subjob, not the union of all errors.

3. This comment is bogus (the jobs are not cancelable), plus the code
is prone to deadlock because the callback may allocate another job from
the same mempool.
/*
* To avoid a race we must keep the job around
* until after the notify function has completed.
* Otherwise the client may try and stop the job
* after we've completed.
*/
job->fn(read_err, write_err, job->context);
mempool_free(job, job->kc->job_pool);

4. Master jobs and subjobs are allocated from the same mempool.
Completion and freeing of a master job depends on successful allocation of
all subjobs -> deadlock.


This patch moves completion of master jobs to run_complete_job (being called
from the kcopyd workqueue). The patch fixes problems 1, 2, 3.

Problem 4 will need more changes and another patch.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
drivers/md/dm-kcopyd.c | 15 ++++++---------
1 file changed, 6 insertions(+), 9 deletions(-)

Index: linux-2.6.29-rc3-devel/drivers/md/dm-kcopyd.c
================================================== =================
--- linux-2.6.29-rc3-devel.orig/drivers/md/dm-kcopyd.c 2009-02-05 23:01:50.000000000 +0100
+++ linux-2.6.29-rc3-devel/drivers/md/dm-kcopyd.c 2009-02-05 23:28:56.000000000 +0100
@@ -297,7 +297,8 @@ static int run_complete_job(struct kcopy
dm_kcopyd_notify_fn fn = job->fn;
struct dm_kcopyd_client *kc = job->kc;

- kcopyd_put_pages(kc, job->pages);
+ if (job->pages)
+ kcopyd_put_pages(kc, job->pages);
mempool_free(job, kc->job_pool);
fn(read_err, write_err, context);

@@ -508,14 +509,8 @@ static void segment_complete(int read_er

} else if (atomic_dec_and_test(&job->sub_jobs)) {

- /*
- * To avoid a race we must keep the job around
- * until after the notify function has completed.
- * Otherwise the client may try and stop the job
- * after we've completed.
- */
- job->fn(read_err, write_err, job->context);
- mempool_free(job, job->kc->job_pool);
+ push(&job->kc->complete_jobs, job);
+ wake(job->kc);
}
}

@@ -528,6 +523,8 @@ static void split_job(struct kcopyd_job
{
int i;

+ atomic_inc(&job->kc->nr_jobs);
+
atomic_set(&job->sub_jobs, SPLIT_COUNT);
for (i = 0; i < SPLIT_COUNT; i++)
segment_complete(0, 0u, job);

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-06-2009, 02:26 AM
Mikulas Patocka
 
Default 2.6.28.2 & dm-snapshot or kcopyd Oops

> + push(&job->kc->complete_jobs, job);
> + wake(job->kc);

I still found one bug with my patch. When I push the job to complete_jobs
queue, it may be immediatelly taken and cease to exist. So job->kc
dereference is wrong.

A new patch:



Under specific circumstances, kcopyd callback could be called from
the thread that called dm_kcopyd_copy not from kcopyd workqueue.

dm_kcopyd_copy -> split_job -> segment_complete -> job->fn()

This code path is taken if thread calling dm_kcopyd_copy is delayed due to
scheduling inside split_job/segment_complete and the subjobs complete before
the loop in split_job completes.

Snapshots depend on the fact that callbacks are called from the singlethreaded
kcopyd workqueue and expect that there is no racing between individual
callbacks. The racing between callbacks can lead to corruption of exception
store and it can also cause that exception store callbacks are called twice
for the same exception --- a likely reason for crashes inside
pending_complete() / remove_exception().


When I reviewed kcopyd further, I found four total problems:

1. job->fn being called from the thread that submitted the job (see above).

2. job->fn(read_err, write_err, job->context); in segment_complete
reports the error of the last subjob, not the union of all errors.

3. This comment is bogus (the jobs are not cancelable), plus the code
is prone to deadlock because the callback may allocate another job from
the same mempool.
/*
* To avoid a race we must keep the job around
* until after the notify function has completed.
* Otherwise the client may try and stop the job
* after we've completed.
*/
job->fn(read_err, write_err, job->context);
mempool_free(job, job->kc->job_pool);

4. Master jobs and subjobs are allocated from the same mempool.
Completion and freeing of a master job depends on successful allocation of
all subjobs -> deadlock.


This patch moves completion of master jobs to run_complete_job (being called
from the kcopyd workqueue). The patch fixes problems 1, 2, 3.

Problem 4 will need more changes and another patch.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

---
drivers/md/dm-kcopyd.c | 18 ++++++++----------
1 file changed, 8 insertions(+), 10 deletions(-)

Index: linux-2.6.29-rc3-devel/drivers/md/dm-kcopyd.c
================================================== =================
--- linux-2.6.29-rc3-devel.orig/drivers/md/dm-kcopyd.c 2009-02-05 23:01:50.000000000 +0100
+++ linux-2.6.29-rc3-devel/drivers/md/dm-kcopyd.c 2009-02-06 04:15:52.000000000 +0100
@@ -297,7 +297,8 @@ static int run_complete_job(struct kcopy
dm_kcopyd_notify_fn fn = job->fn;
struct dm_kcopyd_client *kc = job->kc;

- kcopyd_put_pages(kc, job->pages);
+ if (job->pages)
+ kcopyd_put_pages(kc, job->pages);
mempool_free(job, kc->job_pool);
fn(read_err, write_err, context);

@@ -461,6 +462,7 @@ static void segment_complete(int read_er
sector_t progress = 0;
sector_t count = 0;
struct kcopyd_job *job = (struct kcopyd_job *) context;
+ struct dm_kcopyd_client *kc = job->kc;

mutex_lock(&job->lock);

@@ -490,7 +492,7 @@ static void segment_complete(int read_er

if (count) {
int i;
- struct kcopyd_job *sub_job = mempool_alloc(job->kc->job_pool,
+ struct kcopyd_job *sub_job = mempool_alloc(kc->job_pool,
GFP_NOIO);

*sub_job = *job;
@@ -508,14 +510,8 @@ static void segment_complete(int read_er

} else if (atomic_dec_and_test(&job->sub_jobs)) {

- /*
- * To avoid a race we must keep the job around
- * until after the notify function has completed.
- * Otherwise the client may try and stop the job
- * after we've completed.
- */
- job->fn(read_err, write_err, job->context);
- mempool_free(job, job->kc->job_pool);
+ push(&kc->complete_jobs, job);
+ wake(kc);
}
}

@@ -528,6 +524,8 @@ static void split_job(struct kcopyd_job
{
int i;

+ atomic_inc(&job->kc->nr_jobs);
+
atomic_set(&job->sub_jobs, SPLIT_COUNT);
for (i = 0; i < SPLIT_COUNT; i++)
segment_complete(0, 0u, job);

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-06-2009, 10:17 PM
Mikulas Patocka
 
Default 2.6.28.2 & dm-snapshot or kcopyd Oops

On Fri, 6 Feb 2009, Jacky Kim wrote:

> Hi Mikulas,
>
> I patch it with kernel 2.6.28.2, but it still has probrem:
>
> [ 493.509650] ------------[ cut here ]------------
> [ 493.509654] kernel BUG at drivers/md/dm-snap.c:946!
> [ 493.509656] invalid opcode: 0000 [#1] SMP
> [ 493.509658] last sysfs file: /sys/devices/virtual/block/dm-11/dev
> [ 493.509660] Modules linked in: iscsi_trgt arcmsr bonding e1000
> [ 493.509664]
> [ 493.509666] Pid: 7520, comm: kcopyd Not tainted (2.6.28.2-dm #5) S5000PSL
> [ 493.509668] EIP: 0060:[<c03c6726>] EFLAGS: 00010246 CPU: 1
> [ 493.509677] EIP is at commit_callback+0x16/0x30
> [ 493.509679] EAX: f021f188 EBX: 0000002b ECX: fac54150 EDX: 00000001
> [ 493.509680] ESI: 021c44dc EDI: 00000000 EBP: eed384c0 ESP: efacdf10
> [ 493.509682] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 493.509686] Process kcopyd (pid: 7520, ti=efacc000 task=f2c95b40 task.ti=efacc000)
> [ 493.509687] Stack:
> [ 493.509688] c03c7799 c03c6710 0005aa3b 00000000 efa8fd48 f75b3ec0 00000000 c03c66b0
> [ 493.509692] c03c66e3 efa8fd48 effca9a0 eed3c1c0 c03c13f8 f2c95b40 efa8fd48 00000000
> [ 493.509695] effca9a0 effca9a4 eed3c1fc eed3c200 c03c119c c03c13a0 eed3c1c0 00000000
> [ 493.509699] Call Trace:
> [ 493.509701] [<c03c7799>] persistent_commit+0xb9/0x130
> [ 493.509703] [<c03c6710>] commit_callback+0x0/0x30
> [ 493.509709] [<c03c66b0>] copy_callback+0x0/0x60
> [ 493.509712] [<c03c66e3>] copy_callback+0x33/0x60
> [ 493.509714] [<c03c13f8>] run_complete_job+0x58/0xa0
> [ 493.509716] [<c03c119c>] process_jobs+0x4c/0xe0
> [ 493.509718] [<c03c13a0>] run_complete_job+0x0/0xa0
> [ 493.509720] [<c03c1230>] do_work+0x0/0x50
> [ 493.509722] [<c03c124e>] do_work+0x1e/0x50
> [ 493.509724] [<c012ef32>] run_workqueue+0x72/0x100
> [ 493.509728] [<c0132570>] prepare_to_wait+0x20/0x60
> [ 493.509732] [<c012f840>] worker_thread+0x0/0xb0
> [ 493.509734] [<c012f8b9>] worker_thread+0x79/0xb0
> [ 493.509736] [<c01323d0>] autoremove_wake_function+0x0/0x50
> [ 493.509738] [<c012f840>] worker_thread+0x0/0xb0
> [ 493.509741] [<c01320d2>] kthread+0x42/0x70
> [ 493.509742] [<c0132090>] kthread+0x0/0x70
> [ 493.509744] [<c0103eff>] kernel_thread_helper+0x7/0x18
> [ 493.509748] Code: 90 8d 74 26 00 0f 0b eb fe 8d b6 00 00 00 00 8d bf 00 00 00 00 81 38 00 01 10 00 74 0e 81 78 04 00 02 20 00 74 0f e9 6a fc ff ff <0f> 0b eb fe 8d b6 00 00 00 00 0f 0b eb fe 8d b6 00 00 00 00 8d
> [ 493.509766] EIP: [<c03c6726>] commit_callback+0x16/0x30 SS:ESP 0068:efacdf10
> [ 493.509794] ---[ end trace 05237464c530eabb ]---
> [ 517.575348] iscsi_trgt: Logical Unit Reset (05) issued on tid:1 lun:0 by sid:281475899523136 (Function Complete)
>
> Jacky
> .

Hi

So apply this on the top of both patches (on 2.6.28.x)

(this is not supposed to fix the bug, it will just add more debug tests)

BTW. what is the chunk size of the snapshot store you are testing?

Mikulas

---
drivers/md/dm-exception-store.c | 85 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 84 insertions(+), 1 deletion(-)

Index: linux-2.6.28-clean/drivers/md/dm-exception-store.c
================================================== =================
--- linux-2.6.28-clean.orig/drivers/md/dm-exception-store.c 2009-02-06 23:26:55.000000000 +0100
+++ linux-2.6.28-clean/drivers/md/dm-exception-store.c 2009-02-06 23:39:25.000000000 +0100
@@ -131,6 +131,8 @@ struct pstore {
struct dm_io_client *io_client;

struct workqueue_struct *metadata_wq;
+
+ struct task_struct *process;
};

static unsigned sectors_to_pages(unsigned sectors)
@@ -577,6 +579,48 @@ static int persistent_prepare(struct exc
return 0;
}

+struct dm_snap_pending_exception {
+ struct dm_snap_exception e;
+
+ /*
+ * Origin buffers waiting for this to complete are held
+ * in a bio list
+ */
+ struct bio_list origin_bios;
+ struct bio_list snapshot_bios;
+
+ /*
+ * Short-term queue of pending exceptions prior to submission.
+ */
+ struct list_head list;
+
+ /*
+ * The primary pending_exception is the one that holds
+ * the ref_count and the list of origin_bios for a
+ * group of pending_exceptions. It is always last to get freed.
+ * These fields get set up when writing to the origin.
+ */
+ struct dm_snap_pending_exception *primary_pe;
+
+ /*
+ * Number of pending_exceptions processing this chunk.
+ * When this drops to zero we must complete the origin bios.
+ * If incrementing or decrementing this, hold pe->snap->lock for
+ * the sibling concerned and not pe->primary_pe->snap->lock unless
+ * they are the same.
+ */
+ atomic_t ref_count;
+
+ /* Pointer back to snapshot context */
+ struct dm_snapshot *snap;
+
+ /*
+ * 1 indicates the exception has already been sent to
+ * kcopyd.
+ */
+ int started;
+};
+
static void persistent_commit(struct exception_store *store,
struct dm_snap_exception *e,
void (*callback) (void *, int success),
@@ -586,6 +630,16 @@ static void persistent_commit(struct exc
struct pstore *ps = get_info(store);
struct disk_exception de;
struct commit_callback *cb;
+ struct dm_snap_pending_exception *pe;
+
+ if (!ps->process)
+ ps->process = current;
+
+ BUG_ON(ps->process != current);
+
+ pe = callback_context;
+ BUG_ON(pe->e.hash_list.next == LIST_POISON1);
+ BUG_ON(pe->e.hash_list.prev == LIST_POISON2);

de.old_chunk = e->old_chunk;
de.new_chunk = e->new_chunk;
@@ -601,13 +655,30 @@ static void persistent_commit(struct exc
cb->callback = callback;
cb->context = callback_context;

+ for (i = 0; i < ps->callback_count; i++) {
+ cb = ps->callbacks + i;
+ pe = cb->context;
+ BUG_ON(pe->e.hash_list.next == LIST_POISON1);
+ BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
+ }
/*
* If there are exceptions in flight and we have not yet
* filled this metadata area there's nothing more to do.
*/
if (!atomic_dec_and_test(&ps->pending_count) &&
- (ps->current_committed != ps->exceptions_per_area))
+ (ps->current_committed != ps->exceptions_per_area)) {
+ pe = callback_context;
+ BUG_ON(pe->e.hash_list.next == LIST_POISON1);
+ BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
return;
+ }
+ for (i = 0; i < ps->callback_count; i++) {
+ cb = ps->callbacks + i;
+ pe = cb->context;
+ BUG_ON(pe->e.hash_list.next == LIST_POISON1);
+ BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
+ }
+

/*
* If we completely filled the current area, then wipe the next one.
@@ -633,6 +704,16 @@ static void persistent_commit(struct exc

for (i = 0; i < ps->callback_count; i++) {
cb = ps->callbacks + i;
+ pe = cb->context;
+ BUG_ON(pe->e.hash_list.next == LIST_POISON1);
+ BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
+ }
+
+ for (i = 0; i < ps->callback_count; i++) {
+ cb = ps->callbacks + i;
+ pe = cb->context;
+ BUG_ON(pe->e.hash_list.next == LIST_POISON1);
+ BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
cb->callback(cb->context, ps->valid);
}

@@ -668,6 +749,8 @@ int dm_create_persistent(struct exceptio
atomic_set(&ps->pending_count, 0);
ps->callbacks = NULL;

+ ps->process = NULL;
+
ps->metadata_wq = create_singlethread_workqueue("ksnaphd");
if (!ps->metadata_wq) {
kfree(ps);

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-09-2009, 07:13 AM
Mikulas Patocka
 
Default 2.6.28.2 & dm-snapshot or kcopyd Oops

On Sat, 7 Feb 2009, Jacky Kim wrote:

> Hi
>
> I create PV over a RAID set, and its stripe size is 64KB.
> The chunk size of the snapshot is 4KB, is it too small? then what size
> is better?

It means that there is another bug besides the one I have just fixed :-(

So try another patch (on the top of them all) with even more debug points.

Mikulas

> I test with kernel 2.6.28.2, and get the follow message:
>
> [ 531.209879] ------------[ cut here ]------------
> [ 531.209884] kernel BUG at drivers/md/dm-exception-store.c:715!
> [ 531.209886] invalid opcode: 0000 [#1] SMP
> [ 531.209888] last sysfs file: /sys/devices/virtual/block/dm-11/dev
> [ 531.209890] Modules linked in: iscsi_trgt arcmsr bonding e1000
> [ 531.209893]
> [ 531.209896] Pid: 8241, comm: kcopyd Not tainted (2.6.28.2-dm #6) S5000PSL
> [ 531.209898] EIP: 0060:[<c03c7dc2>] EFLAGS: 00010246 CPU: 1
> [ 531.209903] EIP is at persistent_commit+0x222/0x280
> [ 531.209905] EAX: f5385708 EBX: 00000006 ECX: fabab030 EDX: 00000000
> [ 531.209906] ESI: 00000000 EDI: ef35f840 EBP: 00000075 ESP: f4fc3f14
> [ 531.209908] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 531.209910] Process kcopyd (pid: 8241, ti=f4fc2000 task=f4f16800 task.ti=f4fc2000)
> [ 531.209911] Stack:
> [ 531.209912] c03c6710 000165da 00000000 f5015348 f482f2c0 00000000 c03c66b0 c03c66e3
> [ 531.209916] f5015348 f487f160 ef52e1c0 c03c13f8 f4f16800 f5015348 00000000 f487f160
> [ 531.209919] f487f164 ef52e1fc ef52e200 c03c119c c03c13a0 ef52e1c0 00000001 ef52e1ec
> [ 531.209923] Call Trace:
> [ 531.209925] [<c03c6710>] commit_callback+0x0/0x30
> [ 531.209928] [<c03c66b0>] copy_callback+0x0/0x60
> [ 531.209935] [<c03c66e3>] copy_callback+0x33/0x60
> [ 531.209938] [<c03c13f8>] run_complete_job+0x58/0xa0
> [ 531.209945] [<c03c119c>] process_jobs+0x4c/0xe0
> [ 531.209947] [<c03c13a0>] run_complete_job+0x0/0xa0
> [ 531.209950] [<c03c1230>] do_work+0x0/0x50
> [ 531.209951] [<c03c124e>] do_work+0x1e/0x50
> [ 531.209953] [<c012ef32>] run_workqueue+0x72/0x100
> [ 531.209962] [<c0132570>] prepare_to_wait+0x20/0x60
> [ 531.209965] [<c012f840>] worker_thread+0x0/0xb0
> [ 531.209972] [<c012f8b9>] worker_thread+0x79/0xb0
> [ 531.209974] [<c01323d0>] autoremove_wake_function+0x0/0x50
> [ 531.209976] [<c012f840>] worker_thread+0x0/0xb0
> [ 531.209978] [<c01320d2>] kthread+0x42/0x70
> [ 531.209980] [<c0132090>] kthread+0x0/0x70
> [ 531.209982] [<c0103eff>] kernel_thread_helper+0x7/0x18
> [ 531.209984] Code: 0b eb fe 0f 0b eb fe ba 01 00 00 00 89 f8 e8 d6 f8 ff ff 85 c0 0f 84 18 ff ff ff c7 47 08 00 00 00 00 e9 0c ff ff ff 0f 0b eb fe <0f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 0f 0b eb fe 83
> [ 531.210006] EIP: [<c03c7dc2>] persistent_commit+0x222/0x280 SS:ESP 0068:f4fc3f14
> [ 531.210010] ---[ end trace fc1bc1bb8712a6ff ]---
> [ 556.042136] iscsi_trgt: Logical Unit Reset (05) issued on tid:1 lun:0 by sid:281475899523136 (Function Complete)
>
> Jacky
> .

---
drivers/md/dm-exception-store.c | 7 +++++++
drivers/md/dm-kcopyd.c | 34 +++++++++++++++++++++++++++++++++-
drivers/md/dm-snap.c | 4 ++++
3 files changed, 44 insertions(+), 1 deletion(-)

Index: linux-2.6.28-clean/drivers/md/dm-exception-store.c
================================================== =================
--- linux-2.6.28-clean.orig/drivers/md/dm-exception-store.c 2009-02-09 08:43:40.000000000 +0100
+++ linux-2.6.28-clean/drivers/md/dm-exception-store.c 2009-02-09 08:43:46.000000000 +0100
@@ -645,6 +645,13 @@ static void persistent_commit(struct exc
de.new_chunk = e->new_chunk;
write_exception(ps, ps->current_committed++, &de);

+ for (i = 0; i < ps->callback_count; i++) {
+ cb = ps->callbacks + i;
+ pe = cb->context;
+ BUG_ON(pe->e.hash_list.next == LIST_POISON1);
+ BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
+ BUG_ON(pe == callback_context);
+ }
/*
* Add the callback to the back of the array. This code
* is the only place where the callback array is
Index: linux-2.6.28-clean/drivers/md/dm-snap.c
================================================== =================
--- linux-2.6.28-clean.orig/drivers/md/dm-snap.c 2009-02-09 08:43:40.000000000 +0100
+++ linux-2.6.28-clean/drivers/md/dm-snap.c 2009-02-09 08:43:46.000000000 +0100
@@ -979,6 +979,10 @@ static void start_copy(struct dm_snap_pe
struct dm_io_region src, dest;
struct block_device *bdev = s->origin->bdev;
sector_t dev_size;
+ BUG_ON(!pe->started);
+ BUG_ON(pe->started == 2);
+ BUG_ON(pe->started != 1);
+ pe->started = 2;

dev_size = get_dev_size(bdev);

Index: linux-2.6.28-clean/drivers/md/dm-kcopyd.c
================================================== =================
--- linux-2.6.28-clean.orig/drivers/md/dm-kcopyd.c 2009-02-09 08:43:40.000000000 +0100
+++ linux-2.6.28-clean/drivers/md/dm-kcopyd.c 2009-02-09 08:44:47.000000000 +0100
@@ -60,6 +60,7 @@ struct dm_kcopyd_client {
struct list_head complete_jobs;
struct list_head io_jobs;
struct list_head pages_jobs;
+ struct list_head all_jobs;
};

static void wake(struct dm_kcopyd_client *kc)
@@ -209,6 +210,8 @@ struct kcopyd_job {
dm_kcopyd_notify_fn fn;
void *context;

+ struct list_head list_all;
+
/*
* These fields are only used if the job has been split
* into more manageable parts.
@@ -280,6 +283,9 @@ static void push_head(struct list_head *
spin_unlock_irqrestore(&kc->job_lock, flags);
}

+static void segment_complete(int read_err, unsigned long write_err,
+ void *context);
+
/*
* These three functions process 1 item from the corresponding
* job list.
@@ -291,6 +297,8 @@ static void push_head(struct list_head *
*/
static int run_complete_job(struct kcopyd_job *job)
{
+ struct kcopyd_job *jobb;
+ unsigned long flags;
void *context = job->context;
int read_err = job->read_err;
unsigned long write_err = job->write_err;
@@ -299,6 +307,18 @@ static int run_complete_job(struct kcopy

if (job->pages)
kcopyd_put_pages(kc, job->pages);
+
+ if (fn != segment_complete) {
+ spin_lock_irqsave(&kc->job_lock, flags);
+ list_del(&job->list_all);
+ spin_unlock_irqrestore(&kc->job_lock, flags);
+
+ spin_lock_irqsave(&kc->job_lock, flags);
+ list_for_each_entry(jobb, &kc->all_jobs, list_all)
+ BUG_ON(jobb->fn == fn && jobb->context == context);
+ spin_unlock_irqrestore(&kc->job_lock, flags);
+ }
+
mempool_free(job, kc->job_pool);
fn(read_err, write_err, context);

@@ -535,7 +555,8 @@ int dm_kcopyd_copy(struct dm_kcopyd_clie
unsigned int num_dests, struct dm_io_region *dests,
unsigned int flags, dm_kcopyd_notify_fn fn, void *context)
{
- struct kcopyd_job *job;
+ struct kcopyd_job *job, *jobb;
+ unsigned long fflags;

/*
* Allocate a new job.
@@ -563,6 +584,15 @@ int dm_kcopyd_copy(struct dm_kcopyd_clie
job->fn = fn;
job->context = context;

+ spin_lock_irqsave(&kc->job_lock, fflags);
+ list_for_each_entry(jobb, &kc->all_jobs, list_all)
+ BUG_ON(jobb->fn == fn && jobb->context == context);
+ spin_unlock_irqrestore(&kc->job_lock, fflags);
+
+ spin_lock_irqsave(&kc->job_lock, fflags);
+ list_add_tail(&job->list_all, &kc->all_jobs);
+ spin_unlock_irqrestore(&kc->job_lock, fflags);
+
if (job->source.count < SUB_JOB_SIZE)
dispatch_job(job);

@@ -603,6 +633,7 @@ int dm_kcopyd_client_create(unsigned int

spin_lock_init(&kc->lock);
spin_lock_init(&kc->job_lock);
+ INIT_LIST_HEAD(&kc->all_jobs);
INIT_LIST_HEAD(&kc->complete_jobs);
INIT_LIST_HEAD(&kc->io_jobs);
INIT_LIST_HEAD(&kc->pages_jobs);
@@ -652,6 +683,7 @@ void dm_kcopyd_client_destroy(struct dm_
/* Wait for completion of all jobs submitted by this client. */
wait_event(kc->destroyq, !atomic_read(&kc->nr_jobs));

+ BUG_ON(!list_empty(&kc->all_jobs));
BUG_ON(!list_empty(&kc->complete_jobs));
BUG_ON(!list_empty(&kc->io_jobs));
BUG_ON(!list_empty(&kc->pages_jobs));

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-10-2009, 10:36 PM
Mikulas Patocka
 
Default 2.6.28.2 & dm-snapshot or kcopyd Oops

On Tue, 10 Feb 2009, Jacky Kim wrote:

> Hi Mikulas,
>
> The latest debug messages is as follow:
>
> [ 2006.004571] ------------[ cut here ]------------
> [ 2006.004576] kernel BUG at drivers/md/dm-exception-store.c:653!
> [ 2006.004583] invalid opcode: 0000 [#1] SMP
> [ 2006.004585] last sysfs file: /sys/devices/virtual/block/dm-9/dev
> [ 2006.004587] Modules linked in: iscsi_trgt arcmsr bonding e1000
> [ 2006.004590]
> [ 2006.004593] Pid: 23442, comm: kcopyd Not tainted (2.6.28.2-dm #7) S5000PSL
> [ 2006.004595] EIP: 0060:[<c03c7ede>] EFLAGS: 00010246 CPU: 3
> [ 2006.004605] EIP is at persistent_commit+0x1ce/0x2c0
> [ 2006.004606] EAX: ef347948 EBX: fa7f5000 ECX: 00000001 EDX: 00000007
> [ 2006.004608] ESI: 00000000 EDI: f0f44a40 EBP: ef347948 ESP: efa1df04
> [ 2006.004609] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 2006.004611] Process kcopyd (pid: 23442, ti=efa1c000 task=f0c76460 task.ti=efa1c000)
> [ 2006.004613] Stack:
> [ 2006.004614] c03c6880 ef347948 000174b7 00000000 ef347948 f6f4c080 f0e535e4 f6e71240
> [ 2006.004617] c03c6853 ef347948 f6e71298 c03c6820 c03c1858 f6e7127c f6e7127c ef347948
> [ 2006.004621] 00000000 00000000 00000202 f0e535e4 f0e535e8 f6e7127c f6e71280 c03c119c
> [ 2006.004625] Call Trace:
> [ 2006.004626] [<c03c6880>] commit_callback+0x0/0x30
> [ 2006.004629] [<c03c6853>] copy_callback+0x33/0x60
> [ 2006.004632] [<c03c6820>] copy_callback+0x0/0x60
> [ 2006.004634] [<c03c1858>] run_complete_job+0x108/0x140
> [ 2006.004637] [<c03c119c>] process_jobs+0x4c/0xe0
> [ 2006.004639] [<c03c1750>] run_complete_job+0x0/0x140
> [ 2006.004641] [<c03c1230>] do_work+0x0/0x50
> [ 2006.004643] [<c03c124e>] do_work+0x1e/0x50
> [ 2006.004645] [<c012ef32>] run_workqueue+0x72/0x100
> [ 2006.004649] [<c0132570>] prepare_to_wait+0x20/0x60
> [ 2006.004652] [<c012f840>] worker_thread+0x0/0xb0
> [ 2006.004654] [<c012f8b9>] worker_thread+0x79/0xb0
> [ 2006.004656] [<c01323d0>] autoremove_wake_function+0x0/0x50
> [ 2006.004658] [<c012f840>] worker_thread+0x0/0xb0
> [ 2006.004660] [<c01320d2>] kthread+0x42/0x70
> [ 2006.004662] [<c0132090>] kthread+0x0/0x70
> [ 2006.004664] [<c0103eff>] kernel_thread_helper+0x7/0x18
> [ 2006.004667] Code: fe 8b 4f 28 3b 4f 0c 0f 84 6c ff ff ff 81 7d 00 00 01 10 00 74 4d 81 7d 04 00 02 20 00 0f 84 c8 00 00 00 83 c4 10 5b 5e 5f 5d c3 <0f> 0b eb fe 0f 0b eb fe 0f 0b eb fe 8d b6 00 00 00 00 64 a1 00
> [ 2006.004686] EIP: [<c03c7ede>] persistent_commit+0x1ce/0x2c0 SS:ESP 0068:efa1df04
> [ 2006.005020] ---[ end trace 6a5aa6c590d2cad7 ]---
>
> Jacky
> .

Hi

I have found another bug in dm-snapshots, I think it won't fix your crash
but it is a bug anyway, so it should be fixed. Apply the below patch.

I'll send you some more debug patches to pinpoint the crash you are
seeing.

Mikulas

---

__find_pending_exception drops the snapshot lock for the time of allocation
(this is correct and required behavior because the lock is held when freeing
the exception into the mempool)

After __find_pending_exception regains the lock, it checks if the exception
was already placed into &s->pending hash and it checks if the snapshot was
already invalidated.

But __find_pending_exception doesn't check if the exception was already
completed and placed into &s->complete hash. If the process waiting in
__find_pending_exception was delayed at this point because of a scheduling
latency and the exception was meanwhile completed, __find_pending_exception
will miss that and allocate another pending exception for already completed
chunk.

It will lead to a situation when two records for the same chunk exist and
it could lead to data corruption because multiple snapshot I/Os to the
affected chunk could be redirected to different locations in the snapshot.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>

Index: linux-2.6.28-clean/drivers/md/dm-snap.c
================================================== =================
--- linux-2.6.28-clean.orig/drivers/md/dm-snap.c 2009-02-11 00:26:47.000000000 +0100
+++ linux-2.6.28-clean/drivers/md/dm-snap.c 2009-02-11 00:27:54.000000000 +0100
@@ -1007,6 +1007,10 @@ static void start_copy(struct dm_snap_pe
* for this chunk, otherwise it allocates a new one and inserts
* it into the pending table.
*
+ * Returns: pointer to a pending exception
+ * ERR_PTR(-ENOMEM) pointer --- the snapshot is invalidated
+ * NULL --- the exception was already completed, the caller should recheck
+ *
* NOTE: a write lock must be held on snap->lock before calling
* this.
*/
@@ -1037,6 +1041,12 @@ __find_pending_exception(struct dm_snaps

if (!s->valid) {
free_pending_exception(pe);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ e = lookup_exception(&s->complete, chunk);
+ if (e) {
+ free_pending_exception(pe);
return NULL;
}

@@ -1056,7 +1066,7 @@ __find_pending_exception(struct dm_snaps

if (s->store.prepare_exception(&s->store, &pe->e)) {
free_pending_exception(pe);
- return NULL;
+ return ERR_PTR(-ENOMEM);
}

get_pending_exception(pe);
@@ -1100,6 +1110,7 @@ static int snapshot_map(struct dm_target
goto out_unlock;
}

+lookup_again:
/* If the block is already remapped - use that, else remap it */
e = lookup_exception(&s->complete, chunk);
if (e) {
@@ -1114,8 +1125,10 @@ static int snapshot_map(struct dm_target
*/
if (bio_rw(bio) == WRITE) {
pe = __find_pending_exception(s, bio);
- if (!pe) {
- __invalidate_snapshot(s, -ENOMEM);
+ if (!pe)
+ goto lookup_again;
+ if (IS_ERR(pe)) {
+ __invalidate_snapshot(s, PTR_ERR(pe));
r = -EIO;
goto out_unlock;
}
@@ -1248,8 +1261,10 @@ static int __origin_write(struct list_he
goto next_snapshot;

pe = __find_pending_exception(snap, bio);
- if (!pe) {
- __invalidate_snapshot(snap, -ENOMEM);
+ if (!pe)
+ goto next_snapshot;
+ if (IS_ERR(pe)) {
+ __invalidate_snapshot(snap, PTR_ERR(pe));
goto next_snapshot;
}


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-11-2009, 09:54 AM
Mikulas Patocka
 
Default 2.6.28.2 & dm-snapshot or kcopyd Oops

> Hi
>
> I have found another bug in dm-snapshots, I think it won't fix your crash
> but it is a bug anyway, so it should be fixed. Apply the below patch.
>
> I'll send you some more debug patches to pinpoint the crash you are
> seeing.
>
> Mikulas

Hi

Here is another patch to pinpoint your crashes, so apply it and run the
same workload. Thanks for testing it.

Mikulas

---
drivers/md/dm-exception-store.c | 63 +++++++++++++++++++++++++++++++++++-----
drivers/md/dm-snap.c | 51 +++++++++++++++++++++++++++++---
drivers/md/dm-snap.h | 3 +
3 files changed, 106 insertions(+), 11 deletions(-)

Index: linux-2.6.28-clean/drivers/md/dm-exception-store.c
================================================== =================
--- linux-2.6.28-clean.orig/drivers/md/dm-exception-store.c 2009-02-11 00:39:23.000000000 +0100
+++ linux-2.6.28-clean/drivers/md/dm-exception-store.c 2009-02-11 10:33:11.000000000 +0100
@@ -619,14 +619,18 @@ struct dm_snap_pending_exception {
* kcopyd.
*/
int started;
+
+ struct list_head list_all;
};

+static DEFINE_SPINLOCK(callback_spinlock);
+
static void persistent_commit(struct exception_store *store,
struct dm_snap_exception *e,
void (*callback) (void *, int success),
void *callback_context)
{
- unsigned int i;
+ unsigned int i, n;
struct pstore *ps = get_info(store);
struct disk_exception de;
struct commit_callback *cb;
@@ -641,32 +645,41 @@ static void persistent_commit(struct exc
BUG_ON(pe->e.hash_list.next == LIST_POISON1);
BUG_ON(pe->e.hash_list.prev == LIST_POISON2);

- de.old_chunk = e->old_chunk;
- de.new_chunk = e->new_chunk;
- write_exception(ps, ps->current_committed++, &de);
-
for (i = 0; i < ps->callback_count; i++) {
cb = ps->callbacks + i;
pe = cb->context;
BUG_ON(pe->e.hash_list.next == LIST_POISON1);
BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
BUG_ON(pe == callback_context);
+ BUG_ON(pe->started != 2);
+ }
+ for (i = 0; i < ps->current_committed; i++) {
+ read_exception(ps, i, &de);
+ BUG_ON(de.old_chunk == e->old_chunk);
+ BUG_ON(de.new_chunk == e->new_chunk);
}
+ de.old_chunk = e->old_chunk;
+ de.new_chunk = e->new_chunk;
+ write_exception(ps, ps->current_committed++, &de);
+
/*
* Add the callback to the back of the array. This code
* is the only place where the callback array is
* manipulated, and we know that it will never be called
* multiple times concurrently.
*/
+ spin_lock_irq(&callback_spinlock);
cb = ps->callbacks + ps->callback_count++;
cb->callback = callback;
cb->context = callback_context;
+ spin_unlock_irq(&callback_spinlock);

for (i = 0; i < ps->callback_count; i++) {
cb = ps->callbacks + i;
pe = cb->context;
BUG_ON(pe->e.hash_list.next == LIST_POISON1);
BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
+ BUG_ON(pe->started != 2);
}
/*
* If there are exceptions in flight and we have not yet
@@ -677,6 +690,7 @@ static void persistent_commit(struct exc
pe = callback_context;
BUG_ON(pe->e.hash_list.next == LIST_POISON1);
BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
+ BUG_ON(pe->started != 2);
return;
}
for (i = 0; i < ps->callback_count; i++) {
@@ -684,6 +698,7 @@ static void persistent_commit(struct exc
pe = cb->context;
BUG_ON(pe->e.hash_list.next == LIST_POISON1);
BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
+ BUG_ON(pe->started != 2);
}


@@ -714,17 +729,45 @@ static void persistent_commit(struct exc
pe = cb->context;
BUG_ON(pe->e.hash_list.next == LIST_POISON1);
BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
+ BUG_ON(pe->started != 2);
}

- for (i = 0; i < ps->callback_count; i++) {
+ spin_lock_irq(&callback_spinlock);
+ n = ps->callback_count;
+ ps->callback_count = 0;
+ spin_unlock_irq(&callback_spinlock);
+ for (i = 0; i < n; i++) {
cb = ps->callbacks + i;
pe = cb->context;
+ BUG_ON(pe->started != 2);
BUG_ON(pe->e.hash_list.next == LIST_POISON1);
BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
cb->callback(cb->context, ps->valid);
}
+}

- ps->callback_count = 0;
+static void persistent_check(struct exception_store *store, void *context, int line)
+{
+ struct pstore *ps = get_info(store);
+ unsigned long flags;
+ unsigned i;
+ struct commit_callback *cb;
+ struct dm_snap_pending_exception *pe;
+
+ spin_lock_irqsave(&callback_spinlock, flags);
+ for (i = 0; i < ps->callback_count; i++) {
+ cb = ps->callbacks + i;
+ pe = cb->context;
+ if (pe == context) {
+ printk("CALLED FROM LINE %d
", line);
+ BUG();
+ }
+ if (pe->started != 2) {
+ printk("CALLED FROM LINE %d
", line);
+ BUG();
+ }
+ }
+ spin_unlock_irqrestore(&callback_spinlock, flags);
}

static void persistent_drop(struct exception_store *store)
@@ -769,6 +812,7 @@ int dm_create_persistent(struct exceptio
store->read_metadata = persistent_read_metadata;
store->prepare_exception = persistent_prepare;
store->commit_exception = persistent_commit;
+ store->check_pending_exception = persistent_check;
store->drop_snapshot = persistent_drop;
store->fraction_full = persistent_fraction_full;
store->context = ps;
@@ -824,6 +868,10 @@ static void transient_fraction_full(stru
*denominator = get_dev_size(store->snap->cow->bdev);
}

+static void transient_check(struct exception_store *store, void *context, int line)
+{
+}
+
int dm_create_transient(struct exception_store *store)
{
struct transient_c *tc;
@@ -832,6 +880,7 @@ int dm_create_transient(struct exception
store->read_metadata = transient_read_metadata;
store->prepare_exception = transient_prepare;
store->commit_exception = transient_commit;
+ store->check_pending_exception = transient_check;
store->drop_snapshot = NULL;
store->fraction_full = transient_fraction_full;

Index: linux-2.6.28-clean/drivers/md/dm-snap.h
================================================== =================
--- linux-2.6.28-clean.orig/drivers/md/dm-snap.h 2009-02-11 00:39:36.000000000 +0100
+++ linux-2.6.28-clean/drivers/md/dm-snap.h 2009-02-11 09:54:45.000000000 +0100
@@ -114,6 +114,9 @@ struct exception_store {
void (*callback) (void *, int success),
void *callback_context);

+ void (*check_pending_exception)(struct exception_store *store,
+ void *callback_context, int line);
+
/*
* The snapshot is invalid, note this in the metadata.
*/
Index: linux-2.6.28-clean/drivers/md/dm-snap.c
================================================== =================
--- linux-2.6.28-clean.orig/drivers/md/dm-snap.c 2009-02-11 03:15:26.000000000 +0100
+++ linux-2.6.28-clean/drivers/md/dm-snap.c 2009-02-11 10:20:55.000000000 +0100
@@ -88,6 +88,8 @@ struct dm_snap_pending_exception {
* kcopyd.
*/
int started;
+
+ struct list_head list_all;
};

/*
@@ -365,20 +367,45 @@ static void free_exception(struct dm_sna
kmem_cache_free(exception_cache, e);
}

+static DEFINE_SPINLOCK(pending_spinlock);
+static LIST_HEAD(pending_all);
+
static struct dm_snap_pending_exception *alloc_pending_exception(struct dm_snapshot *s)
{
struct dm_snap_pending_exception *pe = mempool_alloc(s->pending_pool,
GFP_NOIO);
+ struct dm_snap_pending_exception *pe2;

atomic_inc(&s->pending_exceptions_count);
pe->snap = s;

+ spin_lock(&pending_spinlock);
+ list_for_each_entry(pe2, &pending_all, list_all) {
+ BUG_ON(pe2 == pe);
+ }
+ list_add(&pe->list_all, &pending_all);
+ spin_unlock(&pending_spinlock);
+
+ s->store.check_pending_exception(&s->store, pe, __LINE__);
+
return pe;
}

static void free_pending_exception(struct dm_snap_pending_exception *pe)
{
struct dm_snapshot *s = pe->snap;
+ struct dm_snap_pending_exception *pe2;
+
+ s->store.check_pending_exception(&s->store, pe, __LINE__);
+
+ spin_lock(&pending_spinlock);
+ list_for_each_entry(pe2, &pending_all, list_all) {
+ if (pe2 == pe) goto found;
+ }
+ BUG();
+ found:
+ list_del(&pe->list_all);
+ spin_unlock(&pending_spinlock);

mempool_free(pe, s->pending_pool);
smp_mb__before_atomic_dec();
@@ -831,6 +858,8 @@ static struct bio *put_pending_exception
struct dm_snap_pending_exception *primary_pe;
struct bio *origin_bios = NULL;

+ pe->snap->store.check_pending_exception(&pe->snap->store, pe, __LINE__);
+
primary_pe = pe->primary_pe;

/*
@@ -862,12 +891,15 @@ static void pending_complete(struct dm_s
struct bio *snapshot_bios = NULL;
int error = 0;

+ s->store.check_pending_exception(&s->store, pe, __LINE__);
+
BUG_ON(pe->e.hash_list.next == LIST_POISON1);
BUG_ON(pe->e.hash_list.prev == LIST_POISON2);

if (!success) {
/* Read/write error - snapshot is unusable */
down_write(&s->lock);
+ s->store.check_pending_exception(&s->store, pe, __LINE__);
__invalidate_snapshot(s, -EIO);
BUG_ON(pe->e.hash_list.next == LIST_POISON1);
BUG_ON(pe->e.hash_list.prev == LIST_POISON2);
@@ -885,6 +917,7 @@ static void pending_complete(struct dm_s

if (!e) {
down_write(&s->lock);
+ s->store.check_pending_exception(&s->store, pe, __LINE__);
__invalidate_snapshot(s, -ENOMEM);
error = 1;
goto out;
@@ -892,6 +925,7 @@ static void pending_complete(struct dm_s
*e = pe->e;

down_write(&s->lock);
+ s->store.check_pending_exception(&s->store, pe, __LINE__);
BUG_ON(pe->e.hash_list.next == LIST_POISON1);
BUG_ON(pe->e.hash_list.prev == LIST_POISON2);

@@ -943,6 +977,7 @@ static void commit_callback(void *contex
{
struct dm_snap_pending_exception *pe = context;

+ pe->snap->store.check_pending_exception(&pe->snap->store, pe, __LINE__);
BUG_ON(pe->e.hash_list.next == LIST_POISON1);
BUG_ON(pe->e.hash_list.prev == LIST_POISON2);

@@ -961,13 +996,15 @@ static void copy_callback(int read_err,
BUG_ON(pe->e.hash_list.next == LIST_POISON1);
BUG_ON(pe->e.hash_list.prev == LIST_POISON2);

- if (read_err || write_err)
+ if (read_err || write_err) {
+ s->store.check_pending_exception(&s->store, pe, __LINE__);
pending_complete(pe, 0);
-
- else
+ } else {
+ s->store.check_pending_exception(&s->store, pe, __LINE__);
/* Update the metadata if we are persistent */
s->store.commit_exception(&s->store, &pe->e, commit_callback,
pe);
+ }
}

/*
@@ -979,6 +1016,7 @@ static void start_copy(struct dm_snap_pe
struct dm_io_region src, dest;
struct block_device *bdev = s->origin->bdev;
sector_t dev_size;
+ s->store.check_pending_exception(&s->store, pe, __LINE__);
BUG_ON(!pe->started);
BUG_ON(pe->started == 2);
BUG_ON(pe->started != 1);
@@ -1038,6 +1076,7 @@ __find_pending_exception(struct dm_snaps
up_write(&s->lock);
pe = alloc_pending_exception(s);
down_write(&s->lock);
+ s->store.check_pending_exception(&s->store, pe, __LINE__);

if (!s->valid) {
free_pending_exception(pe);
@@ -1071,6 +1110,7 @@ __find_pending_exception(struct dm_snaps

get_pending_exception(pe);
insert_exception(&s->pending, &pe->e);
+ s->store.check_pending_exception(&s->store, pe, __LINE__);

out:
return pe;
@@ -1291,6 +1331,7 @@ static int __origin_write(struct list_he
}

if (!pe->started) {
+ snap->store.check_pending_exception(&snap->store, pe, __LINE__);
pe->started = 1;
list_add_tail(&pe->list, &pe_queue);
}
@@ -1319,8 +1360,10 @@ static int __origin_write(struct list_he
/*
* Now that we have a complete pe list we can start the copying.
*/
- list_for_each_entry_safe(pe, next_pe, &pe_queue, list)
+ list_for_each_entry_safe(pe, next_pe, &pe_queue, list) {
+ pe->snap->store.check_pending_exception(&pe->snap->store, pe, __LINE__);
start_copy(pe);
+ }

return r;
}

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-13-2009, 05:41 AM
Mikulas Patocka
 
Default 2.6.28.2 & dm-snapshot or kcopyd Oops

Hi

Few more questions:
- do you write to the snapshots?
- do you create/delete snapshots while the crash happens?
- do all the snapshots have the same chunk size?
- how many snapshots do you have?

Mikulas

> Hi
>
> With the two patches of yesterday, the latest debug infomation is as follow:
>
> [ 197.073630] ------------[ cut here ]------------
> [ 197.073633] kernel BUG at drivers/md/dm-snap.c:405!
> [ 197.073635] invalid opcode: 0000 [#1] SMP
> [ 197.073637] last sysfs file: /sys/devices/virtual/block/dm-6/dev
> [ 197.073639] Modules linked in: iscsi_trgt arcmsr bonding e1000
> [ 197.073642]
> [ 197.073645] Pid: 4268, comm: kcopyd Not tainted (2.6.28.2-dm #8) S5000PSL
> [ 197.073647] EIP: 0060:[<c03c5da8>] EFLAGS: 00010246 CPU: 2
> [ 197.073653] EIP is at free_pending_exception+0x88/0x90
> [ 197.073654] EAX: c05a0ce8 EBX: f55449f0 ECX: 00000282 EDX: ef3056d0
> [ 197.073656] ESI: f6c65bc0 EDI: 00000000 EBP: ef2a34e0 ESP: ef3d5ecc
> [ 197.073657] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 197.073659] Process kcopyd (pid: 4268, ti=ef3d4000 task=f6d77bc0 task.ti=ef3d4000)
> [ 197.073660] Stack:
> [ 197.073661] f55449f0 00000000 c03c6918 f75fbf70 ef2843c8 f75fbec0 00000000 ef2409c0
> [ 197.073665] 00000001 00000001 ef2a34e0 c03c82ca c03c6a20 002689a4 00000000 000030d8
> [ 197.073668] 00000000 f75fbf2c ef2a34e0 f75fbec0 eeded3c0 c03c69e2 ef2a34e0 eeded418
> [ 197.073672] Call Trace:
> [ 197.073674] [<c03c6918>] pending_complete+0x2d8/0x360
> [ 197.073677] [<c03c82ca>] persistent_commit+0x34a/0x380
> [ 197.073679] [<c03c6a20>] commit_callback+0x0/0x40
> [ 197.073681] [<c03c69e2>] copy_callback+0x42/0x80
> [ 197.073683] [<c03c69a0>] copy_callback+0x0/0x80
> [ 197.073686] [<c03c1858>] run_complete_job+0x108/0x140
> [ 197.073688] [<c03c119c>] process_jobs+0x4c/0xe0
> [ 197.073690] [<c03c1750>] run_complete_job+0x0/0x140
> [ 197.073692] [<c03c1230>] do_work+0x0/0x50
> [ 197.073694] [<c03c124e>] do_work+0x1e/0x50
> [ 197.073696] [<c012ef32>] run_workqueue+0x72/0x100
> [ 197.073700] [<c0132570>] prepare_to_wait+0x20/0x60
> [ 197.073703] [<c012f840>] worker_thread+0x0/0xb0
> [ 197.073705] [<c012f8b9>] worker_thread+0x79/0xb0
> [ 197.073707] [<c01323d0>] autoremove_wake_function+0x0/0x50
> [ 197.073710] [<c012f840>] worker_thread+0x0/0xb0
> [ 197.073712] [<c01320d2>] kthread+0x42/0x70
> [ 197.073713] [<c0132090>] kthread+0x0/0x70
> [ 197.073715] [<c0103eff>] kernel_thread_helper+0x7/0x18
> [ 197.073718] Code: 04 89 42 04 89 10 c7 41 04 00 02 20 00 c7 43 40 00 01 10 00 fe 05 94 41 65 c0 8b 56 48 89 d8 e8 3f 30 d8 ff f0 ff 4e 4c 5b 5e c3 <0f> 0b eb fe 8d 74 26 00 55 89 c5 57 56 53 83 ec 0c 8b 75 30 8b
> [ 197.073737] EIP: [<c03c5da8>] free_pending_exception+0x88/0x90 SS:ESP 0068:ef3d5ecc
> [ 197.073754] ---[ end trace beb1c34d6186bec5 ]---
>
> Jacky Kim
> .

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-13-2009, 06:31 AM
Mikulas Patocka
 
Default 2.6.28.2 & dm-snapshot or kcopyd Oops

On Fri, 13 Feb 2009, Mikulas Patocka wrote:

> Hi
>
> Few more questions:
> - do you write to the snapshots?
> - do you create/delete snapshots while the crash happens?
> - do all the snapshots have the same chunk size?
> - how many snapshots do you have?

another question: do you resize the origin device? or the snapshots?

One more test patch. (the code looks so terrible that I am thinking about
submitting you a patch that erases this logic all and reimplements it).

Mikulas

---
drivers/md/dm-exception-store.c | 2 +
drivers/md/dm-snap.c | 52 ++++++++++++++++++++++++++++++++++------
2 files changed, 47 insertions(+), 7 deletions(-)

Index: linux-2.6.28-clean/drivers/md/dm-snap.c
================================================== =================
--- linux-2.6.28-clean.orig/drivers/md/dm-snap.c 2009-02-13 07:49:49.000000000 +0100
+++ linux-2.6.28-clean/drivers/md/dm-snap.c 2009-02-13 08:26:54.000000000 +0100
@@ -90,6 +90,8 @@ struct dm_snap_pending_exception {
int started;

struct list_head list_all;
+
+ int magic;
};

/*
@@ -378,10 +380,13 @@ static struct dm_snap_pending_exception

atomic_inc(&s->pending_exceptions_count);
pe->snap = s;
+ pe->primary_pe = NULL;
+ pe->magic = 0x12345678;

spin_lock(&pending_spinlock);
list_for_each_entry(pe2, &pending_all, list_all) {
BUG_ON(pe2 == pe);
+ BUG_ON(pe2->primary_pe == pe);
}
list_add(&pe->list_all, &pending_all);
spin_unlock(&pending_spinlock);
@@ -400,13 +405,21 @@ static void free_pending_exception(struc

spin_lock(&pending_spinlock);
list_for_each_entry(pe2, &pending_all, list_all) {
+ if (pe2->primary_pe == pe && pe2 != pe) {
+ printk(KERN_ALERT "Freeing referenced exception, pe %p, pe->magic %x, pe2 %p, pe2->magic %x, pe2->primary_pe, %p, pe->ref_count %d
", pe, pe->magic, pe2, pe2->magic, pe2->primary_pe, atomic_read(&pe->ref_count));
+ BUG();
+ }
+ }
+ list_for_each_entry(pe2, &pending_all, list_all) {
if (pe2 == pe) goto found;
}
+ printk(KERN_ALERT "Freeing free exception pe %p, magic %x, pe->primary_pe %p, pe->ref_count %d
", pe, pe->magic, pe->primary_pe, atomic_read(&pe->ref_count));
BUG();
found:
list_del(&pe->list_all);
spin_unlock(&pending_spinlock);

+ pe->magic = 0x90abcdef;
mempool_free(pe, s->pending_pool);
smp_mb__before_atomic_dec();
atomic_dec(&s->pending_exceptions_count);
@@ -862,6 +875,13 @@ static struct bio *put_pending_exception

primary_pe = pe->primary_pe;

+ if (primary_pe) {
+ if (atomic_read(&primary_pe->ref_count) <= 0 || primary_pe->magic != 0x12345678) {
+ printk(KERN_ALERT "Bad ref count, pe %p, pe->magic %x, primary_pe %p, primary_pe->magic %x, primary_pe->ref_count %d
", pe, pe->magic, pe->primary_pe, primary_pe->magic, atomic_read(&primary_pe->ref_count));
+ BUG();
+ }
+ }
+
/*
* If this pe is involved in a write to the origin and
* it is the last sibling to complete then release
@@ -1313,11 +1333,19 @@ static int __origin_write(struct list_he
* Either every pe here has same
* primary_pe or none has one yet.
*/
- if (pe->primary_pe)
+ if (pe->primary_pe) {
primary_pe = pe->primary_pe;
- else {
+ if (atomic_read(&primary_pe->ref_count) <= 0 || primary_pe->magic != 0x12345678) {
+ printk(KERN_ALERT "Bad ref count, pe %p, pe->magic %x, primary_pe %p, primary_pe->magic %x, primary_pe->ref_count %d
", pe, pe->magic, primary_pe, primary_pe->magic, atomic_read(&primary_pe->ref_count));
+ BUG();
+ }
+ } else {
primary_pe = pe;
first = 1;
+ if (atomic_read(&primary_pe->ref_count) <= 0 || primary_pe->magic != 0x12345678) {
+ printk(KERN_ALERT "Bad ref count, pe %p, pe->magic %x, primary_pe %p, primary_pe->magic %x, primary_pe->ref_count %d
", pe, pe->magic, pe->primary_pe, primary_pe->magic, atomic_read(&primary_pe->ref_count));
+ BUG();
+ }
}

bio_list_add(&primary_pe->origin_bios, bio);
@@ -1327,6 +1355,10 @@ static int __origin_write(struct list_he

if (!pe->primary_pe) {
pe->primary_pe = primary_pe;
+ if (atomic_read(&primary_pe->ref_count) <= 0 || primary_pe->magic != 0x12345678) {
+ printk(KERN_ALERT "Bad ref count, pe %p, pe->magic %x, primary_pe %p, primary_pe->magic %x, primary_pe->ref_count %d
", pe, pe->magic, primary_pe, primary_pe->magic, atomic_read(&primary_pe->ref_count));
+ BUG();
+ }
get_pending_exception(primary_pe);
}

@@ -1350,11 +1382,17 @@ static int __origin_write(struct list_he
* us here to remove the primary_pe and submit any origin_bios.
*/

- if (first && atomic_dec_and_test(&primary_pe->ref_count)) {
- flush_bios(bio_list_get(&primary_pe->origin_bios));
- free_pending_exception(primary_pe);
- /* If we got here, pe_queue is necessarily empty. */
- return r;
+ if (first) {
+ if (atomic_read(&primary_pe->ref_count) <= 0 || primary_pe->magic != 0x12345678) {
+ printk(KERN_ALERT "Bad ref count, primary_pe %p, primary_pe->magic %x, primary_pe->primary_pe %p, primary_pe->ref_count %d
", primary_pe, primary_pe->magic, primary_pe->primary_pe, atomic_read(&primary_pe->ref_count));
+ BUG();
+ }
+ if (atomic_dec_and_test(&primary_pe->ref_count)) {
+ flush_bios(bio_list_get(&primary_pe->origin_bios));
+ free_pending_exception(primary_pe);
+ /* If we got here, pe_queue is necessarily empty. */
+ return r;
+ }
}

/*
Index: linux-2.6.28-clean/drivers/md/dm-exception-store.c
================================================== =================
--- linux-2.6.28-clean.orig/drivers/md/dm-exception-store.c 2009-02-13 08:20:45.000000000 +0100
+++ linux-2.6.28-clean/drivers/md/dm-exception-store.c 2009-02-13 08:20:50.000000000 +0100
@@ -621,6 +621,8 @@ struct dm_snap_pending_exception {
int started;

struct list_head list_all;
+
+ int magic;
};

static DEFINE_SPINLOCK(callback_spinlock);

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-16-2009, 06:14 AM
Mikulas Patocka
 
Default 2.6.28.2 & dm-snapshot or kcopyd Oops

On Fri, 13 Feb 2009, Jacky Kim wrote:

> Hi,
>
> 1. I test with the origin LV, and don't write to snapshots.
> 2. I don't delete snapshots when copy big files to the origin LV, but do create snapshot sometimes.
> 3. The snapshots have the same chunk size: 4KB or 64KB.
> 4. It seems to work well with 1 snapshot. But it always crashes each time with 2 snapshots.
> 5. I don't resize the origin device and the snapshots.
>
> The latest debug information is as follows:

Ohhh, the last patch from me was wrong, so it could produce a false
positive, so its result is unusable. Apply this on the top of it.

Mikulas

---
drivers/md/dm-snap.c | 1 +
1 file changed, 1 insertion(+)

Index: linux-2.6.28-snap-debug/drivers/md/dm-snap.c
================================================== =================
--- linux-2.6.28-snap-debug.orig/drivers/md/dm-snap.c 2009-02-16 08:09:22.000000000 +0100
+++ linux-2.6.28-snap-debug/drivers/md/dm-snap.c 2009-02-16 08:09:47.000000000 +0100
@@ -874,6 +874,7 @@ static struct bio *put_pending_exception
pe->snap->store.check_pending_exception(&pe->snap->store, pe, __LINE__);

primary_pe = pe->primary_pe;
+ pe->primary_pe = NULL;

if (primary_pe) {
if (atomic_read(&primary_pe->ref_count) <= 0 || primary_pe->magic != 0x12345678) {

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 02-16-2009, 06:40 AM
Mikulas Patocka
 
Default 2.6.28.2 & dm-snapshot or kcopyd Oops

On Fri, 13 Feb 2009, Jacky Kim wrote:

> Hi,
>
> 1. I test with the origin LV, and don't write to snapshots.
> 2. I don't delete snapshots when copy big files to the origin LV, but do create snapshot sometimes.
> 3. The snapshots have the same chunk size: 4KB or 64KB.
> 4. It seems to work well with 1 snapshot. But it always crashes each time with 2 snapshots.
> 5. I don't resize the origin device and the snapshots.
>
> The latest debug information is as follows:

Hi

As an alternative, try this patch on a clean 2.6.28.* kernel. It combines
patches for two bugs found so far + complete rewrite of chunk handling.
(the old code was really terrible, I found multiple bugs in it and there
is no way how to improve it except to rewrite it).

Don't erase the old kernel and test my previous patch too, I intend to
backport it to RHEL (when I find the bug with your help) and I intend to
submit this rewrite patch for upstream kernel.

Mikulas

Index: linux-2.6.28-snap-rewrite/drivers/md/dm-kcopyd.c
================================================== =================
--- linux-2.6.28-snap-rewrite.orig/drivers/md/dm-kcopyd.c 2009-02-16 02:54:36.000000000 +0100
+++ linux-2.6.28-snap-rewrite/drivers/md/dm-kcopyd.c 2009-02-13 08:44:05.000000000 +0100
@@ -297,7 +297,8 @@ static int run_complete_job(struct kcopy
dm_kcopyd_notify_fn fn = job->fn;
struct dm_kcopyd_client *kc = job->kc;

- kcopyd_put_pages(kc, job->pages);
+ if (job->pages)
+ kcopyd_put_pages(kc, job->pages);
mempool_free(job, kc->job_pool);
fn(read_err, write_err, context);

@@ -461,6 +462,7 @@ static void segment_complete(int read_er
sector_t progress = 0;
sector_t count = 0;
struct kcopyd_job *job = (struct kcopyd_job *) context;
+ struct dm_kcopyd_client *kc = job->kc;

mutex_lock(&job->lock);

@@ -490,7 +492,7 @@ static void segment_complete(int read_er

if (count) {
int i;
- struct kcopyd_job *sub_job = mempool_alloc(job->kc->job_pool,
+ struct kcopyd_job *sub_job = mempool_alloc(kc->job_pool,
GFP_NOIO);

*sub_job = *job;
@@ -508,14 +510,8 @@ static void segment_complete(int read_er

} else if (atomic_dec_and_test(&job->sub_jobs)) {

- /*
- * To avoid a race we must keep the job around
- * until after the notify function has completed.
- * Otherwise the client may try and stop the job
- * after we've completed.
- */
- job->fn(read_err, write_err, job->context);
- mempool_free(job, job->kc->job_pool);
+ push(&kc->complete_jobs, job);
+ wake(kc);
}
}

@@ -528,6 +524,8 @@ static void split_job(struct kcopyd_job
{
int i;

+ atomic_inc(&job->kc->nr_jobs);
+
atomic_set(&job->sub_jobs, SPLIT_COUNT);
for (i = 0; i < SPLIT_COUNT; i++)
segment_complete(0, 0u, job);
Index: linux-2.6.28-snap-rewrite/drivers/md/dm-snap.c
================================================== =================
--- linux-2.6.28-snap-rewrite.orig/drivers/md/dm-snap.c 2009-02-13 08:44:00.000000000 +0100
+++ linux-2.6.28-snap-rewrite/drivers/md/dm-snap.c 2009-02-13 08:47:21.000000000 +0100
@@ -58,28 +58,6 @@ struct dm_snap_pending_exception {
struct bio_list origin_bios;
struct bio_list snapshot_bios;

- /*
- * Short-term queue of pending exceptions prior to submission.
- */
- struct list_head list;
-
- /*
- * The primary pending_exception is the one that holds
- * the ref_count and the list of origin_bios for a
- * group of pending_exceptions. It is always last to get freed.
- * These fields get set up when writing to the origin.
- */
- struct dm_snap_pending_exception *primary_pe;
-
- /*
- * Number of pending_exceptions processing this chunk.
- * When this drops to zero we must complete the origin bios.
- * If incrementing or decrementing this, hold pe->snap->lock for
- * the sibling concerned and not pe->primary_pe->snap->lock unless
- * they are the same.
- */
- atomic_t ref_count;
-
/* Pointer back to snapshot context */
struct dm_snapshot *snap;

@@ -788,6 +766,28 @@ static void flush_queued_bios(struct wor
flush_bios(queued_bios);
}

+static int do_origin(struct dm_dev *origin, struct bio *bio);
+
+/*
+ * Flush a list of buffers.
+ */
+static void retry_origin_bios(struct dm_snapshot *s, struct bio *bio)
+{
+ struct bio *n;
+ int r;
+
+ while (bio) {
+ n = bio->bi_next;
+ bio->bi_next = NULL;
+ r = do_origin(s->origin, bio);
+ if (r == DM_MAPIO_REMAPPED)
+ generic_make_request(bio);
+ else
+ BUG_ON(r != DM_MAPIO_SUBMITTED);
+ bio = n;
+ }
+}
+
/*
* Error a list of buffers.
*/
@@ -821,39 +821,6 @@ static void __invalidate_snapshot(struct
dm_table_event(s->ti->table);
}

-static void get_pending_exception(struct dm_snap_pending_exception *pe)
-{
- atomic_inc(&pe->ref_count);
-}
-
-static struct bio *put_pending_exception(struct dm_snap_pending_exception *pe)
-{
- struct dm_snap_pending_exception *primary_pe;
- struct bio *origin_bios = NULL;
-
- primary_pe = pe->primary_pe;
-
- /*
- * If this pe is involved in a write to the origin and
- * it is the last sibling to complete then release
- * the bios for the original write to the origin.
- */
- if (primary_pe &&
- atomic_dec_and_test(&primary_pe->ref_count)) {
- origin_bios = bio_list_get(&primary_pe->origin_bios);
- free_pending_exception(primary_pe);
- }
-
- /*
- * Free the pe if it's not linked to an origin write or if
- * it's not itself a primary pe.
- */
- if (!primary_pe || primary_pe != pe)
- free_pending_exception(pe);
-
- return origin_bios;
-}
-
static void pending_complete(struct dm_snap_pending_exception *pe, int success)
{
struct dm_snap_exception *e;
@@ -902,7 +869,8 @@ static void pending_complete(struct dm_s
out:
remove_exception(&pe->e);
snapshot_bios = bio_list_get(&pe->snapshot_bios);
- origin_bios = put_pending_exception(pe);
+ origin_bios = bio_list_get(&pe->origin_bios);
+ free_pending_exception(pe);

up_write(&s->lock);

@@ -912,7 +880,7 @@ static void pending_complete(struct dm_s
else
flush_bios(snapshot_bios);

- flush_bios(origin_bios);
+ retry_origin_bios(s, origin_bios);
}

static void commit_callback(void *context, int success)
@@ -970,15 +938,19 @@ static void start_copy(struct dm_snap_pe
* for this chunk, otherwise it allocates a new one and inserts
* it into the pending table.
*
+ * Returns: pointer to a pending exception
+ * ERR_PTR(-ENOMEM) pointer --- the snapshot is invalidated
+ * NULL --- the exception was already completed, the caller should recheck
+ *
* NOTE: a write lock must be held on snap->lock before calling
* this.
*/
static struct dm_snap_pending_exception *
-__find_pending_exception(struct dm_snapshot *s, struct bio *bio)
+__find_pending_exception(struct dm_snapshot *s, sector_t sector)
{
struct dm_snap_exception *e;
struct dm_snap_pending_exception *pe;
- chunk_t chunk = sector_to_chunk(s, bio->bi_sector);
+ chunk_t chunk = sector_to_chunk(s, sector);

/*
* Is there a pending exception for this already ?
@@ -1000,6 +972,12 @@ __find_pending_exception(struct dm_snaps

if (!s->valid) {
free_pending_exception(pe);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ e = lookup_exception(&s->complete, chunk);
+ if (e) {
+ free_pending_exception(pe);
return NULL;
}

@@ -1013,16 +991,13 @@ __find_pending_exception(struct dm_snaps
pe->e.old_chunk = chunk;
bio_list_init(&pe->origin_bios);
bio_list_init(&pe->snapshot_bios);
- pe->primary_pe = NULL;
- atomic_set(&pe->ref_count, 0);
pe->started = 0;

if (s->store.prepare_exception(&s->store, &pe->e)) {
free_pending_exception(pe);
- return NULL;
+ return ERR_PTR(-ENOMEM);
}

- get_pending_exception(pe);
insert_exception(&s->pending, &pe->e);

out:
@@ -1063,6 +1038,7 @@ static int snapshot_map(struct dm_target
goto out_unlock;
}

+lookup_again:
/* If the block is already remapped - use that, else remap it */
e = lookup_exception(&s->complete, chunk);
if (e) {
@@ -1076,9 +1052,11 @@ static int snapshot_map(struct dm_target
* writeable.
*/
if (bio_rw(bio) == WRITE) {
- pe = __find_pending_exception(s, bio);
- if (!pe) {
- __invalidate_snapshot(s, -ENOMEM);
+ pe = __find_pending_exception(s, bio->bi_sector);
+ if (!pe)
+ goto lookup_again;
+ if (IS_ERR(pe)) {
+ __invalidate_snapshot(s, PTR_ERR(pe));
r = -EIO;
goto out_unlock;
}
@@ -1170,14 +1148,20 @@ static int snapshot_status(struct dm_tar
/*-----------------------------------------------------------------
* Origin methods
*---------------------------------------------------------------*/
-static int __origin_write(struct list_head *snapshots, struct bio *bio)
+
+/*
+ * Returns:
+ * DM_MAPIO_REMAPPED: bio may be submitted to origin device
+ * DM_MAPIO_SUBMITTED: bio was queued on queue on one of exceptions
+ */
+
+static int __origin_write(struct list_head *snapshots, sector_t sector, struct bio *bio)
{
- int r = DM_MAPIO_REMAPPED, first = 0;
+ int r = DM_MAPIO_REMAPPED;
struct dm_snapshot *snap;
struct dm_snap_exception *e;
- struct dm_snap_pending_exception *pe, *next_pe, *primary_pe = NULL;
+ struct dm_snap_pending_exception *pe, *pe_to_start = NULL;
chunk_t chunk;
- LIST_HEAD(pe_queue);

/* Do all the snapshots on this origin */
list_for_each_entry (snap, snapshots, list) {
@@ -1189,86 +1173,65 @@ static int __origin_write(struct list_he
goto next_snapshot;

/* Nothing to do if writing beyond end of snapshot */
- if (bio->bi_sector >= dm_table_get_size(snap->ti->table))
+ if (sector >= dm_table_get_size(snap->ti->table))
goto next_snapshot;

/*
* Remember, different snapshots can have
* different chunk sizes.
*/
- chunk = sector_to_chunk(snap, bio->bi_sector);
+ chunk = sector_to_chunk(snap, sector);

/*
* Check exception table to see if block
* is already remapped in this snapshot
* and trigger an exception if not.
- *
- * ref_count is initialised to 1 so pending_complete()
- * won't destroy the primary_pe while we're inside this loop.
*/
e = lookup_exception(&snap->complete, chunk);
if (e)
goto next_snapshot;

- pe = __find_pending_exception(snap, bio);
- if (!pe) {
- __invalidate_snapshot(snap, -ENOMEM);
+ pe = __find_pending_exception(snap, sector);
+ if (!pe)
+ goto next_snapshot;
+ if (IS_ERR(pe)) {
+ __invalidate_snapshot(snap, PTR_ERR(pe));
goto next_snapshot;
}

- if (!primary_pe) {
- /*
- * Either every pe here has same
- * primary_pe or none has one yet.
- */
- if (pe->primary_pe)
- primary_pe = pe->primary_pe;
- else {
- primary_pe = pe;
- first = 1;
- }
-
- bio_list_add(&primary_pe->origin_bios, bio);
-
- r = DM_MAPIO_SUBMITTED;
- }
+ r = DM_MAPIO_SUBMITTED;

- if (!pe->primary_pe) {
- pe->primary_pe = primary_pe;
- get_pending_exception(primary_pe);
+ if (bio) {
+ bio_list_add(&pe->origin_bios, bio);
+ bio = NULL;
+
+ if (!pe->started) {
+ pe->started = 1;
+ pe_to_start = pe;
+ }
}

if (!pe->started) {
pe->started = 1;
- list_add_tail(&pe->list, &pe_queue);
+ start_copy(pe);
}

next_snapshot:
up_write(&snap->lock);
}

- if (!primary_pe)
- return r;
-
/*
- * If this is the first time we're processing this chunk and
- * ref_count is now 1 it means all the pending exceptions
- * got completed while we were in the loop above, so it falls to
- * us here to remove the primary_pe and submit any origin_bios.
+ * pe_to_start is a small performance improvement:
+ * To avoid calling __origin_write N times for N snapshots, we start
+ * the snapshot where we queued the bio as the last one.
+ *
+ * If we start it as the last one, it finishes most likely as the last
+ * one and exceptions in other snapshots will be already finished when
+ * the bio will be retried.
*/

- if (first && atomic_dec_and_test(&primary_pe->ref_count)) {
- flush_bios(bio_list_get(&primary_pe->origin_bios));
- free_pending_exception(primary_pe);
- /* If we got here, pe_queue is necessarily empty. */
- return r;
- }
-
- /*
- * Now that we have a complete pe list we can start the copying.
- */
- list_for_each_entry_safe(pe, next_pe, &pe_queue, list)
- start_copy(pe);
+ if (pe_to_start)
+ start_copy(pe_to_start);

return r;
}
@@ -1284,7 +1247,7 @@ static int do_origin(struct dm_dev *orig
down_read(&_origins_lock);
o = __lookup_origin(origin->bdev);
if (o)
- r = __origin_write(&o->snapshots, bio);
+ r = __origin_write(&o->snapshots, bio->bi_sector, bio);
up_read(&_origins_lock);

return r;

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 10:35 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org