FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 09-30-2011, 03:51 PM
Christophe Saout
 
Default Block regression since 3.1-rc3

Hello,

Today I was trying to boot a new 3.1 RC kernel on one of our server
machines and ran into a weird block-related crash.

Whenever I was trying to access some filesystem on the external RAID, I
would soon run into the crash seen below, followed by a total lockup a
bit later.

The external RAID is accessed via multipath and simple logical volumes
(both device-mapper) on top. The local hard disks work just fine (LVM on
top of an Areca hardware RAID).

The bug seems to be introduced between rc2 and rc3. Up to rc2 everything
is fine, and rc3 and also the latest kernel shows this bug. From the git
log I can see that the block/ directory has seen some larger update, but
this is just a wild guess.

The crash has been seen with ext3 and GFS2, apparently when processing
write requests (writing files or on umount). All kinds of crashes are
identical up to generic_make_request.

(oh, and this machine runs as Dom0 under XEN, but I can reproduce the
issue when booting natively too, so that's unrelated)

I think you guys should know - maybe you have an idea or have something
for me to test. I would have tried to file a bug somewhere, but half of
the infrastructure is down at the moment.

Thanks a lot,

Christophe


Sep 30 10:05:26 vserv80 kernel: CPU 2
Sep 30 10:05:26 vserv80 kernel: Modules linked in: xenfs gfs2 fuse dlm configfs dummy bonding xt_TPROXY nf_tproxy_core xt_socket xt_owner xt_connmark iptable_mangle ipt_MASQUERADE iptable_nat ipt_ULOG ipt_REJECT xt_mark iptable_filter iptable_raw ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state xt_tcpudp xt_physdev xt_hashlimit xt_multiport ip6table_filter ip6_tables ebt_redirect ebt_ip ebt_mark_m ebtable_broute bridge stp llc ebtable_nat ext4 jbd2 dm_round_robin dm_multipath ebtable_filter ebtables x_tables nf_nat_tftp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_h323 nf_conntrack_h323 nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack xen_netback xen_blkback xen_gntdev ipmi_si ipmi_devintf ipmi_msghandler loop e1000e ohci_hcd tpm_tis tpm ehci_hcd qla2xxx pata_atiixp tpm_bios igb i2c_piix4 scsi_transport_fc
Sep 30 10:05:26 vserv80 kernel:
Sep 30 10:05:26 vserv80 kernel: Pid: 27551, comm: umount Not tainted 3.1.0-rc3 #1 Supermicro H8DG6/H8DGi/H8DG6/H8DGi
Sep 30 10:05:26 vserv80 kernel: RIP: e030:[<ffffffff8121a2e4>] [<ffffffff8121a2e4>] blk_insert_flush+0x134/0x140
Sep 30 10:05:26 vserv80 kernel: RSP: e02b:ffff880098c89988 EFLAGS: 00010046
Sep 30 10:05:26 vserv80 kernel: RAX: 0000000000000000 RBX: ffff880098e4c018 RCX: ffff88009a158698
Sep 30 10:05:26 vserv80 kernel: RDX: 0000000000000005 RSI: 0000000000000001 RDI: ffff880098e4c018
Sep 30 10:05:26 vserv80 kernel: RBP: ffff880098c89988 R08: 0000000000000001 R09: 0000000000000001
Sep 30 10:05:26 vserv80 kernel: R10: ffff8800a0e44080 R11: 0000000000000000 R12: ffff88009a158698
Sep 30 10:05:26 vserv80 kernel: R13: 00000000ffff00fb R14: 0000000000000005 R15: ffff88009ed37c44
Sep 30 10:05:26 vserv80 kernel: FS: 00007faaaf8e6740(0000) GS:ffff8800d75f8000(0000) knlGS:0000000000000000
Sep 30 10:05:26 vserv80 kernel: CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 30 10:05:26 vserv80 kernel: CR2: 00007faaaf01f0b0 CR3: 000000009aa99000 CR4: 0000000000000660
Sep 30 10:05:26 vserv80 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 30 10:05:26 vserv80 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 30 10:05:26 vserv80 kernel: Process umount (pid: 27551, threadinfo ffff880098c88000, task ffff8800924a8000)
Sep 30 10:05:26 vserv80 kernel: ffff880098c899b8 ffffffff812110a2 ffff880098e4c018 ffff88009a158698
Sep 30 10:05:26 vserv80 kernel: 00000000ffff00fb ffff88009ed37c00 ffff880098c899e8 ffffffff81215bd4
Sep 30 10:05:26 vserv80 kernel: ffff880098c899e8 ffff880000000005 ffff880098e4c018 ffff88009a158698
Sep 30 10:05:26 vserv80 kernel: [<ffffffff812110a2>] __elv_add_request+0x172/0x240
Sep 30 10:05:26 vserv80 kernel: [<ffffffff81215bd4>] add_acct_request+0x34/0x40
Sep 30 10:05:26 vserv80 kernel: [<ffffffff81215c4b>] blk_insert_cloned_request+0x6b/0x90
Sep 30 10:05:26 vserv80 kernel: [<ffffffff8138ec39>] dm_dispatch_request+0x39/0x70
Sep 30 10:05:26 vserv80 kernel: [<ffffffff8139011b>] dm_request_fn+0x1ab/0x2b0
Sep 30 10:05:26 vserv80 kernel: [<ffffffff81211a56>] __blk_run_queue+0x16/0x20
Sep 30 10:05:26 vserv80 kernel: [<ffffffff812182ed>] __make_request+0x2ed/0x330
Sep 30 10:05:26 vserv80 kernel: [<ffffffff8138fd3e>] dm_request+0x16e/0x1e0
Sep 30 10:05:26 vserv80 kernel: [<ffffffff81216077>] generic_make_request+0x2d7/0x520
Sep 30 10:05:26 vserv80 kernel: [<ffffffff8104be91>] ? get_parent_ip+0x11/0x50
Sep 30 10:05:26 vserv80 kernel: [<ffffffff81216321>] submit_bio+0x61/0xd0
Sep 30 10:05:26 vserv80 kernel: [<ffffffff81146c86>] submit_bh+0xe6/0x120
Sep 30 10:05:26 vserv80 kernel: [<ffffffffa034c338>] log_write_header+0x208/0x4b0 [gfs2]
Sep 30 10:05:26 vserv80 kernel: [<ffffffff814c85c5>] ? sub_preempt_count+0x95/0xd0
Sep 30 10:05:26 vserv80 kernel: [<ffffffffa034c698>] gfs2_log_shutdown+0xb8/0x280 [gfs2]
Sep 30 10:05:26 vserv80 kernel: [<ffffffffa0362908>] gfs2_make_fs_ro+0x78/0xc0 [gfs2]
Sep 30 10:05:26 vserv80 kernel: [<ffffffffa03628e9>] ? gfs2_make_fs_ro+0x59/0xc0 [gfs2]
Sep 30 10:05:26 vserv80 kernel: [<ffffffffa0362b38>] gfs2_put_super+0x1e8/0x210 [gfs2]
Sep 30 10:05:26 vserv80 kernel: [<ffffffff8111c64d>] generic_shutdown_super+0x5d/0xe0
Sep 30 10:05:26 vserv80 kernel: [<ffffffff8111c6fc>] kill_block_super+0x2c/0x80
Sep 30 10:05:26 vserv80 kernel: [<ffffffffa035465c>] gfs2_kill_sb+0x5c/0x90 [gfs2]
Sep 30 10:05:26 vserv80 kernel: [<ffffffff8111ca08>] deactivate_locked_super+0x38/0x90
Sep 30 10:05:26 vserv80 kernel: [<ffffffff8111d4e5>] deactivate_super+0x45/0x60
Sep 30 10:05:26 vserv80 kernel: [<ffffffff81137a77>] mntput_no_expire+0xe7/0x130
Sep 30 10:05:26 vserv80 kernel: [<ffffffff81138489>] sys_umount+0x79/0x390
Sep 30 10:05:26 vserv80 kernel: [<ffffffff814cbed2>] system_call_fastpath+0x16/0x1b
Sep 30 10:05:26 vserv80 kernel: RSP <ffff880098c89988>
Sep 30 10:05:26 vserv80 kernel: ---[ end trace 906f3285f4cc5ae9 ]---
Sep 30 10:05:26 vserv80 kernel: note: umount[27551] exited with preempt_count 1

Message from syslogd@vserv80 at Sep 30 17:37:08 ...
kernel:------------[ cut here ]------------

Message from syslogd@vserv80 at Sep 30 17:37:08 ...
kernel:invalid opcode: 0000 [#1] PREEMPT SMP

Message from syslogd@vserv80 at Sep 30 17:37:08 ...
kernel:Stack:

Message from syslogd@vserv80 at Sep 30 17:37:08 ...
kernel:Call Trace:

Message from syslogd@vserv80 at Sep 30 17:37:08 ...
kernel:Code: d2 e8 31 c6 ff ff c9 c3 0f 1f 80 00 00 00 00 48 8b 41 08 48 89 79 08 48 89 0f 48 89 47 08 48 89 38 48 89 cf e8 2e d8 ff ff c9 c3 <0f> 0b eb fe 0f 1f 84 00 00 00 00 00 55 89 f2 be 02 00 00 00 48

(half of the stuff ended up on the console and no in the logfile, dunno why)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 09-30-2011, 06:02 PM
Christophe Saout
 
Default Block regression since 3.1-rc3

Hello again,

> Today I was trying to boot a new 3.1 RC kernel on one of our server
> machines and ran into a weird block-related crash.
>
> Whenever I was trying to access some filesystem on the external RAID, I
> would soon run into the crash seen below, followed by a total lockup a
> bit later.
>
> The external RAID is accessed via multipath and simple logical volumes
> on top. [...]
>
> The bug seems to be introduced between rc2 and rc3. Up to rc2 everything
> is fine, and rc3 and also the latest kernel shows this bug. From the git
> log I can see that the block/ directory has seen some larger update, but
> this is just a wild guess.

Actually, I just found out that reverting

commit 4853abaae7e4a2af938115ce9071ef8684fb7af4
Author: Jeff Moyer <jmoyer@redhat.com>
Date: Mon Aug 15 21:37:25 2011 +0200

block: fix flush machinery for stacking drivers with differring flush flags

makes the problem disappear.

Let me know if I can help with further information.

Cheers,
Christophe


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-03-2011, 02:43 PM
Jeff Moyer
 
Default Block regression since 3.1-rc3

Christophe Saout <christophe@saout.de> writes:

> I think you guys should know - maybe you have an idea or have something
> for me to test. I would have tried to file a bug somewhere, but half of
> the infrastructure is down at the moment.

You didn't include the top few lines which tell us what went wrong. I
tried to reproduce this myself, but I ran into other issues with GFS2.
Anyway, it would help a great deal if you could retrigger the failure
and provide the full failure output. You can get that by issuing the
'dmesg' command and redirecting it to a file. And please CC me on any
future correspondences on this issue.

Thanks!
Jeff

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-04-2011, 12:02 PM
Christophe Saout
 
Default Block regression since 3.1-rc3

Hi Jeff,

> Anyway, it would help a great deal if you could retrigger the failure
> and provide the full failure output. You can get that by issuing the
> 'dmesg' command and redirecting it to a file.

Oh, sorry, yes, there's a line missing.

Line 323 is this one: BUG_ON(!rq->bio || rq->bio != rq->biotail);

The bug is not specific to GFS2, I also get it when using an ext3 mount
on that multipath storage pool when writing to a file.

Thanks,
Christophe

------------[ cut here ]------------
kernel BUG at block/blk-flush.c:323!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU 18
Modules linked in: gfs2 fuse dlm configfs dummy bonding xt_TPROXY
nf_tproxy_core xt_socket xt_owner xt_connmark iptable_mangle
ipt_MASQUERADE iptable_nat ipt_ULOG ipt_REJECT xt_mark iptable_filter
iptable_raw ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6
xt_state xt_tcpudp xt_physdev xt_hashlimit xt_multiport ip6table_filter
ip6_tables ebt_redirect ebt_ip ebt_mark_m ebtable_broute bridge stp llc
ebtable_nat ext4 jbd2 dm_round_robin dm_multipath ebtable_filter
ebtables x_tables nf_nat_tftp nf_conntrack_tftp nf_nat_sip
nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_h323
nf_conntrack_h323 nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_conntrack_ftp nf_conntrack ipmi_si ipmi_devintf ipmi_msghandler loop
e1000e qla2xxx igb pata_atiixp ohci_hcd ehci_hcd tpm_tis tpm
scsi_transport_fc tpm_bios i2c_piix4

Pid: 5282, comm: umount Not tainted 3.1.0-rc3 #1 Supermicro
H8DG6/H8DGi/H8DG6/H8DGi
RIP: 0010:[<ffffffff8121a2e4>] [<ffffffff8121a2e4>] blk_insert_flush
+0x134/0x140
RSP: 0018:ffff8808155f3988 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff880c149f0018 RCX: ffff881014170000
RDX: 0000000000000005 RSI: 0000000000000001 RDI: ffff880c149f0018
RBP: ffff8808155f3988 R08: 0000000000000001 R09: 0000000000000001
R10: ffff880415cf0440 R11: 0000000000000000 R12: ffff881014170000
R13: 0000000000000082 R14: 0000000000000005 R15: ffff880c13763044
FS: 00007f2d80611740(0000) GS:ffff880c17c00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f2d7fd4c0b0 CR3: 0000000c0fa3e000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process umount (pid: 5282, threadinfo ffff8808155f2000, task
ffff8808155ed320)
Stack:
ffff8808155f39b8 ffffffff812110a2 ffff880c149f0018 ffff881014170000
0000000000000082 ffff880c13763000 ffff8808155f39e8 ffffffff81215bd4
ffff8808155f39e8 ffff880c00000005 ffff880c149f0018 ffff881014170000
Call Trace:
[<ffffffff812110a2>] __elv_add_request+0x172/0x240
[<ffffffff81215bd4>] add_acct_request+0x34/0x40
[<ffffffff81215c4b>] blk_insert_cloned_request+0x6b/0x90
[<ffffffff8138ec39>] dm_dispatch_request+0x39/0x70
[<ffffffff8139011b>] dm_request_fn+0x1ab/0x2b0
[<ffffffff81211a56>] __blk_run_queue+0x16/0x20
[<ffffffff812182ed>] __make_request+0x2ed/0x330
[<ffffffff8138fd3e>] dm_request+0x16e/0x1e0
[<ffffffff81216077>] generic_make_request+0x2d7/0x520
[<ffffffff8104be91>] ? get_parent_ip+0x11/0x50
[<ffffffff81216321>] submit_bio+0x61/0xd0
[<ffffffff81146c86>] submit_bh+0xe6/0x120
[<ffffffffa034c338>] log_write_header+0x208/0x4b0 [gfs2]
[<ffffffff814c85c5>] ? sub_preempt_count+0x95/0xd0
[<ffffffffa034c698>] gfs2_log_shutdown+0xb8/0x280 [gfs2]
[<ffffffffa0362908>] gfs2_make_fs_ro+0x78/0xc0 [gfs2]
[<ffffffffa03628e9>] ? gfs2_make_fs_ro+0x59/0xc0 [gfs2]
[<ffffffffa0362b38>] gfs2_put_super+0x1e8/0x210 [gfs2]
[<ffffffff8111c64d>] generic_shutdown_super+0x5d/0xe0
[<ffffffff8111c6fc>] kill_block_super+0x2c/0x80
[<ffffffffa035465c>] gfs2_kill_sb+0x5c/0x90 [gfs2]
[<ffffffff8111ca08>] deactivate_locked_super+0x38/0x90
[<ffffffff8111d4e5>] deactivate_super+0x45/0x60
[<ffffffff81137a77>] mntput_no_expire+0xe7/0x130
[<ffffffff81138489>] sys_umount+0x79/0x390
[<ffffffff814cbed2>] system_call_fastpath+0x16/0x1b
Code: d2 e8 51 c7 ff ff c9 c3 0f 1f 80 00 00 00 00 48 8b 41 08 48 89 79
08 48 89 0f 48 89 47 08 48 89 38 48 89 cf e8 3e d9 ff ff c9 c3 <0f> 0b
eb fe 0f 1f 84 00 00 00 00 00 55 89 f2 be 02 00 00 00 48
RIP [<ffffffff8121a2e4>] blk_insert_flush+0x134/0x140
RSP <ffff8808155f3988>
---[ end trace d37a65fe4ba84f7e ]---
note: umount[5282] exited with preempt_count 1



--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-04-2011, 01:32 PM
Jeff Moyer
 
Default Block regression since 3.1-rc3

Christophe Saout <christophe@saout.de> writes:

> Hi Jeff,
>
>> Anyway, it would help a great deal if you could retrigger the failure
>> and provide the full failure output. You can get that by issuing the
>> 'dmesg' command and redirecting it to a file.
>
> Oh, sorry, yes, there's a line missing.
>
> Line 323 is this one: BUG_ON(!rq->bio || rq->bio != rq->biotail);
>
> The bug is not specific to GFS2, I also get it when using an ext3 mount
> on that multipath storage pool when writing to a file.

Great, thanks! I tried ext3 and ext4 as well, and was unable to trigger
this problem. I setup a linear volume on top of a multipath device. Is
that the configuration you are using? Anyway, this may be enough to get
me on the right path...

Cheers,
Jeff

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-04-2011, 03:06 PM
Christophe Saout
 
Default Block regression since 3.1-rc3

Hi Jeff,

> > Line 323 is this one: BUG_ON(!rq->bio || rq->bio != rq->biotail);
> >
> > The bug is not specific to GFS2, I also get it when using an ext3 mount
> > on that multipath storage pool when writing to a file.
>
> Great, thanks! I tried ext3 and ext4 as well, and was unable to trigger
> this problem. I setup a linear volume on top of a multipath device. Is
> that the configuration you are using? Anyway, this may be enough to get
> me on the right path...

Yes - the device-mapper tables looks like:

vserv81:/root # dmsetup table RAID_A-test.root
0 20971520 linear 253:6 384

(this is one the FS which causes the crash when mounting via ext3)

And 253:6 is the multipath device:

vserv81:/root # dmsetup table 3600d023100065bef0000000009f337c0
0 15626018816 multipath 0 0 2 1 round-robin 0 1 1 8:16 1000 round-robin 0 1 1 8:48 1000

And 8:16, 8:48 are two qla2xxx FC ports.

Cheers,
Christophe


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-08-2011, 11:02 AM
Shaohua Li
 
Default Block regression since 3.1-rc3

2011/10/7 Jeff Moyer <jmoyer@redhat.com>:
> Christophe Saout <christophe@saout.de> writes:
>
>> Hi Jeff,
>>
>>> Anyway, it would help a great deal if you could retrigger the failure
>>> and provide the full failure output. *You can get that by issuing the
>>> 'dmesg' command and redirecting it to a file.
>>
>> Oh, sorry, yes, there's a line missing.
>>
>> Line 323 is this one: BUG_ON(!rq->bio || rq->bio != rq->biotail);
>
> OK, it turns out my testing was incomplete. *I only tested targets that
> had a write-through cache, so I didn't hit this problem. *It reproduces
> pretty easily with just multipath involved (no linear target on top) when
> running against the right storage.
>
> So, here's a patch, but I don't have a full explanation for it just yet.
> What I observed was that, on fsync, blkdev_issue_flush was called.
> Eventually, the flush request gets cloned, and blk_insert_cloned_request
> is called. *This cloned request never actually gets issued to the
> q->requst_fn (scsi_request_fn in my case). *So, it may be that there is
> no plug list for this, so the queue isn't goosed? *I'll try to come up
> with a better explanation, or Tejun may just know off the top of his
> head what's going on.
>
> So, the patch works for me, but is very much just an RFC.
>
> Cheers,
> Jeff
>
> Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
>
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 491eb30..7aa4736 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -320,7 +320,7 @@ void blk_insert_flush(struct request *rq)
> * * * * * * * *return;
> * * * *}
>
> - * * * BUG_ON(!rq->bio || rq->bio != rq->biotail);
> + * * * BUG_ON(rq->bio && rq->bio != rq->biotail);
>
> * * * */*
> * * * * * If there's data but flush is not necessary, the request can be
> @@ -345,6 +345,12 @@ void blk_insert_flush(struct request *rq)
> * * * *rq->end_io = flush_data_end_io;
>
> * * * *blk_flush_complete_seq(rq, REQ_FSEQ_ACTIONS & ~policy, 0);
> +
> + * * * /*
> + * * * ** A cloned empty flush needs a queue kick to make progress.
> + * * * **/
> + * * * if (!rq->bio)
> + * * * * * * * blk_run_queue_async(q);
> *}
Looks the dm request based flush logic is broken.

saved_make_request_fn
__make_request
blk_insert_flush
but blk_insert_flush doesn't put the original request to list, instead, the
q->flush_rq is in list.
then
dm_request_fn
blk_peek_request
dm_prep_fn
clone_rq
map_request
blk_insert_cloned_request
so q->flush_rq is cloned, and get dispatched. but we can't clone q->flush_rq
and use it to do flush. map_request even could assign a different blockdev to
the cloned request.

Clone q->flush_rq is absolutely wrong. we can clear ogirinal request's flush bit
and set cloned request flush bit, so we can skip the blk_insert_flush logic for
original request.

can you please try this patch. Only compilation tested.

Thanks,
Shaohua
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-08-2011, 12:05 PM
Christophe Saout
 
Default Block regression since 3.1-rc3

Hi Shaohua,

> >> Line 323 is this one: BUG_ON(!rq->bio || rq->bio != rq->biotail);
> >
> > OK, it turns out my testing was incomplete. I only tested targets that
> > had a write-through cache, so I didn't hit this problem. It reproduces
> > pretty easily with just multipath involved (no linear target on top) when
> > running against the right storage.
> >
> > So, here's a patch, but I don't have a full explanation for it just yet.
> > What I observed was that, on fsync, blkdev_issue_flush was called.
> > Eventually, the flush request gets cloned, and blk_insert_cloned_request
> > is called. This cloned request never actually gets issued to the
> > q->requst_fn (scsi_request_fn in my case). So, it may be that there is
> > no plug list for this, so the queue isn't goosed? I'll try to come up
> > with a better explanation, or Tejun may just know off the top of his
> > head what's going on.
> >
> > So, the patch works for me, but is very much just an RFC.
> [...]
> Looks the dm request based flush logic is broken.
>
> saved_make_request_fn
> __make_request
> blk_insert_flush
> but blk_insert_flush doesn't put the original request to list, instead, the
> q->flush_rq is in list.
> then
> dm_request_fn
> blk_peek_request
> dm_prep_fn
> clone_rq
> map_request
> blk_insert_cloned_request
> so q->flush_rq is cloned, and get dispatched. but we can't clone q->flush_rq
> and use it to do flush. map_request even could assign a different blockdev to
> the cloned request.
>
> Clone q->flush_rq is absolutely wrong. we can clear ogirinal request's flush bit
> and set cloned request flush bit, so we can skip the blk_insert_flush logic for
> original request.
>
> can you please try this patch. Only compilation tested.

Now I get a different BUG when doing what triggered the original BUG
(e.g. unmounting the filesystem):

------------[ cut here ]------------
kernel BUG at drivers/scsi/scsi_lib.c:1152!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU 7
Modules linked in: xenfs gfs2 fuse dlm configfs dummy bonding xt_TPROXY nf_tproxy_core xt_socket xt_owner xt_connmark iptable_mangle ipt_MASQUERADE iptable_nat ipt_ULOG ipt_REJECT xt_mark iptable_filter iptable_raw ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state xt_tcpudp xt_physdev xt_hashlimit xt_multiport ip6table_filter ip6_tables ebt_redirect ebt_ip ebt_mark_m ebtable_broute bridge stp llc ebtable_nat ext4 jbd2 dm_round_robin dm_multipath ebtable_filter ebtables x_tables nf_nat_tftp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_h323 nf_conntrack_h323 nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack xen_netback xen_blkback xen_gntdev ipmi_si ipmi_devintf ipmi_msghandler loop ohci_hcd ehci_hcd e1000e tpm_tis igb qla2xxx tpm tpm_bios scsi_transport_fc pata_atiixp i2c_piix4

Pid: 5056, comm: kworker/7:2 Not tainted 3.1.0-rc8 #1 Supermicro H8DG6/H8DGi/H8DG6/H8DGi
RIP: e030:[<ffffffff812f7eae>] [<ffffffff812f7eae>] scsi_setup_fs_cmnd+0x9e/0xe0
RSP: e02b:ffff88009b299b70 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff88009b9cc018 RCX: 0000000000000000
RDX: ffff88009f193c00 RSI: ffff88009b9cc018 RDI: ffff88009b905800
RBP: ffff88009b299b80 R08: 0000000000000000 R09: 0000000000000001
R10: 0000000000000001 R11: ffff8800a0e40700 R12: ffff88009b905800
R13: 0000000000000001 R14: ffff88009b905800 R15: 0000000000000000
FS: 00007fc76ab77700(0000) GS:ffff8800d767f000(0000) knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fc76a1d0beb CR3: 0000000001805000 CR4: 0000000000000660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/7:2 (pid: 5056, threadinfo ffff88009b298000, task ffff88009ed68000)
Stack:
ffff88009b9cc018 ffff88009e5f1a60 ffff88009b299bf0 ffffffff8130675f
ffff88009b299ba0 ffffffff810ceef0 ffff88009f193c00 ffffffff00000000
ffffffff8100ab6d ffff88009b299c60 ffff88009ed68000 ffff88009b9cc018
Call Trace:
[<ffffffff8130675f>] sd_prep_fn+0x14f/0xa70
[<ffffffff810ceef0>] ? mempool_alloc_slab+0x10/0x20
[<ffffffff8100ab6d>] ? xen_force_evtchn_callback+0xd/0x10
[<ffffffff81215db2>] blk_peek_request+0xd2/0x270
[<ffffffff812f8aeb>] scsi_request_fn+0x4b/0x3c0
[<ffffffff8120fa46>] __blk_run_queue+0x16/0x20
[<ffffffff8120ef03>] __elv_add_request+0xb3/0x270
[<ffffffff81213494>] add_acct_request+0x34/0x40
[<ffffffff8121350b>] blk_insert_cloned_request+0x6b/0x90
[<ffffffff8138c049>] dm_dispatch_request+0x39/0x70
[<ffffffff8138db9c>] dm_request_fn+0x17c/0x280
[<ffffffff8100b322>] ? check_events+0x12/0x20
[<ffffffff81222db0>] ? cfq_init_queue+0x430/0x430
[<ffffffff8120fa46>] __blk_run_queue+0x16/0x20
[<ffffffff81222de4>] cfq_kick_queue+0x34/0x50
[<ffffffff8106e039>] process_one_work+0x129/0x4a0
[<ffffffff81070401>] worker_thread+0x161/0x340
[<ffffffff810702a0>] ? manage_workers.clone.22+0x230/0x230
[<ffffffff81074d36>] kthread+0x96/0xa0
[<ffffffff814cb7b4>] kernel_thread_helper+0x4/0x10
[<ffffffff814c9873>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff814c2dc0>] ? retint_restore_args+0x5/0x6
[<ffffffff814cb7b0>] ? gs_change+0x13/0x13
Code: ff 5b 41 5c c9 c3 0f 1f 80 00 00 00 00 4c 89 e7 be 20 00 00 00 e8 a3 a2 ff ff 48 85 c0 48 89 c7 74 36 48 89 83 e0 00 00 00 eb 9b <0f> 0b 48 8b 00 48 85 c0 0f 84 77 ff ff ff 48 8b 40 50 48 85 c0
RIP [<ffffffff812f7eae>] scsi_setup_fs_cmnd+0x9e/0xe0
RSP <ffff88009b299b70>
---[ end trace 35f29bc1fdb897a6 ]---
note: kworker/7:2[5056] exited with preempt_count 1

(followed by an oops and the machine is dead in an instant)

Christophe


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-08-2011, 04:14 PM
Mike Snitzer
 
Default Block regression since 3.1-rc3

On Sat, Oct 08 2011 at 7:02am -0400,
Shaohua Li <shli@kernel.org> wrote:

> Looks the dm request based flush logic is broken.
>
> saved_make_request_fn
> __make_request
> blk_insert_flush
> but blk_insert_flush doesn't put the original request to list, instead, the
> q->flush_rq is in list.
> then
> dm_request_fn
> blk_peek_request
> dm_prep_fn
> clone_rq
> map_request
> blk_insert_cloned_request
> so q->flush_rq is cloned, and get dispatched. but we can't clone q->flush_rq
> and use it to do flush. map_request even could assign a different blockdev to
> the cloned request.

You haven't explained why cloning q->flush_rq is broken. What is the
problem with map_request changing the blockdev? For the purposes of
request-based DM the flush machinery has already managed the processing
of the flush at the higher level request_queue.

By the time request-based DM is cloning a flush request it really has no
need to reenter the flush machinery (even though Tejun wants it to --
but in practice it doesn't buy us anything because we never stack
request-based DM at the moment. Instead it showcases how brittle this
path is).

> Clone q->flush_rq is absolutely wrong.

I'm still missing the _why_.

Taking a step back:

Unless others have an immediate ah-ha moment, I'd suggest we revert
commit 4853abaae7e4a2a (block: fix flush machinery for stacking drivers
with differring flush flags). Whereby avoiding unnecessarily reentering
the flush machinery.

If commit ed8b752bccf256 (dm table: set flush capability based on
underlying devices) is in place the flush gets fed directly to
scsi_request_fn, which is fine because the request-based DM's
request_queue's flush_flags reflect the flush capabilities of the
underlying device(s).

We are then covered relative to the only request-based DM use-case
people care about (e.g. dm-multipath, which doesn't use stacked
request-based DM).

We can revisit upholding the purity of the flush machinery for stacked
devices in >= 3.2.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 10-09-2011, 04:26 AM
Shaohua Li
 
Default Block regression since 3.1-rc3

2011/10/8 Christophe Saout <christophe@saout.de>:
> Hi Shaohua,
>
>> >> Line 323 is this one: BUG_ON(!rq->bio || rq->bio != rq->biotail);
>> >
>> > OK, it turns out my testing was incomplete. *I only tested targets that
>> > had a write-through cache, so I didn't hit this problem. *It reproduces
>> > pretty easily with just multipath involved (no linear target on top) when
>> > running against the right storage.
>> >
>> > So, here's a patch, but I don't have a full explanation for it just yet.
>> > What I observed was that, on fsync, blkdev_issue_flush was called.
>> > Eventually, the flush request gets cloned, and blk_insert_cloned_request
>> > is called. *This cloned request never actually gets issued to the
>> > q->requst_fn (scsi_request_fn in my case). *So, it may be that there is
>> > no plug list for this, so the queue isn't goosed? *I'll try to come up
>> > with a better explanation, or Tejun may just know off the top of his
>> > head what's going on.
>> >
>> > So, the patch works for me, but is very much just an RFC.
>> [...]
>> Looks the dm request based flush logic is broken.
>>
>> saved_make_request_fn
>> * __make_request
>> * * blk_insert_flush
>> but blk_insert_flush doesn't put the original request to list, instead, the
>> q->flush_rq is in list.
>> then
>> dm_request_fn
>> * blk_peek_request
>> * * dm_prep_fn
>> * * * clone_rq
>> * map_request
>> * * blk_insert_cloned_request
>> so q->flush_rq is cloned, and get dispatched. but we can't clone q->flush_rq
>> and use it to do flush. map_request even could assign a different blockdev to
>> the cloned request.
>>
>> Clone q->flush_rq is absolutely wrong. we can clear ogirinal request's flush bit
>> and set cloned request flush bit, so we can skip the blk_insert_flush logic for
>> original request.
>>
>> can you please try this patch. Only compilation tested.
>
> Now I get a different BUG when doing what triggered the original BUG
> (e.g. unmounting the filesystem):
>
> ------------[ cut here ]------------
> kernel BUG at drivers/scsi/scsi_lib.c:1152!
> invalid opcode: 0000 [#1] PREEMPT SMP
> CPU 7
> Modules linked in: xenfs gfs2 fuse dlm configfs dummy bonding xt_TPROXY nf_tproxy_core xt_socket xt_owner xt_connmark iptable_mangle ipt_MASQUERADE iptable_nat ipt_ULOG ipt_REJECT xt_mark iptable_filter iptable_raw ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state xt_tcpudp xt_physdev xt_hashlimit xt_multiport ip6table_filter ip6_tables ebt_redirect ebt_ip ebt_mark_m ebtable_broute bridge stp llc ebtable_nat ext4 jbd2 dm_round_robin dm_multipath ebtable_filter ebtables x_tables nf_nat_tftp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_h323 nf_conntrack_h323 nf_nat_ftp nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack_ftp nf_conntrack xen_netback xen_blkback xen_gntdev ipmi_si ipmi_devintf ipmi_msghandler loop ohci_hcd ehci_hcd e1000e tpm_tis igb qla2xxx tpm tpm_bios scsi_transport_fc pata_atiixp i2c_piix4
>
> Pid: 5056, comm: kworker/7:2 Not tainted 3.1.0-rc8 #1 Supermicro H8DG6/H8DGi/H8DG6/H8DGi
> RIP: e030:[<ffffffff812f7eae>] *[<ffffffff812f7eae>] scsi_setup_fs_cmnd+0x9e/0xe0
> RSP: e02b:ffff88009b299b70 *EFLAGS: 00010046
> RAX: 0000000000000000 RBX: ffff88009b9cc018 RCX: 0000000000000000
> RDX: ffff88009f193c00 RSI: ffff88009b9cc018 RDI: ffff88009b905800
> RBP: ffff88009b299b80 R08: 0000000000000000 R09: 0000000000000001
> R10: 0000000000000001 R11: ffff8800a0e40700 R12: ffff88009b905800
> R13: 0000000000000001 R14: ffff88009b905800 R15: 0000000000000000
> FS: *00007fc76ab77700(0000) GS:ffff8800d767f000(0000) knlGS:0000000000000000
> CS: *e033 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007fc76a1d0beb CR3: 0000000001805000 CR4: 0000000000000660
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process kworker/7:2 (pid: 5056, threadinfo ffff88009b298000, task ffff88009ed68000)
> Stack:
> *ffff88009b9cc018 ffff88009e5f1a60 ffff88009b299bf0 ffffffff8130675f
> *ffff88009b299ba0 ffffffff810ceef0 ffff88009f193c00 ffffffff00000000
> *ffffffff8100ab6d ffff88009b299c60 ffff88009ed68000 ffff88009b9cc018
> Call Trace:
> *[<ffffffff8130675f>] sd_prep_fn+0x14f/0xa70
> *[<ffffffff810ceef0>] ? mempool_alloc_slab+0x10/0x20
> *[<ffffffff8100ab6d>] ? xen_force_evtchn_callback+0xd/0x10
> *[<ffffffff81215db2>] blk_peek_request+0xd2/0x270
> *[<ffffffff812f8aeb>] scsi_request_fn+0x4b/0x3c0
> *[<ffffffff8120fa46>] __blk_run_queue+0x16/0x20
> *[<ffffffff8120ef03>] __elv_add_request+0xb3/0x270
> *[<ffffffff81213494>] add_acct_request+0x34/0x40
> *[<ffffffff8121350b>] blk_insert_cloned_request+0x6b/0x90
> *[<ffffffff8138c049>] dm_dispatch_request+0x39/0x70
> *[<ffffffff8138db9c>] dm_request_fn+0x17c/0x280
> *[<ffffffff8100b322>] ? check_events+0x12/0x20
> *[<ffffffff81222db0>] ? cfq_init_queue+0x430/0x430
> *[<ffffffff8120fa46>] __blk_run_queue+0x16/0x20
> *[<ffffffff81222de4>] cfq_kick_queue+0x34/0x50
> *[<ffffffff8106e039>] process_one_work+0x129/0x4a0
> *[<ffffffff81070401>] worker_thread+0x161/0x340
> *[<ffffffff810702a0>] ? manage_workers.clone.22+0x230/0x230
> *[<ffffffff81074d36>] kthread+0x96/0xa0
> *[<ffffffff814cb7b4>] kernel_thread_helper+0x4/0x10
> *[<ffffffff814c9873>] ? int_ret_from_sys_call+0x7/0x1b
> *[<ffffffff814c2dc0>] ? retint_restore_args+0x5/0x6
> *[<ffffffff814cb7b0>] ? gs_change+0x13/0x13
> Code: ff 5b 41 5c c9 c3 0f 1f 80 00 00 00 00 4c 89 e7 be 20 00 00 00 e8 a3 a2 ff ff 48 85 c0 48 89 c7 74 36 48 89 83 e0 00 00 00 eb 9b <0f> 0b 48 8b 00 48 85 c0 0f 84 77 ff ff ff 48 8b 40 50 48 85 c0
> RIP *[<ffffffff812f7eae>] scsi_setup_fs_cmnd+0x9e/0xe0
> *RSP <ffff88009b299b70>
> ---[ end trace 35f29bc1fdb897a6 ]---
> note: kworker/7:2[5056] exited with preempt_count 1
>
> (followed by an oops and the machine is dead in an instant)
>
there is a typo there, please check the new one.
--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 12:10 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org