FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Device-mapper Development

 
 
LinkBack Thread Tools
 
Old 11-22-2011, 12:26 AM
NeilBrown
 
Default raid1d crash at boot

On Tue, 22 Nov 2011 01:50:37 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
wrote:

> On Mon, Nov 21, 2011 at 07:27:45PM +1100, NeilBrown wrote:
> > On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
> > <James.Bottomley@HansenPartnership.com> wrote:
> > > On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > > > Thank for the report.
> > > > However as this crash is clearly in the SCSI layer it makes sense to reported
> > > > it to linux-scsi - so I have cc:ed this reply there.
> > > >
> > > > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > > wrote:
> > > > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > > > Debian's initrd. This started to happen for kernels since sometime
> > > > > during 3.1-rcX.
> > > > >
> > > > > [ 6.246170] ------------[ cut here ]------------
> > > > > [ 6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > >
> > > I can tell you what it is:
> > >
> > > /*
> > > * Filesystem requests must transfer data.
> > > */
> > > BUG_ON(!req->nr_phys_segments);
> > >
> > > But the fault is in the layer above SCSI. It means something sent a
> > > request with REQ_TYPE_FS but no actual data attached ... this is
> > > supposed to be impossible, hence the bug on.
> >
> > Thanks.... that sounds strangely familiar, but I cannot be sure and google
> > doesn't help.
> >
> > Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?
>
> The whole stack is: ext4 over lvm over dm-crypt over md-raid1 over SATA
> drives. The boot doesn't survive to the point where the initrd script asks
> for md-crypt's key password.
>

That gives us lots of room for pointing the finger of blame, doesn't it?
I think it is -> his problem. :-)

From the md part of the stack trace it looks most like a write request. It
could be a retried read, but that is extremely unlike that early in boot.

So presumably it is some sort of zero-length REQ_FLUSH or something like that.
md/raid1 will just pass those unchanged down.
My guess is that ext4 is generating this and something in the stack is
stripping the REQ_FLUSH .... though why it even tries before asking for a
password is beyond me.

Maybe someone of dm-devel can help?

If not we might need to try a debugging patch like this:


diff --git a/block/blk-core.c b/block/blk-core.c
index f43c8a5..59cb2ad 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1560,7 +1560,7 @@ generic_make_request_checks(struct bio *bio)
goto end_io;
}
}
-
+ WARN_ON(((bio->bi_rw & (REQ_FLUSH | REQ_FUA)) && nr_sectors == 0);
if ((bio->bi_rw & REQ_DISCARD) &&
(!blk_queue_discard(q) ||
((bio->bi_rw & REQ_SECURE) &&


NeilBrown

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 11-22-2011, 11:03 AM
Michał Mirosław
 
Default raid1d crash at boot

On Tue, Nov 22, 2011 at 12:26:57PM +1100, NeilBrown wrote:
> On Tue, 22 Nov 2011 01:50:37 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> wrote:
>
> > On Mon, Nov 21, 2011 at 07:27:45PM +1100, NeilBrown wrote:
> > > On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
> > > <James.Bottomley@HansenPartnership.com> wrote:
> > > > On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > > > > Thank for the report.
> > > > > However as this crash is clearly in the SCSI layer it makes sense to reported
> > > > > it to linux-scsi - so I have cc:ed this reply there.
> > > > >
> > > > > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > > > wrote:
> > > > > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > > > > Debian's initrd. This started to happen for kernels since sometime
> > > > > > during 3.1-rcX.
> > > > > >
> > > > > > [ 6.246170] ------------[ cut here ]------------
> > > > > > [ 6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > > >
> > > > I can tell you what it is:
> > > >
> > > > /*
> > > > * Filesystem requests must transfer data.
> > > > */
> > > > BUG_ON(!req->nr_phys_segments);
> > > >
> > > > But the fault is in the layer above SCSI. It means something sent a
> > > > request with REQ_TYPE_FS but no actual data attached ... this is
> > > > supposed to be impossible, hence the bug on.
> > >
> > > Thanks.... that sounds strangely familiar, but I cannot be sure and google
> > > doesn't help.
> > >
> > > Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?
> >
> > The whole stack is: ext4 over lvm over dm-crypt over md-raid1 over SATA
> > drives. The boot doesn't survive to the point where the initrd script asks
> > for md-crypt's key password.
> >
>
> That gives us lots of room for pointing the finger of blame, doesn't it?
> I think it is -> his problem. :-)
>
> From the md part of the stack trace it looks most like a write request. It
> could be a retried read, but that is extremely unlike that early in boot.
>
> So presumably it is some sort of zero-length REQ_FLUSH or something like that.
> md/raid1 will just pass those unchanged down.
> My guess is that ext4 is generating this and something in the stack is
> stripping the REQ_FLUSH .... though why it even tries before asking for a
> password is beyond me.

I pointed finger at md because when dm-crypt is not yet set up
then only thing working is the array. All filesystems need the
dm-crypt mapping first.

>From the dmesg on 3.0, I see that NCQ is enabled but FUA is not:

[ 2.269487] ata1: SATA max UDMA/133 abar m2048@0xfbd25000 port 0xfbd25100 irq 64
[ 2.588395] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 2.588979] ata1.00: ATA-8: KINGSTON SV100S264G, D110225a, max UDMA/100
[ 2.589037] ata1.00: 125045424 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 2.589321] ata1.00: configured for UDMA/100
[ 2.589440] scsi 1:0:0:0: Direct-Access ATA KINGSTON SV100S2 D110 PQ: 0 ANSI: 5
[ 2.631113] sd 1:0:0:0: [sda] 125045424 512-byte logical blocks: (64.0 GB/59.6 GiB)
[ 2.631265] sd 1:0:0:0: [sda] Write Protect is off
[ 2.631267] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 2.631296] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2.632119] sd 1:0:0:0: [sda] Attached SCSI disk

[ 2.269557] ata2: SATA max UDMA/133 abar m2048@0xfbd25000 port 0xfbd25180 irq 64
[ 2.588916] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 2.628336] ata2.00: ATA-8: ST9500420AS, 0002SDM1, max UDMA/133
[ 2.628396] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 31/32)
[ 2.630143] ata2.00: configured for UDMA/133
[ 2.630238] scsi 2:0:0:0: Direct-Access ATA ST9500420AS 0002 PQ: 0 ANSI: 5
[ 2.631236] sd 2:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB)
[ 2.631792] sd 2:0:0:0: [sdb] Write Protect is off
[ 2.632031] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 2.632050] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 2.636038] sd 2:0:0:0: [sdb] Attached SCSI disk

There's two RAID1 array on both of the disks, and one more RAID1 (with second
leg missing) on sdb.

> diff --git a/block/blk-core.c b/block/blk-core.c
> index f43c8a5..59cb2ad 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1560,7 +1560,7 @@ generic_make_request_checks(struct bio *bio)
> goto end_io;
> }
> }
> -
> + WARN_ON(((bio->bi_rw & (REQ_FLUSH | REQ_FUA)) && nr_sectors == 0);
> if ((bio->bi_rw & REQ_DISCARD) &&
> (!blk_queue_discard(q) ||
> ((bio->bi_rw & REQ_SECURE) &&

I'll try that. I hope it can be caught through netconsole.

Best Regards,
Michał Mirosław

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 
Old 11-22-2011, 11:10 AM
Michał Mirosław
 
Default raid1d crash at boot

On Tue, Nov 22, 2011 at 01:03:37PM +0100, Michał Mirosław wrote:
> On Tue, Nov 22, 2011 at 12:26:57PM +1100, NeilBrown wrote:
> > On Tue, 22 Nov 2011 01:50:37 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > wrote:
> >
> > > On Mon, Nov 21, 2011 at 07:27:45PM +1100, NeilBrown wrote:
> > > > On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
> > > > <James.Bottomley@HansenPartnership.com> wrote:
> > > > > On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > > > > > Thank for the report.
> > > > > > However as this crash is clearly in the SCSI layer it makes sense to reported
> > > > > > it to linux-scsi - so I have cc:ed this reply there.
> > > > > >
> > > > > > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > > > > wrote:
> > > > > > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > > > > > Debian's initrd. This started to happen for kernels since sometime
> > > > > > > during 3.1-rcX.
> > > > > > >
> > > > > > > [ 6.246170] ------------[ cut here ]------------
> > > > > > > [ 6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > > > >
> > > > > I can tell you what it is:
> > > > >
> > > > > /*
> > > > > * Filesystem requests must transfer data.
> > > > > */
> > > > > BUG_ON(!req->nr_phys_segments);
> > > > >
> > > > > But the fault is in the layer above SCSI. It means something sent a
> > > > > request with REQ_TYPE_FS but no actual data attached ... this is
> > > > > supposed to be impossible, hence the bug on.
> > > >
> > > > Thanks.... that sounds strangely familiar, but I cannot be sure and google
> > > > doesn't help.
> > > >
> > > > Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?
> > >
> > > The whole stack is: ext4 over lvm over dm-crypt over md-raid1 over SATA
> > > drives. The boot doesn't survive to the point where the initrd script asks
> > > for md-crypt's key password.
> > >
> >
> > That gives us lots of room for pointing the finger of blame, doesn't it?
> > I think it is -> his problem. :-)
> >
> > From the md part of the stack trace it looks most like a write request. It
> > could be a retried read, but that is extremely unlike that early in boot.
> >
> > So presumably it is some sort of zero-length REQ_FLUSH or something like that.
> > md/raid1 will just pass those unchanged down.
> > My guess is that ext4 is generating this and something in the stack is
> > stripping the REQ_FLUSH .... though why it even tries before asking for a
> > password is beyond me.
>
> I pointed finger at md because when dm-crypt is not yet set up
> then only thing working is the array. All filesystems need the
> dm-crypt mapping first.
>
> From the dmesg on 3.0, I see that NCQ is enabled but FUA is not:
>
> [ 2.269487] ata1: SATA max UDMA/133 abar m2048@0xfbd25000 port 0xfbd25100 irq 64
> [ 2.588395] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [ 2.588979] ata1.00: ATA-8: KINGSTON SV100S264G, D110225a, max UDMA/100
> [ 2.589037] ata1.00: 125045424 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
> [ 2.589321] ata1.00: configured for UDMA/100
> [ 2.589440] scsi 1:0:0:0: Direct-Access ATA KINGSTON SV100S2 D110 PQ: 0 ANSI: 5
> [ 2.631113] sd 1:0:0:0: [sda] 125045424 512-byte logical blocks: (64.0 GB/59.6 GiB)
> [ 2.631265] sd 1:0:0:0: [sda] Write Protect is off
> [ 2.631267] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [ 2.631296] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [ 2.632119] sd 1:0:0:0: [sda] Attached SCSI disk
>
> [ 2.269557] ata2: SATA max UDMA/133 abar m2048@0xfbd25000 port 0xfbd25180 irq 64
> [ 2.588916] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [ 2.628336] ata2.00: ATA-8: ST9500420AS, 0002SDM1, max UDMA/133
> [ 2.628396] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 31/32)
> [ 2.630143] ata2.00: configured for UDMA/133
> [ 2.630238] scsi 2:0:0:0: Direct-Access ATA ST9500420AS 0002 PQ: 0 ANSI: 5
> [ 2.631236] sd 2:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB)
> [ 2.631792] sd 2:0:0:0: [sdb] Write Protect is off
> [ 2.632031] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> [ 2.632050] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [ 2.636038] sd 2:0:0:0: [sdb] Attached SCSI disk
>
> There's two RAID1 array on both of the disks, and one more RAID1 (with second
> leg missing) on sdb.

I just remembered that the sdb leg of the main array has write-mostly flag
set. I checked /proc/mdstat from running system and it turns out that now
I have both legs marked so. Does this ring a bell?

cat /proc/mdstat
Personalities : [raid1]
md2 : active (auto-read-only) raid1 sdb3[0]
425862712 blocks super 1.2 [2/1] [U_]

md1 : active raid1 sda2[3](W) sdb2[2](W)
62396688 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
123892 blocks super 1.2 [2/2] [UU]

unused devices: <none>

Best Regards,
Michał Mirosław

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel
 

Thread Tools




All times are GMT. The time now is 03:08 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org