FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 09-03-2012, 02:01 AM
Jonathan Nieder
 
Default Bug#682233: mpt2sas: kernel crash under load with hanged disks

Hi George,

George Shuklin wrote:

> Tags: upstream

Which upstream version did you test?

[...]
> That bug found in 3.2 and 3.3 versions of kernel, but not
> reproducing in 3.0.
[...]
> 1) Set up large raid10.
> 2) Start it rebuild
> 3) run addition io on raid (dd if=/dev/md0 of=/dev/md0)
> 4) Somehow make to slow down IO on two or more disks. We found that
> bug in wild with normal load, but following scripts allows to see it
> in few minutes:
[...]
> end_request: I/O error, dev sdf, sector 729088
> ------------[ cut here ]------------
> kernel BUG at [...]/linux-3.4.4/drivers/scsi/scsi_lib.c:1154!
[...]
> Pid: 343, comm: kworker/5:1 Not tainted 3.4-trunk-amd64 #1 Supermicro X8DTN+-F/X8DTN+-F
[...]
> Call Trace:
> [<ffffffffa00dbafa>] ? sd_prep_fn+0x2e9/0xb8e [sd_mod]
> [<ffffffff811ace28>] ? cfq_dispatch_requests+0x722/0x880
> [<ffffffff81196589>] ? create_io_context+0x5a/0x5a
> [<ffffffff811993dd>] ? blk_peek_request+0xcf/0x1ac
[...]
> Code: 85 c0 74 1d 48 8b 00 48 85 c0 74 15 48 8b 40 48 48 85 c0 74 0c 48 89 ee 48 89 df ff d0 85 c0 75 44 66 83 bd e0 00 00 00 00 75 02 <0f> 0b 48 89 ee 48 89 df e8 62 ec ff ff 48 85 c0 48 89 c2 74 20
> RIP [<ffffffffa0076104>] scsi_setup_fs_cmnd+0x45/0x83 [scsi_mod]

Thanks for a clear report, and sorry for the slow reply.

This is "BUG_ON(!req->nr_phys_segments)". Smells similar to [1],
which bisected to v3.1-rc1~131^2~31 and was fixed by v3.2.2~91
(md/raid1: perform bad-block tests for WriteMostly devices too,
2012-01-09), aka v3.3-rc3~3^2~2.

But that wouldn't explain triggering the same trace in a 3.4.y kernel.

Is this reproducible with 3.5.2 or newer from experimental? Which
3.2.y kernel did you use to experience it?

Curious,
Jonathan

[1] http://thread.gmane.org/gmane.linux.raid/36732


--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120903020130.GA2719@mannheim-rule.local
 
Old 09-03-2012, 02:09 AM
George Shuklin
 
Default Bug#682233: mpt2sas: kernel crash under load with hanged disks

We've tested it with vanilla 3.2.12, problem was same.


On 03.09.2012 06:01, Jonathan Nieder wrote:

Hi George,

George Shuklin wrote:


Tags: upstream

Which upstream version did you test?

[...]

That bug found in 3.2 and 3.3 versions of kernel, but not
reproducing in 3.0.

[...]

1) Set up large raid10.
2) Start it rebuild
3) run addition io on raid (dd if=/dev/md0 of=/dev/md0)
4) Somehow make to slow down IO on two or more disks. We found that
bug in wild with normal load, but following scripts allows to see it
in few minutes:

[...]

end_request: I/O error, dev sdf, sector 729088
------------[ cut here ]------------
kernel BUG at [...]/linux-3.4.4/drivers/scsi/scsi_lib.c:1154!

[...]

Pid: 343, comm: kworker/5:1 Not tainted 3.4-trunk-amd64 #1 Supermicro X8DTN+-F/X8DTN+-F

[...]

Call Trace:
[<ffffffffa00dbafa>] ? sd_prep_fn+0x2e9/0xb8e [sd_mod]
[<ffffffff811ace28>] ? cfq_dispatch_requests+0x722/0x880
[<ffffffff81196589>] ? create_io_context+0x5a/0x5a
[<ffffffff811993dd>] ? blk_peek_request+0xcf/0x1ac

[...]

Code: 85 c0 74 1d 48 8b 00 48 85 c0 74 15 48 8b 40 48 48 85 c0 74 0c 48 89 ee 48 89 df ff d0 85 c0 75 44 66 83 bd e0 00 00 00 00 75 02<0f> 0b 48 89 ee 48 89 df e8 62 ec ff ff 48 85 c0 48 89 c2 74 20
RIP [<ffffffffa0076104>] scsi_setup_fs_cmnd+0x45/0x83 [scsi_mod]

Thanks for a clear report, and sorry for the slow reply.

This is "BUG_ON(!req->nr_phys_segments)". Smells similar to [1],
which bisected to v3.1-rc1~131^2~31 and was fixed by v3.2.2~91
(md/raid1: perform bad-block tests for WriteMostly devices too,
2012-01-09), aka v3.3-rc3~3^2~2.

But that wouldn't explain triggering the same trace in a 3.4.y kernel.

Is this reproducible with 3.5.2 or newer from experimental? Which
3.2.y kernel did you use to experience it?

Curious,
Jonathan

[1] http://thread.gmane.org/gmane.linux.raid/36732



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 50441173.9020906@gmail.com">http://lists.debian.org/50441173.9020906@gmail.com
 
Old 09-03-2012, 02:30 AM
Jonathan Nieder
 
Default Bug#682233: mpt2sas: kernel crash under load with hanged disks

George Shuklin wrote:

> We've tested it with vanilla 3.2.12, problem was same.

Thanks for the quick feedback. Please send a summary of symptoms to
linux-raid@vger.kernel.org, cc-ing Neil Brown <neilb@suse.de> and
either me or this bug log so we can track it.

Be sure to mention:

- steps to reproduce, expected result, actual result, and how
the difference indicates a bug (should be simple enough ---
the summary you sent here would work fine)

- which kernel versions you have tested and what happened with
each

- full "dmesg" output from booting and reproducing the bug, as
an attachment

- any other weird symptoms or observations

- what you would be able to do to track it down (can you run commands
if provided? try patches? bisect to find which commit introduced
the regression?)

If we're lucky, the symptoms will ring a bell for Neil or someone else
on-list or someone will have an idea for a test to try to track it
down further. Otherwise, the best we can do is probably to bisect to
find which specific change introduced the bug, as described at [1].

Regards,
Jonathan

[1] http://kernel-handbook.alioth.debian.org/ch-bugs.html#s9.2.1


--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120903023018.GC2769@mannheim-rule.local
 
Old 09-03-2012, 03:56 AM
George Shuklin
 
Default Bug#682233: mpt2sas: kernel crash under load with hanged disks

I think that problem is specific to LSI drivers, not to linux-raid,
because same tests with Adaptec (aacraid) and few onboard HBAs show no
signs of crashing (hanged disks is just marked as 'failed' and all
systems behave as expected).


I'll try to bisect it at 3.5, but I think it's kinda simple to say where
problem is:


linux-3.0 do have mpt2sas 08.100.00.02 and linux-3.2 do have 10.100.00.00

And note, that mpt2sas do have strange behavior in linux-2.6.32 (version
02.100.03.00) under highload.


On 03.09.2012 06:30, Jonathan Nieder wrote:

George Shuklin wrote:


We've tested it with vanilla 3.2.12, problem was same.

Thanks for the quick feedback. Please send a summary of symptoms to
linux-raid@vger.kernel.org, cc-ing Neil Brown<neilb@suse.de> and
either me or this bug log so we can track it.

Be sure to mention:

- steps to reproduce, expected result, actual result, and how
the difference indicates a bug (should be simple enough ---
the summary you sent here would work fine)

- which kernel versions you have tested and what happened with
each

- full "dmesg" output from booting and reproducing the bug, as
an attachment

- any other weird symptoms or observations

- what you would be able to do to track it down (can you run commands
if provided? try patches? bisect to find which commit introduced
the regression?)

If we're lucky, the symptoms will ring a bell for Neil or someone else
on-list or someone will have an idea for a test to try to track it
down further. Otherwise, the best we can do is probably to bisect to
find which specific change introduced the bug, as described at [1].

Regards,
Jonathan

[1] http://kernel-handbook.alioth.debian.org/ch-bugs.html#s9.2.1



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 50442A55.5050706@gmail.com">http://lists.debian.org/50442A55.5050706@gmail.com
 
Old 09-03-2012, 05:38 AM
Jonathan Nieder
 
Default Bug#682233: mpt2sas: kernel crash under load with hanged disks

George Shuklin wrote:

> I think that problem is specific to LSI drivers, not to linux-raid,
> because same tests with Adaptec (aacraid) and few onboard HBAs show
> no signs of crashing (hanged disks is just marked as 'failed' and
> all systems behave as expected).

Thanks. Very useful.

[...]
> linux-3.0 do have mpt2sas 08.100.00.02 and linux-3.2 do have 10.100.00.00

Between 3.0 and 3.2.12, the mpt2sas driver had 30 patches. That would
be an interesting test: could you try a current kernel with the
mpt2sas driver from 3.0.y? It works like this:

0. prerequisites:

apt-get install git build-essential

1. get the kernel history, if you don't already have it:

git clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

2. fetch point releases:

cd linux
git remote add stable
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
git fetch stable

3. configure, build, test:

git checkout origin/master
cp /boot/config-$(uname -r) .config; # current configuration
scripts/config --disable DEBUG_INFO
make localmodconfig; # optional: minimize configuration
make deb-pkg; # optionally with -j<num> for parallel build
dpkg -i ../<name of package>; # as root
reboot
... test test test ...

Hopefully it reproduces the bug. So

4. try the mpt2sas driver from 3.0.y:

cd linux
git checkout stable/linux-3.0.y -- drivers/scsi/mpt2sas
make deb-pkg; # maybe with -j4
dpkg -i ../<name of package>
reboot
... test ...

Jonathan


--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120903053827.GA2951@mannheim-rule.local
 
Old 09-23-2012, 08:56 AM
Jonathan Nieder
 
Default Bug#682233: mpt2sas: kernel crash under load with hanged disks

tags 682233 + upstream patch pending
quit

Hi,

George Shuklin wrote:

> I think this commit is somehow related to that problem:
>
> commit 14216561e164671ce147458653b1fea06a4ada1e
> Author: James Bottomley <JBottomley@Parallels.com>
> Date: Wed Jul 25 23:55:55 2012 +0400
>
> [SCSI] Fix 'Device not ready' issue on mpt2sas

Sounds plausible. That patch was applied upstream as v3.2.30~126, so
please test 3.2.30-1 once it is available.

If impatient before then:

0. prerequisites:

apt-get install git build-essential

1. get the kernel history, if you do not already have it:

git clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

2. fetch point releases:

cd linux
git remote add stable
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
git fetch stable

3. configure, build, attempt to reproduce the bug:

git checkout v3.2.29
cp /boot/config-$(uname -r) .config; # current configuration
scripts/config --disable DEBUG_INFO
make localmodconfig; # optional: minimize configuration
make deb-pkg; # optionally with -j<num> for parallel build
dpkg -i ../<name of package>; # as root
reboot
... test test test ...

Hopefully it reproduces the bug. So

4. update:

cd linux
git merge stable/linux-3.2.y
make deb-pkg; # maybe with -j4
dpkg -i ../<name of package>; # as root
reboot
... test test test ...

Thanks again for your help and patience.

Sincerely,
Jonathan


--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120923085642.GA24835@elie.Belkin
 

Thread Tools




All times are GMT. The time now is 07:19 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org