I've got a pretty normal Debian Squeeze AMD64 system with the current
kernel from Wheezy. Since 2.6.39-1 I experience this bug:
1. I plug in an external USB hard drive with a NTFS file system on
it's first partition.
2. The drive get's automatically mounted using the fuse-based NTFS
driver (ntfs-3g).
3. I right-click on the icon representing the drive on the GNOME
desktop and select "Safely Remove Drive".
4. The kernel panics, see attached screenshot.
Please not that this bug *doesn't* appear in 2.6.38-5, however it *is*
still present in 2.6.39-2. Please also note that there is *no* problem
if I first unmount the drive manually (using umount or some GUI) and
then select "Safely Remove Drive".
Please tell me if you need any more information to fix this bug. If
reproducing turns out to be difficult, I can try to git-bisect the bug
myself.
Best regards
Alexander Kurtz
06-22-2011, 02:40 AM
Ben Hutchings
Bug#631187: Kernel panics when removing external hard drive
On Tue, 2011-06-21 at 11:08 +0200, Alexander Kurtz wrote:
> Package: linux-2.6
> Version: 2.6.39-1
> Severity: serious
>
> Hi,
>
> I've got a pretty normal Debian Squeeze AMD64 system with the current
> kernel from Wheezy. Since 2.6.39-1 I experience this bug:
>
> 1. I plug in an external USB hard drive with a NTFS file system on
> it's first partition.
> 2. The drive get's automatically mounted using the fuse-based NTFS
> driver (ntfs-3g).
> 3. I right-click on the icon representing the drive on the GNOME
> desktop and select "Safely Remove Drive".
Which version of GNOME is this?
> 4. The kernel panics, see attached screenshot.
[...]
The panic message shows there was an earlier kernel warning; please can
you provide that.
Ben.
--
Ben Hutchings
I'm always amazed by the number of people who take up solipsism because
they heard someone else explain it. - E*Borg on alt.fan.pratchett
06-22-2011, 09:59 AM
Alexander Kurtz
Bug#631187: Kernel panics when removing external hard drive
On Wed, 2011-06-22 at 03:40 +0100, Ben Hutchings wrote:
> Which version of GNOME is this?
2.30.2. Apart from the newer kernel, this is a pure Squeeze system.
> The panic message shows there was an earlier kernel warning; please can
> you provide that.
Thanks to netconsole (a really great tool!) I was able to so. The
attached kernel log starts right before I plug the drive in.
Surprisingly the kernel didn't crash the first time, but after trying
again, everything went as expected (see lines 17 and 35). Please note
that I replaced the drive's serial number.
Bug#631187: Kernel panics when removing external hard drive
Hi,
Alexander Kurtz wrote:
> On Wed, 2011-06-22 at 03:40 +0100, Ben Hutchings wrote:
>> The panic message shows there was an earlier kernel warning; please can
>> you provide that.
>
> Thanks to netconsole (a really great tool!) I was able to so. The
> attached kernel log starts right before I plug the drive in.
> Surprisingly the kernel didn't crash the first time, but after trying
> again, everything went as expected (see lines 17 and 35).
Sorry for the long silence. Let's see:
> [ 1421.182657] sd 7:0:0:0: [sdc] Attached SCSI disk
> [ 1454.865926] WARNING! power/level is deprecated; use power/control instead
Seems harmless enough.
> [ 1478.728383] sd 8:0:0:0: [sdc] Attached SCSI disk
> [ 1491.693027] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
> [ 1491.693229] IP: [<ffffffff8118b2e3>] elv_completed_request+0x38/0x47
Disassembly, for convenience (following the hints from
Documentation/oops-tracing.txt):
| <+0>: rex je 0x6008b8 <str+56>
| <+3>: cmpl $0x1,0x44(%rsi)
| <+7>: je 0x60088d <str+13>
| <+9>: test $0x40,%al
| <+11>: je 0x6008b8 <str+56>
| <+13>: and $0x11,%eax
| <+16>: dec %eax
| <+18>: setne %al
| <+21>: and $0x1,%eax
| <+24>: add $0xfc,%rax
| <+30>: decl 0x4(%rdi,%rax,4)
| <+34>: testb $0x4,0x41(%rsi)
| <+38>: je 0x6008b8 <str+56>
| <+40>: mov (%rdx),%rax
| <+43>: cmp %ah,0x40(%rdx)
| <+46>: rex.W
| <+47>: test %rax,%rax
| <+50>: je 0x6008b8 <str+56>
| <+52>: pop %r8
| <+54>: jmpq *%rax
| <+56>: pop %rcx
| <+57>: retq
| <+58>: lea 0x80(%rsi),%rdi
So offset 0x38 is the jump in
if ((rq->cmd_flags & REQ_SORTED) &&
As for why that involves an access to the address 0x48: well, that
is beyond my depth. rq->cmd_flags was already accessed in the check
if (blk_account_rq(rq))
Maybe the actual cause of the fault is some different instruction and
the instruction pointer is not to be trusted (?). I suppose if I were
in this situation, I'd sprinkle block/elevator.c::elv_completed_request
with printk calls to be able to witness exactly what happens.
Sorry for the trouble, and hope that helps.
Jonathan
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110705225129.GA8701@elie">http://lists.debian.org/20110705225129.GA8701@elie
07-08-2011, 03:30 AM
Ben Hutchings
Bug#631187: Kernel panics when removing external hard drive
which was included in stable version 2.6.39.2 and our package version
2.6.39-3.
Alexander, please test the new package version.
Ben.
--
Ben Hutchings
The two most common things in the universe are hydrogen and stupidity.
07-08-2011, 04:02 AM
Jonathan Nieder
Bug#631187: Kernel panics when removing external hard drive
Hi Ben,
Ben Hutchings wrote:
> There is a byte missing between the two lines (in fact, the very byte
> which RIP points to), and you are mixing decimal and hexadecimal
> offsets.
>
> In fact RIP is pointing into the second half of this test:
>
> if ((rq->cmd_flags & REQ_SORTED) &&
> e->ops->elevator_completed_req_fn)
>
> and e->ops was NULL.
Ah, that makes sense.
> This might be fixed by:
>
> commit 0769e21bf4b5cf48878c1ca819276e80465b39e7
> Author: James Bottomley <James.Bottomley@HansenPartnership.com>
> Date: Wed May 25 15:52:14 2011 -0500
>
> Fix oops caused by queue refcounting failure
>
> commit e73e079bf128d68284efedeba1fbbc18d78610f9 upstream.
As does that. Thanks for explaining.
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110708040250.GB2559@elie">http://lists.debian.org/20110708040250.GB2559@elie
07-09-2011, 03:58 PM
Alexander Kurtz
Bug#631187: Kernel panics when removing external hard drive
On Fri, 2011-07-08 at 04:30 +0100, Ben Hutchings wrote:
> Alexander, please test the new package version.
I just tested 2.6.39-3 from sid and 3.0.0~rc6-1~experimental.1 from
experimental. Unfortunately both reliably panic when safely removing my
external hard drive. 2.6.38-5 (still) works fine. Seems like it's time
for me to do a git bisect, or do you any other ideas?
Best regards
Alexander Kurtz
07-09-2011, 05:14 PM
Jonathan Nieder
Bug#631187: Kernel panics when removing external hard drive
> I just tested 2.6.39-3 from sid and 3.0.0~rc6-1~experimental.1 from
> experimental. Unfortunately both reliably panic when safely removing my
> external hard drive. 2.6.38-5 (still) works fine. Seems like it's time
> for me to do a git bisect, or do you any other ideas?
I'd suggest attaching the full dmesg from 3.0.0~rc6 and any other
relevant information to https://bugzilla.kernel.org/show_bug.cgi?id=38842
first. Maybe someone upstream will have ideas.
Thanks again.
Jonathan
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110709171451.GA3341@elie">http://lists.debian.org/20110709171451.GA3341@elie
07-13-2011, 06:53 PM
Alexander Kurtz
Bug#631187: Kernel panics when removing external hard drive
On Wed, 2011-07-13 at 22:03 +1000, Linh Nguyen wrote:
> Hello Alexander,
>
> How are you? I came across your post
> http://lists.debian.org/debian-kernel/2011/06/msg00580.html detailing
> similar issue as to what I am experiencing.
>
> Every time I unmount a portable HDD (normal USB sticks are fine), i get
> a kernel panic the the "power/level is deprecated; use power/control
> instead" error message.
>
> Despite my extensive googling, i've not been able to find a solution. I
> was wondering whether or not you have solved your issue. Cheers.
>
>
> Sincerely,
>
> L
Sorry, I've got no solution either. Since this is kind of a low-priority
bug for me, I'm fine with manually unmounting (using umount or some GUI)
my external drive before removing it. My current plan is to wait for 3.0
and then maybe do a git bisect if it's not fixed by then. However, you
should check out the Debian bug report[1], the Ubuntu bug report[2] and
the upstream bug report[3], maybe you'll find something there.