Bug#602991: kernel crash with null pointer dereference while umounting nfs
Hi,
George Barnett wrote:
> We maintain a large number of OpenVZ containers on several hosts.
> In the course of running these containers, we keep a number of NFS
> mounts which are presented into the OpenVZ containers.
>
> We currently have 3 test machines we are able to test this on. All
> are running the same image, netbooted. The Stack trace below is
> from a 2 x 12 core AMD box, although we see the exact same crash
> with the same cause on the Intel test nodes too (2 x X5650 6 core).
>
> When we stop all the containers quickly on a host, we see the following repeatable crash:
>
> [ 317.100898] CT: 10018: stopped
> [ 317.912269] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 317.916307] IP: [<ffffffff812ea21e>] _spin_lock_bh+0xe/0x25
> [ 317.916307] PGD 100a5d0067 PUD 100a557067 PMD 0
> [ 317.916307] Oops: 0002 [#1] SMP
> [ 317.916307] last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
[...]
> [ 317.916307] Pid: 7838, comm: umount Not tainted 2.6.32-5-openvz-amd64 #1 dyomin H8DGU
> [ 317.916307] RIP: 0010:[<ffffffff812ea21e>] [<ffffffff812ea21e>] _spin_lock_bh+0xe/0x25
[...]
> [ 317.916307] Call Trace:
> [ 317.916307] [<ffffffffa01b97a4>] ? rpc_wake_up_queued_task+0x12/0x29 [sunrpc]
> [ 317.916307] [<ffffffffa01b9835>] ? rpc_killall_tasks+0x7a/0x9b [sunrpc]
> [ 317.916307] [<ffffffffa0217fed>] ? nfs_umount_begin+0x34/0x3a [nfs]
> [ 317.916307] [<ffffffff81106844>] ? sys_umount+0x11b/0x2e6
> [ 317.916307] [<ffffffff812ec6a5>] ? do_page_fault+0x2e0/0x2fc
> [ 317.916307] [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
> [ 317.916307] Code: e9 ff 5b c3 53 48 89 fb e8 a6 a4 d6 ff 48 89 df
> f0 83 2f 01 79 05 e8 42 73 e9 ff 5b c3 53 48 89 fb e8 8d a4 d6 ff b8
> 00 00 01 00 <f0> 0f c1 03 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 13
> eb f5
Thanks for a clear report. Do you still have access to these systems?
If so, can you still reproduce this?
If this bug is still present, our best bet is probably to get help
from openvz upstream, which might involve trying a different
(alienized RHEL) or newer (3.x.y) kernel.
Sorry for the trouble,
Jonathan
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120215193927.GA23759@burratino">http://lists.debian.org/20120215193927.GA23759@burratino
02-15-2012, 11:14 PM
George Barnett
Bug#602991: kernel crash with null pointer dereference while umounting nfs
Hi Jonathan,
We ended up moving to blessed openvz kernels on Centos 5.5 after hitting a few more bugs in debian openvz. As such, I no longer have any systems I could reproduce this on. Sorry I'm not able to be of help.
Cheers,
George
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 0328E75A-EC66-45AE-8067-B6DFD78B7CBE@atlassian.com">http://lists.debian.org/0328E75A-EC66-45AE-8067-B6DFD78B7CBE@atlassian.com
02-15-2012, 11:27 PM
Jonathan Nieder
Bug#602991: kernel crash with null pointer dereference while umounting nfs
tags 602991 + unreproducible
quit
George Barnett wrote:
> We ended up moving to blessed openvz kernels on Centos 5.5 after
> hitting a few more bugs in debian openvz. As such, I no longer have
> any systems I could reproduce this on. Sorry I'm not able to be of
> help.
No problem. Thanks for the update.
Ola, do the symptoms below look familiar to you? Kernel was 2.6.32-27.
> We maintain a large number of OpenVZ containers on several hosts.
> In the course of running these containers, we keep a number of NFS
> mounts which are presented into the OpenVZ containers.
>
> We currently have 3 test machines we are able to test this on. All
> are running the same image, netbooted. The Stack trace below is
> from a 2 x 12 core AMD box, although we see the exact same crash
> with the same cause on the Intel test nodes too (2 x X5650 6 core).
>
> When we stop all the containers quickly on a host, we see the following repeatable crash:
>
> [ 317.100898] CT: 10018: stopped
> [ 317.912269] BUG: unable to handle kernel NULL pointer dereference at (null)
> [ 317.916307] IP: [<ffffffff812ea21e>] _spin_lock_bh+0xe/0x25
> [ 317.916307] PGD 100a5d0067 PUD 100a557067 PMD 0
> [ 317.916307] Oops: 0002 [#1] SMP
> [ 317.916307] last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
[...]
> [ 317.916307] Pid: 7838, comm: umount Not tainted 2.6.32-5-openvz-amd64 #1 dyomin H8DGU
> [ 317.916307] RIP: 0010:[<ffffffff812ea21e>] [<ffffffff812ea21e>] _spin_lock_bh+0xe/0x25
[...]
> [ 317.916307] Call Trace:
> [ 317.916307] [<ffffffffa01b97a4>] ? rpc_wake_up_queued_task+0x12/0x29 [sunrpc]
> [ 317.916307] [<ffffffffa01b9835>] ? rpc_killall_tasks+0x7a/0x9b [sunrpc]
> [ 317.916307] [<ffffffffa0217fed>] ? nfs_umount_begin+0x34/0x3a [nfs]
> [ 317.916307] [<ffffffff81106844>] ? sys_umount+0x11b/0x2e6
> [ 317.916307] [<ffffffff812ec6a5>] ? do_page_fault+0x2e0/0x2fc
> [ 317.916307] [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
> [ 317.916307] Code: e9 ff 5b c3 53 48 89 fb e8 a6 a4 d6 ff 48 89 df
> f0 83 2f 01 79 05 e8 42 73 e9 ff 5b c3 53 48 89 fb e8 8d a4 d6 ff b8
> 00 00 01 00 <f0> 0f c1 03 0f b7 d0 c1 e8 10 39 c2 74 07 f3 90 0f b7 13
> eb f5
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120216002659.GB29709@burratino">http://lists.debian.org/20120216002659.GB29709@burratino
02-16-2012, 12:07 AM
Steven Chamberlain
Bug#602991: kernel crash with null pointer dereference while umounting nfs
On 16/02/12 00:27, Jonathan Nieder wrote:
> George Barnett wrote:
>> [ 317.916307] Pid: 7838, comm: umount Not tainted 2.6.32-5-openvz-amd64 #1 dyomin H8DGU
>> [ 317.916307] RIP: 0010:[<ffffffff812ea21e>] [<ffffffff812ea21e>] _spin_lock_bh+0xe/0x25
FWIW, I found NFS to be very buggy before the 'feoktistov' version of
the OpenVZ patchset (introduced in linux-2.6 2.6.32-31); since that
version I've had no problems of this nature, and I use nfs quite heavily
between OpenVZ containers.
The 'dyomin' version mentioned above was based on 2.6.32.22 which I
believe had some NFS issues not even specific to OpenVZ, such as
kernel.org BZ#24302, and another mentioned in Debian's changelog for
2.6.32-31.
Hope that helps,
Regards,
--
Steven Chamberlain
steven@pyro.eu.org
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4F3C56E8.1010300@pyro.eu.org">http://lists.debian.org/4F3C56E8.1010300@pyro.eu.org
02-16-2012, 05:21 AM
Ola Lundqvist
Bug#602991: kernel crash with null pointer dereference while umounting nfs
Great.
I was just about to tell (almost) the same, that is that NFS has been rather
buggy. So the approach that Steven proposed is probably the best way to go.
// Ola
On Wed, Feb 15, 2012 at 07:13:28PM -0600, Jonathan Nieder wrote:
> Version: 2.6.32-31
>
> Steven Chamberlain wrote:
> > On 16/02/12 00:27, Jonathan Nieder wrote:
> >> George Barnett wrote:
>
> >>> [ 317.916307] Pid: 7838, comm: umount Not tainted 2.6.32-5-openvz-amd64 #1 dyomin H8DGU
> >>> [ 317.916307] RIP: 0010:[<ffffffff812ea21e>] [<ffffffff812ea21e>] _spin_lock_bh+0xe/0x25
> >
> >>> [ 317.916307] [<ffffffffa01b97a4>] ? rpc_wake_up_queued_task+0x12/0x29 [sunrpc]
> >>> [ 317.916307] [<ffffffffa01b9835>] ? rpc_killall_tasks+0x7a/0x9b [sunrpc]
> >>> [ 317.916307] [<ffffffffa0217fed>] ? nfs_umount_begin+0x34/0x3a [nfs]
> >>> [ 317.916307] [<ffffffff81106844>] ? sys_umount+0x11b/0x2e6
> >>> [ 317.916307] [<ffffffff812ec6a5>] ? do_page_fault+0x2e0/0x2fc
> >>> [ 317.916307] [<ffffffff81010c12>] ? system_call_fastpath+0x16/0x1b
> >
> > Hi,
> >
> > FWIW, I found NFS to be very buggy before the 'feoktistov' version of
> > the OpenVZ patchset (introduced in linux-2.6 2.6.32-31); since that
> > version I've had no problems of this nature, and I use nfs quite heavily
> > between OpenVZ containers.
>
> Thanks, Steven. Let's go with that. ;-)
>
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120216062150.GA7376@inguza.net">http://lists.debian.org/20120216062150.GA7376@inguza.net