FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 03-10-2011, 11:30 AM
Dan Tomlinson
 
Default Bug#617666: nfs-kernel-server: Periodic nfsd failure - single nfsd process with high CPU and no mounts working

Package: nfs-kernel-server
Version: 1:1.2.2-4
Severity: grave
Justification: renders package unusable


Hi there,

appologies if this has already been reported but I couldn't see anything quite matching what I'm seeing.

I have a 26TB debian squeeze fileserver providing NFS mounts to a large number of users. The system has been working flawlessly for a number of months but twice in the last week NFS seems to have crashed. The first thing I noticed is that users reported being unable to access shares. Logging into the system I see a single nfsd process taking 100% CPU with a very long run time. Restarting nfs-kernel-server has no effect. The process is unkillable (even with -9) and the system has required a reboot to get it usable again. jnettop is not showing significant network traffic and lsof on /export/ (where all my NFS exports are located) shows no nfs access to any files.

Please let me know if you need any further information. I am going to reboot the server now, so I may not be able to reproduce the problem straight away (but as its happened twice, I am quite sure it will happen again at some point...).

Thanks in advance for your help.

Dan Tomlinson

My /etc/exports file is below:


# /etc/exports: the access control list for filesystems which may be exported
# to NFS clients. See exports(5).
#
# Example for NFSv2 and NFSv3:
# /srv/homes hostname1(no_subtree_check,rw,sync,no_subtree_chec k) hostname2(ro,sync,no_subtree_check)
#
# Example for NFSv4:
#

# misc shares
/export/software 192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/system_tools 192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecur e) 192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecur e)
/export/home 192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecur e)

# flychip shares
/export/flychip/archives 192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/flychip/misc 192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/flychip/production 192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/flychip/share 192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)
/export/flychip/temp 192.168.32.0/24(no_subtree_check,rw,sync,root_squash,insecure) 192.168.128.0/24(no_subtree_check,rw,sync,root_squash,insecure)

# mickelm shares
/export/micklem/releases 192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecur e) 192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecur e)
/export/micklem/data 192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecur e) 192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecur e)

# logic shares
/export/logic/data 192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecur e) 192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecur e)
/export/logic/webdav 192.168.32.0/24(no_subtree_check,rw,sync,no_root_squash,insecur e) 192.168.128.0/24(no_subtree_check,rw,sync,no_root_squash,insecur e)



-- System Information:
Debian Release: 6.0
APT prefers stable
APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/16 CPU cores)
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages nfs-kernel-server depends on:
ii libblkid1 2.17.2-9 block device id library
ii libc6 2.11.2-10 Embedded GNU C Library: Shared lib
ii libcomerr2 1.41.12-2 common error description library
ii libgssapi-krb5-2 1.8.3+dfsg-4 MIT Kerberos runtime libraries - k
ii libgssglue1 0.1-4 mechanism-switch gssapi library
ii libk5crypto3 1.8.3+dfsg-4 MIT Kerberos runtime libraries - C
ii libkrb5-3 1.8.3+dfsg-4 MIT Kerberos runtime libraries
ii libnfsidmap2 0.23-2 An nfs idmapping library
ii librpcsecgss3 0.19-2 allows secure rpc communication us
ii libwrap0 7.6.q-19 Wietse Venema's TCP wrappers libra
ii lsb-base 3.2-23.2squeeze1 Linux Standard Base 3.2 init scrip
ii nfs-common 1:1.2.2-4 NFS support files common to client
ii ucf 3.0025+nmu1 Update Configuration File: preserv

nfs-kernel-server recommends no packages.

nfs-kernel-server suggests no packages.

-- no debconf information



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110310123058.28324.69854.reportbug@fileserver2.s ysbiol.internal.cam.ac.uk">http://lists.debian.org/20110310123058.28324.69854.reportbug@fileserver2.s ysbiol.internal.cam.ac.uk
 
Old 03-20-2011, 04:20 PM
Luk Claes
 
Default Bug#617666: nfs-kernel-server: Periodic nfsd failure - single nfsd process with high CPU and no mounts working

> On 10/03/11 12:54, Debian Bug Tracking System wrote:
>
> I have some extra information about this problem - the syslog contains
> some kernel error messages related to nfs and xfs (the filesystem of the
> /export partition). I have attached the relevant log section...
>
> It could be this is a problem with xfs or even with our hardware raid
> controller. I have rebooted the machine with /export unmounted and am
> currently running xfs_repair over it to see if that picks up any problems.

Hi

I guess your xfs_repair finished by now? Did it shed some more light on
the issue or should we look more closely into the nfs code?

Cheers

Luk



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D86374F.5070102@debian.org">http://lists.debian.org/4D86374F.5070102@debian.org
 
Old 03-21-2011, 10:33 AM
Dan Tomlinson
 
Default Bug#617666: nfs-kernel-server: Periodic nfsd failure - single nfsd process with high CPU and no mounts working

On 20/03/11 17:20, Luk Claes wrote:

On 10/03/11 12:54, Debian Bug Tracking System wrote:

I have some extra information about this problem - the syslog contains
some kernel error messages related to nfs and xfs (the filesystem of the
/export partition). I have attached the relevant log section...

It could be this is a problem with xfs or even with our hardware raid
controller. I have rebooted the machine with /export unmounted and am
currently running xfs_repair over it to see if that picks up any problems.

Hi

I guess your xfs_repair finished by now? Did it shed some more light on
the issue or should we look more closely into the nfs code?

Cheers

Luk


Hi Luk,

thanks for getting back to me. My xfs_repair did finish and it found a
few errors, but I'm not sure if they are from hard resetting the machine
or some indication of a more serious hardware error. I am however
pretty sure that this is not a purely NFS problem - since the repair
finished, the system has crashed in a couple of different ways. Once it
dumped the kernel to the console and went completely unresponsive and
another time the /export partition unmounted itself and wouldn't remount
(giving IO errors). In both cases there was no weird NFS process
hanging around (the mounts just became inaccessible as you would expect
them to after such crashes).


At this point I am pretty sure that I have a hardware issue on my hands,
either with bad RAM or my raid controller. I think we can safely say
NFS is in the clear Sorry for wasting your time!


Dan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4D8737A3.4010005@flymine.org">http://lists.debian.org/4D8737A3.4010005@flymine.org
 

Thread Tools




All times are GMT. The time now is 02:28 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org