FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 01-03-2008, 04:25 PM
Doug Tucker
 
Default cluster suite & gfs problem since update

I have a cluster that has been operational for some time and functioning
flawlessly until a recent yum update. The last unflawed working kernel
was 2.6.9-55.0.12.ELsmp. The current kernel is 2.6.9-67ELsmp. The
problem appears to be some type of infinite recovery loop of sorts. It
runs find for a few minutes, then the service restarts itself. What I
am seeing in /var/log/messages is:

Jan 3 11:17:47 engrfs1 clurgmgrd: [5614]: <err> nfsclient:skynet_disted
is missing!
Jan 3 11:17:47 engrfs1 clurgmgrd[5614]: <notice> status on
nfsclient:skynet_disted returned 1
(generic error)
Jan 3 11:17:47 engrfs1 bash: [27695]: <info> Removing export:
129.119.113.108:/mnt/disted
Jan 3 11:17:47 engrfs1 bash: [27695]: <info> Adding export:
129.119.113.108:/mnt/disted (rw)


It does this for every client definition on the service. After it gets
to the last one, it then restarts the serivce:

Jan 3 11:16:25 engrfs1 clurgmgrd[5614]: <notice> Stopping service
disted_export
Jan 3 11:16:26 engrfs1 clurgmgrd: [5614]: <info> Removing IPv4 address
129.119.113.180 from et
h0
Jan 3 11:16:36 engrfs1 clurgmgrd[5614]: <notice> Service disted_export
is recovering
Jan 3 11:16:36 engrfs1 clurgmgrd[5614]: <notice> Recovering failed
service disted_export

Then adds the exports and starts services again:

Jan 3 11:16:36 engrfs1 clurgmgrd: [5614]: <info> Adding export:
129.119.113.108:/mnt/disted (r
w)
Jan 3 11:16:36 engrfs1 clurgmgrd: [5614]: <info> Adding IPv4 address
129.119.113.180 to eth0
Jan 3 11:16:37 engrfs1 clurgmgrd[5614]: <notice> Service disted_export
started

And then starts over at the beginning again continuously. This is a
production system and this behaviour is causing the clients to hang (of
course) during the restart. Thanks much for your help!

Sincerely,

Doug Tucker




_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 01-04-2008, 03:52 PM
Doug Tucker
 
Default cluster suite & gfs problem since update

Just FYI, I figured out the problem. I had set all of the clients up
with their IP address in the "target" field, but apparently the updated
rgmanager nfsclient.sh script now checks /var/lib/nfs/etab and sees
what's in there and does a compare, and etab always has the *hostname*
instead of the ip, so since it didn't match the script was marking it
bad. Kinda stupid way of monitoring if you ask me, why they felt like
this was necessary I have no idea. Just wanted to let anyone know that
may have set their clients up by ip address that the new update is going
to break them.

On Thu, 2008-01-03 at 11:25 -0600, Doug Tucker wrote:
> I have a cluster that has been operational for some time and functioning
> flawlessly until a recent yum update. The last unflawed working kernel
> was 2.6.9-55.0.12.ELsmp. The current kernel is 2.6.9-67ELsmp. The
> problem appears to be some type of infinite recovery loop of sorts. It
> runs find for a few minutes, then the service restarts itself. What I
> am seeing in /var/log/messages is:
>
> Jan 3 11:17:47 engrfs1 clurgmgrd: [5614]: <err> nfsclient:skynet_disted
> is missing!
> Jan 3 11:17:47 engrfs1 clurgmgrd[5614]: <notice> status on
> nfsclient:skynet_disted returned 1
> (generic error)
> Jan 3 11:17:47 engrfs1 bash: [27695]: <info> Removing export:
> 129.119.113.108:/mnt/disted
> Jan 3 11:17:47 engrfs1 bash: [27695]: <info> Adding export:
> 129.119.113.108:/mnt/disted (rw)
>
>
> It does this for every client definition on the service. After it gets
> to the last one, it then restarts the serivce:
>
> Jan 3 11:16:25 engrfs1 clurgmgrd[5614]: <notice> Stopping service
> disted_export
> Jan 3 11:16:26 engrfs1 clurgmgrd: [5614]: <info> Removing IPv4 address
> 129.119.113.180 from et
> h0
> Jan 3 11:16:36 engrfs1 clurgmgrd[5614]: <notice> Service disted_export
> is recovering
> Jan 3 11:16:36 engrfs1 clurgmgrd[5614]: <notice> Recovering failed
> service disted_export
>
> Then adds the exports and starts services again:
>
> Jan 3 11:16:36 engrfs1 clurgmgrd: [5614]: <info> Adding export:
> 129.119.113.108:/mnt/disted (r
> w)
> Jan 3 11:16:36 engrfs1 clurgmgrd: [5614]: <info> Adding IPv4 address
> 129.119.113.180 to eth0
> Jan 3 11:16:37 engrfs1 clurgmgrd[5614]: <notice> Service disted_export
> started
>
> And then starts over at the beginning again continuously. This is a
> production system and this behaviour is causing the clients to hang (of
> course) during the restart. Thanks much for your help!
>
> Sincerely,
>
> Doug Tucker
>
>
>
>
> _______________________________________________
> CentOS mailing list
> CentOS@centos.org
> http://lists.centos.org/mailman/listinfo/centos

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 09:36 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org