Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Cluster Development (http://www.linux-archive.org/cluster-development/)
-   -   Cluster 3.0.0 blocker list updated (action required) (http://www.linux-archive.org/cluster-development/253636-cluster-3-0-0-blocker-list-updated-action-required.html)

"Fabio M. Di Nitto" 02-27-2009 09:04 AM

Cluster 3.0.0 blocker list updated (action required)
 
Hi all,

http://sources.redhat.com/cluster/wiki/Cluster3Blockers

has been updated today. I am absolutely glad to see that a lot of issues
are being addressed quickly.

with recent changes to corosync IPC interface, a new serious problem has
been discovered. This require effort from everybody to be addressed
properly. AFAICT the issue has always been there (or at least for a long
time) but the effect was never seen up to 2 days ago.

This is how to reproduce the problem:

# ipcs
# start cman
# ipcs
# start groupd
# ipcs (note the semaphores)
# stop groupd
# ipcs (semaphores are gone)
# start groupd
# start dlm_controld...

etc.

basically a full start/stop operation of all daemons will leak shared
semaphores in the system.

According to limits.h each system has only 128 semaphores available and
in order to clean them up manual intervention is required. Either by
meaning of reboot or ipcrm.

Our code probably doesn't clean up all the connections on shutdown
operations. The open connections (11) during runtime seem to correct.
We usually have 6 daemons running (2 connections each, one cman, one
ccs) and I know one of the daemons (cmannotifyd) doesn't hold a
connection with ccs unless required) so the number would match my check.

I personally fixed libccs this morning as it was leaking connections in
some error conditions, but apparently that was not enough.

Please make sure to check your code asap.

Fabio


All times are GMT. The time now is 04:32 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.