FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Cluster Development

 
 
LinkBack Thread Tools
 
Old 05-13-2010, 08:49 AM
Jiaju Zhang
 
Default dlm_controld.pcmk: Fix membership change judging issue

Hi,

This is a fix to the membership judging issue in dlm_controld.pcmk.
Now, dlm_controld.pcmk gets the membership change information from
Pacemaker. Pacemaker get that information from Corosync, which is
good. But when Pacemaker itself gets the membership change info, it
does some internal processing like aligning the node membership as
well as some other node info in the cluster. Before Pacemaker
finished, it won't take the node in question as _active_ member.
Just at that moment, dlm_controld.pcmk also knows the membership
change and goes to read the membership info from Pacemaker. It is a
race condition, because Pacemaker hasn't finished all the jobs in
one membership change, which means not having finished updating all
the info in crm_peer_id_cache, dlm_controld.pcmk read it! So if the
node in question is a joining node, it should be regarded as "Added"
node, but according to current logic, it is not!

Because all the components get the membership info eventually from
Corosync, IMO, for dlm_controld.pcmk, there is no need to wait
Pacemaker/crmd to finish all the information processing related to
membership change.

Patched attached below, any review and comments are highly
appreciated!

Thanks,
Jiaju

Signed-off-by: Jiaju Zhang <jjzhang.linux@gmail.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Andrew Beekhof <andrew@beekhof.net>
---
group/dlm_controld/pacemaker.c | 16 ++++++++++++----
1 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/group/dlm_controld/pacemaker.c b/group/dlm_controld/pacemaker.c
index 3150a1f..9f90d48 100644
--- a/group/dlm_controld/pacemaker.c
+++ b/group/dlm_controld/pacemaker.c
@@ -81,6 +81,7 @@ int setup_cluster(void)
void update_cluster(void)
{
static uint64_t last_membership = 0;
+ ais_dispatch(ais_fd_async, NULL);
cluster_quorate = crm_have_quorum;
if(last_membership < crm_peer_seq) {
log_debug("Processing membership %llu", crm_peer_seq);
@@ -91,7 +92,6 @@ void update_cluster(void)

void process_cluster(int ci)
{
- ais_dispatch(ais_fd_async, NULL);
update_cluster();
}

@@ -102,6 +102,14 @@ void close_cluster(void) {
#include <arpa/inet.h>
#include <corosync/totem/totemip.h>

+static gboolean is_member(const crm_node_t *node)
+{
+ if(node && safe_str_eq(node->state, CRM_NODE_MEMBER))
+ return TRUE;
+
+ return FALSE;
+}
+
void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
{
int rc = 0;
@@ -119,7 +127,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
snprintf(path, PATH_MAX, "%s/%d", COMMS_DIR, node->id);

rc = stat(path, &tmp);
- is_active = crm_is_member_active(node);
+ is_active = is_member(node);

if(rc == 0 && is_active) {
/* nothing to do?
@@ -212,7 +220,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
}

log_debug("%s %sctive node %u '%s': born-on=%llu, last-seen=%llu, this-event=%llu, last-event=%llu",
- action, crm_is_member_active(value)?"a":"ina",
+ action, is_member(value)?"a":"ina",
node->id, node->uname, node->born, node->last_seen,
crm_peer_seq, (unsigned long long)*last);
}
@@ -220,7 +228,7 @@ void dlm_process_node(gpointer key, gpointer value, gpointer user_data)
int is_cluster_member(uint32_t nodeid)
{
crm_node_t *node = crm_get_peer(nodeid, NULL);
- return crm_is_member_active(node);
+ return is_member(node);
}

char *nodeid2name(int nodeid) {
 

Thread Tools




All times are GMT. The time now is 12:32 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org