FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Cluster Development

 
 
LinkBack Thread Tools
 
Old 05-13-2010, 10:18 AM
Jiaju Zhang
 
Default dlm_controld.pcmk: Fix membership change judging issue

On Thu, May 13, 2010 at 5:51 PM, Lars Marowsky-Bree <lmb@novell.com> wrote:
> On 2010-05-13T16:49:27, Jiaju Zhang <jjzhang.linux@gmail.com> wrote:
>
> Hi Jiaju,
>
> thanks for the patch and the technical explanation!
>
>> Because all the components get the membership info eventually from
>> Corosync, IMO, for dlm_controld.pcmk, there is no need to wait
>> Pacemaker/crmd to finish all the information processing related to
>> membership change.
>>
>> Patched attached below, any review and comments are highly
>> appreciated!
>
> I'd only like to add the user-visible impact of this - namely, when a
> previously unknown node joins a pcmk cluster (with running DLM etc),
> nodes will hang.
>
> So it is fairly important to get fixed.
>
>
> So, with my limited knowledge of the code, I think the fix is sound. I
> have just one clerical comment:
>
>> *void process_cluster(int ci)
>> *{
>> - * *ais_dispatch(ais_fd_async, NULL);
>> * * *update_cluster();
>> *}
>
> Can this function be removed? It doesn't do anything any more. Or is
> there something that should be done with the "ci" argument?

There is something that should be done with the "ci" argument. In
fact, the "ci" means one slot which holds a socket(most cases) and a
related processing function which is monitored by "poll" in
dlm_controld.
update_cluster is in different calling path to process_cluster.

Many thanks for the review
Jiaju
 
Old 05-13-2010, 04:25 PM
Andrew Beekhof
 
Default dlm_controld.pcmk: Fix membership change judging issue

On Thu, May 13, 2010 at 11:51 AM, Lars Marowsky-Bree <lmb@novell.com> wrote:
> On 2010-05-13T16:49:27, Jiaju Zhang <jjzhang.linux@gmail.com> wrote:
>
> Hi Jiaju,
>
> thanks for the patch and the technical explanation!
>
>> Because all the components get the membership info eventually from
>> Corosync, IMO, for dlm_controld.pcmk, there is no need to wait
>> Pacemaker/crmd to finish all the information processing related to
>> membership change.
>>
>> Patched attached below, any review and comments are highly
>> appreciated!
>
> I'd only like to add the user-visible impact of this - namely, when a
> previously unknown node joins a pcmk cluster (with running DLM etc),
> nodes will hang.
>
> So it is fairly important to get fixed.
>
>
> So, with my limited knowledge of the code, I think the fix is sound. I
> have just one clerical comment:
>
>> *void process_cluster(int ci)
>> *{
>> - * *ais_dispatch(ais_fd_async, NULL);
>> * * *update_cluster();
>> *}
>
> Can this function be removed?

No, it can't.
Remove that and the membership (and crm_peer_id_cache) stops being updated.

Which makes me very suspicious of the whole patch because it clearly
wasn't tested very well.

> It doesn't do anything any more. Or is
> there something that should be done with the "ci" argument?
>
>
> Regards,
> * *Lars
>
> --
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
 
Old 05-13-2010, 08:19 PM
Andrew Beekhof
 
Default dlm_controld.pcmk: Fix membership change judging issue

On Thu, May 13, 2010 at 8:32 PM, Lars Marowsky-Bree <lmb@novell.com> wrote:
> On 2010-05-13T18:25:42, Andrew Beekhof <andrew@beekhof.net> wrote:
>
>> >> *void process_cluster(int ci)
>> >> *{
>> >> - * *ais_dispatch(ais_fd_async, NULL);
>> >> * * *update_cluster();
>> >> *}
>> >
>> > Can this function be removed?
>>
>> No, it can't.
>> Remove that and the membership (and crm_peer_id_cache) stops being updated.
>>
>> Which makes me very suspicious of the whole patch because it clearly
>> wasn't tested very well.
>
> ais_dispatch() was moved to update_cluster(). That's safe, it seems.

But pointless and unrelated to the problem, so why include it?

> My question related to the detail that now process_cluster() is
> identical to update_cluster(), at initial reading suggesting the
> function might be redundant now.

process_cluster() and update_cluster() are both API entry points.


Does the behavior still occur with pacemaker 1.1.2?
 
Old 05-14-2010, 04:08 AM
Jiaju Zhang
 
Default dlm_controld.pcmk: Fix membership change judging issue

On Fri, May 14, 2010 at 12:25 AM, Andrew Beekhof <andrew@beekhof.net> wrote:
> On Thu, May 13, 2010 at 11:51 AM, Lars Marowsky-Bree <lmb@novell.com> wrote:
>> On 2010-05-13T16:49:27, Jiaju Zhang <jjzhang.linux@gmail.com> wrote:
>>
>> Hi Jiaju,
>>
>> thanks for the patch and the technical explanation!
>>
>>> Because all the components get the membership info eventually from
>>> Corosync, IMO, for dlm_controld.pcmk, there is no need to wait
>>> Pacemaker/crmd to finish all the information processing related to
>>> membership change.
>>>
>>> Patched attached below, any review and comments are highly
>>> appreciated!
>>
>> I'd only like to add the user-visible impact of this - namely, when a
>> previously unknown node joins a pcmk cluster (with running DLM etc),
>> nodes will hang.
>>
>> So it is fairly important to get fixed.
>>
>>
>> So, with my limited knowledge of the code, I think the fix is sound. I
>> have just one clerical comment:
>>
>>> *void process_cluster(int ci)
>>> *{
>>> - * *ais_dispatch(ais_fd_async, NULL);
>>> * * *update_cluster();
>>> *}
>>
>> Can this function be removed?
>
> No, it can't.
> Remove that and the membership (and crm_peer_id_cache) stops being updated.

Agreed.

Both update_cluster and process_cluster can't be removed. Yes, moving
the place of ais_dispatch should have nothing to do with this issue.
Oh, this is because I added it in the debugging stage and forgot to
remove it in the final version, just sending the working version to
the list, sorry, I'll posted an updated patch soon ;-)

> Which makes me very suspicious of the whole patch because it clearly
> wasn't tested very well.

I have tested in my environment and it works for me.

Thanks,
Jiaju
 
Old 05-14-2010, 09:52 AM
Andrew Beekhof
 
Default dlm_controld.pcmk: Fix membership change judging issue

On Thu, May 13, 2010 at 10:36 PM, Lars Marowsky-Bree <lmb@novell.com> wrote:
> On 2010-05-13T22:19:58, Andrew Beekhof <andrew@beekhof.net> wrote:
>
>> > ais_dispatch() was moved to update_cluster(). That's safe, it seems.
>> But pointless and unrelated to the problem, so why include it?
>
> I took it that JJ thought this was part of the solution.

ci == ais_async_fd, if there was something to read we should have
already had a callback.

>
>> Does the behavior still occur with pacemaker 1.1.2?
>
> Yes. Which fix do you think would change this?

These two that I mentioned last week:
http://hg.clusterlabs.org/pacemaker/1.1/rev/34ae9f7b4675
http://hg.clusterlabs.org/pacemaker/1.1/rev/19fdc0a3885a

I really doubt the patch is the right fix, conceptually, the node
can't be a member until the crmd process is running and done a whole
heap of processing.
So CRM_NODE_MEMBER without crm_proc_crmd makes no sense.

More likely, as-in the patches above, the process list wasn't being
updated correctly and thats what needs to get fixed.
It just never mattered before.

>
> JJ's patch has been tested and seems to work in general as well as
> address the problem.
>
>
> Regards,
> * *Lars
>
> --
> Architect Storage/HA, OPS Engineering, Novell, Inc.
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>
>
 
Old 05-14-2010, 10:15 AM
Andrew Beekhof
 
Default dlm_controld.pcmk: Fix membership change judging issue

On Fri, May 14, 2010 at 5:04 AM, Tim Serong <tserong@novell.com> wrote:
> On 5/14/2010 at 06:19 AM, Andrew Beekhof <andrew@beekhof.net> wrote:
>>
>> Does the behavior still occur with pacemaker 1.1.2?
>>
>
> Yes.
>
> For the record, the most minimal testcase I've managed for this
> so far is as follows (substitute "/etc/init.d/corosync start" or
> whatever for "rcopenais start" if you're not on something SUSE-based):
>
> 1) Configure corosync/openais on two nodes.
> * Do not start the cluster yet.
>
> 2) On one node:
>
> * * # rm /var/lib/heartbeat/crm/*
> * * # rcopenais start
> * * # while ! crm_mon -1 | grep -qi online; do
> * * * * echo -n "." ; sleep 5 ; done
>
> 3) Now we have one node online, configure Pacemaker:
>
> * * # cat <<CONF | crm configure
> * * primitive dlm ocfacemaker:controld
> * * primitive clvm ocf:lvm2:clvmd
> * * group g dlm clvm
> * * clone c g meta interleave="true"
> * * property stonith-enabled="false"
> * * property no-quorum-policy="ignore"
> * * commit
> * * CONF
>
> * Watch "crm_mon -r" until that clone comes online.
> * Should only take a few seconds.
>
> 4) On the other node:
>
> * * # rm /var/lib/heartbeat/crm/*
> * * # rcopenais start
>
> The first node will now either wedge up spectacularly, and/or
> dlm_recoverd and clvmd will be stuck in D state on both nodes.

Presumably each thinks the other node isn't a member?
Perhaps something like this will help:

diff -r b59c27dc114a lib/ais/plugin.c
--- a/lib/ais/plugin.c Wed May 12 10:51:56 2010 +0200
+++ b/lib/ais/plugin.c Fri May 14 12:12:33 2010 +0200
@@ -498,9 +498,8 @@ static void *pcmk_wait_dispatch (void *a
ais_notice("Respawning failed child process: %s",
pcmk_children[lpc].name);
spawn_child(&(pcmk_children[lpc]));
- } else {
- send_cluster_id();
}
+ send_cluster_id();
}
}
sched_yield ();
@@ -661,6 +660,7 @@ int pcmk_startup(struct corosync_api_v1
}
}
}
+ send_cluster_id();

return 0;
}
 
Old 05-14-2010, 11:25 AM
Jiaju Zhang
 
Default dlm_controld.pcmk: Fix membership change judging issue

On Fri, May 14, 2010 at 5:52 PM, Andrew Beekhof <andrew@beekhof.net> wrote:
> On Thu, May 13, 2010 at 10:36 PM, Lars Marowsky-Bree <lmb@novell.com> wrote:
>> On 2010-05-13T22:19:58, Andrew Beekhof <andrew@beekhof.net> wrote:
>>
>>> > ais_dispatch() was moved to update_cluster(). That's safe, it seems.
>>> But pointless and unrelated to the problem, so why include it?
>>
>> I took it that JJ thought this was part of the solution.
>
> ci == ais_async_fd, *if there was something to read we should have
> already had a callback.
>
>>
>>> Does the behavior still occur with pacemaker 1.1.2?
>>
>> Yes. Which fix do you think would change this?
>
> These two that I mentioned last week:
> *http://hg.clusterlabs.org/pacemaker/1.1/rev/34ae9f7b4675
> *http://hg.clusterlabs.org/pacemaker/1.1/rev/19fdc0a3885a

I've confirmed that my pacemaker has the above patch and it doesn't work.

>
> I really doubt the patch is the right fix, conceptually, the node
> can't be a member until the crmd process is running and done a whole
> heap of processing.
> So CRM_NODE_MEMBER without crm_proc_crmd makes no sense.
>
> More likely, as-in the patches above, the process list wasn't being
> updated correctly and thats what needs to get fixed.
> It just never mattered before.

This is the where the race condition happens. CRM_NODE_MEMBER was set
from the local node, but crm_proc_crmd was updated only when the peer
node sending the local node this info. When CRM_NODE_MEMBER was
updated but crm_proc_crmd was not updated yet, dlm_controld went to
read the crm_peer_id_cache, this bug happened.

Yes, Updating CRM_NODE_MEMBER and updating crm_proc_crmd should be an
_atomic_ operation, before that has finished, dlm_controld shoud not
read the membership info. So in fact, I have been thinking about
three solutions to this problem:

1) After updating (CRM_NODE_MEMBER && crm_proc_crmd) finished, notify
dlm_controld. But I'm not sure if we can achieve this goal by doing
some work around ais_async_fd. Maybe need to dig into the code
further.
2) maybe add another flag to do some syncing between Pacemaker and
dlm_controld, it might like what dlm_controld and fs_controld has
done, but it may need much more work and a little complicated.
3) As the patch has done, only see CRM_NODE_MEMBER but not see
crm_proc_crmd. It seems have risk?(open disscussion). But after some
consideration, I think this should be OK. Because we eventually get
the membership info from Corosync, even we haven't recevied the
crm_proc_crmd info this moment, we can receive that info at once. If
some bad thing happened and the peer node was unfortunately down, we
can still received that info from Corosync. Also, the time we set the
membership info to DLM is just the node membership info, we don't need
to care if crmd is really started at this moment. The membership info
also is udpated at many other places when we do some work about
lockspace. So it should be safe as my think.

Thanks,
Jiaju
 
Old 05-14-2010, 11:28 AM
Jiaju Zhang
 
Default dlm_controld.pcmk: Fix membership change judging issue

I'll test the patch later on. But I may not finished this testing by
today because
I have some problem to access the hardwre at this moment.

Thanks a lot ;-)
Jiaju

On Fri, May 14, 2010 at 6:15 PM, Andrew Beekhof <andrew@beekhof.net> wrote:
> On Fri, May 14, 2010 at 5:04 AM, Tim Serong <tserong@novell.com> wrote:
>> On 5/14/2010 at 06:19 AM, Andrew Beekhof <andrew@beekhof.net> wrote:
>>>
>>> Does the behavior still occur with pacemaker 1.1.2?
>>>
>>
>> Yes.
>>
>> For the record, the most minimal testcase I've managed for this
>> so far is as follows (substitute "/etc/init.d/corosync start" or
>> whatever for "rcopenais start" if you're not on something SUSE-based):
>>
>> 1) Configure corosync/openais on two nodes.
>> * Do not start the cluster yet.
>>
>> 2) On one node:
>>
>> * * # rm /var/lib/heartbeat/crm/*
>> * * # rcopenais start
>> * * # while ! crm_mon -1 | grep -qi online; do
>> * * * * echo -n "." ; sleep 5 ; done
>>
>> 3) Now we have one node online, configure Pacemaker:
>>
>> * * # cat <<CONF | crm configure
>> * * primitive dlm ocfacemaker:controld
>> * * primitive clvm ocf:lvm2:clvmd
>> * * group g dlm clvm
>> * * clone c g meta interleave="true"
>> * * property stonith-enabled="false"
>> * * property no-quorum-policy="ignore"
>> * * commit
>> * * CONF
>>
>> * Watch "crm_mon -r" until that clone comes online.
>> * Should only take a few seconds.
>>
>> 4) On the other node:
>>
>> * * # rm /var/lib/heartbeat/crm/*
>> * * # rcopenais start
>>
>> The first node will now either wedge up spectacularly, and/or
>> dlm_recoverd and clvmd will be stuck in D state on both nodes.
>
> Presumably each thinks the other node isn't a member?
> Perhaps something like this will help:
>
> diff -r b59c27dc114a lib/ais/plugin.c
> --- a/lib/ais/plugin.c *Wed May 12 10:51:56 2010 +0200
> +++ b/lib/ais/plugin.c *Fri May 14 12:12:33 2010 +0200
> @@ -498,9 +498,8 @@ static void *pcmk_wait_dispatch (void *a
> * * * * * * * * * *ais_notice("Respawning failed child process: %s",
> * * * * * * * * * * * * * * * pcmk_children[lpc].name);
> * * * * * * * * * *spawn_child(&(pcmk_children[lpc]));
> - * * * * * * * } else {
> - * * * * * * * * * send_cluster_id();
> * * * * * * * *}
> + * * * * * * * send_cluster_id();
> * * * * * *}
> * * * *}
> * * * *sched_yield ();
> @@ -661,6 +660,7 @@ int pcmk_startup(struct corosync_api_v1
> * * * * * *}
> * * * *}
> * * }
> + * *send_cluster_id();
>
> * * return 0;
> *}
>
>
 

Thread Tools




All times are GMT. The time now is 08:21 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org