1349248061 receive_protocol from 2 max 1.1.1.0 run 1.1.1.1
1349248061 daemon node 2 max 1.1.1.0 run 1.1.1.0
1349248061 daemon node 2 join 1349247552 left 0 local quorum 1349248061
1349248061 receive_protocol from 2 max 1.1.1.0 run 1.1.1.1
1349248061 daemon node 2 max 1.1.1.0 run 1.1.1.1
1349248061 daemon node 2 join 1349247552 left 0 local quorum 1349248061
10-03-2012, 09:25 AM
Dietmar Maurer
fence daemon problems
> I observe strange problems with fencing when a cluster loose quorum for a
> short time.
>
> After regain quorum, fenced reports 'wait state** messages', and whole
> cluster is blocked waiting for fenced.
Just found the following in fenced/cpg.c:
/* This is how we deal with cpg's that are partitioned and
then merge back together. When the merge happens, the
cpg on each side will see nodes from the other side being
added, and neither side will have zero started_count. So,
both sides will ignore start messages from the other side.
This causes the the domain on each side to continue waiting
for the missing start messages indefinately. To unblock
things, all nodes from one side of the former partition
need to fail. */
So the observed behavior is expected?
10-03-2012, 02:46 PM
David Teigland
fence daemon problems
On Wed, Oct 03, 2012 at 09:25:08AM +0000, Dietmar Maurer wrote:
> So the observed behavior is expected?
Yes, it's a stateful partition merge, and I think /var/log/messages should
have mentioned something about that. When a node is partitioned from the
others (e.g. network disconnected), it has to be cleanly reset before it's
allowed back. "cleanly reset" typically means rebooted. If it comes back
without being reset (e.g. network reconnected), then the others ignore it,
which is what you saw.
10-03-2012, 04:08 PM
Dietmar Maurer
fence daemon problems
> Subject: Re: [Cluster-devel] fence daemon problems
>
> On Wed, Oct 03, 2012 at 09:25:08AM +0000, Dietmar Maurer wrote:
> > So the observed behavior is expected?
>
> Yes, it's a stateful partition merge, and I think /var/log/messages should have
> mentioned something about that.
What message should I look for?
10-03-2012, 04:12 PM
Dietmar Maurer
fence daemon problems
> Yes, it's a stateful partition merge, and I think /var/log/messages should have
> mentioned something about that. When a node is partitioned from the
> others (e.g. network disconnected), it has to be cleanly reset before it's
> allowed back. "cleanly reset" typically means rebooted. If it comes back
> without being reset (e.g. network reconnected), then the others ignore it,
> which is what you saw.
I don't really understand why 'dlm_controld' initiates fencing, although the
node does not has quorum?
I thought 'dlm_controld' should wait until cluster is quorate before starting fence actions?
10-03-2012, 04:24 PM
David Teigland
fence daemon problems
On Wed, Oct 03, 2012 at 04:12:10PM +0000, Dietmar Maurer wrote:
> > Yes, it's a stateful partition merge, and I think /var/log/messages should have
> > mentioned something about that. When a node is partitioned from the
> > others (e.g. network disconnected), it has to be cleanly reset before it's
> > allowed back. "cleanly reset" typically means rebooted. If it comes back
> > without being reset (e.g. network reconnected), then the others ignore it,
> > which is what you saw.
> What message should I look for?
I was wrong, I was thinking about the "daemon node %d stateful merge"
messages which are debug, but should probably be changed to error.
> I don't really understand why 'dlm_controld' initiates fencing, although
> the node does not has quorum?
>
> I thought 'dlm_controld' should wait until cluster is quorate before
> starting fence actions?
I guess you're talking about the dlm_tool ls output? The "fencing" there
means it is waiting for fenced to finish fencing before it starts dlm
recovery. fenced waits for quorum.
hp2:~# dlm_tool ls
dlm lockspaces
name rgmanager
id 0x5231f3eb
flags 0x00000004 kern_stop
change member 3 joined 1 remove 0 failed 0 seq 2,2
members 2 3 4
new change member 2 joined 0 remove 1 failed 1 seq 3,3
new status wait_messages 0 wait_condition 1 fencing
new members 3 4
10-03-2012, 04:26 PM
Dietmar Maurer
fence daemon problems
> I guess you're talking about the dlm_tool ls output?
Yes.
> The "fencing" there
> means it is waiting for fenced to finish fencing before it starts dlm recovery.
> fenced waits for quorum.
So who actually starts fencing when cluster is not quorate? rgmanager?
10-03-2012, 04:44 PM
David Teigland
fence daemon problems
On Wed, Oct 03, 2012 at 04:26:35PM +0000, Dietmar Maurer wrote:
> > I guess you're talking about the dlm_tool ls output?
>
> Yes.
>
> > The "fencing" there
> > means it is waiting for fenced to finish fencing before it starts dlm recovery.
> > fenced waits for quorum.
>
> So who actually starts fencing when cluster is not quorate? rgmanager?
fenced always starts fencing, but it waits for quorum first. In other
words, if your cluster looses quorum, nothing happens, not even fencing.
The intention of that is to prevent an inquorate node/partition from
killing a quorate group of nodes that are running normally. e.g. if a 5
node cluster is partitioned into 2/3 or 1/4. You don't want the 2 or 1
node group to fence the 3 or 4 nodes that are fine.
The difficult cases, which I think you're seeing, are partitions where no
group has quorum, e.g. 2/2. In this case we do nothing, and the user has
to resolve it by resetting some of the nodes. You might be able to assign
different numbers of votes to reduce the likelihood of everyone loosing
quorum.
10-03-2012, 04:55 PM
Dietmar Maurer
fence daemon problems
> The intention of that is to prevent an inquorate node/partition from killing a
> quorate group of nodes that are running normally. e.g. if a 5 node cluster is
> partitioned into 2/3 or 1/4. You don't want the 2 or 1 node group to fence
> the 3 or 4 nodes that are fine.
sure, I understand that.
> The difficult cases, which I think you're seeing, are partitions where no group
> has quorum, e.g. 2/2. In this case we do nothing, and the user has to resolve
> it by resetting some of the nodes
The problem with that is that those 'difficult' cases are very likely. For example
a switch reboot results in that state if you do not have redundant network (yes,
I know that this setup is simply wrong).
And things get worse, because it is not possible to reboot such nodes, because
rgmanager shutdown simply hangs. Is there any way to avoid that, so that it is at
least possible to reboot those nodes?
10-03-2012, 05:10 PM
David Teigland
fence daemon problems
On Wed, Oct 03, 2012 at 04:55:55PM +0000, Dietmar Maurer wrote:
> > The difficult cases, which I think you're seeing, are partitions where
> > no group has quorum, e.g. 2/2. In this case we do nothing, and the
> > user has to resolve it by resetting some of the nodes
>
> The problem with that is that those 'difficult' cases are very likely.
> For example a switch reboot results in that state if you do not have
> redundant network (yes, I know that this setup is simply wrong).
>
> And things get worse, because it is not possible to reboot such nodes,
> because rgmanager shutdown simply hangs. Is there any way to avoid that,
> so that it is at least possible to reboot those nodes?
Fabio's checkquorum script will reboot nodes that loose quorum.