fence daemon problems
On 10/03/2012 12:55 PM, Dietmar Maurer wrote:
The intention of that is to prevent an inquorate node/partition from killing a
quorate group of nodes that are running normally. e.g. if a 5 node cluster is
partitioned into 2/3 or 1/4. You don't want the 2 or 1 node group to fence
the 3 or 4 nodes that are fine.
sure, I understand that.
The difficult cases, which I think you're seeing, are partitions where no group
has quorum, e.g. 2/2. In this case we do nothing, and the user has to resolve
it by resetting some of the nodes
The problem with that is that those 'difficult' cases are very likely. For example
a switch reboot results in that state if you do not have redundant network (yes,
I know that this setup is simply wrong).
And things get worse, because it is not possible to reboot such nodes, because
rgmanager shutdown simply hangs. Is there any way to avoid that, so that it is at
least possible to reboot those nodes?
Kill rgmanager and/or 'reboot -fn' ?
I thought inquorate reboots worked - please file a bugzilla.