FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Cluster Development

 
 
LinkBack Thread Tools
 
Old 11-06-2009, 04:27 PM
David Teigland
 
Default unfence during startup

The current init.d/cman startup sequence is:

start_cman
unfence_self
start_qdiskd
wait_for_quorum
start_fenced
start_dlm_controld
start_gfs_controld
join_fence_domain

I believe the reason we put unfence between cman and qdisk was in case the
qdisk was on a fenced device. But, I'd forgotten about the more critical
case where someone runs 'service cman start' on a node after it has been
kicked out of the cluster and has been fenced (via fence_scsi). This is
not too uncommon for someone to try -- they think they can just restart
the cluster on the node without first rebooting. We go to a lot of
trouble in fenced and other daemons to recognize when someone does that
and shut things down again before getting far enough to corrupt storage.

Obviously, unfencing right at the beginning undercuts all those checks and
precautions, and could easily lead to corrupt storage. So, we need to
move unfence to just before the join_fence_domain step. Requiring a qdisk
to use a disk not subject to fencing shouldn't be too onerous?

Dave
 
Old 11-12-2009, 04:50 PM
"Lon H. Hohberger"
 
Default unfence during startup

On Fri, 2009-11-06 at 11:27 -0600, David Teigland wrote:
> The current init.d/cman startup sequence is:
>
> start_cman
> unfence_self
> start_qdiskd
> wait_for_quorum
> start_fenced
> start_dlm_controld
> start_gfs_controld
> join_fence_domain
>
> I believe the reason we put unfence between cman and qdisk was in case the
> qdisk was on a fenced device. But, I'd forgotten about the more critical
> case where someone runs 'service cman start' on a node after it has been
> kicked out of the cluster and has been fenced (via fence_scsi). This is
> not too uncommon for someone to try -- they think they can just restart
> the cluster on the node without first rebooting. We go to a lot of
> trouble in fenced and other daemons to recognize when someone does that
> and shut things down again before getting far enough to corrupt storage.
>
> Obviously, unfencing right at the beginning undercuts all those checks and
> precautions, and could easily lead to corrupt storage. So, we need to
> move unfence to just before the join_fence_domain step. Requiring a qdisk
> to use a disk not subject to fencing shouldn't be too onerous?

It shouldn't matter -- it's what we require today with fence_scsi.

Alternatively, we can make qdiskd check for this sort of thing as well.
It might be more trouble than it's worth, but qdiskd already has a
'stop_cman' flag which will kill cman if qdiskd detects a critical error
(e.g. trying to rejoin a cluster...)

-- Lon
 

Thread Tools




All times are GMT. The time now is 04:38 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org