FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Ubuntu > Ubuntu Server Development

 
 
LinkBack Thread Tools
 
Old 09-04-2008, 09:55 PM
Chris Joelly
 
Default Red Hat Cluster Suite / Handling of shutdown and mounting of GFS

Hello,

i try to get RHCS up and running and have some success so far.

The cluster with 2 nodes is running, but i don't know how to remove one
node the correct way. I can move the active service (an IP address by
now) to the second node and then want to remove the other node from the
running cluster. cman_tool leave remove should be used for this which is
recommended on the RH documentation. But if i try that i get the error
message:

root@store02:/etc/cluster# cman_tool leave remove
cman_tool: Error leaving cluster: Device or resource busy

I cannot figure out which device is busy so that the node is not able
to leave the cluster. The service (IP address) moved to the other node
correctly as i can see using clustat ...

The only way to get out of this problem is to restart the whole cluster
which brings down the service(s) and results in unnecessary fencing...
Is there a known way to remove one node from the cluster without
bringing down the whole cluster?

Another strange thing comes up when i try to use GFS:

i have configured DRBD on a backing HW Raid10 device, use LVM2 to build
a clusteraware VG, and on top of that use LVs and GFS across the two
cluster nodes.

Using the GFS filesystems without noauto in fstab doesn't mount the
filesystems on boot using /etc/init.d/gfs-tools. I think this is due to
the ordering the sysv init scripts are started. All RHCS stuff is
started from within rcS, and drbd is startet from within rc2. I read the
section of the debian-policy to figure out if rcS is meant to run before
rc2, but this isn't mentioned in the policy. So i assume that drbd is
started in rc2 after rcS, which would mean that every filesystem on top
of drbd is not able to mount on boot time...
Can anybody prove this?

The reason why i try to mount a GFS filesystem at boottime is that i
want to build cluster services on top of it, and that services (more
than one) are relying on one fs. A better solution would be to define
a shared GFS filesystem resource which could be used across more than
one cluster services, but the cluster take care that the filesystem is
only mounted once...
Can this be achieved with RHCS?

thanks for any advice ...

Ubuntu 8.04 LTS 64bit
cluster.conf attached!

<?xml version="1.0"?>
<cluster alias="store" config_version="47" name="store">
<fence_daemon post_fail_delay="0" post_join_delay="60"/>
<clusternodes>
<clusternode name="store01" nodeid="1" votes="1">
<fence>
<method name="1">
<device name="ap7922" port="1"/>
</method>
</fence>
</clusternode>
<clusternode name="store02" nodeid="2" votes="1">
<fence>
<method name="1">
<device name="ap7922" port="9"/>
</method>
</fence>
</clusternode>
</clusternodes>
<cman expected_votes="1" two_node="1"/>
<fencedevices>
<fencedevice agent="fence_apc" ipaddr="192.168.2.10" login="***" name="ap7922" passwd="***"/>
</fencedevices>
<rm>
<failoverdomains>
<failoverdomain name="ip-fail2node2" ordered="1" restricted="0">
<failoverdomainnode name="store01" priority="1"/>
</failoverdomain>
<failoverdomain name="ip-fail2node1" ordered="1">
<failoverdomainnode name="store02" priority="1"/>
</failoverdomain>
</failoverdomains>
<resources>
<ip address="192.168.2.20" monitor_link="1"/>
<ip address="192.168.2.23" monitor_link="1"/>
</resources>
<service autostart="1" domain="ip-fail2node1" name="store02-ip" recovery="restart">
<ip ref="192.168.2.23"/>
</service>
<service autostart="1" domain="ip-fail2node2" name="store01-ip" recovery="restart">
<ip ref="192.168.2.20"/>
</service>
</rm>
</cluster>
--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 09-05-2008, 08:18 AM
Ante Karamatic
 
Default Red Hat Cluster Suite / Handling of shutdown and mounting of GFS

On Thu, 4 Sep 2008 23:55:47 +0200
Chris Joelly <chris-m-lists@joelly.net> wrote:

> The cluster with 2 nodes is running, but i don't know how to remove
> one node the correct way. I can move the active service (an IP
> address by now) to the second node and then want to remove the other
> node from the running cluster. cman_tool leave remove should be used
> for this which is recommended on the RH documentation. But if i try
> that i get the error message:
>
> root@store02:/etc/cluster# cman_tool leave remove
> cman_tool: Error leaving cluster: Device or resource busy

Two node cluster in RHCS is a special case and people should avoid it.
I have one cluster with two nodes and I just hate it. Splitbrain is
very common (and only possible) in two node cluster.

> The only way to get out of this problem is to restart the whole
> cluster which brings down the service(s) and results in unnecessary
> fencing... Is there a known way to remove one node from the cluster
> without bringing down the whole cluster?

I've managed to bring one down, but as soon as it's up, whole rhcs get
unusable. Reboot helps :/

> Another strange thing comes up when i try to use GFS:
>
> i have configured DRBD on a backing HW Raid10 device, use LVM2 to
> build a clusteraware VG, and on top of that use LVs and GFS across
> the two cluster nodes.
>
> Using the GFS filesystems without noauto in fstab doesn't mount the
> filesystems on boot using /etc/init.d/gfs-tools. I think this is due
> to the ordering the sysv init scripts are started. All RHCS stuff is
> started from within rcS, and drbd is startet from within rc2. I read
> the section of the debian-policy to figure out if rcS is meant to run
> before rc2, but this isn't mentioned in the policy. So i assume that
> drbd is started in rc2 after rcS, which would mean that every
> filesystem on top of drbd is not able to mount on boot time...
> Can anybody prove this?

I also use GFS on top of DRBD, and your observations are correct. But,
you really don't want DRBD started before GFS If there's a filesytem
on DRBD, DRBD client must be primary before you try to mount
filesystem. If this drbd client was out of sync for a long time,
becoming primary can take a while.

This is why I don't set up nodes to boot up automaticaly. I'd rather
connect to awaken node, start drbd sync and then manually mount
filesystem and start rhcs.

These things should be easier once we put upstart in use.

> The reason why i try to mount a GFS filesystem at boottime is that i
> want to build cluster services on top of it, and that services (more
> than one) are relying on one fs. A better solution would be to define
> a shared GFS filesystem resource which could be used across more than
> one cluster services, but the cluster take care that the filesystem is
> only mounted once...
> Can this be achieved with RHCS?

You can have same filesystem mounted on both nodes at the same time.
DRBD primary-primary + GFS on top of it.

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 09-05-2008, 10:51 AM
Chris Joelly
 
Default Red Hat Cluster Suite / Handling of shutdown and mounting of GFS

On Fre, Sep 05, 2008, Ante Karamatic wrote:
> On Thu, 4 Sep 2008 23:55:47 +0200
> Chris Joelly <chris-m-lists@joelly.net> wrote:
>
> > The cluster with 2 nodes is running, but i don't know how to remove
> > one node the correct way. I can move the active service (an IP
> > address by now) to the second node and then want to remove the other
> > node from the running cluster. cman_tool leave remove should be used
> > for this which is recommended on the RH documentation. But if i try
> > that i get the error message:
> >
> > root@store02:/etc/cluster# cman_tool leave remove
> > cman_tool: Error leaving cluster: Device or resource busy
>
> Two node cluster in RHCS is a special case and people should avoid it.
> I have one cluster with two nodes and I just hate it. Splitbrain is
> very common (and only possible) in two node cluster.

ack. But the error i get has nothing to do with split brain. And i'm
trying to figure out what device rhcs use and thus cannot remove the
node. The service which where hosted on store02 was successfully moved
to store01, so this must be an mistake from cman_tool? But how can i
find this device? Using strace and lsof i was not able to track it down
:-/

> > The only way to get out of this problem is to restart the whole
> > cluster which brings down the service(s) and results in unnecessary
> > fencing... Is there a known way to remove one node from the cluster
> > without bringing down the whole cluster?
>
> I've managed to bring one down, but as soon as it's up, whole rhcs get
> unusable. Reboot helps :/

Which means that the whole rhcs stuff is rather useless? Or may i assume
that the rhcs stuff in RH, CentOS is much better integrated and tested
than in Ubuntu server? And therefore it's worth the subscription costs
at RH or the switch to CentOS?

> > Another strange thing comes up when i try to use GFS:
> >
> > i have configured DRBD on a backing HW Raid10 device, use LVM2 to
> > build a clusteraware VG, and on top of that use LVs and GFS across
> > the two cluster nodes.
> >
> > Using the GFS filesystems without noauto in fstab doesn't mount the
> > filesystems on boot using /etc/init.d/gfs-tools. I think this is due
> > to the ordering the sysv init scripts are started. All RHCS stuff is
> > started from within rcS, and drbd is startet from within rc2. I read
> > the section of the debian-policy to figure out if rcS is meant to run
> > before rc2, but this isn't mentioned in the policy. So i assume that
> > drbd is started in rc2 after rcS, which would mean that every
> > filesystem on top of drbd is not able to mount on boot time...
> > Can anybody prove this?
>
> I also use GFS on top of DRBD, and your observations are correct. But,
> you really don't want DRBD started before GFS If there's a filesytem
> on DRBD, DRBD client must be primary before you try to mount
> filesystem. If this drbd client was out of sync for a long time,
> becoming primary can take a while.
>
> This is why I don't set up nodes to boot up automaticaly. I'd rather
> connect to awaken node, start drbd sync and then manually mount
> filesystem and start rhcs.

ack. i'm glad to see that my conclusions are not far from reality

> These things should be easier once we put upstart in use.

upstart? aha. sounds interesting... never heard of this before.

> > The reason why i try to mount a GFS filesystem at boottime is that i
> > want to build cluster services on top of it, and that services (more
> > than one) are relying on one fs. A better solution would be to define
> > a shared GFS filesystem resource which could be used across more than
> > one cluster services, but the cluster take care that the filesystem is
> > only mounted once...
> > Can this be achieved with RHCS?
>
> You can have same filesystem mounted on both nodes at the same time.
> DRBD primary-primary + GFS on top of it.

this is the way i use DRBD-LVM2-GFS on my 2-node cluster. But as i
understand cluster.conf and system-config-cluster i have to define
resources for a service. If e.g. i want to create 2 services which both
rely on the same GFS mount and are expected to run on the same node, then
i don't know how to share this GFS resource. Does the resource manager
take care if the GFS resource is already mounted when starting service1
on node1 when he decides to bring up service2 on node1 too?
Or e.g. i setup the cluster so that each node is the fail over node for
the other, and the services have a GFS resource defined which would
cause an GFS mount which is already there on the fail over node? Or would
that 'double' mount trigger an failed start of the failed over service?

Chris

--
"The greatest proof that intelligent life other that humans exists in
the universe is that none of it has tried to contact us!"


--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 09-05-2008, 01:36 PM
Ante Karamatic
 
Default Red Hat Cluster Suite / Handling of shutdown and mounting of GFS

On Fri, 5 Sep 2008 12:51:42 +0200
Chris Joelly <chris-m-lists@joelly.net> wrote:

> ack. But the error i get has nothing to do with split brain. And i'm
> trying to figure out what device rhcs use and thus cannot remove the
> node. The service which where hosted on store02 was successfully moved
> to store01, so this must be an mistake from cman_tool? But how can i
> find this device? Using strace and lsof i was not able to track it
> down :-/

Moving services isn't an issue here (you could remove all services from
node with /etc/init.d/rgmanager stop). This problem is related with
cluster membership. I don't know exactly where the problem is (I'm
just a user, not developer .

I'll repeat once more, having only two nodes in cluster is worst
possible scenario for RHCS.

> Which means that the whole rhcs stuff is rather useless? Or may i
> assume that the rhcs stuff in RH, CentOS is much better integrated
> and tested than in Ubuntu server? And therefore it's worth the
> subscription costs at RH or the switch to CentOS?

I wouldn't use it on two-node cluster if I really don't have to (but I
do in one case), but it's far away from useless. It's great The same
problem exist on all distributions (FWIW my crappy two node cluster is
on RedHat and all others are on Ubuntu).

> > These things should be easier once we put upstart in use.
>
> upstart? aha. sounds interesting... never heard of this before.

upstart is replacement for the oldest part in Unix - SysV init
scripts Check it out:

http://upstart.ubuntu.com

Best thing since sliced bread. Really.

> this is the way i use DRBD-LVM2-GFS on my 2-node cluster. But as i
> understand cluster.conf and system-config-cluster i have to define
> resources for a service. If e.g. i want to create 2 services which
> both rely on the same GFS mount and are expected to run on the same
> node, then i don't know how to share this GFS resource. Does the
> resource manager take care if the GFS resource is already mounted
> when starting service1 on node1 when he decides to bring up service2
> on node1 too? Or e.g. i setup the cluster so that each node is the
> fail over node for the other, and the services have a GFS resource
> defined which would cause an GFS mount which is already there on the
> fail over node? Or would that 'double' mount trigger an failed start
> of the failed over service?

Since RHCS isn't aware of DRBD, you can't really rely on it to handle
GFS mount. This is why I don't manage GFS mounts with RHCS. I rather
mount GFS on both machines and then let the services read it when they
need to. For example:

If I have two apache nodes, then I mount /var/www as GFS on both
(underneath this GFS is a DRBD device with both nodes in
primary-primary). As soon as first node dies, service is started on the
other node. RHCS doesn't manage my /var/www mount.

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam

Fri Sep 5 17:30:01 2008
Return-path: <ubuntu-desktop-bounces@lists.ubuntu.com>
Envelope-to: tom@linux-archive.org
Delivery-date: Fri, 05 Sep 2008 16:38:04 +0300
Received: from chlorine.canonical.com ([91.189.94.204])
by s2.java-tips.org with esmtp (Exim 4.69)
(envelope-from <ubuntu-desktop-bounces@lists.ubuntu.com>)
id 1KbbVj-0000Ud-PF
for tom@linux-archive.org; Fri, 05 Sep 2008 16:38:04 +0300
Received: from localhost ([127.0.0.1] helo=chlorine.canonical.com)
by chlorine.canonical.com with esmtp (Exim 4.60)
(envelope-from <ubuntu-desktop-bounces@lists.ubuntu.com>)
id 1KbbVc-00077w-7x; Fri, 05 Sep 2008 14:37:56 +0100
Received: from gv-out-0910.google.com ([216.239.58.187])
by chlorine.canonical.com with esmtp (Exim 4.60)
(envelope-from <ubuntu@bugabundo.net>) id 1KbbVZ-00076p-Nt
for ubuntu-desktop@lists.ubuntu.com; Fri, 05 Sep 2008 14:37:53 +0100
Received: by gv-out-0910.google.com with SMTP id s4so14375gve.11
for <ubuntu-desktop@lists.ubuntu.com>;
Fri, 05 Sep 2008 06:37:52 -0700 (PDT)
Received: by 10.103.225.11 with SMTP id c11mr7941677mur.32.1220621872287;
Fri, 05 Sep 2008 06:37:52 -0700 (PDT)
Received: from blubug.localnet ( [87.196.197.8])
by mx.google.com with ESMTPS id e10sm372898muf.14.2008.09.05.06.37.48
(version=SSLv3 cipher=RC4-MD5); Fri, 05 Sep 2008 06:37:51 -0700 (PDT)
Organization: http://BUGabundo.net
To: ubuntu-desktop@lists.ubuntu.com
Subject: Re: Pidgin/Empathy (was Desktop team meeting, 2008-09-04)
Date: Thu, 4 Sep 2008 23:24:53 +0100
User-Agent: KMail/1.10.1 (Linux/2.6.27-1-generic; KDE/4.1.1; x86_64; ; )
References: <1220535936.10084.126.camel@quest>
In-Reply-To: <1220535936.10084.126.camel@quest>
MIME-Version: 1.0
Message-Id: <200809042324.56847.Ubuntu@bugabundo.net>
From: "(=?utf-8?q?=60=60-=5F-=C2=B4=C2=B4?=) -- Fernando"
<ubuntu@bugabundo.net>
X-BeenThere: ubuntu-desktop@lists.ubuntu.com
X-Mailman-Version: 2.1.8
Precedence: list
Reply-To: Ubuntu-reply@bugabundo.net, ubuntu-desktop@lists.ubuntu.com
List-Id: Desktop Team co-ordination and discussion
<ubuntu-desktop.lists.ubuntu.com>
List-Unsubscribe: <https://lists.ubuntu.com/mailman/listinfo/ubuntu-desktop>,
<mailto:ubuntu-desktop-request@lists.ubuntu.com?subject=unsubscribe>
List-Archive: <https://lists.ubuntu.com/archives/ubuntu-desktop>
List-Post: <mailto:ubuntu-desktop@lists.ubuntu.com>
List-Help: <mailto:ubuntu-desktop-request@lists.ubuntu.com?subject=help>
List-Subscribe: <https://lists.ubuntu.com/mailman/listinfo/ubuntu-desktop>,
<mailto:ubuntu-desktop-request@lists.ubuntu.com?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============6544222610228427474=="
Mime-version: 1.0
Sender: ubuntu-desktop-bounces@lists.ubuntu.com
Errors-To: ubuntu-desktop-bounces@lists.ubuntu.com

--===============6544222610228427474==
Content-Type: multipart/signed;
boundary="nextPart3198579.n8V7lFqaor";
protocol="application/pgp-signature";
micalg=pgp-sha1
Content-Transfer-Encoding: 7bit

--nextPart3198579.n8V7lFqaor
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Ol=C3=A1 Scott e a todos.

On Thursday 04 September 2008 14:45:36 Scott James Remnant wrote:
> =3D=3D Final decision on Pidgin/Empathy discussion =3D=3D
> After mpt's review, the final decision was to delay the switch until next=
cycle.

Where is this discussion?

=2D-=20
BUGabundo )
(``-_-=C2=B4=C2=B4) http://LinuxNoDEI.BUGabundo.net
Linux user #443786 GPG key 1024D/A1784EBB
My new micro-blog @ http://BUGabundo.net
ps. My emails tend to sound authority and aggressive. I'm sorry in advance.=
I'll try to be more assertive as time goes by...

--nextPart3198579.n8V7lFqaor
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEABECAAYFAkjAYDYACgkQcV4wzCrhCcqL1QCbB+BZFsDzeo mtIAbJv1c1PU/7
gWcAoNWdzA8LevP60OifdqaBTp2TzARA
=pT7z
-----END PGP SIGNATURE-----

--nextPart3198579.n8V7lFqaor--


--===============6544222610228427474==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

--
ubuntu-desktop mailing list
ubuntu-desktop@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-desktop

--===============6544222610228427474==--
 
Old 09-05-2008, 02:52 PM
Chris Joelly
 
Default Red Hat Cluster Suite / Handling of shutdown and mounting of GFS

On Fre, Sep 05, 2008, Ante Karamatic wrote:
> On Fri, 5 Sep 2008 12:51:42 +0200
> Chris Joelly <chris-m-lists@joelly.net> wrote:
>
> Moving services isn't an issue here (you could remove all services from
> node with /etc/init.d/rgmanager stop). This problem is related with
> cluster membership. I don't know exactly where the problem is (I'm
> just a user, not developer .

unfortunately no developers out there reading our posts :-) i posted to
linux-cluster list too but no recommendations yet. i'm very enthusiastic
tracking down problems, but i'm mainly used to track down java related
problems as its my main occupation ;-)

> I'll repeat once more, having only two nodes in cluster is worst
> possible scenario for RHCS.

But you then have to use some other shared storage, DRBD won't work with
more than 2 nodes. and thats too expensive for the actual project ...

> I wouldn't use it on two-node cluster if I really don't have to (but I
> do in one case), but it's far away from useless. It's great The same
> problem exist on all distributions (FWIW my crappy two node cluster is
> on RedHat and all others are on Ubuntu).

This means that its better to switch to heartbeat managed services in an
active/passive manner? At least for a two node setup?

> Since RHCS isn't aware of DRBD, you can't really rely on it to handle
> GFS mount. This is why I don't manage GFS mounts with RHCS. I rather
> mount GFS on both machines and then let the services read it when they
> need to. For example:
>
> If I have two apache nodes, then I mount /var/www as GFS on both
> (underneath this GFS is a DRBD device with both nodes in
> primary-primary). As soon as first node dies, service is started on the
> other node. RHCS doesn't manage my /var/www mount.

ok. so you define the services as "not auto start" in cluster.conf so
that your are able to bring up the underlaying drbd-clvm-gfs stuff.

as i conclude:
if the node a service runs on fails the failover node (in a 2-node
scenario) fences the failed node and takes the service of the failed node.
then when the failed node recovers, either from a reboot or from manually
intervention you first bring up the GFS mounts and then move the service
back to the re-joined node?

sounds reasonable... and avoids the requirement of the rgmanager to
check if a 'shared' resource (GFS in this case) is already activated by
another service on the same node ...

But i have something left open, at least in my head...
how do i safely remove one node from a running cluster, so that the
services on the remaining node keep running.

Chris


--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 

Thread Tools




All times are GMT. The time now is 11:51 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org