FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 12-19-2010, 11:06 PM
Daniel Bareiro
 
Default Problems with Pacemaker + Corosync after reboot

Hi all!

I'm beginning to test HA clusters with GNU/Linux and for that I decided
to try Pacemaker + Corosync in Debian Lenny following this [1] howto.

Both packages were installed from the Backports repositories. But I am
observing that if after configuration I reboot a node, it fails to join
to the cluster after the boot.

This is what I see in /var/log/daemon.log:

--------------------------------------------------------------------------
Dec 19 17:13:13 atlantis corosync[1508]: [pcmk ] WARN: route_ais_message: Sending message to local.crmd failed: unknown (rc=-2)
Dec 19 17:13:13 atlantis corosync[1508]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: unknown (rc=-2)
Dec 19 17:13:13 atlantis corosync[1508]: [pcmk ] WARN: route_ais_message: Sending message to local.attrd failed: unknown (rc=-2)
Dec 19 17:13:13 atlantis corosync[1508]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: unknown (rc=-2)
Dec 19 17:13:14 atlantis corosync[1508]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: unknown (rc=-2)
Dec 19 17:13:14 atlantis corosync[1508]: [pcmk ] WARN: route_ais_message: Sending message to local.cib failed: unknown (rc=-2)
Dec 19 17:13:21 atlantis corosync[1508]: [TOTEM ] A processor failed, forming new configuration.
Dec 19 17:13:25 atlantis corosync[1508]: [pcmk ] notice: pcmk_peer_update: Transitional membership event on ring 72: memb=1, new=0, lost=1
Dec 19 17:13:25 atlantis corosync[1508]: [pcmk ] info: pcmk_peer_update: memb: atlantis 335544586
Dec 19 17:13:25 atlantis corosync[1508]: [pcmk ] info: pcmk_peer_update: lost: daedalus 369099018
Dec 19 17:13:25 atlantis corosync[1508]: [pcmk ] notice: pcmk_peer_update: Stable membership event on ring 72: memb=1, new=0, lost=0
Dec 19 17:13:25 atlantis corosync[1508]: [pcmk ] info: pcmk_peer_update: MEMB: atlantis 335544586
Dec 19 17:13:25 atlantis corosync[1508]: [pcmk ] info: ais_mark_unseen_peer_dead: Node daedalus was not seen in the previous transition
Dec 19 17:13:25 atlantis corosync[1508]: [pcmk ] info: update_member: Node 369099018/daedalus is now: lost
Dec 19 17:13:25 atlantis corosync[1508]: [pcmk ] info: send_member_notification: Sending membership update 72 to 0 children
Dec 19 17:13:25 atlantis corosync[1508]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Dec 19 17:13:25 atlantis corosync[1508]: [MAIN ] Completed service synchronization, ready to provide service.
--------------------------------------------------------------------------


# ps auxf
[...]
root 1508 0.1 1.9 182624 4880 ? Ssl 15:52 0:22 /usr/sbin/corosync
root 1539 0.0 1.2 168144 3240 ? S 15:52 0:00 \_ /usr/sbin/corosync
root 1540 0.0 1.2 168144 3240 ? S 15:52 0:00 \_ /usr/sbin/corosync
root 1541 0.0 1.2 168144 3240 ? S 15:52 0:00 \_ /usr/sbin/corosync
root 1542 0.0 1.2 168144 3240 ? S 15:52 0:00 \_ /usr/sbin/corosync
root 1543 0.0 1.2 168144 3240 ? S 15:52 0:00 \_ /usr/sbin/corosync
root 1544 0.0 1.2 168144 3240 ? S 15:52 0:00 \_ /usr/sbin/corosync


From what I see in the howto, the output should be something like this:


root 29980 0.0 0.8 44304 3808 ? Ssl 20:55 0:00 /usr/sbin/corosync
root 29986 0.0 2.4 10812 10812 ? SLs 20:55 0:00 \_ /usr/lib/heartbeat/stonithd
102 29987 0.0 0.8 13012 3804 ? S 20:55 0:00 \_ /usr/lib/heartbeat/cib
root 29988 0.0 0.4 5444 1800 ? S 20:55 0:00 \_ /usr/lib/heartbeat/lrmd
102 29989 0.0 0.5 12364 2368 ? S 20:55 0:00 \_ /usr/lib/heartbeat/attrd
102 29990 0.0 0.5 8604 2304 ? S 20:55 0:00 \_ /usr/lib/heartbeat/pengine
102 29991 0.0 0.6 12648 3080 ? S 20:55 0:00 \_ /usr/lib/heartbeat/crmd


I also tried compiling Pacemaker using these [2] steps, but I get the
same result.


Thanks in advance for your reply.

Regards,
Daniel

[1] http://www.clusterlabs.org/wiki/Debian_Lenny_HowTo
[2] http://www.clusterlabs.org/wiki/Install#Building_from_Source
--
Fingerprint: BFB3 08D6 B4D1 31B2 72B9 29CE 6696 BF1B 14E6 1D37
Powered by Debian GNU/Linux Lenny - Linux user #188.598
 
Old 12-22-2010, 11:10 PM
Peter Beck
 
Default Problems with Pacemaker + Corosync after reboot

On Sun, 2010-12-19 at 21:06 -0300, Daniel Bareiro wrote:
> Hi all!
>
> I'm beginning to test HA clusters with GNU/Linux and for that I
> decided to try Pacemaker + Corosync in Debian Lenny following this [1]
> howto.
>
> Both packages were installed from the Backports repositories. But I am
> observing that if after configuration I reboot a node, it fails to
> join to the cluster after the boot.

Hi there,

I am trying the same with Squeeze (in VMs) and there are the same
issues. Sometimes it seems to work fine, but then there is the same
issue with just corosync. I also bought the Clusterbook from O'Reilly
(no idea if this is available in english [1]) but I have no clue what I
am doing wrong. I haven't found much useful documentation (beside the
same links you've already mentioned).

I've heard that Pacemaker and Corosync causes a lot of issues and it's
not very reliable and better to run Pacemaker with Heartbeat. Is this
true ?

Regards
Peter

[1] http://www.oreilly.de/catalog/pdf_linuxhacluster2ger/index.html



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1293063058.3531.19.camel@peanut.datentraeger.li">h ttp://lists.debian.org/1293063058.3531.19.camel@peanut.datentraeger.li
 
Old 12-25-2010, 09:01 PM
Peter Beck
 
Default Problems with Pacemaker + Corosync after reboot

On Sun, 2010-12-19 at 21:06 -0300, Daniel Bareiro wrote:
> # ps auxf
> [...]
> root 1508 0.1 1.9 182624 4880 ? Ssl 15:52
> 0:22 /usr/sbin/corosync
> root 1539 0.0 1.2 168144 3240 ? S 15:52 0:00
> \_ /usr/sbin/corosync

Hi Daniel

have you tried to kill corosync with "killall -9 corosync"
and then to restart via "/etc/init.d/corosync start" ?

This seems to bring back my nodes. If I do this, both nodes here are
back. But it does not solve the issue, every reboot I have to do it
again. Maybe corosync starts too early at bootup and one of the
depending services is not ready at this time ?

Best Regards
Peter



--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1293314479.2491.10.camel@peanut.datentraeger.li">h ttp://lists.debian.org/1293314479.2491.10.camel@peanut.datentraeger.li
 
Old 12-29-2010, 02:43 PM
Peter Beck
 
Default Problems with Pacemaker + Corosync after reboot

On Sun, 2010-12-19 at 21:06 -0300, Daniel Bareiro wrote:

> # ps auxf
> [...]
> root 1508 0.1 1.9 182624 4880 ? Ssl 15:52 0:22 /usr/sbin/corosync
> root 1539 0.0 1.2 168144 3240 ? S 15:52 0:00 \_ /usr/sbin/corosync
> root 1540 0.0 1.2 168144 3240 ? S 15:52 0:00 \_ /usr/sbin/corosync
> root 1541 0.0 1.2 168144 3240 ? S 15:52 0:00 \_ /usr/sbin/corosync
> root 1542 0.0 1.2 168144 3240 ? S 15:52 0:00 \_ /usr/sbin/corosync
> root 1543 0.0 1.2 168144 3240 ? S 15:52 0:00 \_ /usr/sbin/corosync
> root 1544 0.0 1.2 168144 3240 ? S 15:52 0:00 \_ /usr/sbin/corosync

Hi Daniel,

Stefan Voelkel just made a Bugreport against this issue:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=608269

Regards
Peter


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1293637381.26493.9.camel@peanut.datentraeger.li">h ttp://lists.debian.org/1293637381.26493.9.camel@peanut.datentraeger.li
 

Thread Tools




All times are GMT. The time now is 11:52 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org