FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Cluster Development

 
 
LinkBack Thread Tools
 
Old 10-13-2011, 02:41 PM
Steven Whitehouse
 
Default cluster4 gfs_controld

Hi,

On Thu, 2011-10-13 at 10:20 -0400, David Teigland wrote:
> Here's the outline of my plan to remove/replace the essential bits of
> gfs_controld in cluster4. I expect it'll go away entirely, but there
> could be one or two minor things it would still handle on the side.
>
> kernel dlm/gfs2 will continue to be operable with either
> . cluster3 dlm_controld/gfs_controld combination, or
> . cluster4 dlm_controld only
>
> Two main things from gfs_controld need replacing:
>
> 1. jid allocation, first mounter
>
> cluster3
> . both from gfs_controld
>
> cluster4
> . jid from dlm-kernel "slots" which will be assigned similarly
What is the actual algorithm used to assign these slots?

> . first mounter using a dlm lock in lock_dlm
>
That sounds good to me. The thing we need to resolve is how do we get
from one to the other. We may have to introduce a new name for the lock
protocol to avoid people accidentally using both schemes in the same
cluster.

> 2. recovery coordination, failure notification
>
> cluster3
> . coordination of dlm-kernel/gfs-kernel recovery is done
> indirectly in userspace between dlm_controld/gfs_controld,
> which then toggle sysfs files.
> . write("sysfs block", 0) -> block_store(1)
> write("sysfs recover", jid) -> recover_store(jid)
> write("sysfs block", 1) -> block_store(0)
>
> cluster4
> . coordination of dlm-kernel/gfs-kernel recovery is done
> directly in kernel using callbacks from dlm-kernel to gfs-kernel.
> . gdlm_mount(struct gfs2_sbd *sdp, const char *table, int *first, int *jid)
> calls dlm_recover_register(dlm, &jid, &recover_callbacks)
Can we not just pass the extra functions to dlm_create_lockspace? That
seems a bit simpler than adding an extra function just to register the
callbacks,

Steve.

> . gdlm_recover_prep() -> block_store(1)
> gdlm_recover_slot(jid) -> recover_store(jid)
> gdlm_recover_done() -> block_store(0)
>
> cluster3 dlm/gfs recovery
> . dlm_controld sees nodedown (libcpg)
> . gfs_controld sees nodedown (libcpg)
> . dlm_controld stops dlm-kernel (sysfs control 0)
> . gfs_controld stops gfs-kernel (sysfs block 1)
> . dlm_controld waits for gfs_controld kernel stop (libdlmcontrol)
> . gfs_controld waits for dlm_controld kernel stop (libdlmcontrol)
> . dlm_controld syncs state among all nodes (libcpg)
> . gfs_controld syncs state among all nodes (libcpg)
> . dlm_controld starts dlm-kernel recovery (sysfs control 1)
> . gfs_controld starts gfs-kernel recovery (sysfs recover jid)
> . gfs_controld starts gfs-kernel (sysfs block 0)
>
> cluster4 dlm/gfs recovery
> . dlm_controld sees nodedown (libcpg)
> . dlm_controld stops dlm-kernel (sysfs control 0)
> . dlm-kernel stops gfs-kernel (callback block 1)
> . dlm_controld syncs state among all nodes (libcpg)
> . dlm_controld starts dlm-kernel recovery (sysfs control 1)
> . dlm-kernel starts gfs-kernel recovery (callback recover jid)
> . dlm-kernel starts gfs-kernel (callback block 0)
>
 
Old 10-13-2011, 03:02 PM
Masatake YAMATO
 
Default cluster4 gfs_controld

Just a question.
I'm happy if you give me a hint.

> ...
> cluster3 dlm/gfs recovery
> . dlm_controld sees nodedown (libcpg)
> . gfs_controld sees nodedown (libcpg)
> . dlm_controld stops dlm-kernel (sysfs control 0)
> . gfs_controld stops gfs-kernel (sysfs block 1)
> . dlm_controld waits for gfs_controld kernel stop (libdlmcontrol)
> ...

"dlm_controld waits for gfs_controld kernel stop (libdlmcontrol)"

Is this true?
I'd like to know which source code file of which package this is
written. Which should I inspect dlm_controld or libdlmcontrol?

(I'm working on wireshark to handle the combination between
dlm_controld.)

Masatake YAMATO
 
Old 10-13-2011, 03:30 PM
David Teigland
 
Default cluster4 gfs_controld

On Thu, Oct 13, 2011 at 03:41:31PM +0100, Steven Whitehouse wrote:
> > cluster4
> > . jid from dlm-kernel "slots" which will be assigned similarly
> What is the actual algorithm used to assign these slots?

The same as picking jids: lowest unused id starting with 0. As for
implementation, I'll add it to the current dlm recovery messages.

(Frankly, I'd really like to just set jid to nodeid-1. Any support for
that? It would obviously add a slight requirement to picking nodeid's,
which 99.9% of people already do.)

> > . first mounter using a dlm lock in lock_dlm
> >
> That sounds good to me. The thing we need to resolve is how do we get
> from one to the other. We may have to introduce a new name for the lock
> protocol to avoid people accidentally using both schemes in the same
> cluster.

Compatibility rests on the fact that the new dlm kernel features will only
work when the cluster4 dlm_controld is used.

dlm_controld.v3 running: dlm_recover_register() returns an error, and
everything falls back to working as it does now, with gfs_controld.v3 etc.

dlm_controld.v4 running: dlm_recover_register() works, lock_dlm sets jid
and first. (gfs_controld.v3 will fail to even run with dlm_controld.v4,
and other combinations of v3/v4 daemons will also fail to run together.)

> > cluster4
> > . coordination of dlm-kernel/gfs-kernel recovery is done
> > directly in kernel using callbacks from dlm-kernel to gfs-kernel.
> > . gdlm_mount(struct gfs2_sbd *sdp, const char *table, int *first, int *jid)
> > calls dlm_recover_register(dlm, &jid, &recover_callbacks)
> Can we not just pass the extra functions to dlm_create_lockspace? That
> seems a bit simpler than adding an extra function just to register the
> callbacks,

Yes we could; I may do that. Returning the error mentioned above becomes
less direct. I'd have to overload the jid arg, or add another to indicate
the callbacks are enabled.
 
Old 10-13-2011, 03:33 PM
David Teigland
 
Default cluster4 gfs_controld

On Fri, Oct 14, 2011 at 12:02:27AM +0900, Masatake YAMATO wrote:
> Just a question.
> I'm happy if you give me a hint.
>
> > ...
> > cluster3 dlm/gfs recovery
> > . dlm_controld sees nodedown (libcpg)
> > . gfs_controld sees nodedown (libcpg)
> > . dlm_controld stops dlm-kernel (sysfs control 0)
> > . gfs_controld stops gfs-kernel (sysfs block 1)
> > . dlm_controld waits for gfs_controld kernel stop (libdlmcontrol)
> > ...
>
> "dlm_controld waits for gfs_controld kernel stop (libdlmcontrol)"
>
> Is this true?
> I'd like to know which source code file of which package this is
> written. Which should I inspect dlm_controld or libdlmcontrol?

The function is check_fs_done()

http://git.fedorahosted.org/git?p=cluster.git;a=blob;f=group/dlm_controld/cpg.c;h=9b0d22333be540a733f2f74db4acc577c82b6026;h b=RHEL6#l636
 
Old 10-13-2011, 04:16 PM
Steven Whitehouse
 
Default cluster4 gfs_controld

Hi,

On Thu, 2011-10-13 at 11:30 -0400, David Teigland wrote:
> On Thu, Oct 13, 2011 at 03:41:31PM +0100, Steven Whitehouse wrote:
> > > cluster4
> > > . jid from dlm-kernel "slots" which will be assigned similarly
> > What is the actual algorithm used to assign these slots?
>
> The same as picking jids: lowest unused id starting with 0. As for
> implementation, I'll add it to the current dlm recovery messages.
>
Yes, but the current implementation uses corosync to enforce ordering of
events, so I'm wondering how the dlm will do that after the change.

> (Frankly, I'd really like to just set jid to nodeid-1. Any support for
> that? It would obviously add a slight requirement to picking nodeid's,
> which 99.9% of people already do.)
>
The problem is that if you have a cluster with lots of nodes, but where
each fs is only mounted by a small number of them, we'd have to insist
on always creating as many journals as there are nodes in the cluster.

> > > . first mounter using a dlm lock in lock_dlm
> > >
> > That sounds good to me. The thing we need to resolve is how do we get
> > from one to the other. We may have to introduce a new name for the lock
> > protocol to avoid people accidentally using both schemes in the same
> > cluster.
>
> Compatibility rests on the fact that the new dlm kernel features will only
> work when the cluster4 dlm_controld is used.
>
> dlm_controld.v3 running: dlm_recover_register() returns an error, and
> everything falls back to working as it does now, with gfs_controld.v3 etc.
>
> dlm_controld.v4 running: dlm_recover_register() works, lock_dlm sets jid
> and first. (gfs_controld.v3 will fail to even run with dlm_controld.v4,
> and other combinations of v3/v4 daemons will also fail to run together.)
>
> > > cluster4
> > > . coordination of dlm-kernel/gfs-kernel recovery is done
> > > directly in kernel using callbacks from dlm-kernel to gfs-kernel.
> > > . gdlm_mount(struct gfs2_sbd *sdp, const char *table, int *first, int *jid)
> > > calls dlm_recover_register(dlm, &jid, &recover_callbacks)
> > Can we not just pass the extra functions to dlm_create_lockspace? That
> > seems a bit simpler than adding an extra function just to register the
> > callbacks,
>
> Yes we could; I may do that. Returning the error mentioned above becomes
> less direct. I'd have to overload the jid arg, or add another to indicate
> the callbacks are enabled.
>
Another alternative is just to add a member of the recover_callbacks
structure which would be a function taking the first and jid as
arguments and the dlm can call that to pass the into to gfs2.

That way dlm users who don't care about that would just leave those
functions NULL, for example,

Steve.
 
Old 10-13-2011, 04:17 PM
Steven Whitehouse
 
Default cluster4 gfs_controld

Hi,

On Fri, 2011-10-14 at 00:02 +0900, Masatake YAMATO wrote:
> Just a question.
> I'm happy if you give me a hint.
>
> > ...
> > cluster3 dlm/gfs recovery
> > . dlm_controld sees nodedown (libcpg)
> > . gfs_controld sees nodedown (libcpg)
> > . dlm_controld stops dlm-kernel (sysfs control 0)
> > . gfs_controld stops gfs-kernel (sysfs block 1)
> > . dlm_controld waits for gfs_controld kernel stop (libdlmcontrol)
> > ...
>
> "dlm_controld waits for gfs_controld kernel stop (libdlmcontrol)"
>
> Is this true?
> I'd like to know which source code file of which package this is
> written. Which should I inspect dlm_controld or libdlmcontrol?
>
> (I'm working on wireshark to handle the combination between
> dlm_controld.)
>
> Masatake YAMATO
>

The communication between gfs_controld and dlm_controld is done via a
unix socket, so you won't see that "on the wire",

Steve.
 
Old 10-13-2011, 04:49 PM
David Teigland
 
Default cluster4 gfs_controld

On Thu, Oct 13, 2011 at 05:16:29PM +0100, Steven Whitehouse wrote:
> Hi,
>
> On Thu, 2011-10-13 at 11:30 -0400, David Teigland wrote:
> > On Thu, Oct 13, 2011 at 03:41:31PM +0100, Steven Whitehouse wrote:
> > > > cluster4
> > > > . jid from dlm-kernel "slots" which will be assigned similarly
> > > What is the actual algorithm used to assign these slots?
> >
> > The same as picking jids: lowest unused id starting with 0. As for
> > implementation, I'll add it to the current dlm recovery messages.
> >
> Yes, but the current implementation uses corosync to enforce ordering of
> events, so I'm wondering how the dlm will do that after the change.

One node picks it and then tells everyone. In this case the low nodeid
coordinating recovery, and the new data is added to the status messages.
We still rely on agreed ordering of different recovery events, which is
still based on libcpg.

In cluster3 everyone picks the same values independently based on the
ordered events/messages, just because that's how gfs_controld decides
everything.


> > (Frankly, I'd really like to just set jid to nodeid-1. Any support for
> > that? It would obviously add a slight requirement to picking nodeid's,
> > which 99.9% of people already do.)
> >
> The problem is that if you have a cluster with lots of nodes, but where
> each fs is only mounted by a small number of them, we'd have to insist
> on always creating as many journals as there are nodes in the cluster.

yeah

> > Yes we could; I may do that. Returning the error mentioned above becomes
> > less direct. I'd have to overload the jid arg, or add another to indicate
> > the callbacks are enabled.
> >
> Another alternative is just to add a member of the recover_callbacks
> structure which would be a function taking the first and jid as
> arguments and the dlm can call that to pass the into to gfs2.
>
> That way dlm users who don't care about that would just leave those
> functions NULL, for example,

The simplest variation should become evident once I start writing it;
it's hard to predict.
 
Old 10-13-2011, 07:00 PM
Masatake YAMATO
 
Default cluster4 gfs_controld

Thank you very much.

Masatake YAMATO

> On Fri, Oct 14, 2011 at 12:02:27AM +0900, Masatake YAMATO wrote:
>> Just a question.
>> I'm happy if you give me a hint.
>>
>> > ...
>> > cluster3 dlm/gfs recovery
>> > . dlm_controld sees nodedown (libcpg)
>> > . gfs_controld sees nodedown (libcpg)
>> > . dlm_controld stops dlm-kernel (sysfs control 0)
>> > . gfs_controld stops gfs-kernel (sysfs block 1)
>> > . dlm_controld waits for gfs_controld kernel stop (libdlmcontrol)
>> > ...
>>
>> "dlm_controld waits for gfs_controld kernel stop (libdlmcontrol)"
>>
>> Is this true?
>> I'd like to know which source code file of which package this is
>> written. Which should I inspect dlm_controld or libdlmcontrol?
>
> The function is check_fs_done()
>
> http://git.fedorahosted.org/git?p=cluster.git;a=blob;f=group/dlm_controld/cpg.c;h=9b0d22333be540a733f2f74db4acc577c82b6026;h b=RHEL6#l636
>
 
Old 10-13-2011, 08:30 PM
Lon Hohberger
 
Default cluster4 gfs_controld

On 10/13/2011 11:30 AM, David Teigland wrote:

On Thu, Oct 13, 2011 at 03:41:31PM +0100, Steven Whitehouse wrote:

cluster4
. jid from dlm-kernel "slots" which will be assigned similarly

What is the actual algorithm used to assign these slots?


The same as picking jids: lowest unused id starting with 0. As for
implementation, I'll add it to the current dlm recovery messages.

(Frankly, I'd really like to just set jid to nodeid-1. Any support for
that? It would obviously add a slight requirement to picking nodeid's,
which 99.9% of people already do.)


While I think this is simple, I don't think this is the best idea.

This would only work efficiently if the cluster stack only used whole
numbers, instead of say "integer" (like native corosync).


-- Lon
 
Old 10-13-2011, 08:30 PM
Lon Hohberger
 
Default cluster4 gfs_controld

On 10/13/2011 11:30 AM, David Teigland wrote:

On Thu, Oct 13, 2011 at 03:41:31PM +0100, Steven Whitehouse wrote:

cluster4
. jid from dlm-kernel "slots" which will be assigned similarly

What is the actual algorithm used to assign these slots?


The same as picking jids: lowest unused id starting with 0. As for
implementation, I'll add it to the current dlm recovery messages.

(Frankly, I'd really like to just set jid to nodeid-1. Any support for
that? It would obviously add a slight requirement to picking nodeid's,
which 99.9% of people already do.)


While I think this is simple, I don't think this is the best idea.

This would only work efficiently if the cluster stack only used whole
numbers, instead of say "integer" (like native corosync).


-- Lon
 

Thread Tools




All times are GMT. The time now is 06:25 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org