FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Cluster Development

 
 
LinkBack Thread Tools
 
Old 12-01-2008, 04:31 PM
David Teigland
 
Default gfs uevent and sysfs changes

Here are the compatibility aspects to the recent ideas about changes to
the user/kernel interface between gfs (1 & 2) and gfs_controld.

. gfs_controld can remove id from hostdata string in mount options

- no compat issues AFAICT

. getting rid of "id" sysfs file from lock_dlm

- new gfs_controld old gfs-kernel
old kernel provides both "block" and "id" sysfs files
new daemon looks for "block" instead of "id" in sysfs

- old gfs_controld new gfs-kernel
old daemon looks for "id" sysfs file
new kernel needs to provide "id" as well as "block" sysfs files

Once everyone is using the new daemon, we can remove the "id" sysfs
file from the kernel.

. uevent strings to replace recover_done/recover_status sysfs files

- new gfs_controld old gfs-kernel
old kernel has recover sysfs files, and no new uevent strings
new daemon needs to look for either sysfs files or uevent strings

- old gfs_controld new gfs-kernel
old daemon looks for recover sysfs files, not new uevent strings
new kernel needs to provide both sysfs files and uevent strings

Once everyone is using new kernel and new daemon, we can remove
the recover sysfs files from kernel, and daemon can stop looking for
recover sysfs files.
 
Old 12-02-2008, 01:02 PM
Steven Whitehouse
 
Default gfs uevent and sysfs changes

Hi,

On Mon, 2008-12-01 at 11:31 -0600, David Teigland wrote:
> Here are the compatibility aspects to the recent ideas about changes to
> the user/kernel interface between gfs (1 & 2) and gfs_controld.
>
> . gfs_controld can remove id from hostdata string in mount options
>
> - no compat issues AFAICT
>
> . getting rid of "id" sysfs file from lock_dlm
>
> - new gfs_controld old gfs-kernel
> old kernel provides both "block" and "id" sysfs files
> new daemon looks for "block" instead of "id" in sysfs
>
> - old gfs_controld new gfs-kernel
> old daemon looks for "id" sysfs file
> new kernel needs to provide "id" as well as "block" sysfs files
>
> Once everyone is using the new daemon, we can remove the "id" sysfs
> file from the kernel.
>
> . uevent strings to replace recover_done/recover_status sysfs files
>
> - new gfs_controld old gfs-kernel
> old kernel has recover sysfs files, and no new uevent strings
> new daemon needs to look for either sysfs files or uevent strings
>
> - old gfs_controld new gfs-kernel
> old daemon looks for recover sysfs files, not new uevent strings
> new kernel needs to provide both sysfs files and uevent strings
>
> Once everyone is using new kernel and new daemon, we can remove
> the recover sysfs files from kernel, and daemon can stop looking for
> recover sysfs files.
>
>
So notwithstanding the fact that I've still not sorted out a proper
build environment for groupd, I have created a new patch based on the
above, which I think should address all the issues.

I've tested the decode_uevent() function separately and it appears to
work well. With the sysfs files which we intend to eventually replace
with the new uevent variables which apply to the "change" message only,
I've introduced a system where "-1" means that we don't know the value
since the uevent didn't tell us. We then fall back to reading sysfs in
that case. It seems to work nicely since all the existing valid values
of those variable are positive integers.

I think I've addressed all the points above, and this version seems a
bit cleaner than the previous one, although there is still scope to do a
bit more cleaning up at a later date,

Steve.

diff --git a/group/gfs_controld/cpg-new.c b/group/gfs_controld/cpg-new.c
index 74806f6..4014081 100644
--- a/group/gfs_controld/cpg-new.c
+++ b/group/gfs_controld/cpg-new.c
@@ -2078,30 +2078,33 @@ static void apply_changes(struct mountgroup *mg)
and then process the uevent/ipc upon receiving the message for it, so
that it can be processed in the same order by all nodes. */

-void process_recovery_uevent(char *table)
+void process_recovery_uevent(char *name, int jid, int recover_status,
+ int first_done)
{
struct mountgroup *mg;
struct journal *j;
- char *name = strstr(table, ":") + 1;
- int jid, recover_status, first_done;
int rv;

mg = find_mg(name);
if (!mg) {
- log_error("recovery_uevent mg not found %s", table);
+ log_error("recovery_uevent mg not found %s", name);
return;
}

- rv = read_sysfs_int(mg, "recover_done", &jid);
- if (rv < 0) {
- log_error("recovery_uevent recover_done read %d", rv);
- return;
+ if (jid < 0) {
+ rv = read_sysfs_int(mg, "recover_done", &jid);
+ if (rv < 0) {
+ log_error("recovery_uevent recover_done read %d", rv);
+ return;
+ }
}

- rv = read_sysfs_int(mg, "recover_status", &recover_status);
- if (rv < 0) {
- log_error("recovery_uevent recover_status read %d", rv);
- return;
+ if (recover_status < 0) {
+ rv = read_sysfs_int(mg, "recover_status", &recover_status);
+ if (rv < 0) {
+ log_error("recovery_uevent recover_status read %d", rv);
+ return;
+ }
}

if (!mg->first_recovery_needed) {
@@ -2162,10 +2165,12 @@ void process_recovery_uevent(char *table)
if (mg->first_done_uevent)
return;

- rv = read_sysfs_int(mg, "first_done", &first_done);
- if (rv < 0) {
- log_error("recovery_uevent first_done read %d", rv);
- return;
+ if (first_done < 0) {
+ rv = read_sysfs_int(mg, "first_done", &first_done);
+ if (rv < 0) {
+ log_error("recovery_uevent first_done read %d", rv);
+ return;
+ }
}

if (first_done) {
@@ -2678,12 +2683,11 @@ static void leave_mountgroup(struct mountgroup *mg, int mnterr)
log_error("cpg_leave error %d", error);
}

-void do_leave(char *table, int mnterr)
+void do_leave(char *name, int mnterr)
{
struct mountgroup *mg;
- char *name = strstr(table, ":") + 1;

- log_debug("do_leave %s mnterr %d", table, mnterr);
+ log_debug("do_leave %s mnterr %d", name, mnterr);

mg = find_mg(name);
if (!mg) {
diff --git a/group/gfs_controld/cpg-old.c b/group/gfs_controld/cpg-old.c
index 192a403..067fb85 100644
--- a/group/gfs_controld/cpg-old.c
+++ b/group/gfs_controld/cpg-old.c
@@ -1563,13 +1563,15 @@ static void recover_journals(struct mountgroup *mg)
these and wait for gfs to be finished with all at which point it calls
others_may_mount() and first_done is set. */

-static int kernel_recovery_done_first(struct mountgroup *mg)
+static int kernel_recovery_done_first(struct mountgroup *mg, int first_done)
{
- int rv, first_done;
+ int rv;

- rv = read_sysfs_int(mg, "first_done", &first_done);
- if (rv < 0)
- return rv;
+ if (first_done < 0) {
+ rv = read_sysfs_int(mg, "first_done", &first_done);
+ if (rv < 0)
+ return rv;
+ }

log_group(mg, "kernel_recovery_done_first first_done %d", first_done);

@@ -1604,26 +1606,27 @@ static int need_kernel_recovery_done(struct mountgroup *mg)
remain blocked until an rw node mounts, and the next mounter must
be rw. */

-int process_recovery_uevent_old(char *table)
+int process_recovery_uevent_old(char *name, int jid_done, int status, int first)
{
struct mountgroup *mg;
struct mg_member *memb;
- char *name = strstr(table, ":") + 1;
char *ss;
- int rv, jid_done, status, found = 0;
+ int rv, found = 0;

mg = find_mg(name);
if (!mg) {
- log_error("recovery_done: unknown mount group %s", table);
+ log_error("recovery_done: unknown mount group %s", name);
return -1;
}

if (mg->first_mounter && !mg->first_mounter_done)
- return kernel_recovery_done_first(mg);
+ return kernel_recovery_done_first(mg, first);

- rv = read_sysfs_int(mg, "recover_done", &jid_done);
- if (rv < 0)
- return rv;
+ if (jid_done < 0) {
+ rv = read_sysfs_int(mg, "recover_done", &jid_done);
+ if (rv < 0)
+ return rv;
+ }

list_for_each_entry(memb, &mg->members_gone, list) {
if (memb->jid == jid_done) {
@@ -1646,12 +1649,14 @@ int process_recovery_uevent_old(char *table)
return 0;
}

- rv = read_sysfs_int(mg, "recover_status", &status);
- if (rv < 0) {
- log_group(mg, "recovery_done jid %d nodeid %d sysfs error %d",
- memb->jid, memb->nodeid, rv);
- memb->local_recovery_status = RS_NOFS;
- goto out;
+ if (status < 0) {
+ rv = read_sysfs_int(mg, "recover_status", &status);
+ if (rv < 0) {
+ log_group(mg, "recovery_done jid %d nodeid %d sysfs error %d",
+ memb->jid, memb->nodeid, rv);
+ memb->local_recovery_status = RS_NOFS;
+ goto out;
+ }
}

switch (status) {
@@ -1724,12 +1729,11 @@ static void leave_mountgroup(struct mountgroup *mg, int mnterr)
group_leave(gh, mg->name);
}

-void do_leave_old(char *table, int mnterr)
+void do_leave_old(char *name, int mnterr)
{
struct mountgroup *mg;
- char *name = strstr(table, ":") + 1;

- log_debug("do_leave_old %s mnterr %d", table, mnterr);
+ log_debug("do_leave_old %s mnterr %d", name, mnterr);

list_for_each_entry(mg, &withdrawn_mounts, list) {
if (strcmp(mg->name, name))
diff --git a/group/gfs_controld/gfs_daemon.h b/group/gfs_controld/gfs_daemon.h
index 3beeb0e..c93d9cc 100644
--- a/group/gfs_controld/gfs_daemon.h
+++ b/group/gfs_controld/gfs_daemon.h
@@ -225,10 +225,10 @@ void process_cpg(int ci);
int setup_dlmcontrol(void);
void process_dlmcontrol(int ci);
int set_protocol(void);
-void process_recovery_uevent(char *table);
+void process_recovery_uevent(char *name, int jid, int status, int first);
void process_mountgroups(void);
int gfs_join_mountgroup(struct mountgroup *mg);
-void do_leave(char *table, int mnterr);
+void do_leave(char *name, int mnterr);
void gfs_mount_done(struct mountgroup *mg);
void send_remount(struct mountgroup *mg, struct gfsc_mount_args *ma);
void send_withdraw(struct mountgroup *mg);
@@ -245,13 +245,12 @@ void close_cpg_old(void);
void process_cpg_old(int ci);

int gfs_join_mountgroup_old(struct mountgroup *mg, struct gfsc_mount_args *ma);
-void do_leave_old(char *table, int mnterr);
+void do_leave_old(char *name, int mnterr);
int send_group_message_old(struct mountgroup *mg, int len, char *buf);
void save_message_old(struct mountgroup *mg, char *buf, int len, int from,
int type);
void send_withdraw_old(struct mountgroup *mg);
-int process_recovery_uevent_old(char *table);
-void ping_kernel_mount_old(char *table);
+int process_recovery_uevent_old(char *name, int jid, int status, int first);
void send_remount_old(struct mountgroup *mg, struct gfsc_mount_args *ma);
void send_mount_status_old(struct mountgroup *mg);
int do_stop(struct mountgroup *mg);
diff --git a/group/gfs_controld/main.c b/group/gfs_controld/main.c
index a2d8ed9..3ad2232 100644
--- a/group/gfs_controld/main.c
+++ b/group/gfs_controld/main.c
@@ -7,6 +7,7 @@

#define LOCKFILE_NAME "/var/run/gfs_controld.pid"
#define CLIENT_NALLOC 32
+#define UEVENT_BUF_SIZE 4096

static int client_maxi;
static int client_size;
@@ -22,7 +23,7 @@ struct client {
struct mountgroup *mg;
};

-static void do_withdraw(char *table);
+static void do_withdraw(char *name);

int do_read(int fd, void *buf, size_t count)
{
@@ -198,62 +199,100 @@ struct mountgroup *find_mg_id(uint32_t id)
return NULL;
}

-#define MAXARGS 8
-
-static char *get_args(char *buf, int *argc, char **argv, char sep, int want)
-{
- char *p = buf, *rp = NULL;
- int i;
-
- argv[0] = p;
-
- for (i = 1; i < MAXARGS; i++) {
- p = strchr(buf, sep);
- if (!p)
- break;
- *p = '';
-
- if (want == i) {
- rp = p + 1;
- break;
- }
-
- argv[i] = p + 1;
- buf = p + 1;
- }
- *argc = i;
-
- /* we ended by hitting , return the point following that */
- if (!rp)
- rp = strchr(buf, '') + 1;
-
- return rp;
-}
-
-static void ping_kernel_mount(char *table)
+static void ping_kernel_mount(char *name)
{
struct mountgroup *mg;
- char *name = strstr(table, ":") + 1;
int rv, val;

mg = find_mg(name);
if (!mg)
return;

- rv = read_sysfs_int(mg, "id", &val);
+ rv = read_sysfs_int(mg, "block", &val);

log_group(mg, "ping_kernel_mount %d", rv);
}

-static void process_uevent(int ci)
+enum {
+ Env_ACTION = 0,
+ Env_SUBSYSTEM,
+ Env_LOCKPROTO,
+ Env_LOCKTABLE,
+ Env_DEVPATH,
+ Env_RECOVERY,
+ Env_FIRSTMOUNT,
+ Env_JID,
+ Env_Last, /* Flag for end of vars */
+};
+
+static const char *uevent_vars[] = {
+ [Env_ACTION] = "ACTION=",
+ [Env_SUBSYSTEM] = "SUBSYSTEM=",
+ [Env_LOCKPROTO] = "LOCKPROTO=",
+ [Env_LOCKTABLE] = "LOCKTABLE=",
+ [Env_DEVPATH] = "DEVPATH=/fs/gfs",
+ [Env_RECOVERY] = "RECOVERY=",
+ [Env_FIRSTMOUNT] = "FIRSTMOUNT=",
+ [Env_JID] = "JID=",
+};
+
+/*
+ * Parses a uevent message for the interesting bits. It requires a list
+ * of variables to look for, and an equally long list of pointers into
+ * which to write the results.
+ */
+static void decode_uevent(const char *buf, unsigned len, const char *vars[],
+ unsigned nvars, const char *vals[])
+{
+ const char *ptr;
+ unsigned i;
+ int slen, vlen;
+
+ memset(vals, 0, sizeof(const char *) * nvars);
+
+ while (len > 0) {
+ ptr = buf;
+ slen = strlen(ptr);
+ buf += slen;
+ len -= slen;
+ buf++; len--;
+ for (i = 0; i < nvars; i++) {
+ vlen = strlen(vars[i]);
+ if (vlen > slen)
+ continue;
+ if (memcmp(vars[i], ptr, vlen) != 0)
+ continue;
+ vals[i] = ptr + vlen;
+ break;
+ }
+ }
+}
+
+static char *uevent_fsname(const char *vars[])
{
- char buf[MAXLINE];
- char *argv[MAXARGS], *act, *sys;
- int rv, argc = 0;
- int lock_module = 0;
+ char *name = NULL;
+ if (vars[Env_LOCKTABLE])
+ name = strchr(vars[Env_LOCKTABLE], ':');
+ /* When all kernels are converted, we can dispose with the following
+ * grotty bit. This is for backward compatibility only.
+ */
+ if (!name && vars[Env_DEVPATH]) {
+ name = strchr(vars[Env_DEVPATH], ':');
+ if (name) {
+ char *end = strstr(name, "/lock_dlm");
+ if (*end)
+ *end = 0;
+ }
+ }
+ return (name && name[0]) ? name + 1 : NULL;
+}

- memset(buf, 0, sizeof(buf));
- memset(argv, 0, sizeof(char *) * MAXARGS);
+static void process_uevent(int ci)
+{
+ char buf[UEVENT_BUF_SIZE];
+ const char *uevent_vals[Env_Last];
+ char *fsname;
+ int rv;

retry_recv:
rv = recv(client[ci].fd, &buf, sizeof(buf), 0);
@@ -264,68 +303,52 @@ static void process_uevent(int ci)
log_error("uevent recv error %d errno %d", rv, errno);
return;
}
-
- /* first we get the uevent for removing lock module kobject:
- "remove@/fs/gfs/bull:x/lock_module"
- second is the uevent for removing gfs kobject:
- "remove@/fs/gfs/bull:x"
- */
-
- if (!strstr(buf, "gfs"))
+ buf[rv] = 0;
+ decode_uevent(buf, rv, uevent_vars, Env_Last, uevent_vals);
+ if (!uevent_vals[Env_DEVPATH] || !uevent_vals[Env_ACTION] ||
+ !uevent_vals[Env_SUBSYSTEM])
return;
-
- /* if an fs is named "gfs", it results in dlm uevents
- like "remove@/kernel/dlm/gfs" */
-
- if (strstr(buf, "kernel/dlm"))
- return;
-
+ fsname = uevent_fsname(uevent_vals);
log_debug("uevent: %s", buf);
+ log_debug("kernel: %s %s", uevent_vals[Env_ACTION], fsname);

- if (strstr(buf, "lock_module"))
- lock_module = 1;
-
- get_args(buf, &argc, argv, '/', 4);
- if (argc != 4)
- log_error("uevent message has %d args", argc);
- act = argv[0];
- sys = argv[2];
-
- log_debug("kernel: %s %s", act, argv[3]);
+ if (!fsname)
+ return;

- if (!strcmp(act, "remove@")) {
+ if (!strcmp(uevent_vals[Env_ACTION], "remove")) {
/* We want to trigger the leave at the very end of the kernel's
unmount process, i.e. at the end of put_super(), so we do the
leave when the second uevent (from the gfs kobj) arrives. */

- if (lock_module)
+ if (strcmp(uevent_vals[Env_SUBSYSTEM], "lock_dlm") == 0)
return;
-
if (group_mode == GROUP_LIBGROUP)
- do_leave_old(argv[3], 0);
+ do_leave_old(fsname, 0);
else
- do_leave(argv[3], 0);
-
- } else if (!strcmp(act, "change@")) {
- if (!lock_module)
- return;
-
+ do_leave(fsname, 0);
+
+ } else if (!strcmp(uevent_vals[Env_ACTION], "change")) {
+ int jid, status = -1, first = -1;
+ if (!uevent_vals[Env_JID] ||
+ (sscanf(uevent_vals[Env_JID], "%d", &jid) != 1))
+ jid = -1;
+ if (uevent_vals[Env_RECOVERY]) {
+ if (strcmp(uevent_vals[Env_RECOVERY], "Done") == 0)
+ status = LM_RD_SUCCESS;
+ if (strcmp(uevent_vals[Env_RECOVERY], "Failed") == 0)
+ status = LM_RD_GAVEUP;
+ }
+ if (uevent_vals[Env_FIRSTMOUNT] &&
+ (strcmp(uevent_vals[Env_FIRSTMOUNT], "Done") == 0))
+ first = 1;
if (group_mode == GROUP_LIBGROUP)
- process_recovery_uevent_old(argv[3]);
+ process_recovery_uevent_old(fsname, jid, status, first);
else
- process_recovery_uevent(argv[3]);
-
- } else if (!strcmp(act, "offline@")) {
- if (!lock_module)
- return;
-
- do_withdraw(argv[3]);
-
+ process_recovery_uevent(fsname, jid, status, first);
+ } else if (!strcmp(uevent_vals[Env_ACTION], "offline")) {
+ do_withdraw(fsname);
} else {
- if (!lock_module)
- return;
-
- ping_kernel_mount(argv[3]);
+ ping_kernel_mount(fsname);
}
}

@@ -736,10 +759,9 @@ static void do_join(int ci, struct gfsc_mount_args *ma)
and when it's been removed from the group, it tells the locally withdrawing
gfs to clear out locks. */

-static void do_withdraw(char *table)
+static void do_withdraw(char *name)
{
struct mountgroup *mg;
- char *name = strstr(table, ":") + 1;
int rv;

log_debug("withdraw: %s", name);
 
Old 12-04-2008, 05:32 PM
"david m. richter"
 
Default gfs uevent and sysfs changes

On Mon, Dec 1, 2008 at 12:31 PM, David Teigland <teigland@redhat.com> wrote:
> Here are the compatibility aspects to the recent ideas about changes to
> the user/kernel interface between gfs (1 & 2) and gfs_controld.
>
> . gfs_controld can remove id from hostdata string in mount options

hi david,

I know I'm a peripheral consumer of the cluster suite, but I thought
I'd chime in and say that I am currently using the "id" as passed into
the kernel in the hostdata string (I believe by mount.gfs2?) in my
pNFS work. does the above "gfs_controld can remove id from hostdata
string" comment refer to something orthogonal, or would it affect what
gets stored in the superblock's hostdata at mount time?

..hm, sorry, I don't have the code right in front of me, but is that
"id" in the hostdata string the same thing as the mountgroup id? if
so, then my above worry about the hostdata string is moot, because if
gfs_controld still has that info I can just make a downcall.

thanks,

d
.

>
> - no compat issues AFAICT
>
> . getting rid of "id" sysfs file from lock_dlm
>
> - new gfs_controld old gfs-kernel
> old kernel provides both "block" and "id" sysfs files
> new daemon looks for "block" instead of "id" in sysfs
>
> - old gfs_controld new gfs-kernel
> old daemon looks for "id" sysfs file
> new kernel needs to provide "id" as well as "block" sysfs files
>
> Once everyone is using the new daemon, we can remove the "id" sysfs
> file from the kernel.
>
> . uevent strings to replace recover_done/recover_status sysfs files
>
> - new gfs_controld old gfs-kernel
> old kernel has recover sysfs files, and no new uevent strings
> new daemon needs to look for either sysfs files or uevent strings
>
> - old gfs_controld new gfs-kernel
> old daemon looks for recover sysfs files, not new uevent strings
> new kernel needs to provide both sysfs files and uevent strings
>
> Once everyone is using new kernel and new daemon, we can remove
> the recover sysfs files from kernel, and daemon can stop looking for
> recover sysfs files.
>
>
>
 
Old 12-04-2008, 08:07 PM
David Teigland
 
Default gfs uevent and sysfs changes

On Thu, Dec 04, 2008 at 01:32:31PM -0500, david m. richter wrote:
> On Mon, Dec 1, 2008 at 12:31 PM, David Teigland <teigland@redhat.com> wrote:
> > Here are the compatibility aspects to the recent ideas about changes to
> > the user/kernel interface between gfs (1 & 2) and gfs_controld.
> >
> > . gfs_controld can remove id from hostdata string in mount options
>
> hi david,
>
> I know I'm a peripheral consumer of the cluster suite, but I thought
> I'd chime in and say that I am currently using the "id" as passed into
> the kernel in the hostdata string (I believe by mount.gfs2?) in my
> pNFS work. does the above "gfs_controld can remove id from hostdata
> string" comment refer to something orthogonal, or would it affect what
> gets stored in the superblock's hostdata at mount time?

yes

> ..hm, sorry, I don't have the code right in front of me, but is that
> "id" in the hostdata string the same thing as the mountgroup id? if
> so, then my above worry about the hostdata string is moot, because if
> gfs_controld still has that info I can just make a downcall.

Yes, it's created in gfs_controld, and passed to mount.gfs via the
hostdata string which is then passed into the kernel during mount(2).

Previously, gfs-kernel (lock_dlm actually) would pass this id back up to
gfs_controld within the plock op structures. This was because plock ops
for all gfs fs's were funnelled to gfs_controld through a single misc
device. gfs_controld would match the op to a particular fs using the id.

The dlm does this now, using the lockspace id.

Dave
 
Old 12-04-2008, 08:59 PM
"david m. richter"
 
Default gfs uevent and sysfs changes

On Thu, Dec 4, 2008 at 4:07 PM, David Teigland <teigland@redhat.com> wrote:
> On Thu, Dec 04, 2008 at 01:32:31PM -0500, david m. richter wrote:
>> On Mon, Dec 1, 2008 at 12:31 PM, David Teigland <teigland@redhat.com> wrote:
>> > Here are the compatibility aspects to the recent ideas about changes to
>> > the user/kernel interface between gfs (1 & 2) and gfs_controld.
>> >
>> > . gfs_controld can remove id from hostdata string in mount options
>>
>> hi david,
>>
>> I know I'm a peripheral consumer of the cluster suite, but I thought
>> I'd chime in and say that I am currently using the "id" as passed into
>> the kernel in the hostdata string (I believe by mount.gfs2?) in my
>> pNFS work. does the above "gfs_controld can remove id from hostdata
>> string" comment refer to something orthogonal, or would it affect what
>> gets stored in the superblock's hostdata at mount time?
>
> yes
>
>> ..hm, sorry, I don't have the code right in front of me, but is that
>> "id" in the hostdata string the same thing as the mountgroup id? if
>> so, then my above worry about the hostdata string is moot, because if
>> gfs_controld still has that info I can just make a downcall.
>
> Yes, it's created in gfs_controld, and passed to mount.gfs via the
> hostdata string which is then passed into the kernel during mount(2).

ah, so just to make sure i'm with you here: (1) gfs_controld is
generating this "id"-which-is-the-mountgroup-id, and (2) gfs_kernel
will no longer receive this in the hostdata string, so (3) i can just
rip out my in-kernel hostdata-parsing gunk and instead send in the
mountgroup id on my own (i have my own up/downcall channel)? if i've
got it right, then everything's a cinch and i'll shut up

say, one tangential question (i won't be offended if you skip it -
heh): is there a particular reason that you folks went with the uevent
mechanism for doing upcalls? i'm just curious, given the
seeming-complexity and possible overhead of using the whole layered
netlink apparatus vs. something like Trond Myklebust's rpc_pipefs
(don't let the "rpc" fool you; it's a barebones, dead-simple pipe).
-- and no, i'm not selling anything my boss was asking for a list
of differences between rpc_pipefs and uevents and the best i could
come up with is the former's bidirectional. Trond mentioned the
netlink overhead and i wondered if that was actually a significant
factor or just lost in the noise in most cases.

thanks again,

d
.

> Previously, gfs-kernel (lock_dlm actually) would pass this id back up to
> gfs_controld within the plock op structures. This was because plock ops
> for all gfs fs's were funnelled to gfs_controld through a single misc
> device. gfs_controld would match the op to a particular fs using the id.
>
> The dlm does this now, using the lockspace id.
>
> Dave
>
>
 
Old 12-04-2008, 09:38 PM
David Teigland
 
Default gfs uevent and sysfs changes

On Thu, Dec 04, 2008 at 04:59:23PM -0500, david m. richter wrote:
> ah, so just to make sure i'm with you here: (1) gfs_controld is
> generating this "id"-which-is-the-mountgroup-id, and (2) gfs_kernel
> will no longer receive this in the hostdata string, so (3) i can just
> rip out my in-kernel hostdata-parsing gunk and instead send in the
> mountgroup id on my own (i have my own up/downcall channel)? if i've
> got it right, then everything's a cinch and i'll shut up

Yep. Generally, the best way to uniquely identify and refer to a gfs
filesystem is using the fsname string (specified during mkfs with -t and
saved in the superblock). But, sometimes it's just a lot easier have a
numerical identifier instead. I expect this is why you're using the id,
and it's why we were using it for communicating about plocks.

In cluster1 and cluster2 the cluster infrastructure dynamically selected a
unique id when needed, and it never worked great. In cluster3 the id is
just a crc of the fsname string.

Now that I think about this a bit more, there may be a reason to keep the
id in the string. There was some interest on linux-kernel about better
using the statfs fsid field, and this id is what gfs should be putting
there.

> say, one tangential question (i won't be offended if you skip it -
> heh): is there a particular reason that you folks went with the uevent
> mechanism for doing upcalls? i'm just curious, given the
> seeming-complexity and possible overhead of using the whole layered
> netlink apparatus vs. something like Trond Myklebust's rpc_pipefs
> (don't let the "rpc" fool you; it's a barebones, dead-simple pipe).
> -- and no, i'm not selling anything my boss was asking for a list
> of differences between rpc_pipefs and uevents and the best i could
> come up with is the former's bidirectional. Trond mentioned the
> netlink overhead and i wondered if that was actually a significant
> factor or just lost in the noise in most cases.

The uevents looked pretty simple when I was initially designing how the
kernel/user interactions would work, and they fit well with sysfs files
which I was using too. I don't think the overhead of using uevents is too
bad. Sysfs files and uevents definately don't work great if you need any
kind of sophisticated bi-directional interface.

Dave
 
Old 12-05-2008, 08:51 AM
Steven Whitehouse
 
Default gfs uevent and sysfs changes

Hi,

On Thu, 2008-12-04 at 16:38 -0600, David Teigland wrote:
> On Thu, Dec 04, 2008 at 04:59:23PM -0500, david m. richter wrote:
> > ah, so just to make sure i'm with you here: (1) gfs_controld is
> > generating this "id"-which-is-the-mountgroup-id, and (2) gfs_kernel
> > will no longer receive this in the hostdata string, so (3) i can just
> > rip out my in-kernel hostdata-parsing gunk and instead send in the
> > mountgroup id on my own (i have my own up/downcall channel)? if i've
> > got it right, then everything's a cinch and i'll shut up
>
> Yep. Generally, the best way to uniquely identify and refer to a gfs
> filesystem is using the fsname string (specified during mkfs with -t and
> saved in the superblock). But, sometimes it's just a lot easier have a
> numerical identifier instead. I expect this is why you're using the id,
> and it's why we were using it for communicating about plocks.
>
> In cluster1 and cluster2 the cluster infrastructure dynamically selected a
> unique id when needed, and it never worked great. In cluster3 the id is
> just a crc of the fsname string.
>
> Now that I think about this a bit more, there may be a reason to keep the
> id in the string. There was some interest on linux-kernel about better
> using the statfs fsid field, and this id is what gfs should be putting
> there.
>
In that case gfs2 should be able to generate the id itself from the
fsname and it still doesn't need it passed in, even if it continues to
expose the id in sysfs.

Perhaps better still, it should be possible for David to generate the id
directly if he really needs it from the fsname.

Since we also have a UUID now, for recently created filesystems, it
might be worth exposing that via sysfs and/or uevents too.

> > say, one tangential question (i won't be offended if you skip it -
> > heh): is there a particular reason that you folks went with the uevent
> > mechanism for doing upcalls? i'm just curious, given the
> > seeming-complexity and possible overhead of using the whole layered
> > netlink apparatus vs. something like Trond Myklebust's rpc_pipefs
> > (don't let the "rpc" fool you; it's a barebones, dead-simple pipe).
> > -- and no, i'm not selling anything my boss was asking for a list
> > of differences between rpc_pipefs and uevents and the best i could
> > come up with is the former's bidirectional. Trond mentioned the
> > netlink overhead and i wondered if that was actually a significant
> > factor or just lost in the noise in most cases.
>
> The uevents looked pretty simple when I was initially designing how the
> kernel/user interactions would work, and they fit well with sysfs files
> which I was using too. I don't think the overhead of using uevents is too
> bad. Sysfs files and uevents definately don't work great if you need any
> kind of sophisticated bi-directional interface.
>
> Dave
>
I think uevents are a reasonable choice as they are easy enough to parse
that it could be done by scripts, etc and easy to extend as well. We do
intend to use netlink in the future (bz #337691) for quota messages, but
in that case we would be using an existing standard for sending those
messages.

Netlink can be extended fairly easily, but you do have to be careful
when designing the message format. I've not come across rpc_pipefs
before, so I can't comment on that yet. I don't think we need to worry
about overhead on sending the messages (if you have so much recovery
message traffic that its a problem, you probably have bigger things to
worry about!), and I don't see that netlink should have any more
overhead than any other method of sending messages.

Steve.
 
Old 12-05-2008, 01:52 PM
David Teigland
 
Default gfs uevent and sysfs changes

On Fri, Dec 05, 2008 at 09:51:45AM +0000, Steven Whitehouse wrote:
> In that case gfs2 should be able to generate the id itself from the
> fsname and it still doesn't need it passed in, even if it continues to
> expose the id in sysfs.
>
> Perhaps better still, it should be possible for David to generate the id
> directly if he really needs it from the fsname.

It's not actually a crc of the fsname, but a crc of the cpg name
gfs_controld creates for the mountgroup, which is "gfs:mount:<fsname>".
Also, we may at some point want to allow that generated id to be overriden
by one that's set explicitly.

> worry about!), and I don't see that netlink should have any more
> overhead than any other method of sending messages.

netlink is painful compared to uevents, look at dlm_controld/netlink.c
which uses the "generic netlink" interface to transfer a data structure
from the kernel to userspace. A library would help, but there didn't seem
to be a de facto netlink lib when I needed it, maybe that's changed.
 
Old 12-05-2008, 02:03 PM
David Teigland
 
Default gfs uevent and sysfs changes

On Fri, Dec 05, 2008 at 08:52:58AM -0600, David Teigland wrote:
> On Fri, Dec 05, 2008 at 09:51:45AM +0000, Steven Whitehouse wrote:
> > In that case gfs2 should be able to generate the id itself from the
> > fsname and it still doesn't need it passed in, even if it continues to
> > expose the id in sysfs.
> >
> > Perhaps better still, it should be possible for David to generate the id
> > directly if he really needs it from the fsname.
>
> It's not actually a crc of the fsname, but a crc of the cpg name
> gfs_controld creates for the mountgroup, which is "gfs:mount:<fsname>".
> Also, we may at some point want to allow that generated id to be overriden
> by one that's set explicitly.

The fact that this id comes from gfs_controld, and becomes available only
during mount, makes me think it's not well suited to be the statfs fsid.
GFS should probably do it's own thing for statfs (like a hash of just the
fsname) instead of depending on gfs_controld for it. With nolock the
daemons won't be there, and we'd still want the same fsid to be produced.
 
Old 12-05-2008, 04:31 PM
"david m. richter"
 
Default gfs uevent and sysfs changes

On Thu, Dec 4, 2008 at 5:38 PM, David Teigland <teigland@redhat.com> wrote:
> On Thu, Dec 04, 2008 at 04:59:23PM -0500, david m. richter wrote:
>> ah, so just to make sure i'm with you here: (1) gfs_controld is
>> generating this "id"-which-is-the-mountgroup-id, and (2) gfs_kernel
>> will no longer receive this in the hostdata string, so (3) i can just
>> rip out my in-kernel hostdata-parsing gunk and instead send in the
>> mountgroup id on my own (i have my own up/downcall channel)? if i've
>> got it right, then everything's a cinch and i'll shut up
>
> Yep. Generally, the best way to uniquely identify and refer to a gfs
> filesystem is using the fsname string (specified during mkfs with -t and
> saved in the superblock). But, sometimes it's just a lot easier have a
> numerical identifier instead. I expect this is why you're using the id,
> and it's why we were using it for communicating about plocks.

yes, the numerical id gets used a lot in my pNFS stuff, where the
kernel needs to make upcalls, of which some then get relayed over
multicast -- so, I've just been stashing that in the superblock.
thanks for clearing up my questions.


> In cluster1 and cluster2 the cluster infrastructure dynamically selected a
> unique id when needed, and it never worked great. In cluster3 the id is
> just a crc of the fsname string.
>
> Now that I think about this a bit more, there may be a reason to keep the
> id in the string. There was some interest on linux-kernel about better
> using the statfs fsid field, and this id is what gfs should be putting
> there.

interesting; that'd be cool. i've been meaning to look at statfs more
often in my stuff anyway.


>> say, one tangential question (i won't be offended if you skip it -
>> heh): is there a particular reason that you folks went with the uevent
>> mechanism for doing upcalls? i'm just curious, given the
>> seeming-complexity and possible overhead of using the whole layered
>> netlink apparatus vs. something like Trond Myklebust's rpc_pipefs
>> (don't let the "rpc" fool you; it's a barebones, dead-simple pipe).
>> -- and no, i'm not selling anything my boss was asking for a list
>> of differences between rpc_pipefs and uevents and the best i could
>> come up with is the former's bidirectional. Trond mentioned the
>> netlink overhead and i wondered if that was actually a significant
>> factor or just lost in the noise in most cases.
>
> The uevents looked pretty simple when I was initially designing how the
> kernel/user interactions would work, and they fit well with sysfs files
> which I was using too. I don't think the overhead of using uevents is too
> bad. Sysfs files and uevents definately don't work great if you need any
> kind of sophisticated bi-directional interface.

great, thanks -- always good to get folks' anecdotal advice and keep
it in my toolbag for later.

cheers,

d
.

>
> Dave
>
>
 

Thread Tools




All times are GMT. The time now is 07:06 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org