FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Cluster Development

 
 
LinkBack Thread Tools
 
Old 01-15-2008, 09:48 PM
Wendy Cheng
 
Default NLM failover unlock commands

Revised version of the patch:

* based on comment from Neil Brown, use sscanf() to parse IP address (a
cool idea imo).
* the "ret" inside nlm_traverse_files() is now the file count that can't
be unlocked.

* other minor changes from latest round of review.

Thanks to all for the review !

-- Wendy
 
Old 01-16-2008, 03:19 AM
Neil Brown
 
Default NLM failover unlock commands

On Tuesday January 15, wcheng@redhat.com wrote:
> Revised version of the patch:
>
> * based on comment from Neil Brown, use sscanf() to parse IP address (a
> cool idea imo).
> * the "ret" inside nlm_traverse_files() is now the file count that can't
> be unlocked.
> * other minor changes from latest round of review.
>
> Thanks to all for the review !
>
> -- Wendy
>
> Two new NFSD procfs files are added:
> /proc/fs/nfsd/unlock_ip
> /proc/fs/nfsd/unlock_filesystem
>
> They are intended to allow admin or user mode script to release NLM locks
> based on either a path name or a server in-bound ip address (ipv4 for now)
> as;
>
> shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip
> shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem
>
> Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
> Signed-off-by: Lon Hohberger <lhh@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-By: NeilBrown <neilb@suse.de>

Thanks.

NeilBrown
 
Old 01-17-2008, 02:10 PM
"J. Bruce Fields"
 
Default NLM failover unlock commands

On Tue, Jan 15, 2008 at 05:48:00PM -0500, Wendy Cheng wrote:
> Revised version of the patch:
>
> * based on comment from Neil Brown, use sscanf() to parse IP address (a
> cool idea imo).
> * the "ret" inside nlm_traverse_files() is now the file count that can't
> be unlocked.
> * other minor changes from latest round of review.

Thanks!

Remind me: why do we need both per-ip and per-filesystem methods? In
practice, I assume that we'll always do *both*?

We're migrating clients by moving a server ip address from one node to
another. And I assume we're permitting at most one node to export each
filesystem at a time. So it *should* be the case that the set of locks
held on the filesystem(s) that are moving are the same as the set of
locks held by the virtual ip that is moving.

But presumably in some scenarios clients can get confused, and we need
to ensure that stale locks are not left behind?

We've discussed this before, but we should get the answer into comments
in the code (or on the patches).

--b.

>
> Thanks to all for the review !
>
> -- Wendy
>

> Two new NFSD procfs files are added:
> /proc/fs/nfsd/unlock_ip
> /proc/fs/nfsd/unlock_filesystem
>
> They are intended to allow admin or user mode script to release NLM locks
> based on either a path name or a server in-bound ip address (ipv4 for now)
> as;
>
> shell> echo 10.1.1.2 > /proc/fs/nfsd/unlock_ip
> shell> echo /mnt/sfs1 > /proc/fs/nfsd/unlock_filesystem
>
> Signed-off-by: S. Wendy Cheng <wcheng@redhat.com>
> Signed-off-by: Lon Hohberger <lhh@redhat.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> fs/lockd/svcsubs.c | 66 +++++++++++++++++++++++++++++++++++++++-----
> fs/nfsd/nfsctl.c | 65 +++++++++++++++++++++++++++++++++++++++++++
> include/linux/lockd/lockd.h | 7 ++++
> 3 files changed, 131 insertions(+), 7 deletions(-)
>
> --- linux-o/fs/nfsd/nfsctl.c 2008-01-04 10:01:08.000000000 -0500
> +++ linux/fs/nfsd/nfsctl.c 2008-01-15 11:30:19.000000000 -0500
> @@ -22,6 +22,7 @@
> #include <linux/seq_file.h>
> #include <linux/pagemap.h>
> #include <linux/init.h>
> +#include <linux/inet.h>
> #include <linux/string.h>
> #include <linux/smp_lock.h>
> #include <linux/ctype.h>
> @@ -35,6 +36,7 @@
> #include <linux/nfsd/cache.h>
> #include <linux/nfsd/xdr.h>
> #include <linux/nfsd/syscall.h>
> +#include <linux/lockd/lockd.h>
>
> #include <asm/uaccess.h>
>
> @@ -52,6 +54,8 @@ enum {
> NFSD_Getfs,
> NFSD_List,
> NFSD_Fh,
> + NFSD_FO_UnlockIP,
> + NFSD_FO_UnlockFS,
> NFSD_Threads,
> NFSD_Pool_Threads,
> NFSD_Versions,
> @@ -88,6 +92,9 @@ static ssize_t write_leasetime(struct fi
> static ssize_t write_recoverydir(struct file *file, char *buf, size_t size);
> #endif
>
> +static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size);
> +static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size);
> +
> static ssize_t (*write_op[])(struct file *, char *, size_t) = {
> [NFSD_Svc] = write_svc,
> [NFSD_Add] = write_add,
> @@ -97,6 +104,8 @@ static ssize_t (*write_op[])(struct file
> [NFSD_Getfd] = write_getfd,
> [NFSD_Getfs] = write_getfs,
> [NFSD_Fh] = write_filehandle,
> + [NFSD_FO_UnlockIP] = failover_unlock_ip,
> + [NFSD_FO_UnlockFS] = failover_unlock_fs,
> [NFSD_Threads] = write_threads,
> [NFSD_Pool_Threads] = write_pool_threads,
> [NFSD_Versions] = write_versions,
> @@ -288,6 +297,58 @@ static ssize_t write_getfd(struct file *
> return err;
> }
>
> +static ssize_t failover_unlock_ip(struct file *file, char *buf, size_t size)
> +{
> + __be32 server_ip;
> + char *fo_path, c;
> + int b1, b2, b3, b4;
> +
> + /* sanity check */
> + if (size == 0)
> + return -EINVAL;
> +
> + if (buf[size-1] != '
')
> + return -EINVAL;
> +
> + fo_path = buf;
> + if (qword_get(&buf, fo_path, size) < 0)
> + return -EINVAL;
> +
> + /* get ipv4 address */
> + if (sscanf(fo_path, "%u.%u.%u.%u%c", &b1, &b2, &b3, &b4, &c) != 4)
> + return -EINVAL;
> + server_ip = htonl((((((b1<<8)|b2)<<8)|b3)<<8)|b4);
> +
> + return nlmsvc_failover_ip(server_ip);
> +}
> +
> +static ssize_t failover_unlock_fs(struct file *file, char *buf, size_t size)
> +{
> + struct nameidata nd;
> + char *fo_path;
> + int error;
> +
> + /* sanity check */
> + if (size == 0)
> + return -EINVAL;
> +
> + if (buf[size-1] != '
')
> + return -EINVAL;
> +
> + fo_path = buf;
> + if (qword_get(&buf, fo_path, size) < 0)
> + return -EINVAL;
> +
> + error = path_lookup(fo_path, 0, &nd);
> + if (error)
> + return error;
> +
> + error = nlmsvc_failover_path(&nd);
> +
> + path_release(&nd);
> + return error;
> +}
> +
> static ssize_t write_filehandle(struct file *file, char *buf, size_t size)
> {
> /* request is:
> @@ -646,6 +707,10 @@ static int nfsd_fill_super(struct super_
> [NFSD_Getfd] = {".getfd", &transaction_ops, S_IWUSR|S_IRUSR},
> [NFSD_Getfs] = {".getfs", &transaction_ops, S_IWUSR|S_IRUSR},
> [NFSD_List] = {"exports", &exports_operations, S_IRUGO},
> + [NFSD_FO_UnlockIP] = {"unlock_ip",
> + &transaction_ops, S_IWUSR|S_IRUSR},
> + [NFSD_FO_UnlockFS] = {"unlock_filesystem",
> + &transaction_ops, S_IWUSR|S_IRUSR},
> [NFSD_Fh] = {"filehandle", &transaction_ops, S_IWUSR|S_IRUSR},
> [NFSD_Threads] = {"threads", &transaction_ops, S_IWUSR|S_IRUSR},
> [NFSD_Pool_Threads] = {"pool_threads", &transaction_ops, S_IWUSR|S_IRUSR},
> --- linux-o/fs/lockd/svcsubs.c 2008-01-04 10:01:08.000000000 -0500
> +++ linux/fs/lockd/svcsubs.c 2008-01-15 11:16:48.000000000 -0500
> @@ -18,6 +18,8 @@
> #include <linux/lockd/lockd.h>
> #include <linux/lockd/share.h>
> #include <linux/lockd/sm_inter.h>
> +#include <linux/module.h>
> +#include <linux/mount.h>
>
> #define NLMDBG_FACILITY NLMDBG_SVCSUBS
>
> @@ -87,7 +89,7 @@ nlm_lookup_file(struct svc_rqst *rqstp,
> unsigned int hash;
> __be32 nfserr;
>
> - nlm_debug_print_fh("nlm_file_lookup", f);
> + nlm_debug_print_fh("nlm_lookup_file", f);
>
> hash = file_hash(f);
>
> @@ -123,6 +125,9 @@ nlm_lookup_file(struct svc_rqst *rqstp,
>
> hlist_add_head(&file->f_list, &nlm_files[hash]);
>
> + /* fill in f_iaddr for nlm lock failover */
> + file->f_iaddr = rqstp->rq_daddr;
> +
> found:
> dprintk("lockd: found file %p (count %d)
", file, file->f_count);
> *result = file;
> @@ -194,6 +199,12 @@ again:
> return 0;
> }
>
> +static int
> +nlmsvc_always_match(struct nlm_host *dummy1, struct nlm_host *dummy2)
> +{
> + return 1;
> +}
> +
> /*
> * Inspect a single file
> */
> @@ -230,7 +241,8 @@ nlm_file_inuse(struct nlm_file *file)
> * Loop over all files in the file table.
> */
> static int
> -nlm_traverse_files(struct nlm_host *host, nlm_host_match_fn_t match)
> +nlm_traverse_files(void *data, nlm_host_match_fn_t match,
> + int (*is_failover_file)(void *data, struct nlm_file *file))
> {
> struct hlist_node *pos, *next;
> struct nlm_file *file;
> @@ -244,8 +256,17 @@ nlm_traverse_files(struct nlm_host *host
>
> /* Traverse locks, blocks and shares of this file
> * and update file->f_locks count */
> - if (nlm_inspect_file(host, file, match))
> - ret = 1;
> +
> + if (likely(is_failover_file == NULL) ||
> + is_failover_file(data, file)) {
> + /*
> + * Note that nlm_inspect_file updates f_locks
> + * and ret is the number of files that can't
> + * be unlocked.
> + */
> + ret += nlm_inspect_file(data, file, match);
> + } else
> + file->f_locks = nlm_file_inuse(file);
>
> mutex_lock(&nlm_file_mutex);
> file->f_count--;
> @@ -337,7 +358,7 @@ void
> nlmsvc_mark_resources(void)
> {
> dprintk("lockd: nlmsvc_mark_resources
");
> - nlm_traverse_files(NULL, nlmsvc_mark_host);
> + nlm_traverse_files(NULL, nlmsvc_mark_host, NULL);
> }
>
> /*
> @@ -348,7 +369,7 @@ nlmsvc_free_host_resources(struct nlm_ho
> {
> dprintk("lockd: nlmsvc_free_host_resources
");
>
> - if (nlm_traverse_files(host, nlmsvc_same_host)) {
> + if (nlm_traverse_files(host, nlmsvc_same_host, NULL)) {
> printk(KERN_WARNING
> "lockd: couldn't remove all locks held by %s
",
> host->h_name);
> @@ -368,5 +389,36 @@ nlmsvc_invalidate_all(void)
> * turn, which is about as inefficient as it gets.
> * Now we just do it once in nlm_traverse_files.
> */
> - nlm_traverse_files(NULL, nlmsvc_is_client);
> + nlm_traverse_files(NULL, nlmsvc_is_client, NULL);
> +}
> +
> +static int
> +nlmsvc_failover_file_ok_path(void *datap, struct nlm_file *file)
> +{
> + struct nameidata *nd = datap;
> + return nd->mnt == file->f_file->f_path.mnt;
> +}
> +
> +int
> +nlmsvc_failover_path(struct nameidata *nd)
> +{
> + return nlm_traverse_files(nd, nlmsvc_always_match,
> + nlmsvc_failover_file_ok_path);
> +}
> +EXPORT_SYMBOL_GPL(nlmsvc_failover_path);
> +
> +static int
> +nlmsvc_failover_file_ok_ip(void *datap, struct nlm_file *file)
> +{
> + __be32 *server_addr = datap;
> +
> + return file->f_iaddr.addr.s_addr == *server_addr;
> +}
> +
> +int
> +nlmsvc_failover_ip(__be32 server_addr)
> +{
> + return nlm_traverse_files(&server_addr, nlmsvc_always_match,
> + nlmsvc_failover_file_ok_ip);
> }
> +EXPORT_SYMBOL_GPL(nlmsvc_failover_ip);
> --- linux-o/include/linux/lockd/lockd.h 2008-01-04 10:01:08.000000000 -0500
> +++ linux/include/linux/lockd/lockd.h 2008-01-15 11:13:04.000000000 -0500
> @@ -113,6 +113,7 @@ struct nlm_file {
> unsigned int f_locks; /* guesstimate # of locks */
> unsigned int f_count; /* reference count */
> struct mutex f_mutex; /* avoid concurrent access */
> + union svc_addr_u f_iaddr; /* server ip for failover */
> };
>
> /*
> @@ -214,6 +215,12 @@ void nlmsvc_mark_resources(void);
> void nlmsvc_free_host_resources(struct nlm_host *);
> void nlmsvc_invalidate_all(void);
>
> +/*
> + * Cluster failover support
> + */
> +int nlmsvc_failover_path(struct nameidata *nd);
> +int nlmsvc_failover_ip(__be32 server_addr);
> +
> static __inline__ struct inode *
> nlmsvc_file_inode(struct nlm_file *file)
> {
 
Old 01-17-2008, 02:48 PM
Wendy Cheng
 
Default NLM failover unlock commands

J. Bruce Fields wrote:

Remind me: why do we need both per-ip and per-filesystem methods? In
practice, I assume that we'll always do *both*?



Failover normally is done via virtual IP address - so per-ip base method
should be the core routine. However, for non-cluster filesystem such as
ext3/4, changing server also implies umount. If there are clients not
following rule and obtaining locks via different ip interfaces, umount
would fail that ends up aborting the failover process. That's the place
we need the per-filesystem method.


ServerA:
1. Tear down the IP address
2. Unexport the path
3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
4. If unmount required,
write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
5. Signal peer to begin take-over.

Sometime ago we were looking at "export name" as the core method (so
per-filesystem method is a subset of that). Unfortunately, the prototype
efforts showed the code would be too intrusive (if filesystem sub-tree
is exported).

We're migrating clients by moving a server ip address from one node to
another. And I assume we're permitting at most one node to export each
filesystem at a time. So it *should* be the case that the set of locks
held on the filesystem(s) that are moving are the same as the set of
locks held by the virtual ip that is moving.



This is true for non-cluster filesystem. But a cluster filesystem can be
exported from multiple servers.

But presumably in some scenarios clients can get confused, and we need
to ensure that stale locks are not left behind?



Yes.


We've discussed this before, but we should get the answer into comments
in the code (or on the patches).


ok, working on it. or should we add something into linux/Documentation
to describe the overall logic ?


-- Wendy
 
Old 01-17-2008, 03:08 PM
Wendy Cheng
 
Default NLM failover unlock commands

Add a more detailed description into the top of the patch itself. I'm
working on the resume patch now - it will include an overall write-up in
the Documentation directory.


-- Wendy
 
Old 01-17-2008, 03:10 PM
Wendy Cheng
 
Default NLM failover unlock commands

Wendy Cheng wrote:
Add a more detailed description into the top of the patch itself. I'm
working on the resume patch now - it will include an overall write-up
in the Documentation directory.


Sorry, forgot to attach the patch. Here it is ... Wendy
 
Old 01-17-2008, 03:14 PM
"J. Bruce Fields"
 
Default NLM failover unlock commands

On Thu, Jan 17, 2008 at 10:48:56AM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> Remind me: why do we need both per-ip and per-filesystem methods? In
>> practice, I assume that we'll always do *both*?
>>
>
> Failover normally is done via virtual IP address - so per-ip base method
> should be the core routine. However, for non-cluster filesystem such as
> ext3/4, changing server also implies umount. If there are clients not
> following rule and obtaining locks via different ip interfaces, umount
> would fail that ends up aborting the failover process. That's the place
> we need the per-filesystem method.
>
> ServerA:
> 1. Tear down the IP address
> 2. Unexport the path
> 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
> 4. If unmount required,
> write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
> 5. Signal peer to begin take-over.
>
> Sometime ago we were looking at "export name" as the core method (so
> per-filesystem method is a subset of that). Unfortunately, the prototype
> efforts showed the code would be too intrusive (if filesystem sub-tree
> is exported).
>> We're migrating clients by moving a server ip address from one node to
>> another. And I assume we're permitting at most one node to export each
>> filesystem at a time. So it *should* be the case that the set of locks
>> held on the filesystem(s) that are moving are the same as the set of
>> locks held by the virtual ip that is moving.
>>
>
> This is true for non-cluster filesystem. But a cluster filesystem can be
> exported from multiple servers.
>> But presumably in some scenarios clients can get confused, and we need
>> to ensure that stale locks are not left behind?
>>
>
> Yes.
>
>> We've discussed this before, but we should get the answer into comments
>> in the code (or on the patches).
>>
>>
> ok, working on it. or should we add something into linux/Documentation
> to describe the overall logic ?

Yeah, sounds good. Maybe under Documentation/filesystems? And it might
also be helpful to leave a reference to it in the code, e.g., in
nfsctl.c:

/*
* The following are used for failover; see
* Documentation/filesystems/nfsd-failover.txt for details.
*/

--b.
 
Old 01-17-2008, 03:17 PM
Wendy Cheng
 
Default NLM failover unlock commands

J. Bruce Fields wrote:

Yeah, sounds good. Maybe under Documentation/filesystems? And it might
also be helpful to leave a reference to it in the code, e.g., in
nfsctl.c:

/*
* The following are used for failover; see
* Documentation/filesystems/nfsd-failover.txt for details.
*/

--b.



ok, looks good ... but let's "fix" this on the resume patch ? or you
want it on the unlock patch *now* ?


-- Wendy
 
Old 01-17-2008, 03:21 PM
"J. Bruce Fields"
 
Default NLM failover unlock commands

On Thu, Jan 17, 2008 at 11:17:08AM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> Yeah, sounds good. Maybe under Documentation/filesystems? And it might
>> also be helpful to leave a reference to it in the code, e.g., in
>> nfsctl.c:
>>
>> /*
>> * The following are used for failover; see
>> * Documentation/filesystems/nfsd-failover.txt for details.
>> */
>>
>> --b.
>>
>
> ok, looks good ... but let's "fix" this on the resume patch ? or you
> want it on the unlock patch *now* ?

It's not urgent.

--b.
 
Old 01-17-2008, 03:31 PM
"J. Bruce Fields"
 
Default NLM failover unlock commands

On Thu, Jan 17, 2008 at 10:48:56AM -0500, Wendy Cheng wrote:
> J. Bruce Fields wrote:
>> Remind me: why do we need both per-ip and per-filesystem methods? In
>> practice, I assume that we'll always do *both*?
>>
>
> Failover normally is done via virtual IP address - so per-ip base method
> should be the core routine. However, for non-cluster filesystem such as
> ext3/4, changing server also implies umount. If there are clients not
> following rule and obtaining locks via different ip interfaces, umount
> would fail that ends up aborting the failover process. That's the place
> we need the per-filesystem method.
>
> ServerA:
> 1. Tear down the IP address
> 2. Unexport the path
> 3. Write IP to /proc/fs/nfsd/unlock_ip to unlock files
> 4. If unmount required,
> write path name to /proc/fs/nfsd/unlock_filesystem, then unmount.
> 5. Signal peer to begin take-over.
>
> Sometime ago we were looking at "export name" as the core method (so
> per-filesystem method is a subset of that). Unfortunately, the prototype
> efforts showed the code would be too intrusive (if filesystem sub-tree
> is exported).
>> We're migrating clients by moving a server ip address from one node to
>> another. And I assume we're permitting at most one node to export each
>> filesystem at a time. So it *should* be the case that the set of locks
>> held on the filesystem(s) that are moving are the same as the set of
>> locks held by the virtual ip that is moving.
>>
>
> This is true for non-cluster filesystem. But a cluster filesystem can be
> exported from multiple servers.

But that last sentence:

it *should* be the case that the set of locks held on the
filesystem(s) that are moving are the same as the set of locks
held by the virtual ip that is moving.

is still true in the cluster filesystem case, right?

--b.
 

Thread Tools




All times are GMT. The time now is 10:16 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org