FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 08-20-2012, 04:39 PM
Jonathan Nieder
 
Default Bug#685407: ext4 dir_index + nfs duplicate cookies problem with large dovecot maildirs

Hi,

Brian Paul Kroth wrote:

> - multiple ext4 filesystems with dir_index enabled for maildir account storage
> - serves these over nfs (v3 or v4, doesn't matter) to server clients
> running dovecot to serve imap to real clients
> - "ls -l | wc -l" or readdir() in directories with many entries
> (I've seen it with as few as ~7000) results in the "duplicate
> cookie" or "too many levels of symbolic links" error being
> returned and the directory listing will be cut short.

Some patches intended to address this on the server side were
merged during the 3.4 merge window[*]. Could you try 3.5.2 or
newer from experimental and let us know how it goes?

Curious,
Jonathan
[*] http://thread.gmane.org/gmane.linux.file-systems/60157


--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120820163913.GB168@mannheim-rule.local
 
Old 08-20-2012, 07:20 PM
Brian Kroth
 
Default Bug#685407: ext4 dir_index + nfs duplicate cookies problem with large dovecot maildirs

Jonathan Nieder <jrnieder@gmail.com> 2012-08-20 09:39:

Hi,

Brian Paul Kroth wrote:


- multiple ext4 filesystems with dir_index enabled for maildir account storage
- serves these over nfs (v3 or v4, doesn't matter) to server clients
running dovecot to serve imap to real clients
- "ls -l | wc -l" or readdir() in directories with many entries
(I've seen it with as few as ~7000) results in the "duplicate
cookie" or "too many levels of symbolic links" error being
returned and the directory listing will be cut short.


Some patches intended to address this on the server side were
merged during the 3.4 merge window[*]. Could you try 3.5.2 or
newer from experimental and let us know how it goes?

Curious,
Jonathan
[*] http://thread.gmane.org/gmane.linux.file-systems/60157


linux-image-3.5-trunk-amd64 (3.5-1~experimental.1) on the server does
indeed appear to fix this. NFSv3 and NFSv4 tested from the following
client kernels:

- linux-image-3.2.0-0.bpo.2-amd64 (3.2.20-1~bpo60+1)
- linux-image-2.6.39-bpo.2-amd64 (2.6.39-3~bpo60+1)
- linux-image-2.6.32-5-amd64 (2.6.32-45)

Attached is a simple script I used to create a test dir for this. The
ext4 volume was 3G.


Any chance this will be backported to squeeze as well?

Thanks,
Brian
 
Old 08-21-2012, 12:33 AM
Jonathan Nieder
 
Default Bug#685407: ext4 dir_index + nfs duplicate cookies problem with large dovecot maildirs

reassign 685407 src:linux 3.2.20-1
fixed 685407 linux/3.5-1~experimental.1
tags 685407 + upstream patch moreinfo
quit

Hi again,

Brian Kroth wrote:

> linux-image-3.5-trunk-amd64 (3.5-1~experimental.1) on the server
> does indeed appear to fix this.
[...]
> Attached is a simple script I used to create a test dir for this.
> The ext4 volume was 3G.

Very nice to hear.

Please test the attached patches together against a 3.2.y kernel,
for example using the following instructions.

0. prerequisites:

apt-get install git build-essential

1. get the kernel history, if you don't already have it:

git clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

2. fetch point releases:

cd linux
git remote add stable
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
git fetch stable

3. configure, build, test:

git checkout stable/linux-3.2.y
cp /boot/config-$(uname -r) .config; # current configuration
scripts/config --disable DEBUG_INFO
make localmodconfig; # optional: minimize configuration
make deb-pkg; # optionally with -j<num> for parallel build
dpkg -i ../<name of package>; # as root
reboot
... test test test ...

Hopefully it reproduces the bug. So:

4. try the patches:

cd linux
git am -3sc $(ls -1 /path/to/patches/0*)
make deb-pkg; # maybe with -j4
dpkg -i ../<name of package>; # as root
reboot
... test test test ...

If it works, someone on the kernel team will probably apply the
patches to wheezy (meaning they would automatically flow into
squeeze-backports as well).

(An alternative set of instructions is at [1].)

Hope that helps,
Jonathan

[1] http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official
or the corresponding page in the debian-kernel-handbook package


--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: http://lists.debian.org/20120821003305.GA1177@mannheim-rule.local
 
Old 08-21-2012, 12:34 AM
Jonathan Nieder
 
Default Bug#685407: ext4 dir_index + nfs duplicate cookies problem with large dovecot maildirs

Jonathan Nieder wrote:
> Brian Kroth wrote:

>> linux-image-3.5-trunk-amd64 (3.5-1~experimental.1) on the server
>> does indeed appear to fix this.
[...]
> Very nice to hear.
>
> Please test the attached patches together against a 3.2.y kernel,
> for example using the following instructions.

Actually attached this time. Sorry for the confusion.
From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Date: Tue, 13 Mar 2012 22:51:38 -0400
Subject: fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash

commit 6a8a13e03861c0ab83ab07d573ca793cff0e5d00 upstream.

Those flags are supposed to be set by NFS readdir() to tell ext3/ext4
to 32bit (NFSv2) or 64bit hash values (offsets) in seekdir().

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
include/linux/fs.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 29b6353..fb7ce74 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -92,6 +92,10 @@ struct inodes_stat_t {
/* File is opened using open(.., 3, ..) and is writeable only for ioctls
(specialy hack for floppy.c) */
#define FMODE_WRITE_IOCTL ((__force fmode_t)0x100)
+/* 32bit hashes as llseek() offset (for directories) */
+#define FMODE_32BITHASH ((__force fmode_t)0x200)
+/* 64bit hashes as llseek() offset (for directories) */
+#define FMODE_64BITHASH ((__force fmode_t)0x400)

/*
* Don't update ctime and mtime.
--
1.7.10.4

From: Fan Yong <yong.fan@whamcloud.com>
Date: Sun, 18 Mar 2012 22:44:40 -0400
Subject: ext4: return 32/64-bit dir name hash according to usage type

commit d1f5273e9adb40724a85272f248f210dc4ce919a upstream.

Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
and telldir(). However, this causes problems if there are 32-bit hash
collisions, since the NFSv2 server can get stuck resending the same
entries from the directory repeatedly.

Allow ext4 to return a full 64-bit hash (both major and minor) for
telldir to decrease the chance of hash collisions. This still needs
integration on the NFS side.

Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
(blame me if something is not correct)

Signed-off-by: Fan Yong <yong.fan@whamcloud.com>
Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
fs/ext4/dir.c | 214 ++++++++++++++++++++++++++++++++++++++++++++------------
fs/ext4/ext4.h | 6 +-
fs/ext4/hash.c | 4 +-
3 files changed, 176 insertions(+), 48 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 164c560..689d1b1 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -32,24 +32,8 @@ static unsigned char ext4_filetype_table[] = {
DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
};

-static int ext4_readdir(struct file *, void *, filldir_t);
static int ext4_dx_readdir(struct file *filp,
void *dirent, filldir_t filldir);
-static int ext4_release_dir(struct inode *inode,
- struct file *filp);
-
-const struct file_operations ext4_dir_operations = {
- .llseek = ext4_llseek,
- .read = generic_read_dir,
- .readdir = ext4_readdir, /* we take BKL. needed?*/
- .unlocked_ioctl = ext4_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = ext4_compat_ioctl,
-#endif
- .fsync = ext4_sync_file,
- .release = ext4_release_dir,
-};
-

static unsigned char get_dtype(struct super_block *sb, int filetype)
{
@@ -60,6 +44,26 @@ static unsigned char get_dtype(struct super_block *sb, int filetype)
return (ext4_filetype_table[filetype]);
}

+/**
+ * Check if the given dir-inode refers to an htree-indexed directory
+ * (or a directory which chould potentially get coverted to use htree
+ * indexing).
+ *
+ * Return 1 if it is a dx dir, 0 if not
+ */
+static int is_dx_dir(struct inode *inode)
+{
+ struct super_block *sb = inode->i_sb;
+
+ if (EXT4_HAS_COMPAT_FEATURE(inode->i_sb,
+ EXT4_FEATURE_COMPAT_DIR_INDEX) &&
+ ((ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) ||
+ ((inode->i_size >> sb->s_blocksize_bits) == 1)))
+ return 1;
+
+ return 0;
+}
+
/*
* Return 0 if the directory entry is OK, and 1 if there is a problem
*
@@ -115,18 +119,13 @@ static int ext4_readdir(struct file *filp,
unsigned int offset;
int i, stored;
struct ext4_dir_entry_2 *de;
- struct super_block *sb;
int err;
struct inode *inode = filp->f_path.dentry->d_inode;
+ struct super_block *sb = inode->i_sb;
int ret = 0;
int dir_has_error = 0;

- sb = inode->i_sb;
-
- if (EXT4_HAS_COMPAT_FEATURE(inode->i_sb,
- EXT4_FEATURE_COMPAT_DIR_INDEX) &&
- ((ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) ||
- ((inode->i_size >> sb->s_blocksize_bits) == 1))) {
+ if (is_dx_dir(inode)) {
err = ext4_dx_readdir(filp, dirent, filldir);
if (err != ERR_BAD_DX_DIR) {
ret = err;
@@ -254,22 +253,134 @@ out:
return ret;
}

+static inline int is_32bit_api(void)
+{
+#ifdef CONFIG_COMPAT
+ return is_compat_task();
+#else
+ return (BITS_PER_LONG == 32);
+#endif
+}
+
/*
* These functions convert from the major/minor hash to an f_pos
- * value.
+ * value for dx directories
*
- * Currently we only use major hash numer. This is unfortunate, but
- * on 32-bit machines, the same VFS interface is used for lseek and
- * llseek, so if we use the 64 bit offset, then the 32-bit versions of
- * lseek/telldir/seekdir will blow out spectacularly, and from within
- * the ext2 low-level routine, we don't know if we're being called by
- * a 64-bit version of the system call or the 32-bit version of the
- * system call. Worse yet, NFSv2 only allows for a 32-bit readdir
- * cookie. Sigh.
+ * Upper layer (for example NFS) should specify FMODE_32BITHASH or
+ * FMODE_64BITHASH explicitly. On the other hand, we allow ext4 to be mounted
+ * directly on both 32-bit and 64-bit nodes, under such case, neither
+ * FMODE_32BITHASH nor FMODE_64BITHASH is specified.
*/
-#define hash2pos(major, minor) (major >> 1)
-#define pos2maj_hash(pos) ((pos << 1) & 0xffffffff)
-#define pos2min_hash(pos) (0)
+static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
+{
+ if ((filp->f_mode & FMODE_32BITHASH) ||
+ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+ return major >> 1;
+ else
+ return ((__u64)(major >> 1) << 32) | (__u64)minor;
+}
+
+static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
+{
+ if ((filp->f_mode & FMODE_32BITHASH) ||
+ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+ return (pos << 1) & 0xffffffff;
+ else
+ return ((pos >> 32) << 1) & 0xffffffff;
+}
+
+static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
+{
+ if ((filp->f_mode & FMODE_32BITHASH) ||
+ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+ return 0;
+ else
+ return pos & 0xffffffff;
+}
+
+/*
+ * Return 32- or 64-bit end-of-file for dx directories
+ */
+static inline loff_t ext4_get_htree_eof(struct file *filp)
+{
+ if ((filp->f_mode & FMODE_32BITHASH) ||
+ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+ return EXT4_HTREE_EOF_32BIT;
+ else
+ return EXT4_HTREE_EOF_64BIT;
+}
+
+
+/*
+ * ext4_dir_llseek() based on generic_file_llseek() to handle both
+ * non-htree and htree directories, where the "offset" is in terms
+ * of the filename hash value instead of the byte offset.
+ *
+ * NOTE: offsets obtained *before* ext4_set_inode_flag(dir, EXT4_INODE_INDEX)
+ * will be invalid once the directory was converted into a dx directory
+ */
+loff_t ext4_dir_llseek(struct file *file, loff_t offset, int origin)
+{
+ struct inode *inode = file->f_mapping->host;
+ loff_t ret = -EINVAL;
+ int dx_dir = is_dx_dir(inode);
+
+ mutex_lock(&inode->i_mutex);
+
+ /* NOTE: relative offsets with dx directories might not work
+ * as expected, as it is difficult to figure out the
+ * correct offset between dx hashes */
+
+ switch (origin) {
+ case SEEK_END:
+ if (unlikely(offset > 0))
+ goto out_err; /* not supported for directories */
+
+ /* so only negative offsets are left, does that have a
+ * meaning for directories at all? */
+ if (dx_dir)
+ offset += ext4_get_htree_eof(file);
+ else
+ offset += inode->i_size;
+ break;
+ case SEEK_CUR:
+ /*
+ * Here we special-case the lseek(fd, 0, SEEK_CUR)
+ * position-querying operation. Avoid rewriting the "same"
+ * f_pos value back to the file because a concurrent read(),
+ * write() or lseek() might have altered it
+ */
+ if (offset == 0) {
+ offset = file->f_pos;
+ goto out_ok;
+ }
+
+ offset += file->f_pos;
+ break;
+ }
+
+ if (unlikely(offset < 0))
+ goto out_err;
+
+ if (!dx_dir) {
+ if (offset > inode->i_sb->s_maxbytes)
+ goto out_err;
+ } else if (offset > ext4_get_htree_eof(file))
+ goto out_err;
+
+ /* Special lock needed here? */
+ if (offset != file->f_pos) {
+ file->f_pos = offset;
+ file->f_version = 0;
+ }
+
+out_ok:
+ ret = offset;
+out_err:
+ mutex_unlock(&inode->i_mutex);
+
+ return ret;
+}

/*
* This structure holds the nodes of the red-black tree used to store
@@ -330,15 +441,16 @@ static void free_rb_tree_fname(struct rb_root *root)
}


-static struct dir_private_info *ext4_htree_create_dir_info(loff_t pos)
+static struct dir_private_info *ext4_htree_create_dir_info(struct file *filp,
+ loff_t pos)
{
struct dir_private_info *p;

p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
if (!p)
return NULL;
- p->curr_hash = pos2maj_hash(pos);
- p->curr_minor_hash = pos2min_hash(pos);
+ p->curr_hash = pos2maj_hash(filp, pos);
+ p->curr_minor_hash = pos2min_hash(filp, pos);
return p;
}

@@ -429,7 +541,7 @@ static int call_filldir(struct file *filp, void *dirent,
"null fname?!?
");
return 0;
}
- curr_pos = hash2pos(fname->hash, fname->minor_hash);
+ curr_pos = hash2pos(filp, fname->hash, fname->minor_hash);
while (fname) {
error = filldir(dirent, fname->name,
fname->name_len, curr_pos,
@@ -454,13 +566,13 @@ static int ext4_dx_readdir(struct file *filp,
int ret;

if (!info) {
- info = ext4_htree_create_dir_info(filp->f_pos);
+ info = ext4_htree_create_dir_info(filp, filp->f_pos);
if (!info)
return -ENOMEM;
filp->private_data = info;
}

- if (filp->f_pos == EXT4_HTREE_EOF)
+ if (filp->f_pos == ext4_get_htree_eof(filp))
return 0; /* EOF */

/* Some one has messed with f_pos; reset the world */
@@ -468,8 +580,8 @@ static int ext4_dx_readdir(struct file *filp,
free_rb_tree_fname(&info->root);
info->curr_node = NULL;
info->extra_fname = NULL;
- info->curr_hash = pos2maj_hash(filp->f_pos);
- info->curr_minor_hash = pos2min_hash(filp->f_pos);
+ info->curr_hash = pos2maj_hash(filp, filp->f_pos);
+ info->curr_minor_hash = pos2min_hash(filp, filp->f_pos);
}

/*
@@ -501,7 +613,7 @@ static int ext4_dx_readdir(struct file *filp,
if (ret < 0)
return ret;
if (ret == 0) {
- filp->f_pos = EXT4_HTREE_EOF;
+ filp->f_pos = ext4_get_htree_eof(filp);
break;
}
info->curr_node = rb_first(&info->root);
@@ -521,7 +633,7 @@ static int ext4_dx_readdir(struct file *filp,
info->curr_minor_hash = fname->minor_hash;
} else {
if (info->next_hash == ~0) {
- filp->f_pos = EXT4_HTREE_EOF;
+ filp->f_pos = ext4_get_htree_eof(filp);
break;
}
info->curr_hash = info->next_hash;
@@ -540,3 +652,15 @@ static int ext4_release_dir(struct inode *inode, struct file *filp)

return 0;
}
+
+const struct file_operations ext4_dir_operations = {
+ .llseek = ext4_dir_llseek,
+ .read = generic_read_dir,
+ .readdir = ext4_readdir,
+ .unlocked_ioctl = ext4_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = ext4_compat_ioctl,
+#endif
+ .fsync = ext4_sync_file,
+ .release = ext4_release_dir,
+};
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 8cb184c..2ac1eef 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1597,7 +1597,11 @@ struct dx_hash_info
u32 *seed;
};

-#define EXT4_HTREE_EOF 0x7fffffff
+
+/* 32 and 64 bit signed EOF for dx directories */
+#define EXT4_HTREE_EOF_32BIT ((1UL << (32 - 1)) - 1)
+#define EXT4_HTREE_EOF_64BIT ((1ULL << (64 - 1)) - 1)
+

/*
* Control parameters used by ext4_htree_next_block
diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c
index ac8f168..fa8e491 100644
--- a/fs/ext4/hash.c
+++ b/fs/ext4/hash.c
@@ -200,8 +200,8 @@ int ext4fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
return -1;
}
hash = hash & ~1;
- if (hash == (EXT4_HTREE_EOF << 1))
- hash = (EXT4_HTREE_EOF-1) << 1;
+ if (hash == (EXT4_HTREE_EOF_32BIT << 1))
+ hash = (EXT4_HTREE_EOF_32BIT - 1) << 1;
hinfo->hash = hash;
hinfo->minor_hash = minor_hash;
return 0;
--
1.7.10.4

From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Date: Sun, 18 Mar 2012 22:44:49 -0400
Subject: nfsd: rename 'int access' to 'int may_flags' in nfsd_open()

commit 999448a8c0202d8c41711c92385323520644527b upstream.

Just rename this variable, as the next patch will add a flag and
'access' as variable name would not be correct any more.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
fs/nfsd/vfs.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 5c3cd82..b395c61 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -726,12 +726,13 @@ static int nfsd_open_break_lease(struct inode *inode, int access)

/*
* Open an existing file or directory.
- * The access argument indicates the type of open (read/write/lock)
+ * The may_flags argument indicates the type of open (read/write/lock)
+ * and additional flags.
* N.B. After this call fhp needs an fh_put
*/
__be32
nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
- int access, struct file **filp)
+ int may_flags, struct file **filp)
{
struct dentry *dentry;
struct inode *inode;
@@ -746,7 +747,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
* and (hopefully) checked permission - so allow OWNER_OVERRIDE
* in case a chmod has now revoked permission.
*/
- err = fh_verify(rqstp, fhp, type, access | NFSD_MAY_OWNER_OVERRIDE);
+ err = fh_verify(rqstp, fhp, type, may_flags | NFSD_MAY_OWNER_OVERRIDE);
if (err)
goto out;

@@ -757,7 +758,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
* or any access when mandatory locking enabled
*/
err = nfserr_perm;
- if (IS_APPEND(inode) && (access & NFSD_MAY_WRITE))
+ if (IS_APPEND(inode) && (may_flags & NFSD_MAY_WRITE))
goto out;
/*
* We must ignore files (but only files) which might have mandatory
@@ -770,12 +771,12 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
if (!inode->i_fop)
goto out;

- host_err = nfsd_open_break_lease(inode, access);
+ host_err = nfsd_open_break_lease(inode, may_flags);
if (host_err) /* NOMEM or WOULDBLOCK */
goto out_nfserr;

- if (access & NFSD_MAY_WRITE) {
- if (access & NFSD_MAY_READ)
+ if (may_flags & NFSD_MAY_WRITE) {
+ if (may_flags & NFSD_MAY_READ)
flags = O_RDWR|O_LARGEFILE;
else
flags = O_WRONLY|O_LARGEFILE;
@@ -785,7 +786,8 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
if (IS_ERR(*filp))
host_err = PTR_ERR(*filp);
else
- host_err = ima_file_check(*filp, access);
+ host_err = ima_file_check(*filp, may_flags);
+
out_nfserr:
err = nfserrno(host_err);
out:
--
1.7.10.4

From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Date: Sun, 18 Mar 2012 22:44:50 -0400
Subject: nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)

commit 06effdbb49af5f6c7d20affaec74603914acc768 upstream.

Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
the NFS version. NFSv2 gets 32-bit hashes only.

NOTE: This patch got rather complex as Christoph asked to set the
filp->f_mode flag in the open call or immediatly after dentry_open()
in nfsd_open() to avoid races.
Personally I still do not see a reason for that and in my opinion
FMODE_32BITHASH/FMODE_64BITHASH flags could be set nfsd_readdir(), as it
follows directly after nfsd_open() without a chance of races.

Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
fs/nfsd/vfs.c | 15 +++++++++++++--
fs/nfsd/vfs.h | 2 ++
2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index b395c61..959039e 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -785,9 +785,15 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
flags, current_cred());
if (IS_ERR(*filp))
host_err = PTR_ERR(*filp);
- else
+ else {
host_err = ima_file_check(*filp, may_flags);

+ if (may_flags & NFSD_MAY_64BIT_COOKIE)
+ (*filp)->f_mode |= FMODE_64BITHASH;
+ else
+ (*filp)->f_mode |= FMODE_32BITHASH;
+ }
+
out_nfserr:
err = nfserrno(host_err);
out:
@@ -2011,8 +2017,13 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
__be32 err;
struct file *file;
loff_t offset = *offsetp;
+ int may_flags = NFSD_MAY_READ;

- err = nfsd_open(rqstp, fhp, S_IFDIR, NFSD_MAY_READ, &file);
+ /* NFSv2 only supports 32 bit cookies */
+ if (rqstp->rq_vers > 2)
+ may_flags |= NFSD_MAY_64BIT_COOKIE;
+
+ err = nfsd_open(rqstp, fhp, S_IFDIR, may_flags, &file);
if (err)
goto out;

diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 3f54ad0..85d4d42 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -27,6 +27,8 @@
#define NFSD_MAY_BYPASS_GSS 0x400
#define NFSD_MAY_READ_IF_EXEC 0x800

+#define NFSD_MAY_64BIT_COOKIE 0x1000 /* 64 bit readdir cookies for >= NFSv3 */
+
#define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
#define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)

--
1.7.10.4

From: Eric Sandeen <sandeen@redhat.com>
Date: Thu, 26 Apr 2012 13:10:39 -0500
Subject: ext3: return 32/64-bit dir name hash according to usage type

commit d7dab39b6e16d5eea78ed3c705d2a2d0772b4f06 upstream.

This is based on commit d1f5273e9adb40724a85272f248f210dc4ce919a
ext4: return 32/64-bit dir name hash according to usage type
by Fan Yong <yong.fan@whamcloud.com>

Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
and telldir(). However, this causes problems if there are 32-bit hash
collisions, since the NFSv2 server can get stuck resending the same
entries from the directory repeatedly.

Allow ext3 to return a full 64-bit hash (both major and minor) for
telldir to decrease the chance of hash collisions.

This patch does implement a new ext3_dir_llseek op, because with 64-bit
hashes, nfs will attempt to seek to a hash "offset" which is much
larger than ext3's s_maxbytes. So for dx dirs, we call
generic_file_llseek_size() with the appropriate max hash value as the
maximum seekable size. Otherwise we just pass through to
generic_file_llseek().

Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Patch-updated-by: Eric Sandeen <sandeen@redhat.com>
(blame us if something is not correct)

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
fs/ext3/dir.c | 167 ++++++++++++++++++++++++++++++++++-------------
fs/ext3/hash.c | 4 +-
include/linux/ext3_fs.h | 6 +-
3 files changed, 129 insertions(+), 48 deletions(-)

diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
index 34f0a07..3268697 100644
--- a/fs/ext3/dir.c
+++ b/fs/ext3/dir.c
@@ -25,6 +25,7 @@
#include <linux/jbd.h>
#include <linux/ext3_fs.h>
#include <linux/buffer_head.h>
+#include <linux/compat.h>
#include <linux/slab.h>
#include <linux/rbtree.h>

@@ -32,24 +33,8 @@ static unsigned char ext3_filetype_table[] = {
DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
};

-static int ext3_readdir(struct file *, void *, filldir_t);
static int ext3_dx_readdir(struct file * filp,
void * dirent, filldir_t filldir);
-static int ext3_release_dir (struct inode * inode,
- struct file * filp);
-
-const struct file_operations ext3_dir_operations = {
- .llseek = generic_file_llseek,
- .read = generic_read_dir,
- .readdir = ext3_readdir, /* we take BKL. needed?*/
- .unlocked_ioctl = ext3_ioctl,
-#ifdef CONFIG_COMPAT
- .compat_ioctl = ext3_compat_ioctl,
-#endif
- .fsync = ext3_sync_file, /* BKL held */
- .release = ext3_release_dir,
-};
-

static unsigned char get_dtype(struct super_block *sb, int filetype)
{
@@ -60,6 +45,25 @@ static unsigned char get_dtype(struct super_block *sb, int filetype)
return (ext3_filetype_table[filetype]);
}

+/**
+ * Check if the given dir-inode refers to an htree-indexed directory
+ * (or a directory which chould potentially get coverted to use htree
+ * indexing).
+ *
+ * Return 1 if it is a dx dir, 0 if not
+ */
+static int is_dx_dir(struct inode *inode)
+{
+ struct super_block *sb = inode->i_sb;
+
+ if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
+ EXT3_FEATURE_COMPAT_DIR_INDEX) &&
+ ((EXT3_I(inode)->i_flags & EXT3_INDEX_FL) ||
+ ((inode->i_size >> sb->s_blocksize_bits) == 1)))
+ return 1;
+
+ return 0;
+}

int ext3_check_dir_entry (const char * function, struct inode * dir,
struct ext3_dir_entry_2 * de,
@@ -99,18 +103,13 @@ static int ext3_readdir(struct file * filp,
unsigned long offset;
int i, stored;
struct ext3_dir_entry_2 *de;
- struct super_block *sb;
int err;
struct inode *inode = filp->f_path.dentry->d_inode;
+ struct super_block *sb = inode->i_sb;
int ret = 0;
int dir_has_error = 0;

- sb = inode->i_sb;
-
- if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
- EXT3_FEATURE_COMPAT_DIR_INDEX) &&
- ((EXT3_I(inode)->i_flags & EXT3_INDEX_FL) ||
- ((inode->i_size >> sb->s_blocksize_bits) == 1))) {
+ if (is_dx_dir(inode)) {
err = ext3_dx_readdir(filp, dirent, filldir);
if (err != ERR_BAD_DX_DIR) {
ret = err;
@@ -232,22 +231,87 @@ out:
return ret;
}

+static inline int is_32bit_api(void)
+{
+#ifdef CONFIG_COMPAT
+ return is_compat_task();
+#else
+ return (BITS_PER_LONG == 32);
+#endif
+}
+
/*
* These functions convert from the major/minor hash to an f_pos
- * value.
+ * value for dx directories
*
- * Currently we only use major hash numer. This is unfortunate, but
- * on 32-bit machines, the same VFS interface is used for lseek and
- * llseek, so if we use the 64 bit offset, then the 32-bit versions of
- * lseek/telldir/seekdir will blow out spectacularly, and from within
- * the ext2 low-level routine, we don't know if we're being called by
- * a 64-bit version of the system call or the 32-bit version of the
- * system call. Worse yet, NFSv2 only allows for a 32-bit readdir
- * cookie. Sigh.
+ * Upper layer (for example NFS) should specify FMODE_32BITHASH or
+ * FMODE_64BITHASH explicitly. On the other hand, we allow ext3 to be mounted
+ * directly on both 32-bit and 64-bit nodes, under such case, neither
+ * FMODE_32BITHASH nor FMODE_64BITHASH is specified.
*/
-#define hash2pos(major, minor) (major >> 1)
-#define pos2maj_hash(pos) ((pos << 1) & 0xffffffff)
-#define pos2min_hash(pos) (0)
+static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
+{
+ if ((filp->f_mode & FMODE_32BITHASH) ||
+ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+ return major >> 1;
+ else
+ return ((__u64)(major >> 1) << 32) | (__u64)minor;
+}
+
+static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
+{
+ if ((filp->f_mode & FMODE_32BITHASH) ||
+ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+ return (pos << 1) & 0xffffffff;
+ else
+ return ((pos >> 32) << 1) & 0xffffffff;
+}
+
+static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
+{
+ if ((filp->f_mode & FMODE_32BITHASH) ||
+ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+ return 0;
+ else
+ return pos & 0xffffffff;
+}
+
+/*
+ * Return 32- or 64-bit end-of-file for dx directories
+ */
+static inline loff_t ext3_get_htree_eof(struct file *filp)
+{
+ if ((filp->f_mode & FMODE_32BITHASH) ||
+ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
+ return EXT3_HTREE_EOF_32BIT;
+ else
+ return EXT3_HTREE_EOF_64BIT;
+}
+
+
+/*
+ * ext3_dir_llseek() calls generic_file_llseek[_size]() to handle both
+ * non-htree and htree directories, where the "offset" is in terms
+ * of the filename hash value instead of the byte offset.
+ *
+ * Because we may return a 64-bit hash that is well beyond s_maxbytes,
+ * we need to pass the max hash as the maximum allowable offset in
+ * the htree directory case.
+ *
+ * NOTE: offsets obtained *before* ext3_set_inode_flag(dir, EXT3_INODE_INDEX)
+ * will be invalid once the directory was converted into a dx directory
+ */
+loff_t ext3_dir_llseek(struct file *file, loff_t offset, int origin)
+{
+ struct inode *inode = file->f_mapping->host;
+ int dx_dir = is_dx_dir(inode);
+
+ if (likely(dx_dir))
+ return generic_file_llseek_size(file, offset, origin,
+ ext3_get_htree_eof(file));
+ else
+ return generic_file_llseek(file, offset, origin);
+}

/*
* This structure holds the nodes of the red-black tree used to store
@@ -308,15 +372,16 @@ static void free_rb_tree_fname(struct rb_root *root)
}


-static struct dir_private_info *ext3_htree_create_dir_info(loff_t pos)
+static struct dir_private_info *ext3_htree_create_dir_info(struct file *filp,
+ loff_t pos)
{
struct dir_private_info *p;

p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
if (!p)
return NULL;
- p->curr_hash = pos2maj_hash(pos);
- p->curr_minor_hash = pos2min_hash(pos);
+ p->curr_hash = pos2maj_hash(filp, pos);
+ p->curr_minor_hash = pos2min_hash(filp, pos);
return p;
}

@@ -406,7 +471,7 @@ static int call_filldir(struct file * filp, void * dirent,
printk("call_filldir: called with null fname?!?
");
return 0;
}
- curr_pos = hash2pos(fname->hash, fname->minor_hash);
+ curr_pos = hash2pos(filp, fname->hash, fname->minor_hash);
while (fname) {
error = filldir(dirent, fname->name,
fname->name_len, curr_pos,
@@ -431,13 +496,13 @@ static int ext3_dx_readdir(struct file * filp,
int ret;

if (!info) {
- info = ext3_htree_create_dir_info(filp->f_pos);
+ info = ext3_htree_create_dir_info(filp, filp->f_pos);
if (!info)
return -ENOMEM;
filp->private_data = info;
}

- if (filp->f_pos == EXT3_HTREE_EOF)
+ if (filp->f_pos == ext3_get_htree_eof(filp))
return 0; /* EOF */

/* Some one has messed with f_pos; reset the world */
@@ -445,8 +510,8 @@ static int ext3_dx_readdir(struct file * filp,
free_rb_tree_fname(&info->root);
info->curr_node = NULL;
info->extra_fname = NULL;
- info->curr_hash = pos2maj_hash(filp->f_pos);
- info->curr_minor_hash = pos2min_hash(filp->f_pos);
+ info->curr_hash = pos2maj_hash(filp, filp->f_pos);
+ info->curr_minor_hash = pos2min_hash(filp, filp->f_pos);
}

/*
@@ -478,7 +543,7 @@ static int ext3_dx_readdir(struct file * filp,
if (ret < 0)
return ret;
if (ret == 0) {
- filp->f_pos = EXT3_HTREE_EOF;
+ filp->f_pos = ext3_get_htree_eof(filp);
break;
}
info->curr_node = rb_first(&info->root);
@@ -498,7 +563,7 @@ static int ext3_dx_readdir(struct file * filp,
info->curr_minor_hash = fname->minor_hash;
} else {
if (info->next_hash == ~0) {
- filp->f_pos = EXT3_HTREE_EOF;
+ filp->f_pos = ext3_get_htree_eof(filp);
break;
}
info->curr_hash = info->next_hash;
@@ -517,3 +582,15 @@ static int ext3_release_dir (struct inode * inode, struct file * filp)

return 0;
}
+
+const struct file_operations ext3_dir_operations = {
+ .llseek = ext3_dir_llseek,
+ .read = generic_read_dir,
+ .readdir = ext3_readdir,
+ .unlocked_ioctl = ext3_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = ext3_compat_ioctl,
+#endif
+ .fsync = ext3_sync_file,
+ .release = ext3_release_dir,
+};
diff --git a/fs/ext3/hash.c b/fs/ext3/hash.c
index 7d215b4..d4d3ade 100644
--- a/fs/ext3/hash.c
+++ b/fs/ext3/hash.c
@@ -200,8 +200,8 @@ int ext3fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
return -1;
}
hash = hash & ~1;
- if (hash == (EXT3_HTREE_EOF << 1))
- hash = (EXT3_HTREE_EOF-1) << 1;
+ if (hash == (EXT3_HTREE_EOF_32BIT << 1))
+ hash = (EXT3_HTREE_EOF_32BIT - 1) << 1;
hinfo->hash = hash;
hinfo->minor_hash = minor_hash;
return 0;
diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
index dec9911..d59ab12 100644
--- a/include/linux/ext3_fs.h
+++ b/include/linux/ext3_fs.h
@@ -781,7 +781,11 @@ struct dx_hash_info
u32 *seed;
};

-#define EXT3_HTREE_EOF 0x7fffffff
+
+/* 32 and 64 bit signed EOF for dx directories */
+#define EXT3_HTREE_EOF_32BIT ((1UL << (32 - 1)) - 1)
+#define EXT3_HTREE_EOF_64BIT ((1ULL << (64 - 1)) - 1)
+

/*
* Control parameters used by ext3_htree_next_block
--
1.7.10.4
 
Old 08-21-2012, 08:03 PM
Brian Kroth
 
Default Bug#685407: ext4 dir_index + nfs duplicate cookies problem with large dovecot maildirs

Jonathan Nieder <jrnieder@gmail.com> 2012-08-20 17:33:

reassign 685407 src:linux 3.2.20-1
fixed 685407 linux/3.5-1~experimental.1
tags 685407 + upstream patch moreinfo
quit

Hi again,

Brian Kroth wrote:


linux-image-3.5-trunk-amd64 (3.5-1~experimental.1) on the server
does indeed appear to fix this.

[...]

Attached is a simple script I used to create a test dir for this.
The ext4 volume was 3G.


Very nice to hear.

Please test the attached patches together against a 3.2.y kernel,
for example using the following instructions.

0. prerequisites:

apt-get install git build-essential

1. get the kernel history, if you don't already have it:

git clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

2. fetch point releases:

cd linux
git remote add stable
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
git fetch stable

3. configure, build, test:

git checkout stable/linux-3.2.y


3.2.28


cp /boot/config-$(uname -r) .config; # current configuration
scripts/config --disable DEBUG_INFO
make localmodconfig; # optional: minimize configuration
make deb-pkg; # optionally with -j<num> for parallel build
dpkg -i ../<name of package>; # as root
reboot
... test test test ...

Hopefully it reproduces the bug. So:


It does reproduce the bug.


4. try the patches:

cd linux
git am -3sc $(ls -1 /path/to/patches/0*)
make deb-pkg; # maybe with -j4
dpkg -i ../<name of package>; # as root
reboot
... test test test ...


It does *not* reproduce the bug.

Looks to have worked.


If it works, someone on the kernel team will probably apply the
patches to wheezy (meaning they would automatically flow into
squeeze-backports as well).


Good news.

Let me know if you need anything else.

Thanks for the help and good instructions,
Brian


(An alternative set of instructions is at [1].)

Hope that helps,
Jonathan

[1] http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official
or the corresponding page in the debian-kernel-handbook package
 
Old 09-03-2012, 07:05 AM
Jonathan Nieder
 
Default Bug#685407: ext4 dir_index + nfs duplicate cookies problem with large dovecot maildirs

Hi kernel team,

Brian Kroth wrote:

> 3.2.28
[...]
> It does reproduce the bug.
>
> Jonathan Nieder <jrnieder@gmail.com> 2012-08-20 17:33:

>> 4. try the patches:
>>
>> cd linux
>> git am -3sc $(ls -1 /path/to/patches/0*)
[...]
> It does *not* reproduce the bug.
>
> Looks to have worked.

Please consider the attached patch for the sid branch of the packaging
repo. It applies the five aforementioned patches from upstream:

6a8a13e03861 fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash
d1f5273e9adb ext4: return 32/64-bit dir name hash according to usage type
999448a8c020 nfsd: rename 'int access' to 'int may_flags' in nfsd_open
06effdbb49af nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
d7dab39b6e16 ext3: return 32/64-bit dir name hash according to usage
type

which make NFSv3/4 use 64-bit hashes as readdir cookies instead of
crippling itself for the sake of NFSv2 which only supports 32-bit
cookies. The most interesting of these (patches #2 and #5) are
unfortunately a bit too big for the letter of the upstream stable
rules, but the patches are straightforward, make sense, and are well
tested.

Thoughts welcome, as usual.

Sincerely,
Jonathan
Index: debian/patches/bugfix/all/nfsd-vfs_llseek-with-32-or-64-bit-offsets-hashes.patch
================================================== =================
--- debian/patches/bugfix/all/nfsd-vfs_llseek-with-32-or-64-bit-offsets-hashes.patch (revision 0)
+++ debian/patches/bugfix/all/nfsd-vfs_llseek-with-32-or-64-bit-offsets-hashes.patch (revision 0)
@@ -0,0 +1,77 @@
+From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
+Date: Sun, 18 Mar 2012 22:44:50 -0400
+Subject: nfsd: vfs_llseek() with 32 or 64 bit offsets (hashes)
+
+commit 06effdbb49af5f6c7d20affaec74603914acc768 upstream.
+
+Use 32-bit or 64-bit llseek() hashes for directory offsets depending on
+the NFS version. NFSv2 gets 32-bit hashes only.
+
+NOTE: This patch got rather complex as Christoph asked to set the
+filp->f_mode flag in the open call or immediatly after dentry_open()
+in nfsd_open() to avoid races.
+Personally I still do not see a reason for that and in my opinion
+FMODE_32BITHASH/FMODE_64BITHASH flags could be set nfsd_readdir(), as it
+follows directly after nfsd_open() without a chance of races.
+
+Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
+Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
+Acked-by: J. Bruce Fields <bfields@redhat.com>
+Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
+---
+ fs/nfsd/vfs.c | 15 +++++++++++++--
+ fs/nfsd/vfs.h | 2 ++
+ 2 files changed, 15 insertions(+), 2 deletions(-)
+
+diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
+index b395c61..959039e 100644
+--- a/fs/nfsd/vfs.c
++++ b/fs/nfsd/vfs.c
+@@ -785,9 +785,15 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
+ flags, current_cred());
+ if (IS_ERR(*filp))
+ host_err = PTR_ERR(*filp);
+- else
++ else {
+ host_err = ima_file_check(*filp, may_flags);
+
++ if (may_flags & NFSD_MAY_64BIT_COOKIE)
++ (*filp)->f_mode |= FMODE_64BITHASH;
++ else
++ (*filp)->f_mode |= FMODE_32BITHASH;
++ }
++
+ out_nfserr:
+ err = nfserrno(host_err);
+ out:
+@@ -2011,8 +2017,13 @@ nfsd_readdir(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t *offsetp,
+ __be32 err;
+ struct file *file;
+ loff_t offset = *offsetp;
++ int may_flags = NFSD_MAY_READ;
+
+- err = nfsd_open(rqstp, fhp, S_IFDIR, NFSD_MAY_READ, &file);
++ /* NFSv2 only supports 32 bit cookies */
++ if (rqstp->rq_vers > 2)
++ may_flags |= NFSD_MAY_64BIT_COOKIE;
++
++ err = nfsd_open(rqstp, fhp, S_IFDIR, may_flags, &file);
+ if (err)
+ goto out;
+
+diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
+index 3f54ad0..85d4d42 100644
+--- a/fs/nfsd/vfs.h
++++ b/fs/nfsd/vfs.h
+@@ -27,6 +27,8 @@
+ #define NFSD_MAY_BYPASS_GSS 0x400
+ #define NFSD_MAY_READ_IF_EXEC 0x800
+
++#define NFSD_MAY_64BIT_COOKIE 0x1000 /* 64 bit readdir cookies for >= NFSv3 */
++
+ #define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
+ #define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
+
+--
+1.7.10.4
+
Index: debian/patches/bugfix/all/ext3-return-32-64-bit-dir-name-hash-according-to-usa.patch
================================================== =================
--- debian/patches/bugfix/all/ext3-return-32-64-bit-dir-name-hash-according-to-usa.patch (revision 0)
+++ debian/patches/bugfix/all/ext3-return-32-64-bit-dir-name-hash-according-to-usa.patch (revision 0)
@@ -0,0 +1,348 @@
+From: Eric Sandeen <sandeen@redhat.com>
+Date: Thu, 26 Apr 2012 13:10:39 -0500
+Subject: ext3: return 32/64-bit dir name hash according to usage type
+
+commit d7dab39b6e16d5eea78ed3c705d2a2d0772b4f06 upstream.
+
+This is based on commit d1f5273e9adb40724a85272f248f210dc4ce919a
+ext4: return 32/64-bit dir name hash according to usage type
+by Fan Yong <yong.fan@whamcloud.com>
+
+Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
+to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
+and telldir(). However, this causes problems if there are 32-bit hash
+collisions, since the NFSv2 server can get stuck resending the same
+entries from the directory repeatedly.
+
+Allow ext3 to return a full 64-bit hash (both major and minor) for
+telldir to decrease the chance of hash collisions.
+
+This patch does implement a new ext3_dir_llseek op, because with 64-bit
+hashes, nfs will attempt to seek to a hash "offset" which is much
+larger than ext3's s_maxbytes. So for dx dirs, we call
+generic_file_llseek_size() with the appropriate max hash value as the
+maximum seekable size. Otherwise we just pass through to
+generic_file_llseek().
+
+Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
+Patch-updated-by: Eric Sandeen <sandeen@redhat.com>
+(blame us if something is not correct)
+
+Signed-off-by: Eric Sandeen <sandeen@redhat.com>
+Signed-off-by: Jan Kara <jack@suse.cz>
+Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
+---
+ fs/ext3/dir.c | 167 ++++++++++++++++++++++++++++++++++-------------
+ fs/ext3/hash.c | 4 +-
+ include/linux/ext3_fs.h | 6 +-
+ 3 files changed, 129 insertions(+), 48 deletions(-)
+
+diff --git a/fs/ext3/dir.c b/fs/ext3/dir.c
+index 34f0a07..3268697 100644
+--- a/fs/ext3/dir.c
++++ b/fs/ext3/dir.c
+@@ -25,6 +25,7 @@
+ #include <linux/jbd.h>
+ #include <linux/ext3_fs.h>
+ #include <linux/buffer_head.h>
++#include <linux/compat.h>
+ #include <linux/slab.h>
+ #include <linux/rbtree.h>
+
+@@ -32,24 +33,8 @@ static unsigned char ext3_filetype_table[] = {
+ DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
+ };
+
+-static int ext3_readdir(struct file *, void *, filldir_t);
+ static int ext3_dx_readdir(struct file * filp,
+ void * dirent, filldir_t filldir);
+-static int ext3_release_dir (struct inode * inode,
+- struct file * filp);
+-
+-const struct file_operations ext3_dir_operations = {
+- .llseek = generic_file_llseek,
+- .read = generic_read_dir,
+- .readdir = ext3_readdir, /* we take BKL. needed?*/
+- .unlocked_ioctl = ext3_ioctl,
+-#ifdef CONFIG_COMPAT
+- .compat_ioctl = ext3_compat_ioctl,
+-#endif
+- .fsync = ext3_sync_file, /* BKL held */
+- .release = ext3_release_dir,
+-};
+-
+
+ static unsigned char get_dtype(struct super_block *sb, int filetype)
+ {
+@@ -60,6 +45,25 @@ static unsigned char get_dtype(struct super_block *sb, int filetype)
+ return (ext3_filetype_table[filetype]);
+ }
+
++/**
++ * Check if the given dir-inode refers to an htree-indexed directory
++ * (or a directory which chould potentially get coverted to use htree
++ * indexing).
++ *
++ * Return 1 if it is a dx dir, 0 if not
++ */
++static int is_dx_dir(struct inode *inode)
++{
++ struct super_block *sb = inode->i_sb;
++
++ if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
++ EXT3_FEATURE_COMPAT_DIR_INDEX) &&
++ ((EXT3_I(inode)->i_flags & EXT3_INDEX_FL) ||
++ ((inode->i_size >> sb->s_blocksize_bits) == 1)))
++ return 1;
++
++ return 0;
++}
+
+ int ext3_check_dir_entry (const char * function, struct inode * dir,
+ struct ext3_dir_entry_2 * de,
+@@ -99,18 +103,13 @@ static int ext3_readdir(struct file * filp,
+ unsigned long offset;
+ int i, stored;
+ struct ext3_dir_entry_2 *de;
+- struct super_block *sb;
+ int err;
+ struct inode *inode = filp->f_path.dentry->d_inode;
++ struct super_block *sb = inode->i_sb;
+ int ret = 0;
+ int dir_has_error = 0;
+
+- sb = inode->i_sb;
+-
+- if (EXT3_HAS_COMPAT_FEATURE(inode->i_sb,
+- EXT3_FEATURE_COMPAT_DIR_INDEX) &&
+- ((EXT3_I(inode)->i_flags & EXT3_INDEX_FL) ||
+- ((inode->i_size >> sb->s_blocksize_bits) == 1))) {
++ if (is_dx_dir(inode)) {
+ err = ext3_dx_readdir(filp, dirent, filldir);
+ if (err != ERR_BAD_DX_DIR) {
+ ret = err;
+@@ -232,22 +231,87 @@ out:
+ return ret;
+ }
+
++static inline int is_32bit_api(void)
++{
++#ifdef CONFIG_COMPAT
++ return is_compat_task();
++#else
++ return (BITS_PER_LONG == 32);
++#endif
++}
++
+ /*
+ * These functions convert from the major/minor hash to an f_pos
+- * value.
++ * value for dx directories
+ *
+- * Currently we only use major hash numer. This is unfortunate, but
+- * on 32-bit machines, the same VFS interface is used for lseek and
+- * llseek, so if we use the 64 bit offset, then the 32-bit versions of
+- * lseek/telldir/seekdir will blow out spectacularly, and from within
+- * the ext2 low-level routine, we don't know if we're being called by
+- * a 64-bit version of the system call or the 32-bit version of the
+- * system call. Worse yet, NFSv2 only allows for a 32-bit readdir
+- * cookie. Sigh.
++ * Upper layer (for example NFS) should specify FMODE_32BITHASH or
++ * FMODE_64BITHASH explicitly. On the other hand, we allow ext3 to be mounted
++ * directly on both 32-bit and 64-bit nodes, under such case, neither
++ * FMODE_32BITHASH nor FMODE_64BITHASH is specified.
+ */
+-#define hash2pos(major, minor) (major >> 1)
+-#define pos2maj_hash(pos) ((pos << 1) & 0xffffffff)
+-#define pos2min_hash(pos) (0)
++static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
++{
++ if ((filp->f_mode & FMODE_32BITHASH) ||
++ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++ return major >> 1;
++ else
++ return ((__u64)(major >> 1) << 32) | (__u64)minor;
++}
++
++static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
++{
++ if ((filp->f_mode & FMODE_32BITHASH) ||
++ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++ return (pos << 1) & 0xffffffff;
++ else
++ return ((pos >> 32) << 1) & 0xffffffff;
++}
++
++static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
++{
++ if ((filp->f_mode & FMODE_32BITHASH) ||
++ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++ return 0;
++ else
++ return pos & 0xffffffff;
++}
++
++/*
++ * Return 32- or 64-bit end-of-file for dx directories
++ */
++static inline loff_t ext3_get_htree_eof(struct file *filp)
++{
++ if ((filp->f_mode & FMODE_32BITHASH) ||
++ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++ return EXT3_HTREE_EOF_32BIT;
++ else
++ return EXT3_HTREE_EOF_64BIT;
++}
++
++
++/*
++ * ext3_dir_llseek() calls generic_file_llseek[_size]() to handle both
++ * non-htree and htree directories, where the "offset" is in terms
++ * of the filename hash value instead of the byte offset.
++ *
++ * Because we may return a 64-bit hash that is well beyond s_maxbytes,
++ * we need to pass the max hash as the maximum allowable offset in
++ * the htree directory case.
++ *
++ * NOTE: offsets obtained *before* ext3_set_inode_flag(dir, EXT3_INODE_INDEX)
++ * will be invalid once the directory was converted into a dx directory
++ */
++loff_t ext3_dir_llseek(struct file *file, loff_t offset, int origin)
++{
++ struct inode *inode = file->f_mapping->host;
++ int dx_dir = is_dx_dir(inode);
++
++ if (likely(dx_dir))
++ return generic_file_llseek_size(file, offset, origin,
++ ext3_get_htree_eof(file));
++ else
++ return generic_file_llseek(file, offset, origin);
++}
+
+ /*
+ * This structure holds the nodes of the red-black tree used to store
+@@ -308,15 +372,16 @@ static void free_rb_tree_fname(struct rb_root *root)
+ }
+
+
+-static struct dir_private_info *ext3_htree_create_dir_info(loff_t pos)
++static struct dir_private_info *ext3_htree_create_dir_info(struct file *filp,
++ loff_t pos)
+ {
+ struct dir_private_info *p;
+
+ p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
+ if (!p)
+ return NULL;
+- p->curr_hash = pos2maj_hash(pos);
+- p->curr_minor_hash = pos2min_hash(pos);
++ p->curr_hash = pos2maj_hash(filp, pos);
++ p->curr_minor_hash = pos2min_hash(filp, pos);
+ return p;
+ }
+
+@@ -406,7 +471,7 @@ static int call_filldir(struct file * filp, void * dirent,
+ printk("call_filldir: called with null fname?!?
");
+ return 0;
+ }
+- curr_pos = hash2pos(fname->hash, fname->minor_hash);
++ curr_pos = hash2pos(filp, fname->hash, fname->minor_hash);
+ while (fname) {
+ error = filldir(dirent, fname->name,
+ fname->name_len, curr_pos,
+@@ -431,13 +496,13 @@ static int ext3_dx_readdir(struct file * filp,
+ int ret;
+
+ if (!info) {
+- info = ext3_htree_create_dir_info(filp->f_pos);
++ info = ext3_htree_create_dir_info(filp, filp->f_pos);
+ if (!info)
+ return -ENOMEM;
+ filp->private_data = info;
+ }
+
+- if (filp->f_pos == EXT3_HTREE_EOF)
++ if (filp->f_pos == ext3_get_htree_eof(filp))
+ return 0; /* EOF */
+
+ /* Some one has messed with f_pos; reset the world */
+@@ -445,8 +510,8 @@ static int ext3_dx_readdir(struct file * filp,
+ free_rb_tree_fname(&info->root);
+ info->curr_node = NULL;
+ info->extra_fname = NULL;
+- info->curr_hash = pos2maj_hash(filp->f_pos);
+- info->curr_minor_hash = pos2min_hash(filp->f_pos);
++ info->curr_hash = pos2maj_hash(filp, filp->f_pos);
++ info->curr_minor_hash = pos2min_hash(filp, filp->f_pos);
+ }
+
+ /*
+@@ -478,7 +543,7 @@ static int ext3_dx_readdir(struct file * filp,
+ if (ret < 0)
+ return ret;
+ if (ret == 0) {
+- filp->f_pos = EXT3_HTREE_EOF;
++ filp->f_pos = ext3_get_htree_eof(filp);
+ break;
+ }
+ info->curr_node = rb_first(&info->root);
+@@ -498,7 +563,7 @@ static int ext3_dx_readdir(struct file * filp,
+ info->curr_minor_hash = fname->minor_hash;
+ } else {
+ if (info->next_hash == ~0) {
+- filp->f_pos = EXT3_HTREE_EOF;
++ filp->f_pos = ext3_get_htree_eof(filp);
+ break;
+ }
+ info->curr_hash = info->next_hash;
+@@ -517,3 +582,15 @@ static int ext3_release_dir (struct inode * inode, struct file * filp)
+
+ return 0;
+ }
++
++const struct file_operations ext3_dir_operations = {
++ .llseek = ext3_dir_llseek,
++ .read = generic_read_dir,
++ .readdir = ext3_readdir,
++ .unlocked_ioctl = ext3_ioctl,
++#ifdef CONFIG_COMPAT
++ .compat_ioctl = ext3_compat_ioctl,
++#endif
++ .fsync = ext3_sync_file,
++ .release = ext3_release_dir,
++};
+diff --git a/fs/ext3/hash.c b/fs/ext3/hash.c
+index 7d215b4..d4d3ade 100644
+--- a/fs/ext3/hash.c
++++ b/fs/ext3/hash.c
+@@ -200,8 +200,8 @@ int ext3fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
+ return -1;
+ }
+ hash = hash & ~1;
+- if (hash == (EXT3_HTREE_EOF << 1))
+- hash = (EXT3_HTREE_EOF-1) << 1;
++ if (hash == (EXT3_HTREE_EOF_32BIT << 1))
++ hash = (EXT3_HTREE_EOF_32BIT - 1) << 1;
+ hinfo->hash = hash;
+ hinfo->minor_hash = minor_hash;
+ return 0;
+diff --git a/include/linux/ext3_fs.h b/include/linux/ext3_fs.h
+index dec9911..d59ab12 100644
+--- a/include/linux/ext3_fs.h
++++ b/include/linux/ext3_fs.h
+@@ -781,7 +781,11 @@ struct dx_hash_info
+ u32 *seed;
+ };
+
+-#define EXT3_HTREE_EOF 0x7fffffff
++
++/* 32 and 64 bit signed EOF for dx directories */
++#define EXT3_HTREE_EOF_32BIT ((1UL << (32 - 1)) - 1)
++#define EXT3_HTREE_EOF_64BIT ((1ULL << (64 - 1)) - 1)
++
+
+ /*
+ * Control parameters used by ext3_htree_next_block
+--
+1.7.10.4
+
Index: debian/patches/bugfix/all/ext4-return-32-64-bit-dir-name-hash-according-to-usa.patch
================================================== =================
--- debian/patches/bugfix/all/ext4-return-32-64-bit-dir-name-hash-according-to-usa.patch (revision 0)
+++ debian/patches/bugfix/all/ext4-return-32-64-bit-dir-name-hash-according-to-usa.patch (revision 0)
@@ -0,0 +1,379 @@
+From: Fan Yong <yong.fan@whamcloud.com>
+Date: Sun, 18 Mar 2012 22:44:40 -0400
+Subject: ext4: return 32/64-bit dir name hash according to usage type
+
+commit d1f5273e9adb40724a85272f248f210dc4ce919a upstream.
+
+Traditionally ext2/3/4 has returned a 32-bit hash value from llseek()
+to appease NFSv2, which can only handle a 32-bit cookie for seekdir()
+and telldir(). However, this causes problems if there are 32-bit hash
+collisions, since the NFSv2 server can get stuck resending the same
+entries from the directory repeatedly.
+
+Allow ext4 to return a full 64-bit hash (both major and minor) for
+telldir to decrease the chance of hash collisions. This still needs
+integration on the NFS side.
+
+Patch-updated-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
+(blame me if something is not correct)
+
+Signed-off-by: Fan Yong <yong.fan@whamcloud.com>
+Signed-off-by: Andreas Dilger <adilger@whamcloud.com>
+Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
+Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
+Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
+---
+ fs/ext4/dir.c | 214 ++++++++++++++++++++++++++++++++++++++++++++------------
+ fs/ext4/ext4.h | 6 +-
+ fs/ext4/hash.c | 4 +-
+ 3 files changed, 176 insertions(+), 48 deletions(-)
+
+diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
+index 164c560..689d1b1 100644
+--- a/fs/ext4/dir.c
++++ b/fs/ext4/dir.c
+@@ -32,24 +32,8 @@ static unsigned char ext4_filetype_table[] = {
+ DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
+ };
+
+-static int ext4_readdir(struct file *, void *, filldir_t);
+ static int ext4_dx_readdir(struct file *filp,
+ void *dirent, filldir_t filldir);
+-static int ext4_release_dir(struct inode *inode,
+- struct file *filp);
+-
+-const struct file_operations ext4_dir_operations = {
+- .llseek = ext4_llseek,
+- .read = generic_read_dir,
+- .readdir = ext4_readdir, /* we take BKL. needed?*/
+- .unlocked_ioctl = ext4_ioctl,
+-#ifdef CONFIG_COMPAT
+- .compat_ioctl = ext4_compat_ioctl,
+-#endif
+- .fsync = ext4_sync_file,
+- .release = ext4_release_dir,
+-};
+-
+
+ static unsigned char get_dtype(struct super_block *sb, int filetype)
+ {
+@@ -60,6 +44,26 @@ static unsigned char get_dtype(struct super_block *sb, int filetype)
+ return (ext4_filetype_table[filetype]);
+ }
+
++/**
++ * Check if the given dir-inode refers to an htree-indexed directory
++ * (or a directory which chould potentially get coverted to use htree
++ * indexing).
++ *
++ * Return 1 if it is a dx dir, 0 if not
++ */
++static int is_dx_dir(struct inode *inode)
++{
++ struct super_block *sb = inode->i_sb;
++
++ if (EXT4_HAS_COMPAT_FEATURE(inode->i_sb,
++ EXT4_FEATURE_COMPAT_DIR_INDEX) &&
++ ((ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) ||
++ ((inode->i_size >> sb->s_blocksize_bits) == 1)))
++ return 1;
++
++ return 0;
++}
++
+ /*
+ * Return 0 if the directory entry is OK, and 1 if there is a problem
+ *
+@@ -115,18 +119,13 @@ static int ext4_readdir(struct file *filp,
+ unsigned int offset;
+ int i, stored;
+ struct ext4_dir_entry_2 *de;
+- struct super_block *sb;
+ int err;
+ struct inode *inode = filp->f_path.dentry->d_inode;
++ struct super_block *sb = inode->i_sb;
+ int ret = 0;
+ int dir_has_error = 0;
+
+- sb = inode->i_sb;
+-
+- if (EXT4_HAS_COMPAT_FEATURE(inode->i_sb,
+- EXT4_FEATURE_COMPAT_DIR_INDEX) &&
+- ((ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) ||
+- ((inode->i_size >> sb->s_blocksize_bits) == 1))) {
++ if (is_dx_dir(inode)) {
+ err = ext4_dx_readdir(filp, dirent, filldir);
+ if (err != ERR_BAD_DX_DIR) {
+ ret = err;
+@@ -254,22 +253,134 @@ out:
+ return ret;
+ }
+
++static inline int is_32bit_api(void)
++{
++#ifdef CONFIG_COMPAT
++ return is_compat_task();
++#else
++ return (BITS_PER_LONG == 32);
++#endif
++}
++
+ /*
+ * These functions convert from the major/minor hash to an f_pos
+- * value.
++ * value for dx directories
+ *
+- * Currently we only use major hash numer. This is unfortunate, but
+- * on 32-bit machines, the same VFS interface is used for lseek and
+- * llseek, so if we use the 64 bit offset, then the 32-bit versions of
+- * lseek/telldir/seekdir will blow out spectacularly, and from within
+- * the ext2 low-level routine, we don't know if we're being called by
+- * a 64-bit version of the system call or the 32-bit version of the
+- * system call. Worse yet, NFSv2 only allows for a 32-bit readdir
+- * cookie. Sigh.
++ * Upper layer (for example NFS) should specify FMODE_32BITHASH or
++ * FMODE_64BITHASH explicitly. On the other hand, we allow ext4 to be mounted
++ * directly on both 32-bit and 64-bit nodes, under such case, neither
++ * FMODE_32BITHASH nor FMODE_64BITHASH is specified.
+ */
+-#define hash2pos(major, minor) (major >> 1)
+-#define pos2maj_hash(pos) ((pos << 1) & 0xffffffff)
+-#define pos2min_hash(pos) (0)
++static inline loff_t hash2pos(struct file *filp, __u32 major, __u32 minor)
++{
++ if ((filp->f_mode & FMODE_32BITHASH) ||
++ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++ return major >> 1;
++ else
++ return ((__u64)(major >> 1) << 32) | (__u64)minor;
++}
++
++static inline __u32 pos2maj_hash(struct file *filp, loff_t pos)
++{
++ if ((filp->f_mode & FMODE_32BITHASH) ||
++ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++ return (pos << 1) & 0xffffffff;
++ else
++ return ((pos >> 32) << 1) & 0xffffffff;
++}
++
++static inline __u32 pos2min_hash(struct file *filp, loff_t pos)
++{
++ if ((filp->f_mode & FMODE_32BITHASH) ||
++ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++ return 0;
++ else
++ return pos & 0xffffffff;
++}
++
++/*
++ * Return 32- or 64-bit end-of-file for dx directories
++ */
++static inline loff_t ext4_get_htree_eof(struct file *filp)
++{
++ if ((filp->f_mode & FMODE_32BITHASH) ||
++ (!(filp->f_mode & FMODE_64BITHASH) && is_32bit_api()))
++ return EXT4_HTREE_EOF_32BIT;
++ else
++ return EXT4_HTREE_EOF_64BIT;
++}
++
++
++/*
++ * ext4_dir_llseek() based on generic_file_llseek() to handle both
++ * non-htree and htree directories, where the "offset" is in terms
++ * of the filename hash value instead of the byte offset.
++ *
++ * NOTE: offsets obtained *before* ext4_set_inode_flag(dir, EXT4_INODE_INDEX)
++ * will be invalid once the directory was converted into a dx directory
++ */
++loff_t ext4_dir_llseek(struct file *file, loff_t offset, int origin)
++{
++ struct inode *inode = file->f_mapping->host;
++ loff_t ret = -EINVAL;
++ int dx_dir = is_dx_dir(inode);
++
++ mutex_lock(&inode->i_mutex);
++
++ /* NOTE: relative offsets with dx directories might not work
++ * as expected, as it is difficult to figure out the
++ * correct offset between dx hashes */
++
++ switch (origin) {
++ case SEEK_END:
++ if (unlikely(offset > 0))
++ goto out_err; /* not supported for directories */
++
++ /* so only negative offsets are left, does that have a
++ * meaning for directories at all? */
++ if (dx_dir)
++ offset += ext4_get_htree_eof(file);
++ else
++ offset += inode->i_size;
++ break;
++ case SEEK_CUR:
++ /*
++ * Here we special-case the lseek(fd, 0, SEEK_CUR)
++ * position-querying operation. Avoid rewriting the "same"
++ * f_pos value back to the file because a concurrent read(),
++ * write() or lseek() might have altered it
++ */
++ if (offset == 0) {
++ offset = file->f_pos;
++ goto out_ok;
++ }
++
++ offset += file->f_pos;
++ break;
++ }
++
++ if (unlikely(offset < 0))
++ goto out_err;
++
++ if (!dx_dir) {
++ if (offset > inode->i_sb->s_maxbytes)
++ goto out_err;
++ } else if (offset > ext4_get_htree_eof(file))
++ goto out_err;
++
++ /* Special lock needed here? */
++ if (offset != file->f_pos) {
++ file->f_pos = offset;
++ file->f_version = 0;
++ }
++
++out_ok:
++ ret = offset;
++out_err:
++ mutex_unlock(&inode->i_mutex);
++
++ return ret;
++}
+
+ /*
+ * This structure holds the nodes of the red-black tree used to store
+@@ -330,15 +441,16 @@ static void free_rb_tree_fname(struct rb_root *root)
+ }
+
+
+-static struct dir_private_info *ext4_htree_create_dir_info(loff_t pos)
++static struct dir_private_info *ext4_htree_create_dir_info(struct file *filp,
++ loff_t pos)
+ {
+ struct dir_private_info *p;
+
+ p = kzalloc(sizeof(struct dir_private_info), GFP_KERNEL);
+ if (!p)
+ return NULL;
+- p->curr_hash = pos2maj_hash(pos);
+- p->curr_minor_hash = pos2min_hash(pos);
++ p->curr_hash = pos2maj_hash(filp, pos);
++ p->curr_minor_hash = pos2min_hash(filp, pos);
+ return p;
+ }
+
+@@ -429,7 +541,7 @@ static int call_filldir(struct file *filp, void *dirent,
+ "null fname?!?
");
+ return 0;
+ }
+- curr_pos = hash2pos(fname->hash, fname->minor_hash);
++ curr_pos = hash2pos(filp, fname->hash, fname->minor_hash);
+ while (fname) {
+ error = filldir(dirent, fname->name,
+ fname->name_len, curr_pos,
+@@ -454,13 +566,13 @@ static int ext4_dx_readdir(struct file *filp,
+ int ret;
+
+ if (!info) {
+- info = ext4_htree_create_dir_info(filp->f_pos);
++ info = ext4_htree_create_dir_info(filp, filp->f_pos);
+ if (!info)
+ return -ENOMEM;
+ filp->private_data = info;
+ }
+
+- if (filp->f_pos == EXT4_HTREE_EOF)
++ if (filp->f_pos == ext4_get_htree_eof(filp))
+ return 0; /* EOF */
+
+ /* Some one has messed with f_pos; reset the world */
+@@ -468,8 +580,8 @@ static int ext4_dx_readdir(struct file *filp,
+ free_rb_tree_fname(&info->root);
+ info->curr_node = NULL;
+ info->extra_fname = NULL;
+- info->curr_hash = pos2maj_hash(filp->f_pos);
+- info->curr_minor_hash = pos2min_hash(filp->f_pos);
++ info->curr_hash = pos2maj_hash(filp, filp->f_pos);
++ info->curr_minor_hash = pos2min_hash(filp, filp->f_pos);
+ }
+
+ /*
+@@ -501,7 +613,7 @@ static int ext4_dx_readdir(struct file *filp,
+ if (ret < 0)
+ return ret;
+ if (ret == 0) {
+- filp->f_pos = EXT4_HTREE_EOF;
++ filp->f_pos = ext4_get_htree_eof(filp);
+ break;
+ }
+ info->curr_node = rb_first(&info->root);
+@@ -521,7 +633,7 @@ static int ext4_dx_readdir(struct file *filp,
+ info->curr_minor_hash = fname->minor_hash;
+ } else {
+ if (info->next_hash == ~0) {
+- filp->f_pos = EXT4_HTREE_EOF;
++ filp->f_pos = ext4_get_htree_eof(filp);
+ break;
+ }
+ info->curr_hash = info->next_hash;
+@@ -540,3 +652,15 @@ static int ext4_release_dir(struct inode *inode, struct file *filp)
+
+ return 0;
+ }
++
++const struct file_operations ext4_dir_operations = {
++ .llseek = ext4_dir_llseek,
++ .read = generic_read_dir,
++ .readdir = ext4_readdir,
++ .unlocked_ioctl = ext4_ioctl,
++#ifdef CONFIG_COMPAT
++ .compat_ioctl = ext4_compat_ioctl,
++#endif
++ .fsync = ext4_sync_file,
++ .release = ext4_release_dir,
++};
+diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
+index 8cb184c..2ac1eef 100644
+--- a/fs/ext4/ext4.h
++++ b/fs/ext4/ext4.h
+@@ -1597,7 +1597,11 @@ struct dx_hash_info
+ u32 *seed;
+ };
+
+-#define EXT4_HTREE_EOF 0x7fffffff
++
++/* 32 and 64 bit signed EOF for dx directories */
++#define EXT4_HTREE_EOF_32BIT ((1UL << (32 - 1)) - 1)
++#define EXT4_HTREE_EOF_64BIT ((1ULL << (64 - 1)) - 1)
++
+
+ /*
+ * Control parameters used by ext4_htree_next_block
+diff --git a/fs/ext4/hash.c b/fs/ext4/hash.c
+index ac8f168..fa8e491 100644
+--- a/fs/ext4/hash.c
++++ b/fs/ext4/hash.c
+@@ -200,8 +200,8 @@ int ext4fs_dirhash(const char *name, int len, struct dx_hash_info *hinfo)
+ return -1;
+ }
+ hash = hash & ~1;
+- if (hash == (EXT4_HTREE_EOF << 1))
+- hash = (EXT4_HTREE_EOF-1) << 1;
++ if (hash == (EXT4_HTREE_EOF_32BIT << 1))
++ hash = (EXT4_HTREE_EOF_32BIT - 1) << 1;
+ hinfo->hash = hash;
+ hinfo->minor_hash = minor_hash;
+ return 0;
+--
+1.7.10.4
+
Index: debian/patches/bugfix/all/fs-add-new-FMODE-flags-FMODE_32bithash-and-FMODE_64b.patch
================================================== =================
--- debian/patches/bugfix/all/fs-add-new-FMODE-flags-FMODE_32bithash-and-FMODE_64b.patch (revision 0)
+++ debian/patches/bugfix/all/fs-add-new-FMODE-flags-FMODE_32bithash-and-FMODE_64b.patch (revision 0)
@@ -0,0 +1,34 @@
+From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
+Date: Tue, 13 Mar 2012 22:51:38 -0400
+Subject: fs: add new FMODE flags: FMODE_32bithash and FMODE_64bithash
+
+commit 6a8a13e03861c0ab83ab07d573ca793cff0e5d00 upstream.
+
+Those flags are supposed to be set by NFS readdir() to tell ext3/ext4
+to 32bit (NFSv2) or 64bit hash values (offsets) in seekdir().
+
+Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
+Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
+Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
+---
+ include/linux/fs.h | 4 ++++
+ 1 file changed, 4 insertions(+)
+
+diff --git a/include/linux/fs.h b/include/linux/fs.h
+index 29b6353..fb7ce74 100644
+--- a/include/linux/fs.h
++++ b/include/linux/fs.h
+@@ -92,6 +92,10 @@ struct inodes_stat_t {
+ /* File is opened using open(.., 3, ..) and is writeable only for ioctls
+ (specialy hack for floppy.c) */
+ #define FMODE_WRITE_IOCTL ((__force fmode_t)0x100)
++/* 32bit hashes as llseek() offset (for directories) */
++#define FMODE_32BITHASH ((__force fmode_t)0x200)
++/* 64bit hashes as llseek() offset (for directories) */
++#define FMODE_64BITHASH ((__force fmode_t)0x400)
+
+ /*
+ * Don't update ctime and mtime.
+--
+1.7.10.4
+
Index: debian/patches/bugfix/all/nfsd-rename-int-access-to-int-may_flags-in-nfsd_open.patch
================================================== =================
--- debian/patches/bugfix/all/nfsd-rename-int-access-to-int-may_flags-in-nfsd_open.patch (revision 0)
+++ debian/patches/bugfix/all/nfsd-rename-int-access-to-int-may_flags-in-nfsd_open.patch (revision 0)
@@ -0,0 +1,84 @@
+From: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
+Date: Sun, 18 Mar 2012 22:44:49 -0400
+Subject: nfsd: rename 'int access' to 'int may_flags' in nfsd_open()
+
+commit 999448a8c0202d8c41711c92385323520644527b upstream.
+
+Just rename this variable, as the next patch will add a flag and
+'access' as variable name would not be correct any more.
+
+Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
+Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
+Acked-by: J. Bruce Fields <bfields@redhat.com>
+Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
+---
+ fs/nfsd/vfs.c | 18 ++++++++++--------
+ 1 file changed, 10 insertions(+), 8 deletions(-)
+
+diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
+index 5c3cd82..b395c61 100644
+--- a/fs/nfsd/vfs.c
++++ b/fs/nfsd/vfs.c
+@@ -726,12 +726,13 @@ static int nfsd_open_break_lease(struct inode *inode, int access)
+
+ /*
+ * Open an existing file or directory.
+- * The access argument indicates the type of open (read/write/lock)
++ * The may_flags argument indicates the type of open (read/write/lock)
++ * and additional flags.
+ * N.B. After this call fhp needs an fh_put
+ */
+ __be32
+ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
+- int access, struct file **filp)
++ int may_flags, struct file **filp)
+ {
+ struct dentry *dentry;
+ struct inode *inode;
+@@ -746,7 +747,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
+ * and (hopefully) checked permission - so allow OWNER_OVERRIDE
+ * in case a chmod has now revoked permission.
+ */
+- err = fh_verify(rqstp, fhp, type, access | NFSD_MAY_OWNER_OVERRIDE);
++ err = fh_verify(rqstp, fhp, type, may_flags | NFSD_MAY_OWNER_OVERRIDE);
+ if (err)
+ goto out;
+
+@@ -757,7 +758,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
+ * or any access when mandatory locking enabled
+ */
+ err = nfserr_perm;
+- if (IS_APPEND(inode) && (access & NFSD_MAY_WRITE))
++ if (IS_APPEND(inode) && (may_flags & NFSD_MAY_WRITE))
+ goto out;
+ /*
+ * We must ignore files (but only files) which might have mandatory
+@@ -770,12 +771,12 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
+ if (!inode->i_fop)
+ goto out;
+
+- host_err = nfsd_open_break_lease(inode, access);
++ host_err = nfsd_open_break_lease(inode, may_flags);
+ if (host_err) /* NOMEM or WOULDBLOCK */
+ goto out_nfserr;
+
+- if (access & NFSD_MAY_WRITE) {
+- if (access & NFSD_MAY_READ)
++ if (may_flags & NFSD_MAY_WRITE) {
++ if (may_flags & NFSD_MAY_READ)
+ flags = O_RDWR|O_LARGEFILE;
+ else
+ flags = O_WRONLY|O_LARGEFILE;
+@@ -785,7 +786,8 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, int type,
+ if (IS_ERR(*filp))
+ host_err = PTR_ERR(*filp);
+ else
+- host_err = ima_file_check(*filp, access);
++ host_err = ima_file_check(*filp, may_flags);
++
+ out_nfserr:
+ err = nfserrno(host_err);
+ out:
+--
+1.7.10.4
+
Index: debian/patches/series
================================================== =================
--- debian/patches/series (revision 19368)
+++ debian/patches/series (working copy)
@@ -393,3 +393,10 @@
bugfix/all/usb-Add-quirk-detection-based-on-interface-informati.patch
bugfix/all/usb-Add-USB_QUIRK_RESET_RESUME-for-all-Logitech-UVC-.patch
bugfix/alpha/alpha-use-large-data-model.diff
+
+# 64-bit NFS readdir cookies on ext3/ext4 with dir_index
+bugfix/all/fs-add-new-FMODE-flags-FMODE_32bithash-and-FMODE_64b.patch
+bugfix/all/ext4-return-32-64-bit-dir-name-hash-according-to-usa.patch
+bugfix/all/nfsd-rename-int-access-to-int-may_flags-in-nfsd_open.patch
+bugfix/all/nfsd-vfs_llseek-with-32-or-64-bit-offsets-hashes.patch
+bugfix/all/ext3-return-32-64-bit-dir-name-hash-according-to-usa.patch
Index: debian/changelog
================================================== =================
--- debian/changelog (revision 19368)
+++ debian/changelog (working copy)
@@ -115,6 +115,10 @@
* Make xen-linux-system meta-packages depend on xen-system. This allows
automatic updates. (closes: #681637)

+ [ Jonathan Nieder ]
+ * ext3, ext4: dir_index: Return 64-bit readdir cookies for NFSv3 and 4
+ (Closes: #685407)
+
-- Ben Hutchings <ben@decadent.org.uk> Tue, 24 Jul 2012 02:20:37 +0100

linux (3.2.23-1) unstable; urgency=low
 

Thread Tools




All times are GMT. The time now is 06:57 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org