Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Jonathan Nieder <jrnieder@gmail.com> 2012-07-20 11:25:
merge 682116 682007 quit Hi, Ben Hutchings <ben@decadent.org.uk> 2012-07-19 13:32: On Thu, 2012-07-19 at 13:37 +0200, Bastian Blank wrote: We don't support proprietary stuff. Please remove and try again. To be clear, Bastian is referring to the proprietary kernel module (nvidia). I think this stance is too aggressive. Testing without the modules we do not support can certainly help, but in cases like this where the proprietary module is not likely to be related, I'd rather hear about problems earlier than have submitters wait until they have time to reproduce without. Luckily this has been reproduced without the nvidia module, so merging. Rhaoul writes: This is reproducable using "grep -r abc *" inside a directory with 9541 files (no sym- or hardlinks, no block or character special files) in 1524 directories (PHP MODX installation) Brian, is it reproducible for you, too? Thanks for the test. Unfortunately, I'm on my way out of town for a couple of days and didn't have time to look at this closely. I'll get back to you with some proper results on this sometime next week. What is the newest kernel you tried that did not have this problem? I don't recall having this level of problems with 3.2.1-2~bpo60+1, but maybe it was just less frequent and I didn't notice it. Cheers, Brian |
Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
tags 682007 + upstream patch moreinfo
quit Hi, Brian Kroth wrote: > kernel BUG at /build/buildd-linux_3.2.20-1~bpo60+1-amd64-tQMw4f/linux-3.2.20/fs/buffer.c:3088! This is BUG_ON(!PagePrivate(page)); in static int drop_buffers(struct page *page, struct buffer_head **buffers_to_free) { struct buffer_head *head = page_buffers(page); I suspect it's the same underlying problem, but maybe not. Please test the attached patches, for example following the instructions below: 0. prerequisites: apt-get install git build-essential 1. get the kernel history, if you don't already have it: git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2. configure, build, test: cd linux git fetch origin git checkout origin/master cp /boot/config-$(uname -r) .config; # current configuration scripts/config --disable DEBUG_INFO make localmodconfig; # optional: minimize configuration make deb-pkg; # optionally with -j<num> for parallel build dpkg -i ../<name of package>; # as root reboot ... test test test ... Hopefully it reproduces the bug. So 3. try the patches: cd linux git am -3sc $(ls -1 /path/to/patches/[01]*) make deb-pkg; # maybe with -j4 dpkg -i ../<name of package>; # as root reboot ... test test test ... Thanks and hope that helps, Jonathan > Call Trace: > [<ffffffff811299ec>] ? try_to_free_buffers+0x3a/0xa6 > [<ffffffff810f977c>] ? migrate_pages+0x20b/0x3a1 > [<ffffffff810f2195>] ? compaction_register_node+0x12/0x12 > [<ffffffff810f2c60>] ? compact_zone+0x738/0x747 > [<ffffffff811bb647>] ? cpumask_next_and+0x2b/0x37 > [<ffffffff810f2edd>] ? try_to_compact_pages+0x12b/0x196 > [<ffffffff810c35c7>] ? __alloc_pages_direct_compact+0xa2/0x15c > [<ffffffff810c3d14>] ? __alloc_pages_nodemask+0x693/0x731 > [<ffffffff810f0196>] ? alloc_pages_vma+0x101/0x11d > [<ffffffff810fca38>] ? do_huge_pmd_anonymous_page+0x9e/0x2e8 > [<ffffffff813684ae>] ? do_page_fault+0x327/0x34c > [<ffffffff810dfb1f>] ? vma_merge+0x219/0x387 > [<ffffffff8100d6b0>] ? __switch_to+0x1c9/0x2b1 > [<ffffffff8103ed4a>] ? pick_next_task_fair+0xfc/0x10e > [<ffffffff81073f6c>] ? sys_futex+0x12f/0x147 > [<ffffffff813658b5>] ? page_fault+0x25/0x30 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:16:50 +0000 Subject: CacheFiles: Fix the marking of cached pages Under some circumstances CacheFiles defers the marking of pages with PG_fscache so that it can take advantage of pagevecs to reduce the number of calls to fscache_mark_pages_cached() and the netfs's hook to keep track of this. There are, however, two problems with this: (1) It can lead to the PG_fscache mark being applied _after_ the page is set PG_uptodate and unlocked (by the call to fscache_end_io()). (2) CacheFiles's ref on the page is dropped immediately following fscache_end_io() - and so may not still be held when the mark is applied. This can lead to the page being passed back to the allocator before the mark is applied. Fix this by, where appropriate, marking the page before calling fscache_end_io() and releasing the page. This means that we can't take advantage of pagevecs and have to make a separate call for each page to the marking routines. The symptoms of this are Bad Page state errors cropping up under memory pressure, for example: BUG: Bad page state in process tar pfn:002da page:ffffea0000009fb0 count:0 mapcount:0 mapping: (null) index:0x1447 page flags: 0x1000(private_2) Pid: 4574, comm: tar Tainted: G W 3.1.0-rc4-fsdevel+ #1064 Call Trace: [<ffffffff8109583c>] ? dump_page+0xb9/0xbe [<ffffffff81095916>] bad_page+0xd5/0xea [<ffffffff81095d82>] get_page_from_freelist+0x35b/0x46a [<ffffffff810961f3>] __alloc_pages_nodemask+0x362/0x662 [<ffffffff810989da>] __do_page_cache_readahead+0x13a/0x267 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267 [<ffffffff81098d7b>] ra_submit+0x1c/0x20 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a [<ffffffff81098ee2>] ? ondemand_readahead+0x163/0x29a [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs] [<ffffffff810c22c4>] do_sync_read+0xba/0xfa [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a [<ffffffff810c2a79>] sys_read+0x45/0x6c [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b As can be seen, PG_private_2 (== PG_fscache) is set in the page flags. Instrumenting fscache_mark_pages_cached() to verify whether page->mapping was set appropriately showed that sometimes it wasn't. This led to the discovery that sometimes the page has apparently been reclaimed by the time the marker got to see it. Reported-by: M. Stevens <m@tippett.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- fs/cachefiles/rdwr.c | 34 ++++++++---------------- fs/fscache/page.c | 59 +++++++++++++++++++++++++---------------- include/linux/fscache-cache.h | 3 +++ include/linux/fscache.h | 12 ++++----- 4 files changed, 56 insertions(+), 52 deletions(-) diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 0e3c0924cc3a..0186fc110162 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -176,9 +176,8 @@ static void cachefiles_read_copier(struct fscache_operation *_op) recheck: if (PageUptodate(monitor->back_page)) { copy_highpage(monitor->netfs_page, monitor->back_page); - - pagevec_add(&pagevec, monitor->netfs_page); - fscache_mark_pages_cached(monitor->op, &pagevec); + fscache_mark_page_cached(monitor->op, + monitor->netfs_page); error = 0; } else if (!PageError(monitor->back_page)) { /* the page has probably been truncated */ @@ -335,8 +334,7 @@ backing_page_already_present: backing_page_already_uptodate: _debug("- uptodate"); - pagevec_add(pagevec, netpage); - fscache_mark_pages_cached(op, pagevec); + fscache_mark_page_cached(op, netpage); copy_highpage(netpage, backpage); fscache_end_io(op, netpage, 0); @@ -448,8 +446,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, &pagevec); } else if (cachefiles_has_space(cache, 0, 1) == 0) { /* there's space in the cache we can use */ - pagevec_add(&pagevec, page); - fscache_mark_pages_cached(op, &pagevec); + fscache_mark_page_cached(op, page); ret = -ENODATA; } else { ret = -ENOBUFS; @@ -465,8 +462,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, */ static int cachefiles_read_backing_file(struct cachefiles_object *object, struct fscache_retrieval *op, - struct list_head *list, - struct pagevec *mark_pvec) + struct list_head *list) { struct cachefiles_one_read *monitor = NULL; struct address_space *bmapping = object->backer->d_inode->i_mapping; @@ -626,13 +622,13 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, page_cache_release(backpage); backpage = NULL; - if (!pagevec_add(mark_pvec, netpage)) - fscache_mark_pages_cached(op, mark_pvec); + fscache_mark_page_cached(op, netpage); page_cache_get(netpage); if (!pagevec_add(&lru_pvec, netpage)) __pagevec_lru_add_file(&lru_pvec); + /* the netpage is unlocked and marked up to date here */ fscache_end_io(op, netpage, 0); page_cache_release(netpage); netpage = NULL; @@ -775,15 +771,11 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op, /* submit the apparently valid pages to the backing fs to be read from * disk */ if (nrbackpages > 0) { - ret2 = cachefiles_read_backing_file(object, op, &backpages, - &pagevec); + ret2 = cachefiles_read_backing_file(object, op, &backpages); if (ret2 == -ENOMEM || ret2 == -EINTR) ret = ret2; } - if (pagevec_count(&pagevec) > 0) - fscache_mark_pages_cached(op, &pagevec); - _leave(" = %d [nr=%u%s]", ret, *nr_pages, list_empty(pages) ? " empty" : ""); return ret; @@ -806,7 +798,6 @@ int cachefiles_allocate_page(struct fscache_retrieval *op, { struct cachefiles_object *object; struct cachefiles_cache *cache; - struct pagevec pagevec; int ret; object = container_of(op->op.object, @@ -817,13 +808,10 @@ int cachefiles_allocate_page(struct fscache_retrieval *op, _enter("%p,{%lx},", object, page->index); ret = cachefiles_has_space(cache, 0, 1); - if (ret == 0) { - pagevec_init(&pagevec, 0); - pagevec_add(&pagevec, page); - fscache_mark_pages_cached(op, &pagevec); - } else { + if (ret == 0) + fscache_mark_page_cached(op, page); + else ret = -ENOBUFS; - } _leave(" = %d", ret); return ret; diff --git a/fs/fscache/page.c b/fs/fscache/page.c index 3f7a59bfa7ad..d7c663cfc923 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -915,6 +915,40 @@ done: EXPORT_SYMBOL(__fscache_uncache_page); /** + * fscache_mark_page_cached - Mark a page as being cached + * @op: The retrieval op pages are being marked for + * @page: The page to be marked + * + * Mark a netfs page as being cached. After this is called, the netfs + * must call fscache_uncache_page() to remove the mark. + */ +void fscache_mark_page_cached(struct fscache_retrieval *op, struct page *page) +{ + struct fscache_cookie *cookie = op->op.object->cookie; + +#ifdef CONFIG_FSCACHE_STATS + atomic_inc(&fscache_n_marks); +#endif + + _debug("- mark %p{%lx}", page, page->index); + if (TestSetPageFsCache(page)) { + static bool once_only; + if (!once_only) { + once_only = true; + printk(KERN_WARNING "FS-Cache:" + " Cookie type %s marked page %lx" + " multiple times ", + cookie->def->name, page->index); + } + } + + if (cookie->def->mark_page_cached) + cookie->def->mark_page_cached(cookie->netfs_data, + op->mapping, page); +} +EXPORT_SYMBOL(fscache_mark_page_cached); + +/** * fscache_mark_pages_cached - Mark pages as being cached * @op: The retrieval op pages are being marked for * @pagevec: The pages to be marked @@ -925,32 +959,11 @@ EXPORT_SYMBOL(__fscache_uncache_page); void fscache_mark_pages_cached(struct fscache_retrieval *op, struct pagevec *pagevec) { - struct fscache_cookie *cookie = op->op.object->cookie; unsigned long loop; -#ifdef CONFIG_FSCACHE_STATS - atomic_add(pagevec->nr, &fscache_n_marks); -#endif + for (loop = 0; loop < pagevec->nr; loop++) + fscache_mark_page_cached(op, pagevec->pages[loop]); - for (loop = 0; loop < pagevec->nr; loop++) { - struct page *page = pagevec->pages[loop]; - - _debug("- mark %p{%lx}", page, page->index); - if (TestSetPageFsCache(page)) { - static bool once_only; - if (!once_only) { - once_only = true; - printk(KERN_WARNING "FS-Cache:" - " Cookie type %s marked page %lx" - " multiple times ", - cookie->def->name, page->index); - } - } - } - - if (cookie->def->mark_pages_cached) - cookie->def->mark_pages_cached(cookie->netfs_data, - op->mapping, pagevec); pagevec_reinit(pagevec); } EXPORT_SYMBOL(fscache_mark_pages_cached); diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index ce31408b1e47..9879183b55d8 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -504,6 +504,9 @@ extern void fscache_withdraw_cache(struct fscache_cache *cache); extern void fscache_io_error(struct fscache_cache *cache); +extern void fscache_mark_page_cached(struct fscache_retrieval *op, + struct page *page); + extern void fscache_mark_pages_cached(struct fscache_retrieval *op, struct pagevec *pagevec); diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 9ec20dec3353..f4b6353543bf 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -135,14 +135,14 @@ struct fscache_cookie_def { */ void (*put_context)(void *cookie_netfs_data, void *context); - /* indicate pages that now have cache metadata retained - * - this function should mark the specified pages as now being cached - * - the pages will have been marked with PG_fscache before this is + /* indicate page that now have cache metadata retained + * - this function should mark the specified page as now being cached + * - the page will have been marked with PG_fscache before this is * called, so this is optional */ - void (*mark_pages_cached)(void *cookie_netfs_data, - struct address_space *mapping, - struct pagevec *cached_pvec); + void (*mark_page_cached)(void *cookie_netfs_data, + struct address_space *mapping, + struct page *page); /* indicate the cookie is no longer cached * - this function is called when the backing store currently caching -- 1.7.10.4 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:17:01 +0000 Subject: CacheFiles: Downgrade the requirements passed to the allocator Downgrade the requirements passed to the allocator in the gfp flags parameter. FS-Cache/CacheFiles can handle OOM conditions simply by aborting the attempt to store an object or a page in the cache. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- fs/cachefiles/interface.c | 8 ++++---- fs/cachefiles/internal.h | 2 ++ fs/cachefiles/key.c | 2 +- fs/cachefiles/rdwr.c | 18 ++++++++++-------- fs/cachefiles/xattr.c | 2 +- fs/fscache/page.c | 2 +- 6 files changed, 19 insertions(+), 15 deletions(-) diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 67bef6d01484..9bff0f878cfd 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -41,12 +41,12 @@ static struct fscache_object *cachefiles_alloc_object( _enter("{%s},%p,", cache->cache.identifier, cookie); - lookup_data = kmalloc(sizeof(*lookup_data), GFP_KERNEL); + lookup_data = kmalloc(sizeof(*lookup_data), cachefiles_gfp); if (!lookup_data) goto nomem_lookup_data; /* create a new object record and a temporary leaf image */ - object = kmem_cache_alloc(cachefiles_object_jar, GFP_KERNEL); + object = kmem_cache_alloc(cachefiles_object_jar, cachefiles_gfp); if (!object) goto nomem_object; @@ -63,7 +63,7 @@ static struct fscache_object *cachefiles_alloc_object( * - stick the length on the front and leave space on the back for the * encoder */ - buffer = kmalloc((2 + 512) + 3, GFP_KERNEL); + buffer = kmalloc((2 + 512) + 3, cachefiles_gfp); if (!buffer) goto nomem_buffer; @@ -219,7 +219,7 @@ static void cachefiles_update_object(struct fscache_object *_object) return; } - auxdata = kmalloc(2 + 512 + 3, GFP_KERNEL); + auxdata = kmalloc(2 + 512 + 3, cachefiles_gfp); if (!auxdata) { _leave(" [nomem]"); return; diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index bd6bc1bde2d7..49382519907a 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -23,6 +23,8 @@ extern unsigned cachefiles_debug; #define CACHEFILES_DEBUG_KLEAVE 2 #define CACHEFILES_DEBUG_KDEBUG 4 +#define cachefiles_gfp (__GFP_WAIT | __GFP_NORETRY | __GFP_NOMEMALLOC) + /* * node records */ diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c index 81b8b2b3a674..33b58c60f2d1 100644 --- a/fs/cachefiles/key.c +++ b/fs/cachefiles/key.c @@ -78,7 +78,7 @@ char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type) _debug("max: %d", max); - key = kmalloc(max, GFP_KERNEL); + key = kmalloc(max, cachefiles_gfp); if (!key) return NULL; diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 0186fc110162..26a6f526ebb7 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -238,7 +238,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object, _debug("read back %p{%lu,%d}", netpage, netpage->index, page_count(netpage)); - monitor = kzalloc(sizeof(*monitor), GFP_KERNEL); + monitor = kzalloc(sizeof(*monitor), cachefiles_gfp); if (!monitor) goto nomem; @@ -257,13 +257,14 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object, goto backing_page_already_present; if (!newpage) { - newpage = page_cache_alloc_cold(bmapping); + newpage = __page_cache_alloc(cachefiles_gfp | + __GFP_COLD); if (!newpage) goto nomem_monitor; } ret = add_to_page_cache(newpage, bmapping, - netpage->index, GFP_KERNEL); + netpage->index, cachefiles_gfp); if (ret == 0) goto installed_new_backing_page; if (ret != -EEXIST) @@ -481,7 +482,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, netpage, netpage->index, page_count(netpage)); if (!monitor) { - monitor = kzalloc(sizeof(*monitor), GFP_KERNEL); + monitor = kzalloc(sizeof(*monitor), cachefiles_gfp); if (!monitor) goto nomem; @@ -496,13 +497,14 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, goto backing_page_already_present; if (!newpage) { - newpage = page_cache_alloc_cold(bmapping); + newpage = __page_cache_alloc(cachefiles_gfp | + __GFP_COLD); if (!newpage) goto nomem; } ret = add_to_page_cache(newpage, bmapping, - netpage->index, GFP_KERNEL); + netpage->index, cachefiles_gfp); if (ret == 0) goto installed_new_backing_page; if (ret != -EEXIST) @@ -532,7 +534,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, _debug("- monitor add"); ret = add_to_page_cache(netpage, op->mapping, netpage->index, - GFP_KERNEL); + cachefiles_gfp); if (ret < 0) { if (ret == -EEXIST) { page_cache_release(netpage); @@ -608,7 +610,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, _debug("- uptodate"); ret = add_to_page_cache(netpage, op->mapping, netpage->index, - GFP_KERNEL); + cachefiles_gfp); if (ret < 0) { if (ret == -EEXIST) { page_cache_release(netpage); diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c index e18b183b47e1..73b46288b54b 100644 --- a/fs/cachefiles/xattr.c +++ b/fs/cachefiles/xattr.c @@ -174,7 +174,7 @@ int cachefiles_check_object_xattr(struct cachefiles_object *object, ASSERT(dentry); ASSERT(dentry->d_inode); - auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, GFP_KERNEL); + auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, cachefiles_gfp); if (!auxbuf) { _leave(" = -ENOMEM"); return -ENOMEM; diff --git a/fs/fscache/page.c b/fs/fscache/page.c index d7c663cfc923..248a12e22532 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -759,7 +759,7 @@ int __fscache_write_page(struct fscache_cookie *cookie, fscache_stat(&fscache_n_stores); - op = kzalloc(sizeof(*op), GFP_NOIO); + op = kzalloc(sizeof(*op), GFP_NOIO | __GFP_NOMEMALLOC | __GFP_NORETRY); if (!op) goto nomem; -- 1.7.10.4 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:17:11 +0000 Subject: FS-Cache: Check that there are no read ops when cookie relinquished Check that the netfs isn't trying to relinquish a cookie that still has read operations in progress upon it. If there are, then give log a warning and BUG. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- fs/fscache/cookie.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index 990535071a8a..0666996adf80 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -452,6 +452,14 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire) _debug("RELEASE OBJ%x", object->debug_id); + if (atomic_read(&object->n_reads)) { + spin_unlock(&cookie->lock); + printk(KERN_ERR "FS-Cache:" + " Cookie '%s' still has %d outstanding reads ", + cookie->def->name, atomic_read(&object->n_reads)); + BUG(); + } + /* detach each cache object from the object cookie */ spin_lock(&object->lock); hlist_del_init(&object->cookie_link); -- 1.7.10.4 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:17:21 +0000 Subject: CacheFiles: Make some debugging statements conditional Downgrade some debugging statements to not unconditionally print stuff, but rather be conditional on the appropriate module parameter setting. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- fs/cachefiles/rdwr.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 26a6f526ebb7..885f1b26cc16 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -77,25 +77,25 @@ static int cachefiles_read_reissue(struct cachefiles_object *object, struct page *backpage = monitor->back_page, *backpage2; int ret; - kenter("{ino=%lx},{%lx,%lx}", + _enter("{ino=%lx},{%lx,%lx}", object->backer->d_inode->i_ino, backpage->index, backpage->flags); /* skip if the page was truncated away completely */ if (backpage->mapping != bmapping) { - kleave(" = -ENODATA [mapping]"); + _leave(" = -ENODATA [mapping]"); return -ENODATA; } backpage2 = find_get_page(bmapping, backpage->index); if (!backpage2) { - kleave(" = -ENODATA [gone]"); + _leave(" = -ENODATA [gone]"); return -ENODATA; } if (backpage != backpage2) { put_page(backpage2); - kleave(" = -ENODATA [different]"); + _leave(" = -ENODATA [different]"); return -ENODATA; } @@ -114,7 +114,7 @@ static int cachefiles_read_reissue(struct cachefiles_object *object, if (PageUptodate(backpage)) goto unlock_discard; - kdebug("reissue read"); + _debug("reissue read"); ret = bmapping->a_ops->readpage(NULL, backpage); if (ret < 0) goto unlock_discard; @@ -129,7 +129,7 @@ static int cachefiles_read_reissue(struct cachefiles_object *object, } /* it'll reappear on the todo list */ - kleave(" = -EINPROGRESS"); + _leave(" = -EINPROGRESS"); return -EINPROGRESS; unlock_discard: @@ -137,7 +137,7 @@ unlock_discard: spin_lock_irq(&object->work_lock); list_del(&monitor->op_link); spin_unlock_irq(&object->work_lock); - kleave(" = %d", ret); + _leave(" = %d", ret); return ret; } -- 1.7.10.4 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:17:31 +0000 Subject: FS-Cache: Make cookie relinquishment wait for outstanding reads Make fscache_relinquish_cookie() log a warning and wait if there are any outstanding reads left on the cookie it was given. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- fs/fscache/cookie.c | 18 ++++++++++++++---- fs/fscache/operation.c | 10 ++++++++-- include/linux/fscache-cache.h | 1 + 3 files changed, 23 insertions(+), 6 deletions(-) diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index 0666996adf80..66be9eccede0 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -442,22 +442,32 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire) event = retire ? FSCACHE_OBJECT_EV_RETIRE : FSCACHE_OBJECT_EV_RELEASE; +try_again: spin_lock(&cookie->lock); /* break links with all the active objects */ while (!hlist_empty(&cookie->backing_objects)) { + int n_reads; object = hlist_entry(cookie->backing_objects.first, struct fscache_object, cookie_link); _debug("RELEASE OBJ%x", object->debug_id); - if (atomic_read(&object->n_reads)) { + set_bit(FSCACHE_COOKIE_WAITING_ON_READS, &cookie->flags); + n_reads = atomic_read(&object->n_reads); + if (n_reads) { + int n_ops = object->n_ops; + int n_in_progress = object->n_in_progress; spin_unlock(&cookie->lock); printk(KERN_ERR "FS-Cache:" - " Cookie '%s' still has %d outstanding reads ", - cookie->def->name, atomic_read(&object->n_reads)); - BUG(); + " Cookie '%s' still has %d outstanding reads (%d,%d) ", + cookie->def->name, + n_reads, n_ops, n_in_progress); + wait_on_bit(&cookie->flags, FSCACHE_COOKIE_WAITING_ON_READS, + fscache_wait_bit, TASK_UNINTERRUPTIBLE); + printk("Wait finished "); + goto try_again; } /* detach each cache object from the object cookie */ diff --git a/fs/fscache/operation.c b/fs/fscache/operation.c index 30afdfa7aec7..c857ab824d6e 100644 --- a/fs/fscache/operation.c +++ b/fs/fscache/operation.c @@ -340,8 +340,14 @@ void fscache_put_operation(struct fscache_operation *op) object = op->object; - if (test_bit(FSCACHE_OP_DEC_READ_CNT, &op->flags)) - atomic_dec(&object->n_reads); + if (test_bit(FSCACHE_OP_DEC_READ_CNT, &op->flags)) { + if (atomic_dec_and_test(&object->n_reads)) { + clear_bit(FSCACHE_COOKIE_WAITING_ON_READS, + &object->cookie->flags); + wake_up_bit(&object->cookie->flags, + FSCACHE_COOKIE_WAITING_ON_READS); + } + } /* now... we may get called with the object spinlock held, so we * complete the cleanup here only if we can immediately acquire the diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index 9879183b55d8..e3d6d939d959 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -301,6 +301,7 @@ struct fscache_cookie { #define FSCACHE_COOKIE_PENDING_FILL 3 /* T if pending initial fill on object */ #define FSCACHE_COOKIE_FILLING 4 /* T if filling object incrementally */ #define FSCACHE_COOKIE_UNAVAILABLE 5 /* T if cookie is unavailable (error, etc) */ +#define FSCACHE_COOKIE_WAITING_ON_READS 6 /* T if cookie is waiting on reads */ }; extern struct fscache_cookie fscache_fsdef_index; -- 1.7.10.4 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:17:41 +0000 Subject: FS-Cache: Fix operation state management and accounting Fix the state management of internal fscache operations and the accounting of what operations are in what states. This is done by: (1) Give struct fscache_operation a enum variable that directly represents the state it's currently in, rather than spreading this knowledge over a bunch of flags, who's processing the operation at the moment and whether it is queued or not. This makes it easier to write assertions to check the state at various points and to prevent invalid state transitions. (2) Add an 'operation complete' state and supply a function to indicate the completion of an operation (fscache_op_complete()) and make things call it. The final call to fscache_put_operation() can then check that an op in the appropriate state (complete or cancelled). (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better govern the state of an object: (a) The ->n_ops is now the number of extant operations on the object and is now decremented by fscache_put_operation() only. (b) The ->n_in_progress is simply the number of objects that have been taken off of the object's pending queue for the purposes of being run. This is decremented by fscache_op_complete() only. (c) The ->n_exclusive is the number of exclusive ops that have been submitted and queued or are in progress. It is decremented by fscache_op_complete() and by fscache_cancel_op(). fscache_put_operation() and fscache_operation_gc() now no longer try to clean up ->n_exclusive and ->n_in_progress. That was leading to double decrements against fscache_cancel_op(). fscache_cancel_op() now no longer decrements ->n_ops. That was leading to double decrements against fscache_put_operation(). fscache_submit_exclusive_op() now decides whether it has to queue an op based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter will persist in being true even after all preceding operations have been cancelled or completed. Furthermore, if an object is active and there are runnable ops against it, there must be at least one op running. (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and provide a function to record completion of the pages as they complete. When n_pages reaches 0, the operation is deemed to be complete and fscache_op_complete() is called. Add calls to fscache_retrieval_complete() anywhere we've finished with a page we've been given to read or allocate for. This includes places where we just return pages to the netfs for reading from the server and where accessing the cache fails and we discard the proposed netfs page. The bugs in the unfixed state management manifest themselves as oopses like the following where the operation completion gets out of sync with return of the cookie by the netfs. This is possible because the cache unlocks and returns all the netfs pages before recording its completion - which means that there's nothing to stop the netfs discarding them and returning the cookie. FS-Cache: Cookie 'NFS.fh' still has outstanding reads ------------[ cut here ]------------ kernel BUG at fs/fscache/cookie.c:519! invalid opcode: 0000 [#1] SMP CPU 1 Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090 /DG965RY RIP: 0010:[<ffffffffa007050a>] [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache] RSP: 0018:ffff8800368cfb00 EFLAGS: 00010282 RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000 RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000 R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98 R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370 FS: 0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040) Stack: ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0 ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0 ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91 Call Trace: [<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs] [<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs] [<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs] [<ffffffff810d8d47>] evict+0xa1/0x15c [<ffffffff810d8e2e>] dispose_list+0x2c/0x38 [<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b [<ffffffff810c56b7>] prune_super+0xd5/0x140 [<ffffffff8109b615>] shrink_slab+0x102/0x1ab [<ffffffff8109d690>] balance_pgdat+0x2f2/0x595 [<ffffffff8103e009>] ? process_timeout+0xb/0xb [<ffffffff8109dba3>] kswapd+0x270/0x289 [<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46 [<ffffffff8109d933>] ? balance_pgdat+0x595/0x595 [<ffffffff8104bf7a>] kthread+0x7f/0x87 [<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10 [<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0 [<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe [<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53 [<ffffffff813ad6b0>] ? gs_change+0xb/0xb Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- Documentation/filesystems/caching/backend-api.txt | 26 +++++- Documentation/filesystems/caching/operations.txt | 2 +- fs/cachefiles/rdwr.c | 31 +++++-- fs/fscache/object.c | 2 - fs/fscache/operation.c | 91 +++++++++++++-------- fs/fscache/page.c | 25 +++++- include/linux/fscache-cache.h | 37 +++++++-- 7 files changed, 164 insertions(+), 50 deletions(-) diff --git a/Documentation/filesystems/caching/backend-api.txt b/Documentation/filesystems/caching/backend-api.txt index 382d52cdaf2d..f4769b9399df 100644 --- a/Documentation/filesystems/caching/backend-api.txt +++ b/Documentation/filesystems/caching/backend-api.txt @@ -419,7 +419,10 @@ performed on the denizens of the cache. These are held in a structure of type: If an I/O error occurs, fscache_io_error() should be called and -ENOBUFS returned if possible or fscache_end_io() called with a suitable error - code.. + code. + + fscache_put_retrieval() should be called after a page or pages are dealt + with. This will complete the operation when all pages are dealt with. (*) Request pages be read from cache [mandatory]: @@ -526,6 +529,27 @@ FS-Cache provides some utilities that a cache backend may make use of: error value should be 0 if successful and an error otherwise. + (*) Record that one or more pages being retrieved or allocated have been dealt + with: + + void fscache_retrieval_complete(struct fscache_retrieval *op, + int n_pages); + + This is called to record the fact that one or more pages have been dealt + with and are no longer the concern of this operation. When the number of + pages remaining in the operation reaches 0, the operation will be + completed. + + + (*) Record operation completion: + + void fscache_op_complete(struct fscache_operation *op); + + This is called to record the completion of an operation. This deducts + this operation from the parent object's run state, potentially permitting + one or more pending operations to start running. + + (*) Set highest store limit: void fscache_set_store_limit(struct fscache_object *object, diff --git a/Documentation/filesystems/caching/operations.txt b/Documentation/filesystems/caching/operations.txt index b6b070c57cbf..bee2a5f93d60 100644 --- a/Documentation/filesystems/caching/operations.txt +++ b/Documentation/filesystems/caching/operations.txt @@ -174,7 +174,7 @@ Operations are used through the following procedure: necessary (the object might have died whilst the thread was waiting). When it has finished doing its processing, it should call - fscache_put_operation() on it. + fscache_op_complete() and fscache_put_operation() on it. (4) The operation holds an effective lock upon the object, preventing other exclusive ops conflicting until it is released. The operation can be diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 885f1b26cc16..31cda81656e5 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -197,6 +197,7 @@ static void cachefiles_read_copier(struct fscache_operation *_op) fscache_end_io(op, monitor->netfs_page, error); page_cache_release(monitor->netfs_page); + fscache_retrieval_complete(op, 1); fscache_put_retrieval(op); kfree(monitor); @@ -339,6 +340,7 @@ backing_page_already_uptodate: copy_highpage(netpage, backpage); fscache_end_io(op, netpage, 0); + fscache_retrieval_complete(op, 1); success: _debug("success"); @@ -360,6 +362,7 @@ read_error: goto out; io_error: cachefiles_io_error_obj(object, "Page read error on backing file"); + fscache_retrieval_complete(op, 1); ret = -ENOBUFS; goto out; @@ -369,6 +372,7 @@ nomem_monitor: fscache_put_retrieval(monitor->op); kfree(monitor); nomem: + fscache_retrieval_complete(op, 1); _leave(" = -ENOMEM"); return -ENOMEM; } @@ -407,7 +411,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, _enter("{%p},{%lx},,,", object, page->index); if (!object->backer) - return -ENOBUFS; + goto enobufs; inode = object->backer->d_inode; ASSERT(S_ISREG(inode->i_mode)); @@ -416,7 +420,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, /* calculate the shift required to use bmap */ if (inode->i_sb->s_blocksize > PAGE_SIZE) - return -ENOBUFS; + goto enobufs; shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits; @@ -448,13 +452,19 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, } else if (cachefiles_has_space(cache, 0, 1) == 0) { /* there's space in the cache we can use */ fscache_mark_page_cached(op, page); + fscache_retrieval_complete(op, 1); ret = -ENODATA; } else { - ret = -ENOBUFS; + goto enobufs; } _leave(" = %d", ret); return ret; + +enobufs: + fscache_retrieval_complete(op, 1); + _leave(" = -ENOBUFS"); + return -ENOBUFS; } /* @@ -632,6 +642,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, /* the netpage is unlocked and marked up to date here */ fscache_end_io(op, netpage, 0); + fscache_retrieval_complete(op, 1); page_cache_release(netpage); netpage = NULL; continue; @@ -659,6 +670,7 @@ out: list_for_each_entry_safe(netpage, _n, list, lru) { list_del(&netpage->lru); page_cache_release(netpage); + fscache_retrieval_complete(op, 1); } _leave(" = %d", ret); @@ -707,7 +719,7 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op, *nr_pages); if (!object->backer) - return -ENOBUFS; + goto all_enobufs; space = 1; if (cachefiles_has_space(cache, 0, *nr_pages) < 0) @@ -720,7 +732,7 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op, /* calculate the shift required to use bmap */ if (inode->i_sb->s_blocksize > PAGE_SIZE) - return -ENOBUFS; + goto all_enobufs; shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits; @@ -760,7 +772,10 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op, nrbackpages++; } else if (space && pagevec_add(&pagevec, page) == 0) { fscache_mark_pages_cached(op, &pagevec); + fscache_retrieval_complete(op, 1); ret = -ENODATA; + } else { + fscache_retrieval_complete(op, 1); } } @@ -781,6 +796,10 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op, _leave(" = %d [nr=%u%s]", ret, *nr_pages, list_empty(pages) ? " empty" : ""); return ret; + +all_enobufs: + fscache_retrieval_complete(op, *nr_pages); + return -ENOBUFS; } /* @@ -815,6 +834,7 @@ int cachefiles_allocate_page(struct fscache_retrieval *op, else ret = -ENOBUFS; + fscache_retrieval_complete(op, 1); _leave(" = %d", ret); return ret; } @@ -864,6 +884,7 @@ int cachefiles_allocate_pages(struct fscache_retrieval *op, ret = -ENOBUFS; } + fscache_retrieval_complete(op, *nr_pages); _leave(" = %d", ret); return ret; } diff --git a/fs/fscache/object.c b/fs/fscache/object.c index b6b897c550ac..773bc798a416 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -587,8 +587,6 @@ static void fscache_object_available(struct fscache_object *object) if (object->n_in_progress == 0) { if (object->n_ops > 0) { ASSERTCMP(object->n_ops, >=, object->n_obj_ops); - ASSERTIF(object->n_ops > object->n_obj_ops, - !list_empty(&object->pending_ops)); fscache_start_operations(object); } else { ASSERT(list_empty(&object->pending_ops)); diff --git a/fs/fscache/operation.c b/fs/fscache/operation.c index c857ab824d6e..748f9553c2cb 100644 --- a/fs/fscache/operation.c +++ b/fs/fscache/operation.c @@ -37,6 +37,7 @@ void fscache_enqueue_operation(struct fscache_operation *op) ASSERT(op->processor != NULL); ASSERTCMP(op->object->state, >=, FSCACHE_OBJECT_AVAILABLE); ASSERTCMP(atomic_read(&op->usage), >, 0); + ASSERTCMP(op->state, ==, FSCACHE_OP_ST_IN_PROGRESS); fscache_stat(&fscache_n_op_enqueue); switch (op->flags & FSCACHE_OP_TYPE) { @@ -64,6 +65,9 @@ EXPORT_SYMBOL(fscache_enqueue_operation); static void fscache_run_op(struct fscache_object *object, struct fscache_operation *op) { + ASSERTCMP(op->state, ==, FSCACHE_OP_ST_PENDING); + + op->state = FSCACHE_OP_ST_IN_PROGRESS; object->n_in_progress++; if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags)) wake_up_bit(&op->flags, FSCACHE_OP_WAITING); @@ -80,22 +84,23 @@ static void fscache_run_op(struct fscache_object *object, int fscache_submit_exclusive_op(struct fscache_object *object, struct fscache_operation *op) { - int ret; - _enter("{OBJ%x OP%x},", object->debug_id, op->debug_id); + ASSERTCMP(op->state, ==, FSCACHE_OP_ST_INITIALISED); + ASSERTCMP(atomic_read(&op->usage), >, 0); + spin_lock(&object->lock); ASSERTCMP(object->n_ops, >=, object->n_in_progress); ASSERTCMP(object->n_ops, >=, object->n_exclusive); ASSERT(list_empty(&op->pend_link)); - ret = -ENOBUFS; + op->state = FSCACHE_OP_ST_PENDING; if (fscache_object_is_active(object)) { op->object = object; object->n_ops++; object->n_exclusive++; /* reads and writes must wait */ - if (object->n_ops > 1) { + if (object->n_in_progress > 0) { atomic_inc(&op->usage); list_add_tail(&op->pend_link, &object->pending_ops); fscache_stat(&fscache_n_op_pend); @@ -111,7 +116,6 @@ int fscache_submit_exclusive_op(struct fscache_object *object, /* need to issue a new write op after this */ clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags); - ret = 0; } else if (object->state == FSCACHE_OBJECT_CREATING) { op->object = object; object->n_ops++; @@ -119,14 +123,13 @@ int fscache_submit_exclusive_op(struct fscache_object *object, atomic_inc(&op->usage); list_add_tail(&op->pend_link, &object->pending_ops); fscache_stat(&fscache_n_op_pend); - ret = 0; } else { /* not allowed to submit ops in any other state */ BUG(); } spin_unlock(&object->lock); - return ret; + return 0; } /* @@ -186,6 +189,7 @@ int fscache_submit_op(struct fscache_object *object, _enter("{OBJ%x OP%x},{%u}", object->debug_id, op->debug_id, atomic_read(&op->usage)); + ASSERTCMP(op->state, ==, FSCACHE_OP_ST_INITIALISED); ASSERTCMP(atomic_read(&op->usage), >, 0); spin_lock(&object->lock); @@ -196,6 +200,7 @@ int fscache_submit_op(struct fscache_object *object, ostate = object->state; smp_rmb(); + op->state = FSCACHE_OP_ST_PENDING; if (fscache_object_is_active(object)) { op->object = object; object->n_ops++; @@ -225,12 +230,15 @@ int fscache_submit_op(struct fscache_object *object, object->state == FSCACHE_OBJECT_LC_DYING || object->state == FSCACHE_OBJECT_WITHDRAWING) { fscache_stat(&fscache_n_op_rejected); + op->state = FSCACHE_OP_ST_CANCELLED; ret = -ENOBUFS; } else if (!test_bit(FSCACHE_IOERROR, &object->cache->flags)) { fscache_report_unexpected_submission(object, op, ostate); ASSERT(!fscache_object_is_active(object)); + op->state = FSCACHE_OP_ST_CANCELLED; ret = -ENOBUFS; } else { + op->state = FSCACHE_OP_ST_CANCELLED; ret = -ENOBUFS; } @@ -290,13 +298,18 @@ int fscache_cancel_op(struct fscache_operation *op) _enter("OBJ%x OP%x}", op->object->debug_id, op->debug_id); + ASSERTCMP(op->state, >=, FSCACHE_OP_ST_PENDING); + ASSERTCMP(op->state, !=, FSCACHE_OP_ST_CANCELLED); + ASSERTCMP(atomic_read(&op->usage), >, 0); + spin_lock(&object->lock); ret = -EBUSY; - if (!list_empty(&op->pend_link)) { + if (op->state == FSCACHE_OP_ST_PENDING) { + ASSERT(!list_empty(&op->pend_link)); fscache_stat(&fscache_n_op_cancelled); list_del_init(&op->pend_link); - object->n_ops--; + op->state = FSCACHE_OP_ST_CANCELLED; if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) object->n_exclusive--; if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags)) @@ -311,6 +324,37 @@ int fscache_cancel_op(struct fscache_operation *op) } /* + * Record the completion of an in-progress operation. + */ +void fscache_op_complete(struct fscache_operation *op) +{ + struct fscache_object *object = op->object; + + _enter("OBJ%x", object->debug_id); + + ASSERTCMP(op->state, ==, FSCACHE_OP_ST_IN_PROGRESS); + ASSERTCMP(object->n_in_progress, >, 0); + ASSERTIFCMP(test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags), + object->n_exclusive, >, 0); + ASSERTIFCMP(test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags), + object->n_in_progress, ==, 1); + + spin_lock(&object->lock); + + op->state = FSCACHE_OP_ST_COMPLETE; + + if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) + object->n_exclusive--; + object->n_in_progress--; + if (object->n_in_progress == 0) + fscache_start_operations(object); + + spin_unlock(&object->lock); + _leave(""); +} +EXPORT_SYMBOL(fscache_op_complete); + +/* * release an operation * - queues pending ops if this is the last in-progress op */ @@ -328,8 +372,9 @@ void fscache_put_operation(struct fscache_operation *op) return; _debug("PUT OP"); - if (test_and_set_bit(FSCACHE_OP_DEAD, &op->flags)) - BUG(); + ASSERTIFCMP(op->state != FSCACHE_OP_ST_COMPLETE, + op->state, ==, FSCACHE_OP_ST_CANCELLED); + op->state = FSCACHE_OP_ST_DEAD; fscache_stat(&fscache_n_op_release); @@ -365,16 +410,6 @@ void fscache_put_operation(struct fscache_operation *op) return; } - if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) { - ASSERTCMP(object->n_exclusive, >, 0); - object->n_exclusive--; - } - - ASSERTCMP(object->n_in_progress, >, 0); - object->n_in_progress--; - if (object->n_in_progress == 0) - fscache_start_operations(object); - ASSERTCMP(object->n_ops, >, 0); object->n_ops--; if (object->n_ops == 0) @@ -413,23 +448,14 @@ void fscache_operation_gc(struct work_struct *work) spin_unlock(&cache->op_gc_list_lock); object = op->object; + spin_lock(&object->lock); _debug("GC DEFERRED REL OBJ%x OP%x", object->debug_id, op->debug_id); fscache_stat(&fscache_n_op_gc); ASSERTCMP(atomic_read(&op->usage), ==, 0); - - spin_lock(&object->lock); - if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) { - ASSERTCMP(object->n_exclusive, >, 0); - object->n_exclusive--; - } - - ASSERTCMP(object->n_in_progress, >, 0); - object->n_in_progress--; - if (object->n_in_progress == 0) - fscache_start_operations(object); + ASSERTCMP(op->state, ==, FSCACHE_OP_ST_DEAD); ASSERTCMP(object->n_ops, >, 0); object->n_ops--; @@ -437,6 +463,7 @@ void fscache_operation_gc(struct work_struct *work) fscache_raise_event(object, FSCACHE_OBJECT_EV_CLEARED); spin_unlock(&object->lock); + kfree(op); } while (count++ < 20); diff --git a/fs/fscache/page.c b/fs/fscache/page.c index 248a12e22532..b38b13d2a555 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -162,6 +162,7 @@ static void fscache_attr_changed_op(struct fscache_operation *op) fscache_abort_object(object); } + fscache_op_complete(op); _leave(""); } @@ -223,6 +224,8 @@ static void fscache_release_retrieval_op(struct fscache_operation *_op) _enter("{OP%x}", op->op.debug_id); + ASSERTCMP(op->n_pages, ==, 0); + fscache_hist(fscache_retrieval_histogram, op->start_time); if (op->context) fscache_put_context(op->op.object->cookie, op->context); @@ -320,6 +323,11 @@ static int fscache_wait_for_retrieval_activation(struct fscache_object *object, _debug("<<< GO"); check_if_dead: + if (op->op.state == FSCACHE_OP_ST_CANCELLED) { + fscache_stat(stat_object_dead); + _leave(" = -ENOBUFS [cancelled]"); + return -ENOBUFS; + } if (unlikely(fscache_object_is_dead(object))) { fscache_stat(stat_object_dead); return -ENOBUFS; @@ -364,6 +372,7 @@ int __fscache_read_or_alloc_page(struct fscache_cookie *cookie, _leave(" = -ENOMEM"); return -ENOMEM; } + op->n_pages = 1; spin_lock(&cookie->lock); @@ -375,10 +384,10 @@ int __fscache_read_or_alloc_page(struct fscache_cookie *cookie, ASSERTCMP(object->state, >, FSCACHE_OBJECT_LOOKING_UP); atomic_inc(&object->n_reads); - set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags); + __set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags); if (fscache_submit_op(object, &op->op) < 0) - goto nobufs_unlock; + goto nobufs_unlock_dec; spin_unlock(&cookie->lock); fscache_stat(&fscache_n_retrieval_ops); @@ -425,6 +434,8 @@ error: _leave(" = %d", ret); return ret; +nobufs_unlock_dec: + atomic_dec(&object->n_reads); nobufs_unlock: spin_unlock(&cookie->lock); kfree(op); @@ -482,6 +493,7 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie, op = fscache_alloc_retrieval(mapping, end_io_func, context); if (!op) return -ENOMEM; + op->n_pages = *nr_pages; spin_lock(&cookie->lock); @@ -491,10 +503,10 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie, struct fscache_object, cookie_link); atomic_inc(&object->n_reads); - set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags); + __set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags); if (fscache_submit_op(object, &op->op) < 0) - goto nobufs_unlock; + goto nobufs_unlock_dec; spin_unlock(&cookie->lock); fscache_stat(&fscache_n_retrieval_ops); @@ -541,6 +553,8 @@ error: _leave(" = %d", ret); return ret; +nobufs_unlock_dec: + atomic_dec(&object->n_reads); nobufs_unlock: spin_unlock(&cookie->lock); kfree(op); @@ -583,6 +597,7 @@ int __fscache_alloc_page(struct fscache_cookie *cookie, op = fscache_alloc_retrieval(page->mapping, NULL, NULL); if (!op) return -ENOMEM; + op->n_pages = 1; spin_lock(&cookie->lock); @@ -696,6 +711,7 @@ static void fscache_write_op(struct fscache_operation *_op) fscache_end_page_write(object, page); if (ret < 0) { fscache_abort_object(object); + fscache_op_complete(&op->op); } else { fscache_enqueue_operation(&op->op); } @@ -710,6 +726,7 @@ superseded: spin_unlock(&cookie->stores_lock); clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags); spin_unlock(&object->lock); + fscache_op_complete(&op->op); _leave(""); } diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index e3d6d939d959..f5facd1d333f 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -75,6 +75,16 @@ extern wait_queue_head_t fscache_cache_cleared_wq; typedef void (*fscache_operation_release_t)(struct fscache_operation *op); typedef void (*fscache_operation_processor_t)(struct fscache_operation *op); +enum fscache_operation_state { + FSCACHE_OP_ST_BLANK, /* Op is not yet submitted */ + FSCACHE_OP_ST_INITIALISED, /* Op is initialised */ + FSCACHE_OP_ST_PENDING, /* Op is blocked from running */ + FSCACHE_OP_ST_IN_PROGRESS, /* Op is in progress */ + FSCACHE_OP_ST_COMPLETE, /* Op is complete */ + FSCACHE_OP_ST_CANCELLED, /* Op has been cancelled */ + FSCACHE_OP_ST_DEAD /* Op is now dead */ +}; + struct fscache_operation { struct work_struct work; /* record for async ops */ struct list_head pend_link; /* link in object->pending_ops */ @@ -86,10 +96,10 @@ struct fscache_operation { #define FSCACHE_OP_MYTHREAD 0x0002 /* - processing is done be issuing thread, not pool */ #define FSCACHE_OP_WAITING 4 /* cleared when op is woken */ #define FSCACHE_OP_EXCLUSIVE 5 /* exclusive op, other ops must wait */ -#define FSCACHE_OP_DEAD 6 /* op is now dead */ -#define FSCACHE_OP_DEC_READ_CNT 7 /* decrement object->n_reads on destruction */ -#define FSCACHE_OP_KEEP_FLAGS 0xc0 /* flags to keep when repurposing an op */ +#define FSCACHE_OP_DEC_READ_CNT 6 /* decrement object->n_reads on destruction */ +#define FSCACHE_OP_KEEP_FLAGS 0x0070 /* flags to keep when repurposing an op */ + enum fscache_operation_state state; atomic_t usage; unsigned debug_id; /* debugging ID */ @@ -106,6 +116,7 @@ extern atomic_t fscache_op_debug_id; extern void fscache_op_work_func(struct work_struct *work); extern void fscache_enqueue_operation(struct fscache_operation *); +extern void fscache_op_complete(struct fscache_operation *); extern void fscache_put_operation(struct fscache_operation *); /** @@ -122,6 +133,7 @@ static inline void fscache_operation_init(struct fscache_operation *op, { INIT_WORK(&op->work, fscache_op_work_func); atomic_set(&op->usage, 1); + op->state = FSCACHE_OP_ST_INITIALISED; op->debug_id = atomic_inc_return(&fscache_op_debug_id); op->processor = processor; op->release = release; @@ -138,6 +150,7 @@ struct fscache_retrieval { void *context; /* netfs read context (pinned) */ struct list_head to_do; /* list of things to be done by the backend */ unsigned long start_time; /* time at which retrieval started */ + unsigned n_pages; /* number of pages to be retrieved */ }; typedef int (*fscache_page_retrieval_func_t)(struct fscache_retrieval *op, @@ -174,8 +187,22 @@ static inline void fscache_enqueue_retrieval(struct fscache_retrieval *op) } /** + * fscache_retrieval_complete - Record (partial) completion of a retrieval + * @op: The retrieval operation affected + * @n_pages: The number of pages to account for + */ +static inline void fscache_retrieval_complete(struct fscache_retrieval *op, + int n_pages) +{ + op->n_pages -= n_pages; + if (op->n_pages <= 0) + fscache_op_complete(&op->op); +} + +/** * fscache_put_retrieval - Drop a reference to a retrieval operation * @op: The retrieval operation affected + * @n_pages: The number of pages to account for * * Drop a reference to a retrieval operation. */ @@ -333,10 +360,10 @@ struct fscache_object { int debug_id; /* debugging ID */ int n_children; /* number of child objects */ - int n_ops; /* number of ops outstanding on object */ + int n_ops; /* number of extant ops on object */ int n_obj_ops; /* number of object ops outstanding on object */ int n_in_progress; /* number of ops in progress */ - int n_exclusive; /* number of exclusive ops queued */ + int n_exclusive; /* number of exclusive ops queued or in progress */ atomic_t n_reads; /* number of read ops in progress */ spinlock_t lock; /* state and operations lock */ -- 1.7.10.4 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:17:52 +0000 Subject: FS-Cache: Provide proper invalidation Provide a proper invalidation method rather than relying on the netfs retiring the cookie it has and getting a new one. The problem with this is that isn't easy for the netfs to make sure that it has completed/cancelled all its outstanding storage and retrieval operations on the cookie it is retiring. Instead, have the cache provide an invalidation method that will cancel or wait for all currently outstanding operations before invalidating the cache, and will cause new operations to queue up behind that. Whilst invalidation is in progress, some requests will be rejected until the cache can stack a barrier on the operation queue to cause new operations to be deferred behind it. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- Documentation/filesystems/caching/backend-api.txt | 12 ++++ Documentation/filesystems/caching/netfs-api.txt | 46 ++++++++++--- Documentation/filesystems/caching/object.txt | 23 ++++--- fs/fscache/cookie.c | 60 +++++++++++++++++ fs/fscache/internal.h | 10 +++ fs/fscache/object.c | 72 +++++++++++++++++++++ fs/fscache/operation.c | 32 +++++++++ fs/fscache/page.c | 51 +++++++++++++++ fs/fscache/stats.c | 11 +++- include/linux/fscache-cache.h | 8 ++- include/linux/fscache.h | 38 +++++++++++ 11 files changed, 345 insertions(+), 18 deletions(-) diff --git a/Documentation/filesystems/caching/backend-api.txt b/Documentation/filesystems/caching/backend-api.txt index f4769b9399df..d78bab9622c6 100644 --- a/Documentation/filesystems/caching/backend-api.txt +++ b/Documentation/filesystems/caching/backend-api.txt @@ -308,6 +308,18 @@ performed on the denizens of the cache. These are held in a structure of type: obtained by calling object->cookie->def->get_aux()/get_attr(). + (*) Invalidate data object [mandatory]: + + int (*invalidate_object)(struct fscache_operation *op) + + This is called to invalidate a data object (as pointed to by op->object). + All the data stored for this object should be discarded and an + attr_changed operation should be performed. The caller will follow up + with an object update operation. + + fscache_op_complete() must be called on op before returning. + + (*) Discard object [mandatory]: void (*drop_object)(struct fscache_object *object) diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt index 7cc6bf2871eb..97e6c0ecc5ef 100644 --- a/Documentation/filesystems/caching/netfs-api.txt +++ b/Documentation/filesystems/caching/netfs-api.txt @@ -35,8 +35,9 @@ This document contains the following sections: (12) Index and data file update (13) Miscellaneous cookie operations (14) Cookie unregistration - (15) Index and data file invalidation - (16) FS-Cache specific page flags. + (15) Index invalidation + (16) Data file invalidation + (17) FS-Cache specific page flags. ============================= @@ -767,13 +768,42 @@ the cookies for "child" indices, objects and pages have been relinquished first. -================================ -INDEX AND DATA FILE INVALIDATION -================================ +================== +INDEX INVALIDATION +================== -There is no direct way to invalidate an index subtree or a data file. To do -this, the caller should relinquish and retire the cookie they have, and then -acquire a new one. +There is no direct way to invalidate an index subtree. To do this, the caller +should relinquish and retire the cookie they have, and then acquire a new one. + + +====================== +DATA FILE INVALIDATION +====================== + +Sometimes it will be necessary to invalidate an object that contains data. +Typically this will be necessary when the server tells the netfs of a foreign +change - at which point the netfs has to throw away all the state it had for an +inode and reload from the server. + +To indicate that a cache object should be invalidated, the following function +can be called: + + void fscache_invalidate(struct fscache_cookie *cookie); + +This can be called with spinlocks held as it defers the work to a thread pool. +All extant storage, retrieval and attribute change ops at this point are +cancelled and discarded. Some future operations will be rejected until the +cache has had a chance to insert a barrier in the operations queue. After +that, operations will be queued again behind the invalidation operation. + +The invalidation operation will perform an attribute change operation and an +auxiliary data update operation as it is very likely these will have changed. + +Using the following function, the netfs can wait for the invalidation operation +to have reached a point at which it can start submitting ordinary operations +once again: + + void fscache_wait_on_invalidate(struct fscache_cookie *cookie); =========================== diff --git a/Documentation/filesystems/caching/object.txt b/Documentation/filesystems/caching/object.txt index 58313348da87..100ff41127e4 100644 --- a/Documentation/filesystems/caching/object.txt +++ b/Documentation/filesystems/caching/object.txt @@ -216,7 +216,14 @@ servicing netfs requests: The normal running state. In this state, requests the netfs makes will be passed on to the cache. - (6) State FSCACHE_OBJECT_UPDATING. + (6) State FSCACHE_OBJECT_INVALIDATING. + + The object is undergoing invalidation. When the state comes here, it + discards all pending read, write and attribute change operations as it is + going to clear out the cache entirely and reinitialise it. It will then + continue to the FSCACHE_OBJECT_UPDATING state. + + (7) State FSCACHE_OBJECT_UPDATING. The state machine comes here to update the object in the cache from the netfs's records. This involves updating the auxiliary data that is used @@ -225,13 +232,13 @@ servicing netfs requests: And there are terminal states in which an object cleans itself up, deallocates memory and potentially deletes stuff from disk: - (7) State FSCACHE_OBJECT_LC_DYING. + (8) State FSCACHE_OBJECT_LC_DYING. The object comes here if it is dying because of a lookup or creation error. This would be due to a disk error or system error of some sort. Temporary data is cleaned up, and the parent is released. - (8) State FSCACHE_OBJECT_DYING. + (9) State FSCACHE_OBJECT_DYING. The object comes here if it is dying due to an error, because its parent cookie has been relinquished by the netfs or because the cache is being @@ -241,27 +248,27 @@ memory and potentially deletes stuff from disk: can destroy themselves. This object waits for all its children to go away before advancing to the next state. - (9) State FSCACHE_OBJECT_ABORT_INIT. +(10) State FSCACHE_OBJECT_ABORT_INIT. The object comes to this state if it was waiting on its parent in FSCACHE_OBJECT_INIT, but its parent died. The object will destroy itself so that the parent may proceed from the FSCACHE_OBJECT_DYING state. -(10) State FSCACHE_OBJECT_RELEASING. -(11) State FSCACHE_OBJECT_RECYCLING. +(11) State FSCACHE_OBJECT_RELEASING. +(12) State FSCACHE_OBJECT_RECYCLING. The object comes to one of these two states when dying once it is rid of all its children, if it is dying because the netfs relinquished its cookie. In the first state, the cached data is expected to persist, and in the second it will be deleted. -(12) State FSCACHE_OBJECT_WITHDRAWING. +(13) State FSCACHE_OBJECT_WITHDRAWING. The object transits to this state if the cache decides it wants to withdraw the object from service, perhaps to make space, but also due to error or just because the whole cache is being withdrawn. -(13) State FSCACHE_OBJECT_DEAD. +(14) State FSCACHE_OBJECT_DEAD. The object transits to this state when the in-memory object record is ready to be deleted. The object processor shouldn't ever see an object in diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index 66be9eccede0..8dcb114758e3 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -370,6 +370,66 @@ cant_attach_object: } /* + * Invalidate an object. Callable with spinlocks held. + */ +void __fscache_invalidate(struct fscache_cookie *cookie) +{ + struct fscache_object *object; + + _enter("{%s}", cookie->def->name); + + fscache_stat(&fscache_n_invalidates); + + /* Only permit invalidation of data files. Invalidating an index will + * require the caller to release all its attachments to the tree rooted + * there, and if it's doing that, it may as well just retire the + * cookie. + */ + ASSERTCMP(cookie->def->type, ==, FSCACHE_COOKIE_TYPE_DATAFILE); + + /* We will be updating the cookie too. */ + BUG_ON(!cookie->def->get_aux); + + /* If there's an object, we tell the object state machine to handle the + * invalidation on our behalf, otherwise there's nothing to do. + */ + if (!hlist_empty(&cookie->backing_objects)) { + spin_lock(&cookie->lock); + + if (!hlist_empty(&cookie->backing_objects) && + !test_and_set_bit(FSCACHE_COOKIE_INVALIDATING, + &cookie->flags)) { + object = hlist_entry(cookie->backing_objects.first, + struct fscache_object, + cookie_link); + if (object->state < FSCACHE_OBJECT_DYING) + fscache_raise_event( + object, FSCACHE_OBJECT_EV_INVALIDATE); + } + + spin_unlock(&cookie->lock); + } + + _leave(""); +} +EXPORT_SYMBOL(__fscache_invalidate); + +/* + * Wait for object invalidation to complete. + */ +void __fscache_wait_on_invalidate(struct fscache_cookie *cookie) +{ + _enter("%p", cookie); + + wait_on_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING, + fscache_wait_bit_interruptible, + TASK_UNINTERRUPTIBLE); + + _leave(""); +} +EXPORT_SYMBOL(__fscache_wait_on_invalidate); + +/* * update the index entries backing a cookie */ void __fscache_update_cookie(struct fscache_cookie *cookie) diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h index f6aad48d38a8..c81179303930 100644 --- a/fs/fscache/internal.h +++ b/fs/fscache/internal.h @@ -122,11 +122,17 @@ extern int fscache_submit_exclusive_op(struct fscache_object *, extern int fscache_submit_op(struct fscache_object *, struct fscache_operation *); extern int fscache_cancel_op(struct fscache_operation *); +extern void fscache_cancel_all_ops(struct fscache_object *); extern void fscache_abort_object(struct fscache_object *); extern void fscache_start_operations(struct fscache_object *); extern void fscache_operation_gc(struct work_struct *); /* + * page.c + */ +extern void fscache_invalidate_writes(struct fscache_cookie *); + +/* * proc.c */ #ifdef CONFIG_PROC_FS @@ -205,6 +211,9 @@ extern atomic_t fscache_n_acquires_ok; extern atomic_t fscache_n_acquires_nobufs; extern atomic_t fscache_n_acquires_oom; +extern atomic_t fscache_n_invalidates; +extern atomic_t fscache_n_invalidates_run; + extern atomic_t fscache_n_updates; extern atomic_t fscache_n_updates_null; extern atomic_t fscache_n_updates_run; @@ -237,6 +246,7 @@ extern atomic_t fscache_n_cop_alloc_object; extern atomic_t fscache_n_cop_lookup_object; extern atomic_t fscache_n_cop_lookup_complete; extern atomic_t fscache_n_cop_grab_object; +extern atomic_t fscache_n_cop_invalidate_object; extern atomic_t fscache_n_cop_update_object; extern atomic_t fscache_n_cop_drop_object; extern atomic_t |
Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Jonathan Nieder <jrnieder@gmail.com> 2012-07-20 11:25:
merge 682116 682007 quit Hi, Ben Hutchings <ben@decadent.org.uk> 2012-07-19 13:32: On Thu, 2012-07-19 at 13:37 +0200, Bastian Blank wrote: We don't support proprietary stuff. Please remove and try again. To be clear, Bastian is referring to the proprietary kernel module (nvidia). I think this stance is too aggressive. Testing without the modules we do not support can certainly help, but in cases like this where the proprietary module is not likely to be related, I'd rather hear about problems earlier than have submitters wait until they have time to reproduce without. Luckily this has been reproduced without the nvidia module, so merging. Rhaoul writes: This is reproducable using "grep -r abc *" inside a directory with 9541 files (no sym- or hardlinks, no block or character special files) in 1524 directories (PHP MODX installation) I downloaded the couple of files from that site [1] and unzipped them to hopefully create a similar test setup. I had to make two copies of it to get that many files/dirs. Right now I'm running this to see what happens: # for i in {1..100}; do grep -r abc /cae/apps/data/testapp-1/tmp* > /dev/null; done So far nothing much, but I just started. Some other points for comparison: - does the cache need to be fresh? I have a cron job that does this from time to time (about once a month with some random splay between machines) on these machines anyways (basically stop cachefilesd && rm -rf the_cache_dir_contents && start cachefilesd) - anything else in particular about the test I should look for? I've also noticed that I can usually get cachefilesd to spin to 100% cpu if I do something like this: # grep pattern /home/logs/some_multi_gb_large_readonly_logfile I recall seeing patches for large file support, but wasn't sure on their status. Anyways, that's digressing, so I'll leave that as a separate item for later. Thanks, Brian [1] http://modx.com/download/ |
Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Jonathan Nieder <jrnieder@gmail.com> 2012-07-21 12:04:
tags 682007 + upstream patch moreinfo quit Hi, Brian Kroth wrote: kernel BUG at /build/buildd-linux_3.2.20-1~bpo60+1-amd64-tQMw4f/linux-3.2.20/fs/buffer.c:3088! This is BUG_ON(!PagePrivate(page)); in static int drop_buffers(struct page *page, struct buffer_head **buffers_to_free) { struct buffer_head *head = page_buffers(page); I suspect it's the same underlying problem, but maybe not. Please test the attached patches, for example following the instructions below: 0. prerequisites: apt-get install git build-essential 1. get the kernel history, if you don't already have it: git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git I take it then that this is a patch against the latest greatest kernel, not against the source for the backports kernel I'm currently running? 2. configure, build, test: cd linux git fetch origin git checkout origin/master cp /boot/config-$(uname -r) .config; # current configuration scripts/config --disable DEBUG_INFO make localmodconfig; # optional: minimize configuration make deb-pkg; # optionally with -j<num> for parallel build dpkg -i ../<name of package>; # as root reboot ... test test test ... Hopefully it reproduces the bug. So Oh, I see you want to compare two nearly identical kernels (both fairly recent) to isolate if just the patches are helpful rather than some mix between the two. 3. try the patches: cd linux git am -3sc $(ls -1 /path/to/patches/[01]*) make deb-pkg; # maybe with -j4 dpkg -i ../<name of package>; # as root reboot ... test test test ... Thanks and hope that helps, Jonathan I can try and build/install this on one of our machines (I'd prefer not to push it everywhere yet), but without a reliably reproducible (on-demand that is) test case I'm not sure what it will show except that it perhaps doesn't introduce further blatant bugs. Anyways, I'll wait on the results of my previous test first to see if we have a reliable test case from it before moving forward. At this point the grep -r abc ... test is just hitting the cache over and over again, so it's not showing a whole lot. One other thing I'd tried before was something like this run a couple of times in a row (hmm, I suppose I could try them in parallel too): find /fsc_mounted_nfs -type f -exec cat {} > /dev/null ; A couple of them paniced, but with inconsistent messages, so I had left them out for now. Perhaps that's worth another shot ... Thanks, Brian |
Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Hi again,
In July, 2012, Brian Kroth wrote: > Jonathan Nieder <jrnieder@gmail.com> 2012-07-21 12:04: >> Please test the attached patches, for example following the instructions >> below: [...] > Anyways, I'll wait on the results of my previous test first to see if we > have a reliable test case from it before moving forward. > > At this point the grep -r abc ... test is just hitting the cache over and > over again, so it's not showing a whole lot. > > One other thing I'd tried before was something like this run a couple of > times in a row (hmm, I suppose I could try them in parallel too): > > find /fsc_mounted_nfs -type f -exec cat {} > /dev/null ; > > A couple of them paniced, but with inconsistent messages, so I had left them > out for now. Perhaps that's worth another shot ... So, how did it go? Did some test case prove reliable? Any other new observations? Thanks, Jonathan -- To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: http://lists.debian.org/20120929222202.GA12884@elie.Belkin |
Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Jonathan Nieder <jrnieder@gmail.com> 2012-09-29 15:22:
Hi again, In July, 2012, Brian Kroth wrote: Jonathan Nieder <jrnieder@gmail.com> 2012-07-21 12:04: Please test the attached patches, for example following the instructions below: [...] Anyways, I'll wait on the results of my previous test first to see if we have a reliable test case from it before moving forward. At this point the grep -r abc ... test is just hitting the cache over and over again, so it's not showing a whole lot. One other thing I'd tried before was something like this run a couple of times in a row (hmm, I suppose I could try them in parallel too): find /fsc_mounted_nfs -type f -exec cat {} > /dev/null ; A couple of them paniced, but with inconsistent messages, so I had left them out for now. Perhaps that's worth another shot ... So, how did it go? Did some test case prove reliable? Any other new observations? Sorry, the labs went into their dormant period and all of my test cases ran away for the rest of the summer (the find cmd didn't seem to trigger the __fscache problem), so I hadn't moved any further on this. Now that they're back, I'm definitely seeing it again (about 20 different machines in two days last week), so I've started the process of hunting down a trigger cause again. I'll let you know if I find something. Thanks, Brian |
Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Brian Kroth wrote:
> Sorry, the labs went into their dormant period and all of my test cases ran > away for the rest of the summer (the find cmd didn't seem to trigger the > __fscache problem), so I hadn't moved any further on this. > > Now that they're back, I'm definitely seeing it again (about 20 different > machines in two days last week), so I've started the process of hunting down > a trigger cause again. I'll let you know if I find something. Thanks for the update. The human test cases can work fine for vetting a fix. I'd also be interested to hear whether the series I sent was completely borked, so I'd recommend trying on a test machine for a day or two before putting such a patched kernel into production, though. Jonathan -- To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: http://lists.debian.org/20121001082554.GA7957@elie.Belkin |
Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Jonathan Nieder <jrnieder@gmail.com> 2012-10-01 01:25:
Brian Kroth wrote: Sorry, the labs went into their dormant period and all of my test cases ran away for the rest of the summer (the find cmd didn't seem to trigger the __fscache problem), so I hadn't moved any further on this. Now that they're back, I'm definitely seeing it again (about 20 different machines in two days last week), so I've started the process of hunting down a trigger cause again. I'll let you know if I find something. Thanks for the update. The human test cases can work fine for vetting a fix. I'd also be interested to hear whether the series I sent was completely borked, so I'd recommend trying on a test machine for a day or two before putting such a patched kernel into production, though. Jonathan Once again very sorry for the delay :( I forgot to disable the DEBUG_INFO and kept filling up my build VMs disk during compile. Then realized I had grabbed the 3.7 rc code, which these patches don't apply against. "git checkout remotes/stable/linux-3.2.y" (results in head c74a5e1fe4d0672936c8fb63d7484dfeaa30669c and 3.2.28), seemed to fix that. Here's what I got when applying the patches. # git am -3sc $(ls -1 ../patches/fscache_read/[01]*.patch) Applying: CacheFiles: Fix the marking of cached pages Applying: CacheFiles: Downgrade the requirements passed to the allocator Applying: FS-Cache: Check that there are no read ops when cookie relinquished Applying: CacheFiles: Make some debugging statements conditional Applying: FS-Cache: Make cookie relinquishment wait for outstanding reads Applying: FS-Cache: Fix operation state management and accounting Applying: FS-Cache: Provide proper invalidation Applying: VFS: Make more complete truncate operation available to CacheFiles Applying: CacheFiles: Implement invalidation Applying: NFS: Use FS-Cache invalidation Using index info to reconstruct a base tree... Falling back to patching base and 3-way merge... Auto-merging fs/nfs/fscache.h Auto-merging fs/nfs/inode.c Auto-merging fs/nfs/nfs4proc.c Applying: CacheFiles: Add missing retrieval completions Applying: FS-Cache: Convert the object event ID #defines into an enum Applying: FS-Cache: Initialise the object event mask with the calculated mask Applying: FS-Cache: Don't mask off the object event mask when printing it Applying: FS-Cache: Limit the number of I/O error reports for a cache Applying: FS-Cache: Exclusive op submission can BUG if there's been an I/O error Applying: NFS: nfs_migrate_page() does not wait for FS-Cache to finish with a page Anyways, I just started running that on a machine, so I'll let you know if I noticed anything there first before I think about pushing it to further places. Thanks, Brian |
Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Brian Kroth wrote:
> Then realized I had grabbed the 3.7 rc code, which these > patches don't apply against. "git checkout remotes/stable/linux-3.2.y" > (results in head c74a5e1fe4d0672936c8fb63d7484dfeaa30669c and 3.2.28), > seemed to fix that. There were no fundamentally conflicting changes in mainline, so here's a rebased series. But whatever you are testing already should be just as informative. Thanks, Jonathan From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:16:50 +0000 Subject: CacheFiles: Fix the marking of cached pages Under some circumstances CacheFiles defers the marking of pages with PG_fscache so that it can take advantage of pagevecs to reduce the number of calls to fscache_mark_pages_cached() and the netfs's hook to keep track of this. There are, however, two problems with this: (1) It can lead to the PG_fscache mark being applied _after_ the page is set PG_uptodate and unlocked (by the call to fscache_end_io()). (2) CacheFiles's ref on the page is dropped immediately following fscache_end_io() - and so may not still be held when the mark is applied. This can lead to the page being passed back to the allocator before the mark is applied. Fix this by, where appropriate, marking the page before calling fscache_end_io() and releasing the page. This means that we can't take advantage of pagevecs and have to make a separate call for each page to the marking routines. The symptoms of this are Bad Page state errors cropping up under memory pressure, for example: BUG: Bad page state in process tar pfn:002da page:ffffea0000009fb0 count:0 mapcount:0 mapping: (null) index:0x1447 page flags: 0x1000(private_2) Pid: 4574, comm: tar Tainted: G W 3.1.0-rc4-fsdevel+ #1064 Call Trace: [<ffffffff8109583c>] ? dump_page+0xb9/0xbe [<ffffffff81095916>] bad_page+0xd5/0xea [<ffffffff81095d82>] get_page_from_freelist+0x35b/0x46a [<ffffffff810961f3>] __alloc_pages_nodemask+0x362/0x662 [<ffffffff810989da>] __do_page_cache_readahead+0x13a/0x267 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267 [<ffffffff81098d7b>] ra_submit+0x1c/0x20 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a [<ffffffff81098ee2>] ? ondemand_readahead+0x163/0x29a [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs] [<ffffffff810c22c4>] do_sync_read+0xba/0xfa [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a [<ffffffff810c2a79>] sys_read+0x45/0x6c [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b As can be seen, PG_private_2 (== PG_fscache) is set in the page flags. Instrumenting fscache_mark_pages_cached() to verify whether page->mapping was set appropriately showed that sometimes it wasn't. This led to the discovery that sometimes the page has apparently been reclaimed by the time the marker got to see it. Reported-by: M. Stevens <m@tippett.com> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- fs/cachefiles/rdwr.c | 34 ++++++++---------------- fs/fscache/page.c | 59 +++++++++++++++++++++++++---------------- include/linux/fscache-cache.h | 3 +++ include/linux/fscache.h | 12 ++++----- 4 files changed, 56 insertions(+), 52 deletions(-) diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 0e3c0924cc3a..0186fc110162 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -176,9 +176,8 @@ static void cachefiles_read_copier(struct fscache_operation *_op) recheck: if (PageUptodate(monitor->back_page)) { copy_highpage(monitor->netfs_page, monitor->back_page); - - pagevec_add(&pagevec, monitor->netfs_page); - fscache_mark_pages_cached(monitor->op, &pagevec); + fscache_mark_page_cached(monitor->op, + monitor->netfs_page); error = 0; } else if (!PageError(monitor->back_page)) { /* the page has probably been truncated */ @@ -335,8 +334,7 @@ backing_page_already_present: backing_page_already_uptodate: _debug("- uptodate"); - pagevec_add(pagevec, netpage); - fscache_mark_pages_cached(op, pagevec); + fscache_mark_page_cached(op, netpage); copy_highpage(netpage, backpage); fscache_end_io(op, netpage, 0); @@ -448,8 +446,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, &pagevec); } else if (cachefiles_has_space(cache, 0, 1) == 0) { /* there's space in the cache we can use */ - pagevec_add(&pagevec, page); - fscache_mark_pages_cached(op, &pagevec); + fscache_mark_page_cached(op, page); ret = -ENODATA; } else { ret = -ENOBUFS; @@ -465,8 +462,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, */ static int cachefiles_read_backing_file(struct cachefiles_object *object, struct fscache_retrieval *op, - struct list_head *list, - struct pagevec *mark_pvec) + struct list_head *list) { struct cachefiles_one_read *monitor = NULL; struct address_space *bmapping = object->backer->d_inode->i_mapping; @@ -626,13 +622,13 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, page_cache_release(backpage); backpage = NULL; - if (!pagevec_add(mark_pvec, netpage)) - fscache_mark_pages_cached(op, mark_pvec); + fscache_mark_page_cached(op, netpage); page_cache_get(netpage); if (!pagevec_add(&lru_pvec, netpage)) __pagevec_lru_add_file(&lru_pvec); + /* the netpage is unlocked and marked up to date here */ fscache_end_io(op, netpage, 0); page_cache_release(netpage); netpage = NULL; @@ -775,15 +771,11 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op, /* submit the apparently valid pages to the backing fs to be read from * disk */ if (nrbackpages > 0) { - ret2 = cachefiles_read_backing_file(object, op, &backpages, - &pagevec); + ret2 = cachefiles_read_backing_file(object, op, &backpages); if (ret2 == -ENOMEM || ret2 == -EINTR) ret = ret2; } - if (pagevec_count(&pagevec) > 0) - fscache_mark_pages_cached(op, &pagevec); - _leave(" = %d [nr=%u%s]", ret, *nr_pages, list_empty(pages) ? " empty" : ""); return ret; @@ -806,7 +798,6 @@ int cachefiles_allocate_page(struct fscache_retrieval *op, { struct cachefiles_object *object; struct cachefiles_cache *cache; - struct pagevec pagevec; int ret; object = container_of(op->op.object, @@ -817,13 +808,10 @@ int cachefiles_allocate_page(struct fscache_retrieval *op, _enter("%p,{%lx},", object, page->index); ret = cachefiles_has_space(cache, 0, 1); - if (ret == 0) { - pagevec_init(&pagevec, 0); - pagevec_add(&pagevec, page); - fscache_mark_pages_cached(op, &pagevec); - } else { + if (ret == 0) + fscache_mark_page_cached(op, page); + else ret = -ENOBUFS; - } _leave(" = %d", ret); return ret; diff --git a/fs/fscache/page.c b/fs/fscache/page.c index 3f7a59bfa7ad..d7c663cfc923 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -915,6 +915,40 @@ done: EXPORT_SYMBOL(__fscache_uncache_page); /** + * fscache_mark_page_cached - Mark a page as being cached + * @op: The retrieval op pages are being marked for + * @page: The page to be marked + * + * Mark a netfs page as being cached. After this is called, the netfs + * must call fscache_uncache_page() to remove the mark. + */ +void fscache_mark_page_cached(struct fscache_retrieval *op, struct page *page) +{ + struct fscache_cookie *cookie = op->op.object->cookie; + +#ifdef CONFIG_FSCACHE_STATS + atomic_inc(&fscache_n_marks); +#endif + + _debug("- mark %p{%lx}", page, page->index); + if (TestSetPageFsCache(page)) { + static bool once_only; + if (!once_only) { + once_only = true; + printk(KERN_WARNING "FS-Cache:" + " Cookie type %s marked page %lx" + " multiple times ", + cookie->def->name, page->index); + } + } + + if (cookie->def->mark_page_cached) + cookie->def->mark_page_cached(cookie->netfs_data, + op->mapping, page); +} +EXPORT_SYMBOL(fscache_mark_page_cached); + +/** * fscache_mark_pages_cached - Mark pages as being cached * @op: The retrieval op pages are being marked for * @pagevec: The pages to be marked @@ -925,32 +959,11 @@ EXPORT_SYMBOL(__fscache_uncache_page); void fscache_mark_pages_cached(struct fscache_retrieval *op, struct pagevec *pagevec) { - struct fscache_cookie *cookie = op->op.object->cookie; unsigned long loop; -#ifdef CONFIG_FSCACHE_STATS - atomic_add(pagevec->nr, &fscache_n_marks); -#endif + for (loop = 0; loop < pagevec->nr; loop++) + fscache_mark_page_cached(op, pagevec->pages[loop]); - for (loop = 0; loop < pagevec->nr; loop++) { - struct page *page = pagevec->pages[loop]; - - _debug("- mark %p{%lx}", page, page->index); - if (TestSetPageFsCache(page)) { - static bool once_only; - if (!once_only) { - once_only = true; - printk(KERN_WARNING "FS-Cache:" - " Cookie type %s marked page %lx" - " multiple times ", - cookie->def->name, page->index); - } - } - } - - if (cookie->def->mark_pages_cached) - cookie->def->mark_pages_cached(cookie->netfs_data, - op->mapping, pagevec); pagevec_reinit(pagevec); } EXPORT_SYMBOL(fscache_mark_pages_cached); diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index ce31408b1e47..9879183b55d8 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -504,6 +504,9 @@ extern void fscache_withdraw_cache(struct fscache_cache *cache); extern void fscache_io_error(struct fscache_cache *cache); +extern void fscache_mark_page_cached(struct fscache_retrieval *op, + struct page *page); + extern void fscache_mark_pages_cached(struct fscache_retrieval *op, struct pagevec *pagevec); diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 9ec20dec3353..f4b6353543bf 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -135,14 +135,14 @@ struct fscache_cookie_def { */ void (*put_context)(void *cookie_netfs_data, void *context); - /* indicate pages that now have cache metadata retained - * - this function should mark the specified pages as now being cached - * - the pages will have been marked with PG_fscache before this is + /* indicate page that now have cache metadata retained + * - this function should mark the specified page as now being cached + * - the page will have been marked with PG_fscache before this is * called, so this is optional */ - void (*mark_pages_cached)(void *cookie_netfs_data, - struct address_space *mapping, - struct pagevec *cached_pvec); + void (*mark_page_cached)(void *cookie_netfs_data, + struct address_space *mapping, + struct page *page); /* indicate the cookie is no longer cached * - this function is called when the backing store currently caching -- 1.7.10.4 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:17:01 +0000 Subject: CacheFiles: Downgrade the requirements passed to the allocator Downgrade the requirements passed to the allocator in the gfp flags parameter. FS-Cache/CacheFiles can handle OOM conditions simply by aborting the attempt to store an object or a page in the cache. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- fs/cachefiles/interface.c | 8 ++++---- fs/cachefiles/internal.h | 2 ++ fs/cachefiles/key.c | 2 +- fs/cachefiles/rdwr.c | 18 ++++++++++-------- fs/cachefiles/xattr.c | 2 +- fs/fscache/page.c | 2 +- 6 files changed, 19 insertions(+), 15 deletions(-) diff --git a/fs/cachefiles/interface.c b/fs/cachefiles/interface.c index 67bef6d01484..9bff0f878cfd 100644 --- a/fs/cachefiles/interface.c +++ b/fs/cachefiles/interface.c @@ -41,12 +41,12 @@ static struct fscache_object *cachefiles_alloc_object( _enter("{%s},%p,", cache->cache.identifier, cookie); - lookup_data = kmalloc(sizeof(*lookup_data), GFP_KERNEL); + lookup_data = kmalloc(sizeof(*lookup_data), cachefiles_gfp); if (!lookup_data) goto nomem_lookup_data; /* create a new object record and a temporary leaf image */ - object = kmem_cache_alloc(cachefiles_object_jar, GFP_KERNEL); + object = kmem_cache_alloc(cachefiles_object_jar, cachefiles_gfp); if (!object) goto nomem_object; @@ -63,7 +63,7 @@ static struct fscache_object *cachefiles_alloc_object( * - stick the length on the front and leave space on the back for the * encoder */ - buffer = kmalloc((2 + 512) + 3, GFP_KERNEL); + buffer = kmalloc((2 + 512) + 3, cachefiles_gfp); if (!buffer) goto nomem_buffer; @@ -219,7 +219,7 @@ static void cachefiles_update_object(struct fscache_object *_object) return; } - auxdata = kmalloc(2 + 512 + 3, GFP_KERNEL); + auxdata = kmalloc(2 + 512 + 3, cachefiles_gfp); if (!auxdata) { _leave(" [nomem]"); return; diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index bd6bc1bde2d7..49382519907a 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -23,6 +23,8 @@ extern unsigned cachefiles_debug; #define CACHEFILES_DEBUG_KLEAVE 2 #define CACHEFILES_DEBUG_KDEBUG 4 +#define cachefiles_gfp (__GFP_WAIT | __GFP_NORETRY | __GFP_NOMEMALLOC) + /* * node records */ diff --git a/fs/cachefiles/key.c b/fs/cachefiles/key.c index 81b8b2b3a674..33b58c60f2d1 100644 --- a/fs/cachefiles/key.c +++ b/fs/cachefiles/key.c @@ -78,7 +78,7 @@ char *cachefiles_cook_key(const u8 *raw, int keylen, uint8_t type) _debug("max: %d", max); - key = kmalloc(max, GFP_KERNEL); + key = kmalloc(max, cachefiles_gfp); if (!key) return NULL; diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 0186fc110162..26a6f526ebb7 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -238,7 +238,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object, _debug("read back %p{%lu,%d}", netpage, netpage->index, page_count(netpage)); - monitor = kzalloc(sizeof(*monitor), GFP_KERNEL); + monitor = kzalloc(sizeof(*monitor), cachefiles_gfp); if (!monitor) goto nomem; @@ -257,13 +257,14 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object, goto backing_page_already_present; if (!newpage) { - newpage = page_cache_alloc_cold(bmapping); + newpage = __page_cache_alloc(cachefiles_gfp | + __GFP_COLD); if (!newpage) goto nomem_monitor; } ret = add_to_page_cache(newpage, bmapping, - netpage->index, GFP_KERNEL); + netpage->index, cachefiles_gfp); if (ret == 0) goto installed_new_backing_page; if (ret != -EEXIST) @@ -481,7 +482,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, netpage, netpage->index, page_count(netpage)); if (!monitor) { - monitor = kzalloc(sizeof(*monitor), GFP_KERNEL); + monitor = kzalloc(sizeof(*monitor), cachefiles_gfp); if (!monitor) goto nomem; @@ -496,13 +497,14 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, goto backing_page_already_present; if (!newpage) { - newpage = page_cache_alloc_cold(bmapping); + newpage = __page_cache_alloc(cachefiles_gfp | + __GFP_COLD); if (!newpage) goto nomem; } ret = add_to_page_cache(newpage, bmapping, - netpage->index, GFP_KERNEL); + netpage->index, cachefiles_gfp); if (ret == 0) goto installed_new_backing_page; if (ret != -EEXIST) @@ -532,7 +534,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, _debug("- monitor add"); ret = add_to_page_cache(netpage, op->mapping, netpage->index, - GFP_KERNEL); + cachefiles_gfp); if (ret < 0) { if (ret == -EEXIST) { page_cache_release(netpage); @@ -608,7 +610,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, _debug("- uptodate"); ret = add_to_page_cache(netpage, op->mapping, netpage->index, - GFP_KERNEL); + cachefiles_gfp); if (ret < 0) { if (ret == -EEXIST) { page_cache_release(netpage); diff --git a/fs/cachefiles/xattr.c b/fs/cachefiles/xattr.c index e18b183b47e1..73b46288b54b 100644 --- a/fs/cachefiles/xattr.c +++ b/fs/cachefiles/xattr.c @@ -174,7 +174,7 @@ int cachefiles_check_object_xattr(struct cachefiles_object *object, ASSERT(dentry); ASSERT(dentry->d_inode); - auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, GFP_KERNEL); + auxbuf = kmalloc(sizeof(struct cachefiles_xattr) + 512, cachefiles_gfp); if (!auxbuf) { _leave(" = -ENOMEM"); return -ENOMEM; diff --git a/fs/fscache/page.c b/fs/fscache/page.c index d7c663cfc923..248a12e22532 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -759,7 +759,7 @@ int __fscache_write_page(struct fscache_cookie *cookie, fscache_stat(&fscache_n_stores); - op = kzalloc(sizeof(*op), GFP_NOIO); + op = kzalloc(sizeof(*op), GFP_NOIO | __GFP_NOMEMALLOC | __GFP_NORETRY); if (!op) goto nomem; -- 1.7.10.4 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:17:11 +0000 Subject: FS-Cache: Check that there are no read ops when cookie relinquished Check that the netfs isn't trying to relinquish a cookie that still has read operations in progress upon it. If there are, then give log a warning and BUG. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- fs/fscache/cookie.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index 990535071a8a..0666996adf80 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -452,6 +452,14 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire) _debug("RELEASE OBJ%x", object->debug_id); + if (atomic_read(&object->n_reads)) { + spin_unlock(&cookie->lock); + printk(KERN_ERR "FS-Cache:" + " Cookie '%s' still has %d outstanding reads ", + cookie->def->name, atomic_read(&object->n_reads)); + BUG(); + } + /* detach each cache object from the object cookie */ spin_lock(&object->lock); hlist_del_init(&object->cookie_link); -- 1.7.10.4 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:17:21 +0000 Subject: CacheFiles: Make some debugging statements conditional Downgrade some debugging statements to not unconditionally print stuff, but rather be conditional on the appropriate module parameter setting. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- fs/cachefiles/rdwr.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 26a6f526ebb7..885f1b26cc16 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -77,25 +77,25 @@ static int cachefiles_read_reissue(struct cachefiles_object *object, struct page *backpage = monitor->back_page, *backpage2; int ret; - kenter("{ino=%lx},{%lx,%lx}", + _enter("{ino=%lx},{%lx,%lx}", object->backer->d_inode->i_ino, backpage->index, backpage->flags); /* skip if the page was truncated away completely */ if (backpage->mapping != bmapping) { - kleave(" = -ENODATA [mapping]"); + _leave(" = -ENODATA [mapping]"); return -ENODATA; } backpage2 = find_get_page(bmapping, backpage->index); if (!backpage2) { - kleave(" = -ENODATA [gone]"); + _leave(" = -ENODATA [gone]"); return -ENODATA; } if (backpage != backpage2) { put_page(backpage2); - kleave(" = -ENODATA [different]"); + _leave(" = -ENODATA [different]"); return -ENODATA; } @@ -114,7 +114,7 @@ static int cachefiles_read_reissue(struct cachefiles_object *object, if (PageUptodate(backpage)) goto unlock_discard; - kdebug("reissue read"); + _debug("reissue read"); ret = bmapping->a_ops->readpage(NULL, backpage); if (ret < 0) goto unlock_discard; @@ -129,7 +129,7 @@ static int cachefiles_read_reissue(struct cachefiles_object *object, } /* it'll reappear on the todo list */ - kleave(" = -EINPROGRESS"); + _leave(" = -EINPROGRESS"); return -EINPROGRESS; unlock_discard: @@ -137,7 +137,7 @@ unlock_discard: spin_lock_irq(&object->work_lock); list_del(&monitor->op_link); spin_unlock_irq(&object->work_lock); - kleave(" = %d", ret); + _leave(" = %d", ret); return ret; } -- 1.7.10.4 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:17:31 +0000 Subject: FS-Cache: Make cookie relinquishment wait for outstanding reads Make fscache_relinquish_cookie() log a warning and wait if there are any outstanding reads left on the cookie it was given. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- fs/fscache/cookie.c | 18 ++++++++++++++---- fs/fscache/operation.c | 10 ++++++++-- include/linux/fscache-cache.h | 1 + 3 files changed, 23 insertions(+), 6 deletions(-) diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index 0666996adf80..66be9eccede0 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -442,22 +442,32 @@ void __fscache_relinquish_cookie(struct fscache_cookie *cookie, int retire) event = retire ? FSCACHE_OBJECT_EV_RETIRE : FSCACHE_OBJECT_EV_RELEASE; +try_again: spin_lock(&cookie->lock); /* break links with all the active objects */ while (!hlist_empty(&cookie->backing_objects)) { + int n_reads; object = hlist_entry(cookie->backing_objects.first, struct fscache_object, cookie_link); _debug("RELEASE OBJ%x", object->debug_id); - if (atomic_read(&object->n_reads)) { + set_bit(FSCACHE_COOKIE_WAITING_ON_READS, &cookie->flags); + n_reads = atomic_read(&object->n_reads); + if (n_reads) { + int n_ops = object->n_ops; + int n_in_progress = object->n_in_progress; spin_unlock(&cookie->lock); printk(KERN_ERR "FS-Cache:" - " Cookie '%s' still has %d outstanding reads ", - cookie->def->name, atomic_read(&object->n_reads)); - BUG(); + " Cookie '%s' still has %d outstanding reads (%d,%d) ", + cookie->def->name, + n_reads, n_ops, n_in_progress); + wait_on_bit(&cookie->flags, FSCACHE_COOKIE_WAITING_ON_READS, + fscache_wait_bit, TASK_UNINTERRUPTIBLE); + printk("Wait finished "); + goto try_again; } /* detach each cache object from the object cookie */ diff --git a/fs/fscache/operation.c b/fs/fscache/operation.c index 30afdfa7aec7..c857ab824d6e 100644 --- a/fs/fscache/operation.c +++ b/fs/fscache/operation.c @@ -340,8 +340,14 @@ void fscache_put_operation(struct fscache_operation *op) object = op->object; - if (test_bit(FSCACHE_OP_DEC_READ_CNT, &op->flags)) - atomic_dec(&object->n_reads); + if (test_bit(FSCACHE_OP_DEC_READ_CNT, &op->flags)) { + if (atomic_dec_and_test(&object->n_reads)) { + clear_bit(FSCACHE_COOKIE_WAITING_ON_READS, + &object->cookie->flags); + wake_up_bit(&object->cookie->flags, + FSCACHE_COOKIE_WAITING_ON_READS); + } + } /* now... we may get called with the object spinlock held, so we * complete the cleanup here only if we can immediately acquire the diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index 9879183b55d8..e3d6d939d959 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -301,6 +301,7 @@ struct fscache_cookie { #define FSCACHE_COOKIE_PENDING_FILL 3 /* T if pending initial fill on object */ #define FSCACHE_COOKIE_FILLING 4 /* T if filling object incrementally */ #define FSCACHE_COOKIE_UNAVAILABLE 5 /* T if cookie is unavailable (error, etc) */ +#define FSCACHE_COOKIE_WAITING_ON_READS 6 /* T if cookie is waiting on reads */ }; extern struct fscache_cookie fscache_fsdef_index; -- 1.7.10.4 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:17:41 +0000 Subject: FS-Cache: Fix operation state management and accounting Fix the state management of internal fscache operations and the accounting of what operations are in what states. This is done by: (1) Give struct fscache_operation a enum variable that directly represents the state it's currently in, rather than spreading this knowledge over a bunch of flags, who's processing the operation at the moment and whether it is queued or not. This makes it easier to write assertions to check the state at various points and to prevent invalid state transitions. (2) Add an 'operation complete' state and supply a function to indicate the completion of an operation (fscache_op_complete()) and make things call it. The final call to fscache_put_operation() can then check that an op in the appropriate state (complete or cancelled). (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better govern the state of an object: (a) The ->n_ops is now the number of extant operations on the object and is now decremented by fscache_put_operation() only. (b) The ->n_in_progress is simply the number of objects that have been taken off of the object's pending queue for the purposes of being run. This is decremented by fscache_op_complete() only. (c) The ->n_exclusive is the number of exclusive ops that have been submitted and queued or are in progress. It is decremented by fscache_op_complete() and by fscache_cancel_op(). fscache_put_operation() and fscache_operation_gc() now no longer try to clean up ->n_exclusive and ->n_in_progress. That was leading to double decrements against fscache_cancel_op(). fscache_cancel_op() now no longer decrements ->n_ops. That was leading to double decrements against fscache_put_operation(). fscache_submit_exclusive_op() now decides whether it has to queue an op based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter will persist in being true even after all preceding operations have been cancelled or completed. Furthermore, if an object is active and there are runnable ops against it, there must be at least one op running. (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and provide a function to record completion of the pages as they complete. When n_pages reaches 0, the operation is deemed to be complete and fscache_op_complete() is called. Add calls to fscache_retrieval_complete() anywhere we've finished with a page we've been given to read or allocate for. This includes places where we just return pages to the netfs for reading from the server and where accessing the cache fails and we discard the proposed netfs page. The bugs in the unfixed state management manifest themselves as oopses like the following where the operation completion gets out of sync with return of the cookie by the netfs. This is possible because the cache unlocks and returns all the netfs pages before recording its completion - which means that there's nothing to stop the netfs discarding them and returning the cookie. FS-Cache: Cookie 'NFS.fh' still has outstanding reads ------------[ cut here ]------------ kernel BUG at fs/fscache/cookie.c:519! invalid opcode: 0000 [#1] SMP CPU 1 Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090 /DG965RY RIP: 0010:[<ffffffffa007050a>] [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache] RSP: 0018:ffff8800368cfb00 EFLAGS: 00010282 RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000 RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000 R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98 R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370 FS: 0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040) Stack: ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0 ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0 ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91 Call Trace: [<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs] [<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs] [<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs] [<ffffffff810d8d47>] evict+0xa1/0x15c [<ffffffff810d8e2e>] dispose_list+0x2c/0x38 [<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b [<ffffffff810c56b7>] prune_super+0xd5/0x140 [<ffffffff8109b615>] shrink_slab+0x102/0x1ab [<ffffffff8109d690>] balance_pgdat+0x2f2/0x595 [<ffffffff8103e009>] ? process_timeout+0xb/0xb [<ffffffff8109dba3>] kswapd+0x270/0x289 [<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46 [<ffffffff8109d933>] ? balance_pgdat+0x595/0x595 [<ffffffff8104bf7a>] kthread+0x7f/0x87 [<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10 [<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0 [<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe [<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53 [<ffffffff813ad6b0>] ? gs_change+0xb/0xb Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- Documentation/filesystems/caching/backend-api.txt | 26 +++++- Documentation/filesystems/caching/operations.txt | 2 +- fs/cachefiles/rdwr.c | 31 +++++-- fs/fscache/object.c | 2 - fs/fscache/operation.c | 91 +++++++++++++-------- fs/fscache/page.c | 25 +++++- include/linux/fscache-cache.h | 37 +++++++-- 7 files changed, 164 insertions(+), 50 deletions(-) diff --git a/Documentation/filesystems/caching/backend-api.txt b/Documentation/filesystems/caching/backend-api.txt index 382d52cdaf2d..f4769b9399df 100644 --- a/Documentation/filesystems/caching/backend-api.txt +++ b/Documentation/filesystems/caching/backend-api.txt @@ -419,7 +419,10 @@ performed on the denizens of the cache. These are held in a structure of type: If an I/O error occurs, fscache_io_error() should be called and -ENOBUFS returned if possible or fscache_end_io() called with a suitable error - code.. + code. + + fscache_put_retrieval() should be called after a page or pages are dealt + with. This will complete the operation when all pages are dealt with. (*) Request pages be read from cache [mandatory]: @@ -526,6 +529,27 @@ FS-Cache provides some utilities that a cache backend may make use of: error value should be 0 if successful and an error otherwise. + (*) Record that one or more pages being retrieved or allocated have been dealt + with: + + void fscache_retrieval_complete(struct fscache_retrieval *op, + int n_pages); + + This is called to record the fact that one or more pages have been dealt + with and are no longer the concern of this operation. When the number of + pages remaining in the operation reaches 0, the operation will be + completed. + + + (*) Record operation completion: + + void fscache_op_complete(struct fscache_operation *op); + + This is called to record the completion of an operation. This deducts + this operation from the parent object's run state, potentially permitting + one or more pending operations to start running. + + (*) Set highest store limit: void fscache_set_store_limit(struct fscache_object *object, diff --git a/Documentation/filesystems/caching/operations.txt b/Documentation/filesystems/caching/operations.txt index b6b070c57cbf..bee2a5f93d60 100644 --- a/Documentation/filesystems/caching/operations.txt +++ b/Documentation/filesystems/caching/operations.txt @@ -174,7 +174,7 @@ Operations are used through the following procedure: necessary (the object might have died whilst the thread was waiting). When it has finished doing its processing, it should call - fscache_put_operation() on it. + fscache_op_complete() and fscache_put_operation() on it. (4) The operation holds an effective lock upon the object, preventing other exclusive ops conflicting until it is released. The operation can be diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 885f1b26cc16..31cda81656e5 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -197,6 +197,7 @@ static void cachefiles_read_copier(struct fscache_operation *_op) fscache_end_io(op, monitor->netfs_page, error); page_cache_release(monitor->netfs_page); + fscache_retrieval_complete(op, 1); fscache_put_retrieval(op); kfree(monitor); @@ -339,6 +340,7 @@ backing_page_already_uptodate: copy_highpage(netpage, backpage); fscache_end_io(op, netpage, 0); + fscache_retrieval_complete(op, 1); success: _debug("success"); @@ -360,6 +362,7 @@ read_error: goto out; io_error: cachefiles_io_error_obj(object, "Page read error on backing file"); + fscache_retrieval_complete(op, 1); ret = -ENOBUFS; goto out; @@ -369,6 +372,7 @@ nomem_monitor: fscache_put_retrieval(monitor->op); kfree(monitor); nomem: + fscache_retrieval_complete(op, 1); _leave(" = -ENOMEM"); return -ENOMEM; } @@ -407,7 +411,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, _enter("{%p},{%lx},,,", object, page->index); if (!object->backer) - return -ENOBUFS; + goto enobufs; inode = object->backer->d_inode; ASSERT(S_ISREG(inode->i_mode)); @@ -416,7 +420,7 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, /* calculate the shift required to use bmap */ if (inode->i_sb->s_blocksize > PAGE_SIZE) - return -ENOBUFS; + goto enobufs; shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits; @@ -448,13 +452,19 @@ int cachefiles_read_or_alloc_page(struct fscache_retrieval *op, } else if (cachefiles_has_space(cache, 0, 1) == 0) { /* there's space in the cache we can use */ fscache_mark_page_cached(op, page); + fscache_retrieval_complete(op, 1); ret = -ENODATA; } else { - ret = -ENOBUFS; + goto enobufs; } _leave(" = %d", ret); return ret; + +enobufs: + fscache_retrieval_complete(op, 1); + _leave(" = -ENOBUFS"); + return -ENOBUFS; } /* @@ -632,6 +642,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, /* the netpage is unlocked and marked up to date here */ fscache_end_io(op, netpage, 0); + fscache_retrieval_complete(op, 1); page_cache_release(netpage); netpage = NULL; continue; @@ -659,6 +670,7 @@ out: list_for_each_entry_safe(netpage, _n, list, lru) { list_del(&netpage->lru); page_cache_release(netpage); + fscache_retrieval_complete(op, 1); } _leave(" = %d", ret); @@ -707,7 +719,7 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op, *nr_pages); if (!object->backer) - return -ENOBUFS; + goto all_enobufs; space = 1; if (cachefiles_has_space(cache, 0, *nr_pages) < 0) @@ -720,7 +732,7 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op, /* calculate the shift required to use bmap */ if (inode->i_sb->s_blocksize > PAGE_SIZE) - return -ENOBUFS; + goto all_enobufs; shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits; @@ -760,7 +772,10 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op, nrbackpages++; } else if (space && pagevec_add(&pagevec, page) == 0) { fscache_mark_pages_cached(op, &pagevec); + fscache_retrieval_complete(op, 1); ret = -ENODATA; + } else { + fscache_retrieval_complete(op, 1); } } @@ -781,6 +796,10 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op, _leave(" = %d [nr=%u%s]", ret, *nr_pages, list_empty(pages) ? " empty" : ""); return ret; + +all_enobufs: + fscache_retrieval_complete(op, *nr_pages); + return -ENOBUFS; } /* @@ -815,6 +834,7 @@ int cachefiles_allocate_page(struct fscache_retrieval *op, else ret = -ENOBUFS; + fscache_retrieval_complete(op, 1); _leave(" = %d", ret); return ret; } @@ -864,6 +884,7 @@ int cachefiles_allocate_pages(struct fscache_retrieval *op, ret = -ENOBUFS; } + fscache_retrieval_complete(op, *nr_pages); _leave(" = %d", ret); return ret; } diff --git a/fs/fscache/object.c b/fs/fscache/object.c index b6b897c550ac..773bc798a416 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -587,8 +587,6 @@ static void fscache_object_available(struct fscache_object *object) if (object->n_in_progress == 0) { if (object->n_ops > 0) { ASSERTCMP(object->n_ops, >=, object->n_obj_ops); - ASSERTIF(object->n_ops > object->n_obj_ops, - !list_empty(&object->pending_ops)); fscache_start_operations(object); } else { ASSERT(list_empty(&object->pending_ops)); diff --git a/fs/fscache/operation.c b/fs/fscache/operation.c index c857ab824d6e..748f9553c2cb 100644 --- a/fs/fscache/operation.c +++ b/fs/fscache/operation.c @@ -37,6 +37,7 @@ void fscache_enqueue_operation(struct fscache_operation *op) ASSERT(op->processor != NULL); ASSERTCMP(op->object->state, >=, FSCACHE_OBJECT_AVAILABLE); ASSERTCMP(atomic_read(&op->usage), >, 0); + ASSERTCMP(op->state, ==, FSCACHE_OP_ST_IN_PROGRESS); fscache_stat(&fscache_n_op_enqueue); switch (op->flags & FSCACHE_OP_TYPE) { @@ -64,6 +65,9 @@ EXPORT_SYMBOL(fscache_enqueue_operation); static void fscache_run_op(struct fscache_object *object, struct fscache_operation *op) { + ASSERTCMP(op->state, ==, FSCACHE_OP_ST_PENDING); + + op->state = FSCACHE_OP_ST_IN_PROGRESS; object->n_in_progress++; if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags)) wake_up_bit(&op->flags, FSCACHE_OP_WAITING); @@ -80,22 +84,23 @@ static void fscache_run_op(struct fscache_object *object, int fscache_submit_exclusive_op(struct fscache_object *object, struct fscache_operation *op) { - int ret; - _enter("{OBJ%x OP%x},", object->debug_id, op->debug_id); + ASSERTCMP(op->state, ==, FSCACHE_OP_ST_INITIALISED); + ASSERTCMP(atomic_read(&op->usage), >, 0); + spin_lock(&object->lock); ASSERTCMP(object->n_ops, >=, object->n_in_progress); ASSERTCMP(object->n_ops, >=, object->n_exclusive); ASSERT(list_empty(&op->pend_link)); - ret = -ENOBUFS; + op->state = FSCACHE_OP_ST_PENDING; if (fscache_object_is_active(object)) { op->object = object; object->n_ops++; object->n_exclusive++; /* reads and writes must wait */ - if (object->n_ops > 1) { + if (object->n_in_progress > 0) { atomic_inc(&op->usage); list_add_tail(&op->pend_link, &object->pending_ops); fscache_stat(&fscache_n_op_pend); @@ -111,7 +116,6 @@ int fscache_submit_exclusive_op(struct fscache_object *object, /* need to issue a new write op after this */ clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags); - ret = 0; } else if (object->state == FSCACHE_OBJECT_CREATING) { op->object = object; object->n_ops++; @@ -119,14 +123,13 @@ int fscache_submit_exclusive_op(struct fscache_object *object, atomic_inc(&op->usage); list_add_tail(&op->pend_link, &object->pending_ops); fscache_stat(&fscache_n_op_pend); - ret = 0; } else { /* not allowed to submit ops in any other state */ BUG(); } spin_unlock(&object->lock); - return ret; + return 0; } /* @@ -186,6 +189,7 @@ int fscache_submit_op(struct fscache_object *object, _enter("{OBJ%x OP%x},{%u}", object->debug_id, op->debug_id, atomic_read(&op->usage)); + ASSERTCMP(op->state, ==, FSCACHE_OP_ST_INITIALISED); ASSERTCMP(atomic_read(&op->usage), >, 0); spin_lock(&object->lock); @@ -196,6 +200,7 @@ int fscache_submit_op(struct fscache_object *object, ostate = object->state; smp_rmb(); + op->state = FSCACHE_OP_ST_PENDING; if (fscache_object_is_active(object)) { op->object = object; object->n_ops++; @@ -225,12 +230,15 @@ int fscache_submit_op(struct fscache_object *object, object->state == FSCACHE_OBJECT_LC_DYING || object->state == FSCACHE_OBJECT_WITHDRAWING) { fscache_stat(&fscache_n_op_rejected); + op->state = FSCACHE_OP_ST_CANCELLED; ret = -ENOBUFS; } else if (!test_bit(FSCACHE_IOERROR, &object->cache->flags)) { fscache_report_unexpected_submission(object, op, ostate); ASSERT(!fscache_object_is_active(object)); + op->state = FSCACHE_OP_ST_CANCELLED; ret = -ENOBUFS; } else { + op->state = FSCACHE_OP_ST_CANCELLED; ret = -ENOBUFS; } @@ -290,13 +298,18 @@ int fscache_cancel_op(struct fscache_operation *op) _enter("OBJ%x OP%x}", op->object->debug_id, op->debug_id); + ASSERTCMP(op->state, >=, FSCACHE_OP_ST_PENDING); + ASSERTCMP(op->state, !=, FSCACHE_OP_ST_CANCELLED); + ASSERTCMP(atomic_read(&op->usage), >, 0); + spin_lock(&object->lock); ret = -EBUSY; - if (!list_empty(&op->pend_link)) { + if (op->state == FSCACHE_OP_ST_PENDING) { + ASSERT(!list_empty(&op->pend_link)); fscache_stat(&fscache_n_op_cancelled); list_del_init(&op->pend_link); - object->n_ops--; + op->state = FSCACHE_OP_ST_CANCELLED; if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) object->n_exclusive--; if (test_and_clear_bit(FSCACHE_OP_WAITING, &op->flags)) @@ -311,6 +324,37 @@ int fscache_cancel_op(struct fscache_operation *op) } /* + * Record the completion of an in-progress operation. + */ +void fscache_op_complete(struct fscache_operation *op) +{ + struct fscache_object *object = op->object; + + _enter("OBJ%x", object->debug_id); + + ASSERTCMP(op->state, ==, FSCACHE_OP_ST_IN_PROGRESS); + ASSERTCMP(object->n_in_progress, >, 0); + ASSERTIFCMP(test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags), + object->n_exclusive, >, 0); + ASSERTIFCMP(test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags), + object->n_in_progress, ==, 1); + + spin_lock(&object->lock); + + op->state = FSCACHE_OP_ST_COMPLETE; + + if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) + object->n_exclusive--; + object->n_in_progress--; + if (object->n_in_progress == 0) + fscache_start_operations(object); + + spin_unlock(&object->lock); + _leave(""); +} +EXPORT_SYMBOL(fscache_op_complete); + +/* * release an operation * - queues pending ops if this is the last in-progress op */ @@ -328,8 +372,9 @@ void fscache_put_operation(struct fscache_operation *op) return; _debug("PUT OP"); - if (test_and_set_bit(FSCACHE_OP_DEAD, &op->flags)) - BUG(); + ASSERTIFCMP(op->state != FSCACHE_OP_ST_COMPLETE, + op->state, ==, FSCACHE_OP_ST_CANCELLED); + op->state = FSCACHE_OP_ST_DEAD; fscache_stat(&fscache_n_op_release); @@ -365,16 +410,6 @@ void fscache_put_operation(struct fscache_operation *op) return; } - if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) { - ASSERTCMP(object->n_exclusive, >, 0); - object->n_exclusive--; - } - - ASSERTCMP(object->n_in_progress, >, 0); - object->n_in_progress--; - if (object->n_in_progress == 0) - fscache_start_operations(object); - ASSERTCMP(object->n_ops, >, 0); object->n_ops--; if (object->n_ops == 0) @@ -413,23 +448,14 @@ void fscache_operation_gc(struct work_struct *work) spin_unlock(&cache->op_gc_list_lock); object = op->object; + spin_lock(&object->lock); _debug("GC DEFERRED REL OBJ%x OP%x", object->debug_id, op->debug_id); fscache_stat(&fscache_n_op_gc); ASSERTCMP(atomic_read(&op->usage), ==, 0); - - spin_lock(&object->lock); - if (test_bit(FSCACHE_OP_EXCLUSIVE, &op->flags)) { - ASSERTCMP(object->n_exclusive, >, 0); - object->n_exclusive--; - } - - ASSERTCMP(object->n_in_progress, >, 0); - object->n_in_progress--; - if (object->n_in_progress == 0) - fscache_start_operations(object); + ASSERTCMP(op->state, ==, FSCACHE_OP_ST_DEAD); ASSERTCMP(object->n_ops, >, 0); object->n_ops--; @@ -437,6 +463,7 @@ void fscache_operation_gc(struct work_struct *work) fscache_raise_event(object, FSCACHE_OBJECT_EV_CLEARED); spin_unlock(&object->lock); + kfree(op); } while (count++ < 20); diff --git a/fs/fscache/page.c b/fs/fscache/page.c index 248a12e22532..b38b13d2a555 100644 --- a/fs/fscache/page.c +++ b/fs/fscache/page.c @@ -162,6 +162,7 @@ static void fscache_attr_changed_op(struct fscache_operation *op) fscache_abort_object(object); } + fscache_op_complete(op); _leave(""); } @@ -223,6 +224,8 @@ static void fscache_release_retrieval_op(struct fscache_operation *_op) _enter("{OP%x}", op->op.debug_id); + ASSERTCMP(op->n_pages, ==, 0); + fscache_hist(fscache_retrieval_histogram, op->start_time); if (op->context) fscache_put_context(op->op.object->cookie, op->context); @@ -320,6 +323,11 @@ static int fscache_wait_for_retrieval_activation(struct fscache_object *object, _debug("<<< GO"); check_if_dead: + if (op->op.state == FSCACHE_OP_ST_CANCELLED) { + fscache_stat(stat_object_dead); + _leave(" = -ENOBUFS [cancelled]"); + return -ENOBUFS; + } if (unlikely(fscache_object_is_dead(object))) { fscache_stat(stat_object_dead); return -ENOBUFS; @@ -364,6 +372,7 @@ int __fscache_read_or_alloc_page(struct fscache_cookie *cookie, _leave(" = -ENOMEM"); return -ENOMEM; } + op->n_pages = 1; spin_lock(&cookie->lock); @@ -375,10 +384,10 @@ int __fscache_read_or_alloc_page(struct fscache_cookie *cookie, ASSERTCMP(object->state, >, FSCACHE_OBJECT_LOOKING_UP); atomic_inc(&object->n_reads); - set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags); + __set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags); if (fscache_submit_op(object, &op->op) < 0) - goto nobufs_unlock; + goto nobufs_unlock_dec; spin_unlock(&cookie->lock); fscache_stat(&fscache_n_retrieval_ops); @@ -425,6 +434,8 @@ error: _leave(" = %d", ret); return ret; +nobufs_unlock_dec: + atomic_dec(&object->n_reads); nobufs_unlock: spin_unlock(&cookie->lock); kfree(op); @@ -482,6 +493,7 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie, op = fscache_alloc_retrieval(mapping, end_io_func, context); if (!op) return -ENOMEM; + op->n_pages = *nr_pages; spin_lock(&cookie->lock); @@ -491,10 +503,10 @@ int __fscache_read_or_alloc_pages(struct fscache_cookie *cookie, struct fscache_object, cookie_link); atomic_inc(&object->n_reads); - set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags); + __set_bit(FSCACHE_OP_DEC_READ_CNT, &op->op.flags); if (fscache_submit_op(object, &op->op) < 0) - goto nobufs_unlock; + goto nobufs_unlock_dec; spin_unlock(&cookie->lock); fscache_stat(&fscache_n_retrieval_ops); @@ -541,6 +553,8 @@ error: _leave(" = %d", ret); return ret; +nobufs_unlock_dec: + atomic_dec(&object->n_reads); nobufs_unlock: spin_unlock(&cookie->lock); kfree(op); @@ -583,6 +597,7 @@ int __fscache_alloc_page(struct fscache_cookie *cookie, op = fscache_alloc_retrieval(page->mapping, NULL, NULL); if (!op) return -ENOMEM; + op->n_pages = 1; spin_lock(&cookie->lock); @@ -696,6 +711,7 @@ static void fscache_write_op(struct fscache_operation *_op) fscache_end_page_write(object, page); if (ret < 0) { fscache_abort_object(object); + fscache_op_complete(&op->op); } else { fscache_enqueue_operation(&op->op); } @@ -710,6 +726,7 @@ superseded: spin_unlock(&cookie->stores_lock); clear_bit(FSCACHE_OBJECT_PENDING_WRITE, &object->flags); spin_unlock(&object->lock); + fscache_op_complete(&op->op); _leave(""); } diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h index e3d6d939d959..f5facd1d333f 100644 --- a/include/linux/fscache-cache.h +++ b/include/linux/fscache-cache.h @@ -75,6 +75,16 @@ extern wait_queue_head_t fscache_cache_cleared_wq; typedef void (*fscache_operation_release_t)(struct fscache_operation *op); typedef void (*fscache_operation_processor_t)(struct fscache_operation *op); +enum fscache_operation_state { + FSCACHE_OP_ST_BLANK, /* Op is not yet submitted */ + FSCACHE_OP_ST_INITIALISED, /* Op is initialised */ + FSCACHE_OP_ST_PENDING, /* Op is blocked from running */ + FSCACHE_OP_ST_IN_PROGRESS, /* Op is in progress */ + FSCACHE_OP_ST_COMPLETE, /* Op is complete */ + FSCACHE_OP_ST_CANCELLED, /* Op has been cancelled */ + FSCACHE_OP_ST_DEAD /* Op is now dead */ +}; + struct fscache_operation { struct work_struct work; /* record for async ops */ struct list_head pend_link; /* link in object->pending_ops */ @@ -86,10 +96,10 @@ struct fscache_operation { #define FSCACHE_OP_MYTHREAD 0x0002 /* - processing is done be issuing thread, not pool */ #define FSCACHE_OP_WAITING 4 /* cleared when op is woken */ #define FSCACHE_OP_EXCLUSIVE 5 /* exclusive op, other ops must wait */ -#define FSCACHE_OP_DEAD 6 /* op is now dead */ -#define FSCACHE_OP_DEC_READ_CNT 7 /* decrement object->n_reads on destruction */ -#define FSCACHE_OP_KEEP_FLAGS 0xc0 /* flags to keep when repurposing an op */ +#define FSCACHE_OP_DEC_READ_CNT 6 /* decrement object->n_reads on destruction */ +#define FSCACHE_OP_KEEP_FLAGS 0x0070 /* flags to keep when repurposing an op */ + enum fscache_operation_state state; atomic_t usage; unsigned debug_id; /* debugging ID */ @@ -106,6 +116,7 @@ extern atomic_t fscache_op_debug_id; extern void fscache_op_work_func(struct work_struct *work); extern void fscache_enqueue_operation(struct fscache_operation *); +extern void fscache_op_complete(struct fscache_operation *); extern void fscache_put_operation(struct fscache_operation *); /** @@ -122,6 +133,7 @@ static inline void fscache_operation_init(struct fscache_operation *op, { INIT_WORK(&op->work, fscache_op_work_func); atomic_set(&op->usage, 1); + op->state = FSCACHE_OP_ST_INITIALISED; op->debug_id = atomic_inc_return(&fscache_op_debug_id); op->processor = processor; op->release = release; @@ -138,6 +150,7 @@ struct fscache_retrieval { void *context; /* netfs read context (pinned) */ struct list_head to_do; /* list of things to be done by the backend */ unsigned long start_time; /* time at which retrieval started */ + unsigned n_pages; /* number of pages to be retrieved */ }; typedef int (*fscache_page_retrieval_func_t)(struct fscache_retrieval *op, @@ -174,8 +187,22 @@ static inline void fscache_enqueue_retrieval(struct fscache_retrieval *op) } /** + * fscache_retrieval_complete - Record (partial) completion of a retrieval + * @op: The retrieval operation affected + * @n_pages: The number of pages to account for + */ +static inline void fscache_retrieval_complete(struct fscache_retrieval *op, + int n_pages) +{ + op->n_pages -= n_pages; + if (op->n_pages <= 0) + fscache_op_complete(&op->op); +} + +/** * fscache_put_retrieval - Drop a reference to a retrieval operation * @op: The retrieval operation affected + * @n_pages: The number of pages to account for * * Drop a reference to a retrieval operation. */ @@ -333,10 +360,10 @@ struct fscache_object { int debug_id; /* debugging ID */ int n_children; /* number of child objects */ - int n_ops; /* number of ops outstanding on object */ + int n_ops; /* number of extant ops on object */ int n_obj_ops; /* number of object ops outstanding on object */ int n_in_progress; /* number of ops in progress */ - int n_exclusive; /* number of exclusive ops queued */ + int n_exclusive; /* number of exclusive ops queued or in progress */ atomic_t n_reads; /* number of read ops in progress */ spinlock_t lock; /* state and operations lock */ -- 1.7.10.4 From: David Howells <dhowells@redhat.com> Date: Wed, 8 Feb 2012 21:17:52 +0000 Subject: FS-Cache: Provide proper invalidation Provide a proper invalidation method rather than relying on the netfs retiring the cookie it has and getting a new one. The problem with this is that isn't easy for the netfs to make sure that it has completed/cancelled all its outstanding storage and retrieval operations on the cookie it is retiring. Instead, have the cache provide an invalidation method that will cancel or wait for all currently outstanding operations before invalidating the cache, and will cause new operations to queue up behind that. Whilst invalidation is in progress, some requests will be rejected until the cache can stack a barrier on the operation queue to cause new operations to be deferred behind it. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> --- Documentation/filesystems/caching/backend-api.txt | 12 ++++ Documentation/filesystems/caching/netfs-api.txt | 46 ++++++++++--- Documentation/filesystems/caching/object.txt | 23 ++++--- fs/fscache/cookie.c | 60 +++++++++++++++++ fs/fscache/internal.h | 10 +++ fs/fscache/object.c | 72 +++++++++++++++++++++ fs/fscache/operation.c | 32 +++++++++ fs/fscache/page.c | 51 +++++++++++++++ fs/fscache/stats.c | 11 +++- include/linux/fscache-cache.h | 8 ++- include/linux/fscache.h | 38 +++++++++++ 11 files changed, 345 insertions(+), 18 deletions(-) diff --git a/Documentation/filesystems/caching/backend-api.txt b/Documentation/filesystems/caching/backend-api.txt index f4769b9399df..d78bab9622c6 100644 --- a/Documentation/filesystems/caching/backend-api.txt +++ b/Documentation/filesystems/caching/backend-api.txt @@ -308,6 +308,18 @@ performed on the denizens of the cache. These are held in a structure of type: obtained by calling object->cookie->def->get_aux()/get_attr(). + (*) Invalidate data object [mandatory]: + + int (*invalidate_object)(struct fscache_operation *op) + + This is called to invalidate a data object (as pointed to by op->object). + All the data stored for this object should be discarded and an + attr_changed operation should be performed. The caller will follow up + with an object update operation. + + fscache_op_complete() must be called on op before returning. + + (*) Discard object [mandatory]: void (*drop_object)(struct fscache_object *object) diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt index 7cc6bf2871eb..97e6c0ecc5ef 100644 --- a/Documentation/filesystems/caching/netfs-api.txt +++ b/Documentation/filesystems/caching/netfs-api.txt @@ -35,8 +35,9 @@ This document contains the following sections: (12) Index and data file update (13) Miscellaneous cookie operations (14) Cookie unregistration - (15) Index and data file invalidation - (16) FS-Cache specific page flags. + (15) Index invalidation + (16) Data file invalidation + (17) FS-Cache specific page flags. ============================= @@ -767,13 +768,42 @@ the cookies for "child" indices, objects and pages have been relinquished first. -================================ -INDEX AND DATA FILE INVALIDATION -================================ +================== +INDEX INVALIDATION +================== -There is no direct way to invalidate an index subtree or a data file. To do -this, the caller should relinquish and retire the cookie they have, and then -acquire a new one. +There is no direct way to invalidate an index subtree. To do this, the caller +should relinquish and retire the cookie they have, and then acquire a new one. + + +====================== +DATA FILE INVALIDATION +====================== + +Sometimes it will be necessary to invalidate an object that contains data. +Typically this will be necessary when the server tells the netfs of a foreign +change - at which point the netfs has to throw away all the state it had for an +inode and reload from the server. + +To indicate that a cache object should be invalidated, the following function +can be called: + + void fscache_invalidate(struct fscache_cookie *cookie); + +This can be called with spinlocks held as it defers the work to a thread pool. +All extant storage, retrieval and attribute change ops at this point are +cancelled and discarded. Some future operations will be rejected until the +cache has had a chance to insert a barrier in the operations queue. After +that, operations will be queued again behind the invalidation operation. + +The invalidation operation will perform an attribute change operation and an +auxiliary data update operation as it is very likely these will have changed. + +Using the following function, the netfs can wait for the invalidation operation +to have reached a point at which it can start submitting ordinary operations +once again: + + void fscache_wait_on_invalidate(struct fscache_cookie *cookie); =========================== diff --git a/Documentation/filesystems/caching/object.txt b/Documentation/filesystems/caching/object.txt index 58313348da87..100ff41127e4 100644 --- a/Documentation/filesystems/caching/object.txt +++ b/Documentation/filesystems/caching/object.txt @@ -216,7 +216,14 @@ servicing netfs requests: The normal running state. In this state, requests the netfs makes will be passed on to the cache. - (6) State FSCACHE_OBJECT_UPDATING. + (6) State FSCACHE_OBJECT_INVALIDATING. + + The object is undergoing invalidation. When the state comes here, it + discards all pending read, write and attribute change operations as it is + going to clear out the cache entirely and reinitialise it. It will then + continue to the FSCACHE_OBJECT_UPDATING state. + + (7) State FSCACHE_OBJECT_UPDATING. The state machine comes here to update the object in the cache from the netfs's records. This involves updating the auxiliary data that is used @@ -225,13 +232,13 @@ servicing netfs requests: And there are terminal states in which an object cleans itself up, deallocates memory and potentially deletes stuff from disk: - (7) State FSCACHE_OBJECT_LC_DYING. + (8) State FSCACHE_OBJECT_LC_DYING. The object comes here if it is dying because of a lookup or creation error. This would be due to a disk error or system error of some sort. Temporary data is cleaned up, and the parent is released. - (8) State FSCACHE_OBJECT_DYING. + (9) State FSCACHE_OBJECT_DYING. The object comes here if it is dying due to an error, because its parent cookie has been relinquished by the netfs or because the cache is being @@ -241,27 +248,27 @@ memory and potentially deletes stuff from disk: can destroy themselves. This object waits for all its children to go away before advancing to the next state. - (9) State FSCACHE_OBJECT_ABORT_INIT. +(10) State FSCACHE_OBJECT_ABORT_INIT. The object comes to this state if it was waiting on its parent in FSCACHE_OBJECT_INIT, but its parent died. The object will destroy itself so that the parent may proceed from the FSCACHE_OBJECT_DYING state. -(10) State FSCACHE_OBJECT_RELEASING. -(11) State FSCACHE_OBJECT_RECYCLING. +(11) State FSCACHE_OBJECT_RELEASING. +(12) State FSCACHE_OBJECT_RECYCLING. The object comes to one of these two states when dying once it is rid of all its children, if it is dying because the netfs relinquished its cookie. In the first state, the cached data is expected to persist, and in the second it will be deleted. -(12) State FSCACHE_OBJECT_WITHDRAWING. +(13) State FSCACHE_OBJECT_WITHDRAWING. The object transits to this state if the cache decides it wants to withdraw the object from service, perhaps to make space, but also due to error or just because the whole cache is being withdrawn. -(13) State FSCACHE_OBJECT_DEAD. +(14) State FSCACHE_OBJECT_DEAD. The object transits to this state when the in-memory object record is ready to be deleted. The object processor shouldn't ever see an object in diff --git a/fs/fscache/cookie.c b/fs/fscache/cookie.c index 66be9eccede0..8dcb114758e3 100644 --- a/fs/fscache/cookie.c +++ b/fs/fscache/cookie.c @@ -370,6 +370,66 @@ cant_attach_object: } /* + * Invalidate an object. Callable with spinlocks held. + */ +void __fscache_invalidate(struct fscache_cookie *cookie) +{ + struct fscache_object *object; + + _enter("{%s}", cookie->def->name); + + fscache_stat(&fscache_n_invalidates); + + /* Only permit invalidation of data files. Invalidating an index will + * require the caller to release all its attachments to the tree rooted + * there, and if it's doing that, it may as well just retire the + * cookie. + */ + ASSERTCMP(cookie->def->type, ==, FSCACHE_COOKIE_TYPE_DATAFILE); + + /* We will be updating the cookie too. */ + BUG_ON(!cookie->def->get_aux); + + /* If there's an object, we tell the object state machine to handle the + * invalidation on our behalf, otherwise there's nothing to do. + */ + if (!hlist_empty(&cookie->backing_objects)) { + spin_lock(&cookie->lock); + + if (!hlist_empty(&cookie->backing_objects) && + !test_and_set_bit(FSCACHE_COOKIE_INVALIDATING, + &cookie->flags)) { + object = hlist_entry(cookie->backing_objects.first, + struct fscache_object, + cookie_link); + if (object->state < FSCACHE_OBJECT_DYING) + fscache_raise_event( + object, FSCACHE_OBJECT_EV_INVALIDATE); + } + + spin_unlock(&cookie->lock); + } + + _leave(""); +} +EXPORT_SYMBOL(__fscache_invalidate); + +/* + * Wait for object invalidation to complete. + */ +void __fscache_wait_on_invalidate(struct fscache_cookie *cookie) +{ + _enter("%p", cookie); + + wait_on_bit(&cookie->flags, FSCACHE_COOKIE_INVALIDATING, + fscache_wait_bit_interruptible, + TASK_UNINTERRUPTIBLE); + + _leave(""); +} +EXPORT_SYMBOL(__fscache_wait_on_invalidate); + +/* * update the index entries backing a cookie */ void __fscache_update_cookie(struct fscache_cookie *cookie) diff --git a/fs/fscache/internal.h b/fs/fscache/internal.h index f6aad48d38a8..c81179303930 100644 --- a/fs/fscache/internal.h +++ b/fs/fscache/internal.h @@ -122,11 +122,17 @@ extern int fscache_submit_exclusive_op(struct fscache_object *, extern int fscache_submit_op(struct fscache_object *, struct fscache_operation *); extern int fscache_cancel_op(struct fscache_operation *); +extern void fscache_cancel_all_ops(struct fscache_object *); extern void fscache_abort_object(struct fscache_object *); extern void fscache_start_operations(struct fscache_object *); extern void fscache_operation_gc(struct work_struct *); /* + * page.c + */ +extern void fscache_invalidate_writes(struct fscache_cookie *); + +/* * proc.c */ #ifdef CONFIG_PROC_FS @@ -205,6 +211,9 @@ extern atomic_t fscache_n_acquires_ok; extern atomic_t fscache_n_acquires_nobufs; extern atomic_t fscache_n_acquires_oom; +extern atomic_t fscache_n_invalidates; +extern atomic_t fscache_n_invalidates_run; + extern atomic_t fscache_n_updates; extern atomic_t fscache_n_updates_null; extern atomic_t fscache_n_updates_run; @@ -237,6 +246,7 @@ extern atomic_t fscache_n_cop_alloc_object; extern atomic_t fscache_n_cop_lookup_object; extern atomic_t fscache_n_cop_lookup_complete; extern atomic_t fscache_n_cop_grab_object; +extern atomic_t fscache_n_cop_invalidate_object; extern atomic_t fscache_n_cop_update_object; extern atomic_t fscache_n_cop_drop_object; extern atomic_t fscache_n_cop_put_object; diff --git a/fs/fscache/object.c b/fs/fscache/object.c index 773bc798a416..80b549141ea6 100644 --- a/fs/fscache/object.c +++ b/fs/fscache/object.c @@ -14,6 +14,7 @@ #define FSCACHE_DEBUG_LEVEL COOKIE #include <linux/module.h> +#include <linux/slab.h> #include "internal.h" const char *fscache_object_states[FSCACHE_OBJECT__NSTATES] = { @@ -22,6 +23,7 @@ const char *fscache_object_states[FSCACHE_OBJECT__NSTATES] = { [FSCACHE_OBJECT_CREATING] = "OBJECT_CREATING", [FSCACHE_OBJECT_AVAILABLE] = "OBJECT_AVAILABLE", [FSCACHE_OBJECT_ACTIVE] = "OBJECT_ACTIVE", + [FSCACHE_OBJECT_INVALIDATING] = "OBJECT_INVALIDATING", [FSCACHE_OBJECT_UPDATING] = "OBJECT_UPDATING", [FSCACHE_OBJECT_DYING] = "OBJECT_DYING", [FSCACHE_OBJECT_LC_DYING] = "OBJECT_LC_DYING", @@ -39,6 +41,7 @@ const char fscache_object_states_short[FSCACHE_OBJECT__NSTATES][5] = { [FSCACHE_OBJECT_CREATING] = "CRTN", [FSCACHE_OBJECT_AVAILABLE] = "AVBL", [FSCACHE_OBJECT_ACTIVE] = "ACTV", + [FSCACHE_OBJECT_INVALIDATING] = "INVL", [FSCACHE_OBJECT_UPDATING] = "UPDT", [FSCACHE_OBJECT_DYING] = "DYNG", [FSCACHE_OBJECT_LC_DYING] = "LCDY", @@ -54,6 +57,7 @@ static void fscache_put_object(struct fscache_object *); static void fscache_initialise_object(struct fscache_object *); static void fscache_lookup_object(struct fscache_object *); static void fscache_object_available(struct fscache_object *); +static void fscache_invalidate_object(struct fscache_object *); static void fscache_release_object(struct fscache_object *); static void fscache_withdraw_object(struct fscache_object *); static void fscache_enqueue_dependents(struct fscache_object *); @@ -79,6 +83,15 @@ static inline void fscache_done_parent_op(struct fscache_object *object) } /* + * Notify netfs of |
Bug#682007: NULL pointer dereference in __fscache_read_or_alloc_pages
Brian Paul Kroth <bpkroth@gmail.com> 2012-10-11 14:06:
Jonathan Nieder <jrnieder@gmail.com> 2012-10-01 01:25: <snip/> Once again very sorry for the delay :( I forgot to disable the DEBUG_INFO and kept filling up my build VMs disk during compile. Then realized I had grabbed the 3.7 rc code, which these patches don't apply against. "git checkout remotes/stable/linux-3.2.y" (results in head c74a5e1fe4d0672936c8fb63d7484dfeaa30669c and 3.2.28), seemed to fix that. <snip/> Anyways, I just started running that on a machine, so I'll let you know if I noticed anything there first before I think about pushing it to further places. Thanks, Brian Got another panic using this kernel/set of patches. The dump is attached. Let me know if you need anything else. Thanks, Brian Oct 12 13:43:01 kefka [14595.129262] FS-Cache: Unsupported event 2 [44/7] in state OBJECT_DEAD Oct 12 13:43:01 kefka [14595.129317] ------------[ cut here ]------------ Oct 12 13:43:01 kefka [14595.129338] kernel BUG at fs/fscache/object.c:357! Oct 12 13:43:01 kefka [14595.129358] invalid opcode: 0000 [#1] Oct 12 13:43:01 kefka SMP Oct 12 13:43:01 kefka Oct 12 13:43:01 kefka [14595.129390] CPU 1 Oct 12 13:43:01 kefka Oct 12 13:43:01 kefka [14595.129395] Modules linked in: Oct 12 13:43:01 kefka acpi_cpufreq Oct 12 13:43:01 kefka mperf Oct 12 13:43:01 kefka cpufreq_stats Oct 12 13:43:01 kefka cpufreq_userspace Oct 12 13:43:01 kefka cpufreq_powersave Oct 12 13:43:01 kefka cpufreq_conservative Oct 12 13:43:01 kefka autofs4 Oct 12 13:43:01 kefka kvm_intel Oct 12 13:43:01 kefka kvm Oct 12 13:43:01 kefka cachefiles Oct 12 13:43:01 kefka binfmt_misc Oct 12 13:43:01 kefka nfsd Oct 12 13:43:01 kefka nfs Oct 12 13:43:01 kefka lockd Oct 12 13:43:01 kefka fscache Oct 12 13:43:01 kefka auth_rpcgss Oct 12 13:43:01 kefka nfs_acl Oct 12 13:43:01 kefka sunrpc Oct 12 13:43:01 kefka netconsole Oct 12 13:43:01 kefka configfs Oct 12 13:43:01 kefka ext3 Oct 12 13:43:01 kefka jbd Oct 12 13:43:01 kefka coretemp Oct 12 13:43:01 kefka ipmi_watchdog Oct 12 13:43:01 kefka ipmi_devintf Oct 12 13:43:01 kefka ipmi_si Oct 12 13:43:01 kefka ipmi_msghandler Oct 12 13:43:01 kefka fuse Oct 12 13:43:01 kefka uhci_hcd Oct 12 13:43:01 kefka ohci_hcd Oct 12 13:43:01 kefka tpm_infineon Oct 12 13:43:01 kefka snd_hda_codec_realtek Oct 12 13:43:01 kefka snd_hda_intel Oct 12 13:43:01 kefka snd_hda_codec Oct 12 13:43:01 kefka snd_hwdep Oct 12 13:43:01 kefka snd_pcm_oss Oct 12 13:43:01 kefka snd_mixer_oss Oct 12 13:43:01 kefka snd_pcm Oct 12 13:43:01 kefka snd_seq_midi Oct 12 13:43:01 kefka button Oct 12 13:43:01 kefka hp_wmi Oct 12 13:43:01 kefka snd_rawmidi Oct 12 13:43:01 kefka snd_seq_midi_event Oct 12 13:43:01 kefka processor Oct 12 13:43:01 kefka sparse_keymap Oct 12 13:43:01 kefka rfkill Oct 12 13:43:01 kefka snd_seq Oct 12 13:43:01 kefka psmouse Oct 12 13:43:01 kefka thermal_sys Oct 12 13:43:01 kefka serio_raw Oct 12 13:43:01 kefka joydev Oct 12 13:43:01 kefka evdev Oct 12 13:43:01 kefka tpm_tis Oct 12 13:43:01 kefka tpm Oct 12 13:43:01 kefka i2c_i801 Oct 12 13:43:01 kefka tpm_bios Oct 12 13:43:01 kefka i2c_core Oct 12 13:43:01 kefka wmi Oct 12 13:43:01 kefka snd_timer Oct 12 13:43:01 kefka snd_seq_device Oct 12 13:43:01 kefka snd Oct 12 13:43:01 kefka soundcore Oct 12 13:43:01 kefka snd_page_alloc Oct 12 13:43:01 kefka ext4 Oct 12 13:43:01 kefka mbcache Oct 12 13:43:01 kefka jbd2 Oct 12 13:43:01 kefka crc16 Oct 12 13:43:01 kefka dm_mod Oct 12 13:43:01 kefka raid10 Oct 12 13:43:01 kefka raid456 Oct 12 13:43:01 kefka async_raid6_recov Oct 12 13:43:01 kefka async_pq Oct 12 13:43:01 kefka raid6_pq Oct 12 13:43:01 kefka async_xor Oct 12 13:43:01 kefka xor Oct 12 13:43:01 kefka async_memcpy Oct 12 13:43:01 kefka async_tx Oct 12 13:43:01 kefka raid1 Oct 12 13:43:01 kefka raid0 Oct 12 13:43:01 kefka multipath Oct 12 13:43:01 kefka linear Oct 12 13:43:01 kefka md_mod Oct 12 13:43:01 kefka hid_microsoft Oct 12 13:43:01 kefka usbhid Oct 12 13:43:01 kefka hid Oct 12 13:43:01 kefka sg Oct 12 13:43:01 kefka sr_mod Oct 12 13:43:01 kefka sd_mod Oct 12 13:43:01 kefka cdrom Oct 12 13:43:01 kefka crc_t10dif Oct 12 13:43:01 kefka ahci Oct 12 13:43:01 kefka libahci Oct 12 13:43:01 kefka libata Oct 12 13:43:01 kefka scsi_mod Oct 12 13:43:01 kefka ehci_hcd Oct 12 13:43:01 kefka usbcore Oct 12 13:43:01 kefka e1000e Oct 12 13:43:01 kefka usb_common Oct 12 13:43:01 kefka [last unloaded: microcode] Oct 12 13:43:01 kefka Oct 12 13:43:01 kefka [14595.130083] Oct 12 13:43:01 kefka [14595.130101] Pid: 25732, comm: kworker/u:0 Not tainted 3.2.28+ #8 Oct 12 13:43:01 kefka Hewlett-Packard HP Compaq 8200 Elite CMT PC Oct 12 13:43:01 kefka /1494 Oct 12 13:43:01 kefka Oct 12 13:43:01 kefka [14595.130149] RIP: 0010:[<ffffffffa0411fe5>] Oct 12 13:43:01 kefka [<ffffffffa0411fe5>] fscache_object_work_func+0x79c/0x7db [fscache] Oct 12 13:43:01 kefka [14595.130192] RSP: 0018:ffff88021ed15e20 EFLAGS: 00010286 Oct 12 13:43:01 kefka [14595.130217] RAX: 000000000000004f RBX: ffff88021f6406c0 RCX: 000000004a524a52 Oct 12 13:43:01 kefka [14595.130240] RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246 Oct 12 13:43:01 kefka [14595.130267] RBP: ffff88021f640740 R08: 0000000000000000 R09: 0000000000000000 Oct 12 13:43:01 kefka [14595.130290] R10: 000000021f2ae402 R11: 0000000000000001 R12: ffff88021d876600 Oct 12 13:43:01 kefka [14595.130317] R13: ffffffffa0411849 R14: 0000000000000000 R15: ffff88021d876605 Oct 12 13:43:01 kefka [14595.130340] FS: 0000000000000000(0000) GS:ffff88022dc80000(0000) knlGS:0000000000000000 Oct 12 13:43:01 kefka [14595.135548] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 12 13:43:01 kefka [14595.135571] CR2: 00000000010cc318 CR3: 0000000001605000 CR4: 00000000000406e0 Oct 12 13:43:01 kefka [14595.135597] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 12 13:43:01 kefka [14595.135620] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 12 13:43:01 kefka [14595.135647] Process kworker/u:0 (pid: 25732, threadinfo ffff88021ed14000, task ffff88021f6c0100) Oct 12 13:43:01 kefka [14595.135681] Stack: Oct 12 13:43:01 kefka [14595.135696] ffff88021def5ec0 Oct 12 13:43:01 kefka ffffffff817938c0 Oct 12 13:43:01 kefka ffff88021d876600 Oct 12 13:43:01 kefka ffffffffa0411849 Oct 12 13:43:01 kefka Oct 12 13:43:01 kefka [14595.135751] 0000000000000000 Oct 12 13:43:01 kefka ffffffff8105f882 Oct 12 13:43:01 kefka ffff88021f640740 Oct 12 13:43:01 kefka 0000000100000000 Oct 12 13:43:01 kefka Oct 12 13:43:01 kefka [14595.135801] 0000000000000004 Oct 12 13:43:01 kefka ffff88021def5ec0 Oct 12 13:43:01 kefka ffffffff817938c0 Oct 12 13:43:01 kefka ffff88021def5ee0 Oct 12 13:43:01 kefka Oct 12 13:43:01 kefka [14595.135853] Call Trace: Oct 12 13:43:01 kefka [14595.135875] [<ffffffffa0411849>] ? fscache_enqueue_dependents+0xa0/0xa0 [fscache] Oct 12 13:43:01 kefka [14595.135912] [<ffffffff8105f882>] ? process_one_work+0x1cc/0x2ea Oct 12 13:43:01 kefka [14595.135935] [<ffffffff8105facd>] ? worker_thread+0x12d/0x247 Oct 12 13:43:01 kefka [14595.135962] [<ffffffff8105f9a0>] ? process_one_work+0x2ea/0x2ea Oct 12 13:43:01 kefka [14595.135984] [<ffffffff8105f9a0>] ? process_one_work+0x2ea/0x2ea Oct 12 13:43:01 kefka [14595.136012] [<ffffffff810632d9>] ? kthread+0x7a/0x82 Oct 12 13:43:01 kefka [14595.136036] [<ffffffff8136c4b4>] ? kernel_thread_helper+0x4/0x10 Oct 12 13:43:01 kefka [14595.136062] [<ffffffff8106325f>] ? kthread_worker_fn+0x147/0x147 Oct 12 13:43:01 kefka [14595.136085] [<ffffffff8136c4b0>] ? gs_change+0x13/0x13 Oct 12 13:43:01 kefka [14595.136110] Code: Oct 12 13:43:01 kefka 44 Oct 12 13:43:01 kefka 89 Oct 12 13:43:01 kefka 65 Oct 12 13:43:01 kefka 80 Oct 12 13:43:01 kefka 66 Oct 12 13:43:01 kefka ff Oct 12 13:43:01 kefka 43 Oct 12 13:43:01 kefka 20 Oct 12 13:43:01 kefka eb Oct 12 13:43:01 kefka 25 Oct 12 13:43:01 kefka 8b Oct 12 13:43:01 kefka 45 Oct 12 13:43:01 kefka 80 Oct 12 13:43:01 kefka 48 Oct 12 13:43:01 kefka 8b Oct 12 13:43:01 kefka 4d Oct 12 13:43:01 kefka b0 Oct 12 13:43:01 kefka 48 Oct 12 13:43:01 kefka c7 Oct 12 13:43:01 kefka c7 Oct 12 13:43:01 kefka 19 Oct 12 13:43:01 kefka 5a Oct 12 13:43:01 kefka 41 Oct 12 13:43:01 kefka a0 Oct 12 13:43:01 kefka 48 Oct 12 13:43:01 kefka 8b Oct 12 13:43:01 kefka 55 Oct 12 13:43:01 kefka b8 Oct 12 13:43:01 kefka 4c Oct 12 13:43:01 kefka 8b Oct 12 13:43:01 kefka 04 Oct 12 13:43:01 kefka c5 Oct 12 13:43:01 kefka 10 Oct 12 13:43:01 kefka 73 Oct 12 13:43:01 kefka 41 Oct 12 13:43:01 kefka a0 Oct 12 13:43:01 kefka 31 Oct 12 13:43:01 kefka c0 Oct 12 13:43:01 kefka e8 Oct 12 13:43:01 kefka 6e Oct 12 13:43:01 kefka 10 Oct 12 13:43:01 kefka f5 Oct 12 13:43:01 kefka e0 Oct 12 13:43:01 kefka 0b Oct 12 13:43:01 kefka eb Oct 12 13:43:01 kefka fe Oct 12 13:43:01 kefka 48 Oct 12 13:43:01 kefka 8b Oct 12 13:43:01 kefka 45 Oct 12 13:43:01 kefka b0 Oct 12 13:43:01 kefka 48 Oct 12 13:43:01 kefka 85 Oct 12 13:43:01 kefka [14595.137301] Pid: 25732, comm: kworker/u:0 Tainted: G D 3.2.28+ #8 Oct 12 13:43:01 kefka [14595.137397] Call Trace: Oct 12 13:43:01 kefka [14595.137479] [<ffffffff81362f3b>] ? panic+0x92/0x1aa Oct 12 13:43:01 kefka [14595.137590] [<ffffffff81049cd0>] ? kmsg_dump+0x41/0xdd Oct 12 13:43:01 kefka [14595.137686] [<ffffffffa0411849>] ? fscache_enqueue_dependents+0xa0/0xa0 [fscache] Oct 12 13:43:01 kefka [14595.137791] [<ffffffff81365c01>] ? oops_end+0xa9/0xb6 Oct 12 13:43:01 kefka [14595.137896] [<ffffffff8100ee7b>] ? do_invalid_op+0x8b/0x95 Oct 12 13:43:01 kefka [14595.138000] [<ffffffffa0411fe5>] ? fscache_object_work_func+0x79c/0x7db [fscache] Oct 12 13:43:01 kefka [14595.138107] [<ffffffff81045e70>] ? try_to_wake_up+0x181/0x190 Oct 12 13:43:01 kefka [14595.138214] [<ffffffff8136c32b>] ? invalid_op+0x1b/0x20 |
| All times are GMT. The time now is 12:13 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.