FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Ubuntu > Ubuntu Kernel Team

 
 
LinkBack Thread Tools
 
Old 05-06-2010, 10:34 AM
Surbhi Palande
 
Default fix for slow unmount on ext4 - bug 543617 on lp

The following two patches fix the bug 543617 on launchpad. The first patch is
a revert of a temporary fix of this bug. The second patch is a hack which
solves this bug. To resolve this issue without any side effects and
further ado, we can for now, consider the second patch for lucid.

However, there is another bigger patch by Jens Axboe which is
submitted as a fix for this regression/bug. We will pick this patch from
stable as and when it is released and at that point revert the second
patch in this series that is intended to fix the issue for now.

More information of the patchsets and conversation can be found at
https://bugzilla.kernel.org/show_bug.cgi?id=15906

Dmitry Monakhov (1):
writeback: fix __sync_filesystem(sb, 0) on umount

Surbhi Palande (1):
Revert "UBUNTU: SAUCE: sync before umount to reduce time taken by
ext4 umount"

fs/fs-writeback.c | 5 +++--
fs/namespace.c | 7 -------
fs/sync.c | 2 +-
include/linux/fs.h | 1 -
4 files changed, 4 insertions(+), 11 deletions(-)


--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 
Old 05-06-2010, 11:40 AM
Andy Whitcroft
 
Default fix for slow unmount on ext4 - bug 543617 on lp

On Thu, May 06, 2010 at 01:34:58PM +0300, Surbhi Palande wrote:
> The following two patches fix the bug 543617 on launchpad. The first patch is
> a revert of a temporary fix of this bug. The second patch is a hack which
> solves this bug. To resolve this issue without any side effects and
> further ado, we can for now, consider the second patch for lucid.

This first revert represents the patch which Kees has already requested
we revert as it has some unexpected side effects performance wise on for
example unmounting loop back devices.

> However, there is another bigger patch by Jens Axboe which is
> submitted as a fix for this regression/bug. We will pick this patch from
> stable as and when it is released and at that point revert the second
> patch in this series that is intended to fix the issue for now.

-apw

--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 
Old 06-02-2010, 01:09 PM
Stefan Bader
 
Default fix for slow unmount on ext4 - bug 543617 on lp

This is a followup on a previous post which basically consists of the
revert of 5e1941884c700b7b97bcad52c2d8212ac56a7ebc
"UBUNTU: SAUCE: sync before umount to reduce time taken by ext4 umount"
and then applying a patch from bugzilla
"writeback: fix __sync_filesystem(sb, 0) on umount"

However the second patch never went upstream, while the following two
(backported to 2.6.32.y) made it. Sadly, when asking Jens Axboe about
forwarding those to stable, I got the following response:

Jens Axboe wrote:
> > I would send the backports to stable but wanted to check with you
> > beforehand. Would you be ok with a stable submission of these two?

> I would have said yes a few days ago, but Christoph claims that the
> patches make xfs test hang. Unfortunately I'm a bit between jobs and
> don't have proper testing equipment right now, so I had no other option
> than revert exactly those two commits...

Now the question is whether we want to go ahead (because we need to
do the revert as that causes other problems) with the upstream version
that might cause xfs problems (assuming xfs is not our main fs) or
take the patch from bugzilla which might have the same problem or others
yet unknown.

-Stefan

----

From: Dmitry Monakhov <dmonakhov@openvz.org>

BugLink: http://launchpad.net/bugs/543617

__sync_filesystem(sb, 0) is no longer works on umount because sb can
not be pined, Because s_mount sem is downed for write and s_root is NULL.
In fact umount is a special case similar to WB_SYNC_ALL, where
sb is pinned already.

BADCOMMIT: 03ba3782e8dcc5b0e1efe440d33084f066e38cae

Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
(cherry-picked from https://bugzilla.kernel.org/attachment.cgi?id=26224)

Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>
---
fs/fs-writeback.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 4102f20..937a8ae 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -583,9 +583,9 @@ static int pin_sb_for_writeback(struct writeback_control *wbc,
unpin_sb_for_writeback(psb);

/*
- * Caller must already hold the ref for this
+ * Caller must already hold the ref for integrity sync, and on umount
*/
- if (wbc->sync_mode == WB_SYNC_ALL) {
+ if (wbc->sync_mode == WB_SYNC_ALL || !sb->s_root) {
WARN_ON(!rwsem_is_locked(&sb->s_umount));
return 0;
}
@@ -597,6 +597,7 @@ static int pin_sb_for_writeback(struct writeback_control *wbc,
spin_unlock(&sb_lock);
goto pinned;
}
+ WARN_ON(1);
/*
* umounted, drop rwsem again and fall through to failure
*/


--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 
Old 06-07-2010, 12:10 PM
Stefan Bader
 
Default fix for slow unmount on ext4 - bug 543617 on lp

Ok, so upstream seems to have officially reverted the changes for now. Given
that the changes in that area seem to have unexpected side effects and we do not
really know the other patch would not do the same. When reverting the
work-around we got huge sync times. IIRC the side effects of that patch were not
as bad timewise (2 seconds?). So maybe for now, we stay with what we got and
wait for the proper upstream solution?

Stefan

On 06/02/2010 03:09 PM, Stefan Bader wrote:
> This is a followup on a previous post which basically consists of the
> revert of 5e1941884c700b7b97bcad52c2d8212ac56a7ebc
> "UBUNTU: SAUCE: sync before umount to reduce time taken by ext4 umount"
> and then applying a patch from bugzilla
> "writeback: fix __sync_filesystem(sb, 0) on umount"
>
> However the second patch never went upstream, while the following two
> (backported to 2.6.32.y) made it. Sadly, when asking Jens Axboe about
> forwarding those to stable, I got the following response:
>
> Jens Axboe wrote:
>>> I would send the backports to stable but wanted to check with you
>>> beforehand. Would you be ok with a stable submission of these two?
>
>> I would have said yes a few days ago, but Christoph claims that the
>> patches make xfs test hang. Unfortunately I'm a bit between jobs and
>> don't have proper testing equipment right now, so I had no other option
>> than revert exactly those two commits...
>
> Now the question is whether we want to go ahead (because we need to
> do the revert as that causes other problems) with the upstream version
> that might cause xfs problems (assuming xfs is not our main fs) or
> take the patch from bugzilla which might have the same problem or others
> yet unknown.
>
> -Stefan
>
> ----
>
> From: Dmitry Monakhov <dmonakhov@openvz.org>
>
> BugLink: http://launchpad.net/bugs/543617
>
> __sync_filesystem(sb, 0) is no longer works on umount because sb can
> not be pined, Because s_mount sem is downed for write and s_root is NULL.
> In fact umount is a special case similar to WB_SYNC_ALL, where
> sb is pinned already.
>
> BADCOMMIT: 03ba3782e8dcc5b0e1efe440d33084f066e38cae
>
> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
> (cherry-picked from https://bugzilla.kernel.org/attachment.cgi?id=26224)
>
> Signed-off-by: Surbhi Palande <surbhi.palande@canonical.com>
> ---
> fs/fs-writeback.c | 5 +++--
> 1 files changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 4102f20..937a8ae 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -583,9 +583,9 @@ static int pin_sb_for_writeback(struct writeback_control *wbc,
> unpin_sb_for_writeback(psb);
>
> /*
> - * Caller must already hold the ref for this
> + * Caller must already hold the ref for integrity sync, and on umount
> */
> - if (wbc->sync_mode == WB_SYNC_ALL) {
> + if (wbc->sync_mode == WB_SYNC_ALL || !sb->s_root) {
> WARN_ON(!rwsem_is_locked(&sb->s_umount));
> return 0;
> }
> @@ -597,6 +597,7 @@ static int pin_sb_for_writeback(struct writeback_control *wbc,
> spin_unlock(&sb_lock);
> goto pinned;
> }
> + WARN_ON(1);
> /*
> * umounted, drop rwsem again and fall through to failure
> */
>
>


--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 
Old 06-09-2010, 07:06 PM
Kees Cook
 
Default fix for slow unmount on ext4 - bug 543617 on lp

Hi Stefan,

On Mon, Jun 07, 2010 at 02:10:43PM +0200, Stefan Bader wrote:
> Ok, so upstream seems to have officially reverted the changes for now. Given
> that the changes in that area seem to have unexpected side effects and we do not
> really know the other patch would not do the same. When reverting the
> work-around we got huge sync times. IIRC the side effects of that patch were not
> as bad timewise (2 seconds?). So maybe for now, we stay with what we got and
> wait for the proper upstream solution?

Upstream gave up on fixing the problem? Argh.

Huge sync times in Lucid? When what was changed?

I think we need to revert 5e1941884c700b7b97bcad52c2d8212ac56a7ebc even
if there isn't a good fix for the umount slow-down since it solves the
slow umount only under certain conditions, and makes other more common
umount scenarios slower.

-Kees

--
Kees Cook
Ubuntu Security Team

--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 
Old 06-10-2010, 06:57 AM
Stefan Bader
 
Default fix for slow unmount on ext4 - bug 543617 on lp

On 06/09/2010 09:06 PM, Kees Cook wrote:
> Hi Stefan,
>
> On Mon, Jun 07, 2010 at 02:10:43PM +0200, Stefan Bader wrote:
>> Ok, so upstream seems to have officially reverted the changes for now. Given
>> that the changes in that area seem to have unexpected side effects and we do not
>> really know the other patch would not do the same. When reverting the
>> work-around we got huge sync times. IIRC the side effects of that patch were not
>> as bad timewise (2 seconds?). So maybe for now, we stay with what we got and
>> wait for the proper upstream solution?
>
> Upstream gave up on fixing the problem? Argh.
>

No not given up but currently needing to come up with a better solution.

> Huge sync times in Lucid? When what was changed?
>

Err, wasn't that the reason for the patch below in the first place? Maybe my
description was not completely accurate. It was very long time to unmount. I
might be wrong but in my memory that was as bad as 30m in bad cases.
For which the reason is that sync is done syncronously.

> I think we need to revert 5e1941884c700b7b97bcad52c2d8212ac56a7ebc even
> if there isn't a good fix for the umount slow-down since it solves the
> slow umount only under certain conditions, and makes other more common
> umount scenarios slower.
>

The question would be how much slower. If it is possible to live with the
current status a little longer, then I'd rather wait for the correct upstream
fix, than to revert this patch to see one regression fixed by another one coming
back.

Stefan

> -Kees
>

--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 
Old 06-10-2010, 07:13 PM
Kees Cook
 
Default fix for slow unmount on ext4 - bug 543617 on lp

On Thu, Jun 10, 2010 at 08:57:55AM +0200, Stefan Bader wrote:
> Err, wasn't that the reason for the patch below in the first place? Maybe my
> description was not completely accurate. It was very long time to unmount. I
> might be wrong but in my memory that was as bad as 30m in bad cases.
> For which the reason is that sync is done syncronously.

Without the forced-sync-on-umount work-around patch:
- umount of a single filesystem with barrier support on an idle system can
take upwards of 15 minutes.
- umount of a single filesystem with barrier support on a busy system can
take upwards of 4 hours.
- umount of everything else is fast and normal (less than a hundredth of
a second to umount a bind mount), regardless of system IO load.

With patch:
- umount of a single filesystem with barrier support on an idle system
will take as long as a normal sync and umount (i.e. fast and normal).
- umount of everything else on an idle system takes as long as a sync
and umount (raises trivial bind umounts to at least 1 second).
- umount of everything else on a busy system takes as long as a sync
of the entire system and umount of the filesystem (which can sometimes
take upwards of 4 hours due to the original problem of dirty umounting
of filesystems with barrier support).

The addition of the sync workaround means that busy systems will have
umount blocked for as long as _all_ sync activity takes on the given
system. I'll leave it up to you, but for me in most cases, the patch
made my system significantly worse than without it.

-Kees

--
Kees Cook
Ubuntu Security Team

--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 
Old 06-12-2010, 02:18 PM
Daniel J Blueman
 
Default fix for slow unmount on ext4 - bug 543617 on lp

On Wed, Jun 9, 2010 at 8:06 PM, Kees Cook <kees.cook@canonical.com> wrote:
> Hi Stefan,
>
> On Mon, Jun 07, 2010 at 02:10:43PM +0200, Stefan Bader wrote:
>> Ok, so upstream seems to have officially reverted the changes for now. Given
>> that the changes in that area seem to have unexpected side effects and we do not
>> really know the other patch would not do the same. When reverting the
>> work-around we got huge sync times. IIRC the side effects of that patch were not
>> as bad timewise (2 seconds?). So maybe for now, we stay with what we got and
>> wait for the proper upstream solution?
>
> Upstream gave up on fixing the problem? *Argh.
>
> Huge sync times in Lucid? *When what was changed?
>
> I think we need to revert 5e1941884c700b7b97bcad52c2d8212ac56a7ebc even
> if there isn't a good fix for the umount slow-down since it solves the
> slow umount only under certain conditions, and makes other more common
> umount scenarios slower.

Dave Chinner just committed a ratified fix [1] for this upstream,
which may be a good cherry-pick.

Dan

--- [1]

commit d87815cb2090e07b0b0b2d73dc9740706e92c80c
Author: Dave Chinner <dchinner@redhat.com>
Date: Wed Jun 9 10:37:20 2010 +1000

writeback: limit write_cache_pages integrity scanning to current EOF

sync can currently take a really long time if a concurrent writer is
extending a file. The problem is that the dirty pages on the address
space grow in the same direction as write_cache_pages scans, so if
the writer keeps ahead of writeback, the writeback will not
terminate until the writer stops adding dirty pages.

For a data integrity sync, we only need to write the pages dirty at
the time we start the writeback, so we can stop scanning once we get
to the page that was at the end of the file at the time the scan
started.

This will prevent operations like copying a large file preventing
sync from completing as it will not write back pages that were
dirtied after the sync was started. This does not impact the
existing integrity guarantees, as any dirty page (old or new)
within the EOF range at the start of the scan will still be
captured.

This patch will not prevent sync from blocking on large writes into
holes. That requires more complex intervention while this patch only
addresses the common append-case of this sync holdoff.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
--
Daniel J Blueman

--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 
Old 06-14-2010, 07:58 AM
Stefan Bader
 
Default fix for slow unmount on ext4 - bug 543617 on lp

On 06/12/2010 04:18 PM, Daniel J Blueman wrote:
> On Wed, Jun 9, 2010 at 8:06 PM, Kees Cook <kees.cook@canonical.com> wrote:
>> Hi Stefan,
>>
>> On Mon, Jun 07, 2010 at 02:10:43PM +0200, Stefan Bader wrote:
>>> Ok, so upstream seems to have officially reverted the changes for now. Given
>>> that the changes in that area seem to have unexpected side effects and we do not
>>> really know the other patch would not do the same. When reverting the
>>> work-around we got huge sync times. IIRC the side effects of that patch were not
>>> as bad timewise (2 seconds?). So maybe for now, we stay with what we got and
>>> wait for the proper upstream solution?
>>
>> Upstream gave up on fixing the problem? Argh.
>>
>> Huge sync times in Lucid? When what was changed?
>>
>> I think we need to revert 5e1941884c700b7b97bcad52c2d8212ac56a7ebc even
>> if there isn't a good fix for the umount slow-down since it solves the
>> slow umount only under certain conditions, and makes other more common
>> umount scenarios slower.
>
> Dave Chinner just committed a ratified fix [1] for this upstream,
> which may be a good cherry-pick.
>
> Dan

Hi Daniel,

this looks like an additional corner case from the description (I have not yet
had time to look at the actual patch). The case we are facing is that sync on
umount used to get converted into an asynchronous sync, followed by a
synchronous one. This changed in a way that the sync on umount is done only
synchronously which waits on every single page.
But the patch you pointed to definitely is something to keep in mind as well.

Cheers,
Stefan
>
> --- [1]
>
> commit d87815cb2090e07b0b0b2d73dc9740706e92c80c
> Author: Dave Chinner <dchinner@redhat.com>
> Date: Wed Jun 9 10:37:20 2010 +1000
>
> writeback: limit write_cache_pages integrity scanning to current EOF
>
> sync can currently take a really long time if a concurrent writer is
> extending a file. The problem is that the dirty pages on the address
> space grow in the same direction as write_cache_pages scans, so if
> the writer keeps ahead of writeback, the writeback will not
> terminate until the writer stops adding dirty pages.
>
> For a data integrity sync, we only need to write the pages dirty at
> the time we start the writeback, so we can stop scanning once we get
> to the page that was at the end of the file at the time the scan
> started.
>
> This will prevent operations like copying a large file preventing
> sync from completing as it will not write back pages that were
> dirtied after the sync was started. This does not impact the
> existing integrity guarantees, as any dirty page (old or new)
> within the EOF range at the start of the scan will still be
> captured.
>
> This patch will not prevent sync from blocking on large writes into
> holes. That requires more complex intervention while this patch only
> addresses the common append-case of this sync holdoff.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 
Old 08-03-2010, 04:49 PM
Stefan Bader
 
Default fix for slow unmount on ext4 - bug 543617 on lp

The patches discussed here were causing regresssions and have been reverted
upstream. By now there is a new set (which is even bigger). I am waiting for
some feedback on a backport I did for 2.6.32.y from upstream.

--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team
 

Thread Tools




All times are GMT. The time now is 09:31 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org