Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Ubuntu Kernel Team (http://www.linux-archive.org/ubuntu-kernel-team/)
-   -   mm: vmscan: only read new_classzone_idx from pgdat w= (http://www.linux-archive.org/ubuntu-kernel-team/551543-mm-vmscan-only-read-new_classzone_idx-pgdat-w.html)

Mel Gorman 07-08-2011 10:39 PM

mm: vmscan: only read new_classzone_idx from pgdat w=
 
BugLink: http://bugs.launchpad.net/bugs/808509

During allocator-intensive workloads, kswapd will be woken frequently
causing free memory to oscillate between the high and min watermark. Thi=
s
is expected behaviour. Unfortunately, if the highest zone is small, a
problem occurs.

When balance_pgdat() returns, it may be at a lower classzone_idx than it
started because the highest zone was unreclaimable. Before checking if i=
t
should go to sleep though, it checks pgdat->classzone_idx which when ther=
e
is no other activity will be MAX_NR_ZONES-1. It interprets this as it ha=
s
been woken up while reclaiming, skips scheduling and reclaims again. As
there is no useful reclaim work to do, it enters into a loop of shrinking
slab consuming loads of CPU until the highest zone becomes reclaimable fo=
r
a long period of time.

There are two problems here. 1) If the returned classzone or order is
lower, it'll continue reclaiming without scheduling. 2) if the highest
zone was marked unreclaimable but balance_pgdat() returns immediately at
DEF_PRIORITY, the new lower classzone is not communicated back to kswapd(=
)
for sleeping.

This patch does two things that are related. If the end_zone is
unreclaimable, this information is communicated back. Second, if the
classzone or order was reduced due to failing to reclaim, new information
is not read from pgdat and instead an attempt is made to go to sleep. Du=
e
to this, it is also necessary that pgdat->classzone_idx be initialised
each time to pgdat->nr_zones - 1 to avoid re-reads being interpreted as
wakeups.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: P=C3=A1draig Brady <P@draigBrady.com>
Tested-by: P=C3=A1draig Brady <P@draigBrady.com>
Tested-by: Andrew Lutomirski <luto@mit.edu>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 215ddd6664ced067afca7eebd2d1eb83f064ff5a)

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
---
mm/vmscan.c | 34 +++++++++++++++++++++-------------
1 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 136dc5e..4b8b37c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2381,7 +2381,6 @@ loop_again:
if (!zone_watermark_ok_safe(zone, order,
high_wmark_pages(zone), 0, 0)) {
end_zone =3D i;
- *classzone_idx =3D i;
break;
}
}
@@ -2458,8 +2457,11 @@ loop_again:
total_scanned > sc.nr_reclaimed + sc.nr_reclaimed / 2)
sc.may_writepage =3D 1;
=20
- if (zone->all_unreclaimable)
+ if (zone->all_unreclaimable) {
+ if (end_zone && end_zone =3D=3D i)
+ end_zone--;
continue;
+ }
=20
if (!zone_watermark_ok_safe(zone, order,
high_wmark_pages(zone), end_zone, 0)) {
@@ -2639,8 +2641,8 @@ static void kswapd_try_to_sleep(pg_data_t *pgdat, i=
nt order, int classzone_idx)
*/
static int kswapd(void *p)
{
- unsigned long order;
- int classzone_idx;
+ unsigned long order, new_order;
+ int classzone_idx, new_classzone_idx;
pg_data_t *pgdat =3D (pg_data_t*)p;
struct task_struct *tsk =3D current;
=20
@@ -2670,17 +2672,23 @@ static int kswapd(void *p)
tsk->flags |=3D PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
set_freezable();
=20
- order =3D 0;
- classzone_idx =3D MAX_NR_ZONES - 1;
+ order =3D new_order =3D 0;
+ classzone_idx =3D new_classzone_idx =3D pgdat->nr_zones - 1;
for ( ; ; ) {
- unsigned long new_order;
- int new_classzone_idx;
int ret;
=20
- new_order =3D pgdat->kswapd_max_order;
- new_classzone_idx =3D pgdat->classzone_idx;
- pgdat->kswapd_max_order =3D 0;
- pgdat->classzone_idx =3D MAX_NR_ZONES - 1;
+ /*
+ * If the last balance_pgdat was unsuccessful it's unlikely a
+ * new request of a similar or harder type will succeed soon
+ * so consider going to sleep on the basis we reclaimed at
+ */
+ if (classzone_idx >=3D new_classzone_idx && order =3D=3D new_order) {
+ new_order =3D pgdat->kswapd_max_order;
+ new_classzone_idx =3D pgdat->classzone_idx;
+ pgdat->kswapd_max_order =3D 0;
+ pgdat->classzone_idx =3D pgdat->nr_zones - 1;
+ }
+
if (order < new_order || classzone_idx > new_classzone_idx) {
/*
* Don't sleep if someone wants a larger 'order'
@@ -2693,7 +2701,7 @@ static int kswapd(void *p)
order =3D pgdat->kswapd_max_order;
classzone_idx =3D pgdat->classzone_idx;
pgdat->kswapd_max_order =3D 0;
- pgdat->classzone_idx =3D MAX_NR_ZONES - 1;
+ pgdat->classzone_idx =3D pgdat->nr_zones - 1;
}
=20
ret =3D try_to_freeze();
--=20
1.7.0.4


--------------060103040001060907020503
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

--
kernel-team mailing list
kernel-team@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/kernel-team

--------------060103040001060907020503--


All times are GMT. The time now is 07:27 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.