FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 09-20-2010, 08:27 AM
"Thomas Poo"
 
Default Bug#597489: kswapd ownes the whole cpu resources, unable to reboot

Package: linux-image-2.6.32-bpo.5-amd64
Version: 2.6.32-21~bpo50+1
Severity: important


/var/log/kern.log: http://pastebin.com/FVV25wf4
libc6: 2.7-18lenny4
System: Dell PowerEdge R710

Hi - we have some problems with backported kernels for a few weeks. This is the first time that we got a nice output from the kernel. I think because we've updated from 2.6.32-15~bpo50+1.
We had a few crashs with the old package which results in a kill of our network and the server itself - results in an EOI Message we have seen in the remote access unit.
The switch has set itself to the root instance of the spanning tree.

Now in the new case we observed a high load (between 80 and 100) with the new package and a look in the processtable has shown that kswapd is running with a cpu load of 100%. We wanted to reboot the system by hand - but nothing happened and we had to reset it. The network didn't care about that problem.

Feel free and request more information please.

--
Neu: GMX De-Mail - Einfach wie E-Mail, sicher wie ein Brief!
Jetzt De-Mail-Adresse reservieren: http://portal.gmx.net/de/go/demail


--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20100920082712.157570@gmx.net">http://lists.debian.org/20100920082712.157570@gmx.net
 
Old 09-20-2010, 01:40 PM
Ben Hutchings
 
Default Bug#597489: kswapd ownes the whole cpu resources, unable to reboot

On Mon, Sep 20, 2010 at 10:27:12AM +0200, Thomas Poo wrote:
> Package: linux-image-2.6.32-bpo.5-amd64
> Version: 2.6.32-21~bpo50+1
> Severity: important
>
>
> /var/log/kern.log: http://pastebin.com/FVV25wf4

You must send this information to the bug report, not to some other
web site where it will expire.

Ben.

--
Ben Hutchings
We get into the habit of living before acquiring the habit of thinking.
- Albert Camus



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20100920134058.GA3204@decadent.org.uk">http://lists.debian.org/20100920134058.GA3204@decadent.org.uk
 
Old 09-23-2010, 12:05 AM
Ben Hutchings
 
Default Bug#597489: kswapd ownes the whole cpu resources, unable to reboot

It looks like this system is running short of memory. Can you send the
contents of /proc/meminfo when it gets into this state?

Ben.

--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.
 
Old 07-28-2011, 01:23 AM
Jonathan Nieder
 
Default Bug#597489: kswapd ownes the whole cpu resources, unable to reboot

retitle 597489 kswapd using 100% CPU, unable to reboot (soft lockup in pagevec_lookup)
found 597489 linux-2.6/2.6.32-21
fixed 597489 linux-2.6/2.6.38-2
tags 597489 = upstream fixed-upstream
quit

Hi,

Giuseppe Lavagetto wrote:

> I backported 2.6.38 from sid to squeeze, and the machine is running fine
> since then. I tried to reproduce the problem with my test case, with no
> luck. So, it seems that changes that have been performed to the MM
> subsystem (specifically, the patch I referred to in a previous post) do
> solve this issue.

Thanks for checking. Marking accordingly.

Out of curiosity, how did you find v2.6.38-rc1~216 (mm: page
allocator: adjust the per-cpu counter threshold when memory is low,
2011-01-13)? Is there a corresponding RH bugzilla entry with details?

I thought that patch fixed a regression introduced by v2.6.32.23~22
(mm: page allocator: calculate a better estimate of NR_FREE_PAGES when
memory is low and kswapd is awake, 2010-09-09), which would correspond
to version 2.6.32-24 of the linux-2.6 package, but the original report
was against 2.6.32-21. Moreover, 2.6.32-24 was not uploaded until
after a couple of weeks after the original report. Hmm.

Thomas, if you get a chance to try a kernel from sid or experimental,
that would be interesting.

Regards,
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110728012324.GA31442@elie">http://lists.debian.org/20110728012324.GA31442@elie
 
Old 12-24-2011, 09:07 AM
Jonathan Nieder
 
Default Bug#597489: kswapd ownes the whole cpu resources, unable to reboot

tags 597489 + patch moreinfo
quit

Hi Thomas,

Thomas Poo wrote:

> BUG: soft lockup - CPU#11 stuck for 61s! [kswapd1:144]
> Modules linked in: quota_v2 quota_tree ip6table_filter ip6_tables iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables loop snd_pcsp snd_pcm snd_timer joydev snd soundcore snd_page_alloc dcdbas psmouse serio_raw evdev ioatdma dca power_meter button processor usbhid hid ext3 jbd mbcache sg sr_mod cdrom usb_storage ata_generic sd_mod crc_t10dif ses enclosure ata_piix ehci_hcd uhci_hcd libata usbcore nls_base megaraid_sas scsi_mod bnx2 thermal fan thermal_sys [last unloaded: scsi_wait_scan]
> CPU 11:
> Modules linked in: quota_v2 quota_tree ip6table_filter ip6_tables iptable_raw xt_comment xt_recent xt_policy ipt_ULOG ipt_REJECT ipt_REDIRECT ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_ECN ipt_ecn ipt_CLUSTERIP ipt_ah ipt_addrtype nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_nat_h323 nf_nat_ftp nf_nat_amanda ts_kmp nf_conntrack_amanda nf_conntrack_tftp nf_conntrack_sip nf_conntrack_proto_sctp nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nf_conntrack_netbios_ns nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp xt_tcpmss xt_pkttype xt_physdev xt_owner xt_NFQUEUE xt_NFLOG nfnetlink_log xt_multiport xt_MARK xt_mark xt_mac xt_limit xt_length xt_iprange xt_helper xt_hashlimit xt_DSCP xt_dscp xt_dccp xt_conntrack xt_CONNMARK xt_connmark xt_CLASSIFY xt_tcpudp xt_state iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack iptable_mangle nfnetlink iptable_filter ip_tables x_tables loop snd_pcsp snd_pcm snd_timer joydev snd soundcore snd_page_alloc dcdbas psmouse serio_raw evdev ioatdma dca power_meter button processor usbhid hid ext3 jbd mbcache sg sr_mod cdrom usb_storage ata_generic sd_mod crc_t10dif ses enclosure ata_piix ehci_hcd uhci_hcd libata usbcore nls_base megaraid_sas scsi_mod bnx2 thermal fan thermal_sys [last unloaded: scsi_wait_scan]
> Pid: 144, comm: kswapd1 Not tainted 2.6.32-bpo.5-amd64 #1 PowerEdge R710
> RIP: 0010:[<ffffffff810b26cf>] [<ffffffff810b26cf>] find_get_pages+0x3f/0xbb
[...]
> Call Trace:
> [<ffffffff810ba74c>] ? pagevec_lookup+0x17/0x1e
> [<ffffffff810bb509>] ? invalidate_mapping_pages+0xb9/0xdb
> [<ffffffff810fb5bf>] ? d_kill+0x58/0x61
> [<ffffffff810b98f0>] ? throttle_vm_writeout+0x30/0x8d
> [<ffffffff810fe94f>] ? shrink_icache_memory+0xfc/0x228
> [<ffffffff810bdad7>] ? shrink_slab+0xe0/0x153
> [<ffffffff810be37a>] ? kswapd+0x4d9/0x683
> [<ffffffff810bba27>] ? isolate_pages_global+0x0/0x20f
> [<ffffffff810638ce>] ? autoremove_wake_function+0x0/0x2e
> [<ffffffff810bdea1>] ? kswapd+0x0/0x683
> [<ffffffff81063601>] ? kthread+0x79/0x81
> [<ffffffff81011baa>] ? child_rip+0xa/0x20
> [<ffffffff81063588>] ? kthread+0x0/0x81
> [<ffffffff81011ba0>] ? child_rip+0x0/0x20
> BUG: soft lockup - CPU#11 stuck for 61s! [kswapd1:144]

Sorry for the slow reply. Please test the following patch. [1]
explains how.

[1] http://kernel-handbook.alioth.debian.org/ch-common-tasks.html

Thanks,
Jonathan

From: Mel Gorman <mel@csn.ul.ie>
Date: Thu, 13 Jan 2011 15:45:41 -0800
Subject: mm: page allocator: adjust the per-cpu counter threshold when memory is low

commit 88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97 upstream.

Commit aa45484 ("calculate a better estimate of NR_FREE_PAGES when memory
is low") noted that watermarks were based on the vmstat NR_FREE_PAGES. To
avoid synchronization overhead, these counters are maintained on a per-cpu
basis and drained both periodically and when a threshold is above a
threshold. On large CPU systems, the difference between the estimate and
real value of NR_FREE_PAGES can be very high. The system can get into a
case where pages are allocated far below the min watermark potentially
causing livelock issues. The commit solved the problem by taking a better
reading of NR_FREE_PAGES when memory was low.

Unfortately, as reported by Shaohua Li this accurate reading can consume a
large amount of CPU time on systems with many sockets due to cache line
bouncing. This patch takes a different approach. For large machines
where counter drift might be unsafe and while kswapd is awake, the per-cpu
thresholds for the target pgdat are reduced to limit the level of drift to
what should be a safe level. This incurs a performance penalty in heavy
memory pressure by a factor that depends on the workload and the machine
but the machine should function correctly without accidentally exhausting
all memory on a node. There is an additional cost when kswapd wakes and
sleeps but the event is not expected to be frequent - in Shaohua's test
case, there was one recorded sleep and wake event at least.

To ensure that kswapd wakes up, a safe version of zone_watermark_ok() is
introduced that takes a more accurate reading of NR_FREE_PAGES when called
from wakeup_kswapd, when deciding whether it is really safe to go back to
sleep in sleeping_prematurely() and when deciding if a zone is really
balanced or not in balance_pgdat(). We are still using an expensive
function but limiting how often it is called.

When the test case is reproduced, the time spent in the watermark
functions is reduced. The following report is on the percentage of time
spent cumulatively spent in the functions zone_nr_free_pages(),
zone_watermark_ok(), __zone_watermark_ok(), zone_watermark_ok_safe(),
zone_page_state_snapshot(), zone_page_state().

vanilla 11.6615%
disable-threshold 0.2584%

David said:

: We had to pull aa454840 "mm: page allocator: calculate a better estimate
: of NR_FREE_PAGES when memory is low and kswapd is awake" from 2.6.36
: internally because tests showed that it would cause the machine to stall
: as the result of heavy kswapd activity. I merged it back with this fix as
: it is pending in the -mm tree and it solves the issue we were seeing, so I
: definitely think this should be pushed to -stable (and I would seriously
: consider it for 2.6.37 inclusion even at this late date).

[jn: with Michal Hocko's commit 8e44cd712265 ("mm: fix off-by-two in
__zone_watermark_ok()", 2011-12-22) squashed in]

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reported-by: Shaohua Li <shaohua.li@intel.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Tested-by: Nicolas Bareil <nico@chdir.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: <stable@kernel.org> [2.6.37.1, 2.6.36.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jonathan Niedder <jrnieder@gmail.com>
---
include/linux/mmzone.h | 10 ++-----
include/linux/vmstat.h | 5 +++
mm/mmzone.c | 21 ---------------
mm/page_alloc.c | 35 +++++++++++++++++++------
mm/vmscan.c | 20 ++++++++------
mm/vmstat.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++-
6 files changed, 112 insertions(+), 45 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6c31a2a7c18d..b4ea7395eb52 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -467,12 +467,6 @@ static inline int zone_is_oom_locked(const struct zone *zone)
return test_bit(ZONE_OOM_LOCKED, &zone->flags);
}

-#ifdef CONFIG_SMP
-unsigned long zone_nr_free_pages(struct zone *zone);
-#else
-#define zone_nr_free_pages(zone) zone_page_state(zone, NR_FREE_PAGES)
-#endif /* CONFIG_SMP */
-
/*
* The "priority" of VM scanning is how much of the queues we will scan in one
* go. A value of 12 for DEF_PRIORITY implies that we will scan 1/4096th of the
@@ -669,7 +663,9 @@ void get_zone_counts(unsigned long *active, unsigned long *inactive,
unsigned long *free);
void build_all_zonelists(void);
void wakeup_kswapd(struct zone *zone, int order);
-int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
+bool zone_watermark_ok(struct zone *z, int order, unsigned long mark,
+ int classzone_idx, int alloc_flags);
+bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark,
int classzone_idx, int alloc_flags);
enum memmap_context {
MEMMAP_EARLY,
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 13070d659129..f50ad5eecc78 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -250,6 +250,8 @@ extern void dec_zone_state(struct zone *, enum zone_stat_item);
extern void __dec_zone_state(struct zone *, enum zone_stat_item);

void refresh_cpu_vm_stats(int);
+void reduce_pgdat_percpu_threshold(pg_data_t *pgdat);
+void restore_pgdat_percpu_threshold(pg_data_t *pgdat);
#else /* CONFIG_SMP */

/*
@@ -294,6 +296,9 @@ static inline void __dec_zone_page_state(struct page *page,
#define dec_zone_page_state __dec_zone_page_state
#define mod_zone_page_state __mod_zone_page_state

+static inline void reduce_pgdat_percpu_threshold(pg_data_t *pgdat) { }
+static inline void restore_pgdat_percpu_threshold(pg_data_t *pgdat) { }
+
static inline void refresh_cpu_vm_stats(int cpu) { }
#endif

diff --git a/mm/mmzone.c b/mm/mmzone.c
index e35bfb82c855..f5b7d1760213 100644
--- a/mm/mmzone.c
+++ b/mm/mmzone.c
@@ -87,24 +87,3 @@ int memmap_valid_within(unsigned long pfn,
return 1;
}
#endif /* CONFIG_ARCH_HAS_HOLES_MEMORYMODEL */
-
-#ifdef CONFIG_SMP
-/* Called when a more accurate view of NR_FREE_PAGES is needed */
-unsigned long zone_nr_free_pages(struct zone *zone)
-{
- unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
-
- /*
- * While kswapd is awake, it is considered the zone is under some
- * memory pressure. Under pressure, there is a risk that
- * per-cpu-counter-drift will allow the min watermark to be breached
- * potentially causing a live-lock. While kswapd is awake and
- * free pages are low, get a better estimate for free pages
- */
- if (nr_free_pages < zone->percpu_drift_mark &&
- !waitqueue_active(&zone->zone_pgdat->kswapd_wait))
- return zone_page_state_snapshot(zone, NR_FREE_PAGES);
-
- return nr_free_pages;
-}
-#endif /* CONFIG_SMP */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3ecab7e7bbfa..d2e827ffedf3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1358,24 +1358,24 @@ static inline int should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
#endif /* CONFIG_FAIL_PAGE_ALLOC */

/*
- * Return 1 if free pages are above 'mark'. This takes into account the order
+ * Return true if free pages are above 'mark'. This takes into account the order
* of the allocation.
*/
-int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
- int classzone_idx, int alloc_flags)
+static bool __zone_watermark_ok(struct zone *z, int order, unsigned long mark,
+ int classzone_idx, int alloc_flags, long free_pages)
{
/* free_pages my go negative - that's OK */
long min = mark;
- long free_pages = zone_nr_free_pages(z) - (1 << order) + 1;
int o;

+ free_pages -= (1 << order) - 1;
if (alloc_flags & ALLOC_HIGH)
min -= min / 2;
if (alloc_flags & ALLOC_HARDER)
min -= min / 4;

if (free_pages <= min + z->lowmem_reserve[classzone_idx])
- return 0;
+ return false;
for (o = 0; o < order; o++) {
/* At the next order, this order's pages become unavailable */
free_pages -= z->free_area[o].nr_free << o;
@@ -1384,9 +1384,28 @@ int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
min >>= 1;

if (free_pages <= min)
- return 0;
+ return false;
}
- return 1;
+ return true;
+}
+
+bool zone_watermark_ok(struct zone *z, int order, unsigned long mark,
+ int classzone_idx, int alloc_flags)
+{
+ return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags,
+ zone_page_state(z, NR_FREE_PAGES));
+}
+
+bool zone_watermark_ok_safe(struct zone *z, int order, unsigned long mark,
+ int classzone_idx, int alloc_flags)
+{
+ long free_pages = zone_page_state(z, NR_FREE_PAGES);
+
+ if (z->percpu_drift_mark && free_pages < z->percpu_drift_mark)
+ free_pages = zone_page_state_snapshot(z, NR_FREE_PAGES);
+
+ return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags,
+ free_pages);
}

#ifdef CONFIG_NUMA
@@ -2251,7 +2270,7 @@ void show_free_areas(void)
" all_unreclaimable? %s"
"
",
zone->name,
- K(zone_nr_free_pages(zone)),
+ K(zone_page_state(zone, NR_FREE_PAGES)),
K(min_wmark_pages(zone)),
K(low_wmark_pages(zone)),
K(high_wmark_pages(zone)),
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4649929401f8..8c357ca9b2af 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2033,7 +2033,7 @@ loop_again:
shrink_active_list(SWAP_CLUSTER_MAX, zone,
&sc, priority, 0);

- if (!zone_watermark_ok(zone, order,
+ if (!zone_watermark_ok_safe(zone, order,
high_wmark_pages(zone), 0, 0)) {
end_zone = i;
break;
@@ -2069,7 +2069,7 @@ loop_again:
priority != DEF_PRIORITY)
continue;

- if (!zone_watermark_ok(zone, order,
+ if (!zone_watermark_ok_safe(zone, order,
high_wmark_pages(zone), end_zone, 0))
all_zones_ok = 0;
temp_priority[i] = priority;
@@ -2088,7 +2088,7 @@ loop_again:
* We put equal pressure on every zone, unless one
* zone has way too many pages free already.
*/
- if (!zone_watermark_ok(zone, order,
+ if (!zone_watermark_ok_safe(zone, order,
8*high_wmark_pages(zone), end_zone, 0))
shrink_zone(priority, zone, &sc);
reclaim_state->reclaimed_slab = 0;
@@ -2227,8 +2227,11 @@ static int kswapd(void *p)
*/
order = new_order;
} else {
- if (!freezing(current))
+ if (!freezing(current)) {
+ restore_pgdat_percpu_threshold(pgdat);
schedule();
+ reduce_pgdat_percpu_threshold(pgdat);
+ }

order = pgdat->kswapd_max_order;
}
@@ -2254,15 +2257,16 @@ void wakeup_kswapd(struct zone *zone, int order)
if (!populated_zone(zone))
return;

+ if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
+ return;
pgdat = zone->zone_pgdat;
- if (zone_watermark_ok(zone, order, low_wmark_pages(zone), 0, 0))
- return;
if (pgdat->kswapd_max_order < order)
pgdat->kswapd_max_order = order;
- if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
- return;
if (!waitqueue_active(&pgdat->kswapd_wait))
return;
+ if (zone_watermark_ok_safe(zone, order, low_wmark_pages(zone), 0, 0))
+ return;
+
wake_up_interruptible(&pgdat->kswapd_wait);
}

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 42d76c65e9f2..84e0b1fabe8b 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -79,6 +79,30 @@ EXPORT_SYMBOL(vm_stat);

#ifdef CONFIG_SMP

+static int calculate_pressure_threshold(struct zone *zone)
+{
+ int threshold;
+ int watermark_distance;
+
+ /*
+ * As vmstats are not up to date, there is drift between the estimated
+ * and real values. For high thresholds and a high number of CPUs, it
+ * is possible for the min watermark to be breached while the estimated
+ * value looks fine. The pressure threshold is a reduced value such
+ * that even the maximum amount of drift will not accidentally breach
+ * the min watermark
+ */
+ watermark_distance = low_wmark_pages(zone) - min_wmark_pages(zone);
+ threshold = max(1, (int)(watermark_distance / num_online_cpus()));
+
+ /*
+ * Maximum threshold is 125
+ */
+ threshold = min(125, threshold);
+
+ return threshold;
+}
+
static int calculate_threshold(struct zone *zone)
{
int threshold;
@@ -156,6 +180,46 @@ static void refresh_zone_stat_thresholds(void)
}
}

+void reduce_pgdat_percpu_threshold(pg_data_t *pgdat)
+{
+ struct zone *zone;
+ int cpu;
+ int threshold;
+ int i;
+
+ get_online_cpus();
+ for (i = 0; i < pgdat->nr_zones; i++) {
+ zone = &pgdat->node_zones[i];
+ if (!zone->percpu_drift_mark)
+ continue;
+
+ threshold = calculate_pressure_threshold(zone);
+ for_each_online_cpu(cpu)
+ zone_pcp(zone, cpu)->stat_threshold = threshold;
+ }
+ put_online_cpus();
+}
+
+void restore_pgdat_percpu_threshold(pg_data_t *pgdat)
+{
+ struct zone *zone;
+ int cpu;
+ int threshold;
+ int i;
+
+ get_online_cpus();
+ for (i = 0; i < pgdat->nr_zones; i++) {
+ zone = &pgdat->node_zones[i];
+ if (!zone->percpu_drift_mark)
+ continue;
+
+ threshold = calculate_threshold(zone);
+ for_each_online_cpu(cpu)
+ zone_pcp(zone, cpu)->stat_threshold = threshold;
+ }
+ put_online_cpus();
+}
+
/*
* For use when we know that interrupts are disabled.
*/
@@ -728,7 +792,7 @@ static void zoneinfo_show_print(struct seq_file *m, pg_data_t *pgdat,
"
scanned %lu"
"
spanned %lu"
"
present %lu",
- zone_nr_free_pages(zone),
+ zone_page_state(zone, NR_FREE_PAGES),
min_wmark_pages(zone),
low_wmark_pages(zone),
high_wmark_pages(zone),
--
1.7.8




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20111224100700.GA23262@elie.Belkin">http://lists.debian.org/20111224100700.GA23262@elie.Belkin
 
Old 01-26-2012, 08:08 PM
Jonathan Nieder
 
Default Bug#597489: kswapd ownes the whole cpu resources, unable to reboot

Hi Giuseppe,

Jonathan Nieder wrote:
> Thomas Poo wrote:

>> BUG: soft lockup - CPU#11 stuck for 61s! [kswapd1:144]
[...]
>> Pid: 144, comm: kswapd1 Not tainted 2.6.32-bpo.5-amd64 #1 PowerEdge R710
>> RIP: 0010:[<ffffffff810b26cf>] [<ffffffff810b26cf>] find_get_pages+0x3f/0xbb
[...]
> Sorry for the slow reply. Please test the following patch. [1]
> explains how.
>
> [1] http://kernel-handbook.alioth.debian.org/ch-common-tasks.html
[...]
> commit 88f5acf88ae6a9778f6d25d0d5d7ec2d57764a97 upstream.

Thanks again for the pointer. I'd be interested in seeing this fixed in
squeeze but haven't been able to reproduce it. Would you be able to
check the fix? It works like so:

1. Install some tools:

apt-get install git build-essential

2. Grab the latest upstream 2.6.32.y kernel:

git clone -o stable
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
linux

Or if you already have a clone of the linux repository:

git remote add -f stable
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git

3. Build and test pristine 2.6.32.y:

git checkout stable/linux-2.6.32.y
cp /boot/config-$(uname -r); # stock configuration
make localmodconfig; # optional: simplify configuration
make deb-pkg; # optionally with -j<n> for parallel build
dpkg -i ../<resulting package>
reboot
test test test ...

4. Hopefully it still reproduces the problem. Apply the proposed fix:

wget 'http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=87;mbox=yes;bug=597489'
git am -3sc bugreport.cgi*
make deb-pkg; # optionally with -j4
dpkg -i ../<resulting package>
reboot
test test test ...

Alternatively, if you are no longer interested in pursuing this,
that's fine, but please do let us know so we can act accordingly.

Thanks and sorry for the trouble,
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120126210831.GA5239@burratino">http://lists.debian.org/20120126210831.GA5239@burratino
 
Old 03-27-2012, 04:06 PM
Jonathan Nieder
 
Default Bug#597489: kswapd ownes the whole cpu resources, unable to reboot

Hi again,

Jonathan Nieder wrote:
>> Thomas Poo wrote:

>>> BUG: soft lockup - CPU#11 stuck for 61s! [kswapd1:144]
> [...]
>>> Pid: 144, comm: kswapd1 Not tainted 2.6.32-bpo.5-amd64 #1 PowerEdge R710
>>> RIP: 0010:[<ffffffff810b26cf>] [<ffffffff810b26cf>] find_get_pages+0x3f/0xbb
[...]
> Thanks again for the pointer. I'd be interested in seeing this fixed in
> squeeze but haven't been able to reproduce it. Would you be able to
> check the fix? It works like so:

Do you still have access to hardware that can reproduce this bug, and
if so, would you be interested in pursuing a fix in squeeze?

(A squeeze kernel should work without trouble on a wheezy/sid system,
so this doesn't involve installing any new packages except for the
kernel itself. The one complication I know of is the nouveau X
driver, as documented in [1]. There is a simple workaround for it
that I can describe if that case applies to you.)

If the answer is "no", that's fine, but please do let us know so we
can plan accordingly.

Thanks for your help,
Jonathan

[1] /usr/share/doc/xserver-xorg-video-nouvea/README.Debian



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120327160633.GA26028@burratino">http://lists.debian.org/20120327160633.GA26028@burratino
 

Thread Tools




All times are GMT. The time now is 10:39 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org