Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian Kernel (http://www.linux-archive.org/debian-kernel/)
-   -   Bug#597489: Kswapd hanging: patch available from lkml (http://www.linux-archive.org/debian-kernel/507956-bug-597489-kswapd-hanging-patch-available-lkml.html)

Giuseppe Lavagetto 03-31-2011 10:30 AM

Bug#597489: Kswapd hanging: patch available from lkml
 
Hi, posting here since I am evidently reproducing this bug.

Under load (relatively mild anyway) a 24-core X5660, 24GB RAM Dell
Poweredge 710 gets stuck with 100% cpu usage (meaning one core gets
stuck in running kswapd). The peculiarity of the situation is that NO
swap is being allocated, si and so columns in vmstat output show no swap
usage, and swap was correctly mounted. Also, it was not running
completely out of RAM.

The machine eventually freezed so I was not able to get any information
apart from the kernel stack trace, which I post at the end of the
report.

This issue seems to be a known bug in the linux kernel, and as far as I
understand a patch is available (and already included in RH kernels):

http://kerneltrap.org/mailarchive/linux-kernel/2010/10/27/4637977

I'll try to reproduce the problem, in the meantime do you think the
solution Mel proposed could be ported back to the stable kernel?

Kernel stack trace (excerpt) is attached.

Best,
Giuseppe
--
Giuseppe Lavagetto, Ph.d.
Systems Manager and Developer - Gruppo Immobiliare.it s.r.l.

Giuseppe Lavagetto 03-31-2011 12:02 PM

Bug#597489: Kswapd hanging: patch available from lkml
 
Sorry,

I forgot to mention that we are running debian squeeze (stable) on a
freshly installed system. So the bug seems to still be there in the
stable kernel, and not just in the backported one.

Giuseppe
--
Giuseppe Lavagetto, Ph.d.
Systems Manager and Developer - Gruppo Immobiliare.it s.r.l.




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1301572933.2542.51.camel@bunny">http://lists.debian.org/1301572933.2542.51.camel@bunny

Ben Hutchings 04-03-2011 04:43 AM

Bug#597489: Kswapd hanging: patch available from lkml
 
On Thu, 2011-03-31 at 12:30 +0200, Giuseppe Lavagetto wrote:
> Hi, posting here since I am evidently reproducing this bug.
>
> Under load (relatively mild anyway) a 24-core X5660, 24GB RAM Dell
> Poweredge 710 gets stuck with 100% cpu usage (meaning one core gets
> stuck in running kswapd). The peculiarity of the situation is that NO
> swap is being allocated, si and so columns in vmstat output show no swap
> usage, and swap was correctly mounted. Also, it was not running
> completely out of RAM.
>
> The machine eventually freezed so I was not able to get any information
> apart from the kernel stack trace, which I post at the end of the
> report.
>
> This issue seems to be a known bug in the linux kernel, and as far as I
> understand a patch is available (and already included in RH kernels):
>
> http://kerneltrap.org/mailarchive/linux-kernel/2010/10/27/4637977

I really don't think that deals with the same bug you are seeing.

> I'll try to reproduce the problem, in the meantime do you think the
> solution Mel proposed could be ported back to the stable kernel?

Perhaps, if Mel or one of the upstream developers does it. I don't
believe anyone in the Debian kernel team is sufficiently familiar with
the VMM to backport this significant change.

> Kernel stack trace (excerpt) is attached.
>
> Best,
> Giuseppe
> application log attachment (kernel.log)
> [86613.384580] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc drbd ixgbe dca lru_cache cn ipmi_si mpt2sas scsi_transport_sas mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu bonding ext3 jbd mbcache loop snd_pcm dcdbas joydev power_meter processor button psmouse snd_timer serio_raw snd soundcore snd_page_alloc evdev pcspkr xfs exportfs sg sr_mod cdrom ata_generic usbhid hid uhci_hcd sd_mod ses crc_t10dif enclosure thermal ehci_hcd ata_piix usbcore libata megaraid_sas nls_base scsi_mod bnx2 thermal_sys [last unloaded: drbd]
> [86613.384610] CPU 2:
> [86613.384611] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc drbd ixgbe dca lru_cache cn ipmi_si mpt2sas scsi_transport_sas mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu bonding ext3 jbd mbcache loop snd_pcm dcdbas joydev power_meter processor button psmouse snd_timer serio_raw snd soundcore snd_page_alloc evdev pcspkr xfs exportfs sg sr_mod cdrom ata_generic usbhid hid uhci_hcd sd_mod ses crc_t10dif enclosure thermal ehci_hcd ata_piix usbcore libata megaraid_sas nls_base scsi_mod bnx2 thermal_sys [last unloaded: drbd]
> [86613.384635] Pid: 207, comm: kswapd0 Not tainted 2.6.32-5-amd64 #1 PowerEdge R710
> [86613.384636] RIP: 0010:[<ffffffff810b3f19>] [<ffffffff810b3f19>] find_get_pages+0x5f/0xbb
> [86613.384645] RSP: 0018:ffff88062c869bc0 EFLAGS: 00000293
> [86613.384646] RAX: ffffffffffffffff RBX: ffff88062c869c50 RCX: 0000000000000000
> [86613.384648] RDX: 0000000000000040 RSI: ffffea0002bc56e0 RDI: ffffea0002bc56d8
> [86613.384649] RBP: ffffffff8101166e R08: ffff88062c869b80 R09: 0000000000000002
> [86613.384651] R10: 0000000000000040 R11: ffff880093d74ad8 R12: 0000000000000005
> [86613.384653] R13: 0000000000000286 R14: ffff88000000b100 R15: ffff88000000c780
> [86613.384655] FS: 0000000000000000(0000) GS:ffff88033ac20000(0000) knlGS:0000000000000000
> [86613.384656] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [86613.384658] CR2: 00007fffd81dd038 CR3: 0000000001001000 CR4: 00000000000006e0
> [86613.384659] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [86613.384661] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [86613.384663] Call Trace:
> [86613.384668] [<ffffffff810bc034>] ? pagevec_lookup+0x17/0x1e
> [86613.384671] [<ffffffff810bcdf1>] ? invalidate_mapping_pages+0xb9/0xdb
> [86613.384675] [<ffffffff81100573>] ? shrink_icache_memory+0xfc/0x228
> [86613.384678] [<ffffffff810bf3f5>] ? shrink_slab+0xe0/0x153
> [86613.384680] [<ffffffff810bfc98>] ? kswapd+0x4d9/0x686
> [86613.384683] [<ffffffff810bd30f>] ? isolate_pages_global+0x0/0x20f
> [86613.384687] [<ffffffff81064e96>] ? autoremove_wake_function+0x0/0x2e
> [86613.384691] [<ffffffff8103aa56>] ? __wake_up_common+0x44/0x72
> [86613.384693] [<ffffffff810bf7bf>] ? kswapd+0x0/0x686
> [86613.384695] [<ffffffff81064bc9>] ? kthread+0x79/0x81
> [86613.384700] [<ffffffff81011baa>] ? child_rip+0xa/0x20
> [86613.384702] [<ffffffff81064b50>] ? kthread+0x0/0x81
> [86613.384703] [<ffffffff81011ba0>] ? child_rip+0x0/0x20

Please provide the full log messages for this error. If the messages
seem to be produced continually then send all messages produced in 1
second (the numbers on the left are time in seconds).

Ben.

--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

Giuseppe Lavagetto 04-03-2011 07:50 AM

Bug#597489: Kswapd hanging: patch available from lkml
 
On Sun, 2011-04-03 at 05:43 +0100, Ben Hutchings wrote:
> I really don't think that deals with the same bug you are seeing.
>

Since now I have a test case for the problem, I'll compile 2.6.38.2 and
try to see if recent modifications to MM do help in our situation, or if
something else is causing the kernel hang (maybe nfs3 client, as Matteo
was suggesting?)


> Perhaps, if Mel or one of the upstream developers does it. I don't
> believe anyone in the Debian kernel team is sufficiently familiar with
> the VMM to backport this significant change.
>

Yes, I took a shot at applying the patch by myself but it wouldn't be
anybody's cup of tea. MM has radically changed since 2.6.32

>
> Please provide the full log messages for this error. If the messages
> seem to be produced continually then send all messages produced in 1
> second (the numbers on the left are time in seconds).
>


The messages are produced at 60 seconds interval. Sorry for the bad
stack trace in the first posting, I hope the full 1-minute stack trace
in my second posting was enough.

I'll keep you posted with results of my tests, so that anyone else
running in the same problem can follow my steps, in case.

G.
--
Giuseppe Lavagetto, Ph.d.
Systems Manager and Developer - Gruppo Immobiliare.it s.r.l.




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1301817049.5670.318.camel@bunny">http://lists.debian.org/1301817049.5670.318.camel@bunny


All times are GMT. The time now is 01:05 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.