Bug#597489: Kswapd hanging: patch available from lkml
Hi, posting here since I am evidently reproducing this bug.
Under load (relatively mild anyway) a 24-core X5660, 24GB RAM Dell Poweredge 710 gets stuck with 100% cpu usage (meaning one core gets stuck in running kswapd). The peculiarity of the situation is that NO swap is being allocated, si and so columns in vmstat output show no swap usage, and swap was correctly mounted. Also, it was not running completely out of RAM. The machine eventually freezed so I was not able to get any information apart from the kernel stack trace, which I post at the end of the report. This issue seems to be a known bug in the linux kernel, and as far as I understand a patch is available (and already included in RH kernels): http://kerneltrap.org/mailarchive/linux-kernel/2010/10/27/4637977 I'll try to reproduce the problem, in the meantime do you think the solution Mel proposed could be ported back to the stable kernel? Kernel stack trace (excerpt) is attached. Best, Giuseppe -- Giuseppe Lavagetto, Ph.d. Systems Manager and Developer - Gruppo Immobiliare.it s.r.l. |
Bug#597489: Kswapd hanging: patch available from lkml
Sorry,
I forgot to mention that we are running debian squeeze (stable) on a freshly installed system. So the bug seems to still be there in the stable kernel, and not just in the backported one. Giuseppe -- Giuseppe Lavagetto, Ph.d. Systems Manager and Developer - Gruppo Immobiliare.it s.r.l. -- To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 1301572933.2542.51.camel@bunny">http://lists.debian.org/1301572933.2542.51.camel@bunny |
Bug#597489: Kswapd hanging: patch available from lkml
On Thu, 2011-03-31 at 12:30 +0200, Giuseppe Lavagetto wrote:
> Hi, posting here since I am evidently reproducing this bug. > > Under load (relatively mild anyway) a 24-core X5660, 24GB RAM Dell > Poweredge 710 gets stuck with 100% cpu usage (meaning one core gets > stuck in running kswapd). The peculiarity of the situation is that NO > swap is being allocated, si and so columns in vmstat output show no swap > usage, and swap was correctly mounted. Also, it was not running > completely out of RAM. > > The machine eventually freezed so I was not able to get any information > apart from the kernel stack trace, which I post at the end of the > report. > > This issue seems to be a known bug in the linux kernel, and as far as I > understand a patch is available (and already included in RH kernels): > > http://kerneltrap.org/mailarchive/linux-kernel/2010/10/27/4637977 I really don't think that deals with the same bug you are seeing. > I'll try to reproduce the problem, in the meantime do you think the > solution Mel proposed could be ported back to the stable kernel? Perhaps, if Mel or one of the upstream developers does it. I don't believe anyone in the Debian kernel team is sufficiently familiar with the VMM to backport this significant change. > Kernel stack trace (excerpt) is attached. > > Best, > Giuseppe > application log attachment (kernel.log) > [86613.384580] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc drbd ixgbe dca lru_cache cn ipmi_si mpt2sas scsi_transport_sas mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu bonding ext3 jbd mbcache loop snd_pcm dcdbas joydev power_meter processor button psmouse snd_timer serio_raw snd soundcore snd_page_alloc evdev pcspkr xfs exportfs sg sr_mod cdrom ata_generic usbhid hid uhci_hcd sd_mod ses crc_t10dif enclosure thermal ehci_hcd ata_piix usbcore libata megaraid_sas nls_base scsi_mod bnx2 thermal_sys [last unloaded: drbd] > [86613.384610] CPU 2: > [86613.384611] Modules linked in: nfs lockd fscache nfs_acl auth_rpcgss sunrpc drbd ixgbe dca lru_cache cn ipmi_si mpt2sas scsi_transport_sas mptctl mptbase ipmi_devintf ipmi_msghandler dell_rbu bonding ext3 jbd mbcache loop snd_pcm dcdbas joydev power_meter processor button psmouse snd_timer serio_raw snd soundcore snd_page_alloc evdev pcspkr xfs exportfs sg sr_mod cdrom ata_generic usbhid hid uhci_hcd sd_mod ses crc_t10dif enclosure thermal ehci_hcd ata_piix usbcore libata megaraid_sas nls_base scsi_mod bnx2 thermal_sys [last unloaded: drbd] > [86613.384635] Pid: 207, comm: kswapd0 Not tainted 2.6.32-5-amd64 #1 PowerEdge R710 > [86613.384636] RIP: 0010:[<ffffffff810b3f19>] [<ffffffff810b3f19>] find_get_pages+0x5f/0xbb > [86613.384645] RSP: 0018:ffff88062c869bc0 EFLAGS: 00000293 > [86613.384646] RAX: ffffffffffffffff RBX: ffff88062c869c50 RCX: 0000000000000000 > [86613.384648] RDX: 0000000000000040 RSI: ffffea0002bc56e0 RDI: ffffea0002bc56d8 > [86613.384649] RBP: ffffffff8101166e R08: ffff88062c869b80 R09: 0000000000000002 > [86613.384651] R10: 0000000000000040 R11: ffff880093d74ad8 R12: 0000000000000005 > [86613.384653] R13: 0000000000000286 R14: ffff88000000b100 R15: ffff88000000c780 > [86613.384655] FS: 0000000000000000(0000) GS:ffff88033ac20000(0000) knlGS:0000000000000000 > [86613.384656] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b > [86613.384658] CR2: 00007fffd81dd038 CR3: 0000000001001000 CR4: 00000000000006e0 > [86613.384659] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [86613.384661] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [86613.384663] Call Trace: > [86613.384668] [<ffffffff810bc034>] ? pagevec_lookup+0x17/0x1e > [86613.384671] [<ffffffff810bcdf1>] ? invalidate_mapping_pages+0xb9/0xdb > [86613.384675] [<ffffffff81100573>] ? shrink_icache_memory+0xfc/0x228 > [86613.384678] [<ffffffff810bf3f5>] ? shrink_slab+0xe0/0x153 > [86613.384680] [<ffffffff810bfc98>] ? kswapd+0x4d9/0x686 > [86613.384683] [<ffffffff810bd30f>] ? isolate_pages_global+0x0/0x20f > [86613.384687] [<ffffffff81064e96>] ? autoremove_wake_function+0x0/0x2e > [86613.384691] [<ffffffff8103aa56>] ? __wake_up_common+0x44/0x72 > [86613.384693] [<ffffffff810bf7bf>] ? kswapd+0x0/0x686 > [86613.384695] [<ffffffff81064bc9>] ? kthread+0x79/0x81 > [86613.384700] [<ffffffff81011baa>] ? child_rip+0xa/0x20 > [86613.384702] [<ffffffff81064b50>] ? kthread+0x0/0x81 > [86613.384703] [<ffffffff81011ba0>] ? child_rip+0x0/0x20 Please provide the full log messages for this error. If the messages seem to be produced continually then send all messages produced in 1 second (the numbers on the left are time in seconds). Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse. |
Bug#597489: Kswapd hanging: patch available from lkml
On Sun, 2011-04-03 at 05:43 +0100, Ben Hutchings wrote:
> I really don't think that deals with the same bug you are seeing. > Since now I have a test case for the problem, I'll compile 2.6.38.2 and try to see if recent modifications to MM do help in our situation, or if something else is causing the kernel hang (maybe nfs3 client, as Matteo was suggesting?) > Perhaps, if Mel or one of the upstream developers does it. I don't > believe anyone in the Debian kernel team is sufficiently familiar with > the VMM to backport this significant change. > Yes, I took a shot at applying the patch by myself but it wouldn't be anybody's cup of tea. MM has radically changed since 2.6.32 > > Please provide the full log messages for this error. If the messages > seem to be produced continually then send all messages produced in 1 > second (the numbers on the left are time in seconds). > The messages are produced at 60 seconds interval. Sorry for the bad stack trace in the first posting, I hope the full 1-minute stack trace in my second posting was enough. I'll keep you posted with results of my tests, so that anyone else running in the same problem can follow my steps, in case. G. -- Giuseppe Lavagetto, Ph.d. Systems Manager and Developer - Gruppo Immobiliare.it s.r.l. -- To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 1301817049.5670.318.camel@bunny">http://lists.debian.org/1301817049.5670.318.camel@bunny |
| All times are GMT. The time now is 02:48 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.