Please help me understand why oom-killer was invoked?
From the SAR logs we can see that there is lots of memory in use by the system cache, but free is low.* How can we ensure there is more memory available in free to avoid triggering oom-killer?
From Kernel Log:
Mar 16 23:30:06 kernel: [49173666.630999] postgres invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Mar 16 23:30:06 kernel: [49173666.631008] Pid: 14865, comm: postgres Not tainted 2.6.26-2-686-bigmem #1
Mar 16 23:30:06 kernel: [49173666.631019]* [<c016004e>] oom_kill_process+0x4f/0x195
Mar 16 23:30:06 kernel: [49173666.631028]* [<c0160478>] out_of_memory+0x14e/0x17f
Mar 16 23:30:06 kernel: [49173666.631032]* [<c01623e0>] __alloc_pages_internal+0x2b8/0x34e
Mar 16 23:30:06 kernel: [49173666.631037]* [<c0162482>] __alloc_pages+0x7/0x9
Mar 16 23:30:06 kernel: [49173666.631041]* [<c028cdd1>] tcp_sendmsg+0x41b/0x8d9
Mar 16 23:30:06 kernel: [49173666.631048]* [<c029468c>] tcp_current_mss+0xaa/0xc8
Mar 16 23:30:06 kernel: [49173666.631052]* [<c025fc13>] sock_sendmsg+0xc7/0xe1
Mar 16 23:30:06 kernel: [49173666.631058]* [<c0138850>] autoremove_wake_function+0x0/0x2d
Mar 16 23:30:06 kernel: [49173666.631062]* [<c029ad8e>] tcp_v4_rcv+0x545/0x597
Mar 16 23:30:06 kernel: [49173666.631066]* [<c028368f>] ip_local_deliver_finish+0xe8/0x183
Mar 16 23:30:06 kernel: [49173666.631069]* [<c028358a>] ip_rcv_finish+0x286/0x2a3
Mar 16 23:30:06 kernel: [49173666.631075]* [<c02604f7>] sys_sendto+0x105/0x130
Mar 16 23:30:06 kernel: [49173666.631077]* [<c013e5e2>] clocksource_get_next+0x39/0x3f
Mar 16 23:30:06 kernel: [49173666.631081]* [<c013d5da>] update_wall_time+0x519/0x68f
Mar 16 23:30:06 kernel: [49173666.631087]* [<c013aaa4>] hrtimer_forward+0xe4/0x100
Mar 16 23:30:06 kernel: [49173666.631090]* [<c013cfc4>] getnstimeofday+0x37/0xbc
Mar 16 23:30:06 kernel: [49173666.631095]* [<c026053b>] sys_send+0x19/0x1d
Mar 16 23:30:06 kernel: [49173666.631098]* [<c0260dd3>] sys_socketcall+0xed/0x19e
Mar 16 23:30:06 kernel: [49173666.631102]* [<c010afdb>] do_IRQ+0x52/0x63
Mar 16 23:30:06 kernel: [49173666.631106]* [<c0108857>] sysenter_past_esp+0x78/0xb1
Mar 16 23:30:06 kernel: [49173666.631111]* =======================
Mar 16 23:30:06 kernel: [49173666.631113] Mem-info:
Mar 16 23:30:06 kernel: [49173666.631114] DMA per-cpu:
Mar 16 23:30:06 kernel: [49173666.631116] CPU*** 0: hi:*** 0, btch:** 1 usd:** 0
Mar 16 23:30:06 kernel: [49173666.631117] CPU*** 1: hi:*** 0, btch:** 1 usd:** 0
Mar 16 23:30:06 kernel: [49173666.631119] CPU*** 2: hi:*** 0, btch:** 1 usd:** 0
Mar 16 23:30:06 kernel: [49173666.631120] CPU*** 3: hi:*** 0, btch:** 1 usd:** 0
Mar 16 23:30:06 kernel: [49173666.631122] Normal per-cpu:
2048kB 0*4096kB = 27432kB
Mar 16 23:30:06 kernel: [49173666.631182] 1086618 total pagecache pages
Mar 16 23:30:06 kernel: [49173666.631184] Swap cache: add 28233, delete 28233, find 12822/15537
Mar 16 23:30:06 kernel: [49173666.631186] Free swap* = 995184kB
Mar 16 23:30:06 kernel: [49173666.631187] Total swap = 995988kB
Mar 16 23:30:06 kernel: [49173666.669460] 2359296 pages of RAM
Mar 16 23:30:06 kernel: [49173666.669463] 2129920 pages of HIGHMEM
Mar 16 23:30:06 kernel: [49173666.669465] 281785 reserved pages
Mar 16 23:30:06 kernel: [49173666.669466] 46131460 pages shared
Mar 16 23:30:06 kernel: [49173666.669467] 0 pages swap cached
Mar 16 23:30:06 kernel: [49173666.669468] 222 pages dirty
Mar 16 23:30:06 kernel: [49173666.669470] 0 pages writeback
Mar 16 23:30:06 kernel: [49173666.669471] 539697 pages mapped
Mar 16 23:30:06 kernel: [49173666.669472] 11962 pages slab
Mar 16 23:30:06 kernel: [49173666.669473] 188949 pages pagetables
Mar 16 23:30:06 kernel: [49173666.669476] Out of memory: kill process 27302 (postgres) score 337392 or a child
Mar 16 23:30:06 kernel: [49173666.669567] Killed process 27304 (postgres)
Mar 16 23:30:06 kernel: [49173666.712469] postgres invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Mar 16 23:30:06 kernel: [49173666.712474] Pid: 14865, comm: postgres Not tainted 2.6.26-2-686-bigmem #1
Mar 16 23:30:06 kernel: [49173666.712489]* [<c016004e>] oom_kill_process+0x4f/0x195
Mar 16 23:30:06 kernel: [49173666.712500]* [<c0160478>] out_of_memory+0x14e/0x17f
Mar 16 23:30:06 kernel: [49173666.712505]* [<c01623e0>] __alloc_pages_internal+0x2b8/0x34e
Mar 16 23:30:06 kernel: [49173666.712510]* [<c0162482>] __alloc_pages+0x7/0x9
Mar 16 23:30:06 kernel: [49173666.712513]* [<c028cdd1>] tcp_sendmsg+0x41b/0x8d9
Mar 16 23:30:06 kernel: [49173666.712521]* [<c029468c>] tcp_current_mss+0xaa/0xc8
Mar 16 23:30:06 kernel: [49173666.712525]* [<c025fc13>] sock_sendmsg+0xc7/0xe1
Mar 16 23:30:06 kernel: [49173666.712532]* [<c0138850>] autoremove_wake_function+0x0/0x2d
Mar 16 23:30:06 kernel: [49173666.712537]* [<c029ad8e>] tcp_v4_rcv+0x545/0x597
Mar 16 23:30:06 kernel: [49173666.712541]* [<c028368f>] ip_local_deliver_finish+0xe8/0x183
Mar 16 23:30:06 kernel: [49173666.712544]* [<c028358a>] ip_rcv_finish+0x286/0x2a3
Mar 16 23:30:06 kernel: [49173666.712547]* [<c0268d31>] netif_receive_skb+0x31c/0x349
Mar 16 23:30:06 kernel: [49173666.712553]* [<c02604f7>] sys_sendto+0x105/0x130
Mar 16 23:30:06 kernel: [49173666.712556]* [<c013e5e2>] clocksource_get_next+0x39/0x3f
Mar 16 23:30:06 kernel: [49173666.712561]* [<c013d5da>] update_wall_time+0x519/0x68f
Mar 16 23:30:06 kernel: [49173666.712567]* [<c013aaa4>] hrtimer_forward+0xe4/0x100
Mar 16 23:30:06 kernel: [49173666.712570]* [<c013cfc4>] getnstimeofday+0x37/0xbc
Mar 16 23:30:06 kernel: [49173666.712575]* [<c026053b>] sys_send+0x19/0x1d
Mar 16 23:30:06 kernel: [49173666.712578]* [<c0260dd3>] sys_socketcall+0xed/0x19e
Mar 16 23:30:06 kernel: [49173666.712582]* [<c010afdb>] do_IRQ+0x52/0x63
Mar 16 23:30:06 kernel: [49173666.712586]* [<c0108857>] sysenter_past_esp+0x78/0xb1
Mar 16 23:30:06 kernel: [49173666.712598]* =======================
Mar 16 23:30:06 kernel: [49173666.712600] Mem-info:
Mar 16 23:30:06 kernel: [49173666.712601] DMA per-cpu:
Mar 16 23:30:06 kernel: [49173666.712603] CPU*** 0: hi:*** 0, btch:** 1 usd:** 0
Mar 16 23:30:06 kernel: [49173666.712670] Swap cache: add 28233, delete 28233, find 12822/15537
Mar 16 23:30:06 kernel: [49173666.712671] Free swap* = 995184kB
Mar 16 23:30:06 kernel: [49173666.712673] Total swap = 995988kB
Mar 16 23:30:06 kernel: [49173666.739537] 2359296 pages of RAM
Mar 16 23:30:06 kernel: [49173666.739537] 2129920 pages of HIGHMEM
Mar 16 23:30:06 kernel: [49173666.739537] 281785 reserved pages
Mar 16 23:30:06 kernel: [49173666.739537] 45818784 pages shared
Mar 16 23:30:06 kernel: [49173666.739537] 0 pages swap cached
Mar 16 23:30:06 kernel: [49173666.739537] 3 pages dirty
Mar 16 23:30:06 kernel: [49173666.739537] 0 pages writeback
Mar 16 23:30:06 kernel: [49173666.739537] 536777 pages mapped
Mar 16 23:30:06 kernel: [49173666.739537] 11962 pages slab
Mar 16 23:30:06 kernel: [49173666.739537] 188949 pages pagetables
Mar 16 23:30:06 kernel: [49173666.739537] Out of memory: kill process 27302 (postgres) score 335971 or a child
Mar 16 23:30:06 kernel: [49173666.739537] Killed process 27305 (postgres)
> Please help me understand why oom-killer was invoked?
You already know the answer: what does "oom" stand for?
>>From the SAR logs we can see that there is lots of memory in use by the
> system cache, but free is low. How can we ensure there is more memory
> available in free to avoid triggering oom-killer?
Switch to a 64 bit kernel and user space, including 64bit pgsql. This
gives you a flat memory space so any app can have more than 2GB virtual
address space. You already knew this as well, but you've been fighting
the switch tooth and nail, screaming "but I shouldn't _have to_, it
works now, mostly". It's the "mostly" part that is your current
problem. Switch to 64bit with its large flat memory space and you'll be
much happier.
I still can't understand why so many people are so damn resistant to
going 64bit, *especially* database users, who benefit the most from it.
--
Stan
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4F67065C.9080506@hardwarefreak.com">http://lists.debian.org/4F67065C.9080506@hardwarefreak.com
03-19-2012, 01:19 PM
hvw59601
Why was oom-killer invoked?
Stan Hoeppner wrote:
On 3/19/2012 3:48 AM, tim truman wrote:
Please help me understand why oom-killer was invoked?
You already know the answer: what does "oom" stand for?
>From the SAR logs we can see that there is lots of memory in use by the
system cache, but free is low. How can we ensure there is more memory
available in free to avoid triggering oom-killer?
Switch to a 64 bit kernel and user space, including 64bit pgsql. This
gives you a flat memory space so any app can have more than 2GB virtual
address space. You already knew this as well, but you've been fighting
the switch tooth and nail, screaming "but I shouldn't _have to_, it
works now, mostly". It's the "mostly" part that is your current
problem. Switch to 64bit with its large flat memory space and you'll be
much happier.
and some apps. run 3x faster...
I still can't understand why so many people are so damn resistant to
going 64bit, *especially* database users, who benefit the most from it.
Hugo
--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
> From the SAR logs we can see that there is lots of memory in use by the
> system cache, but free is low. How can we ensure there is more memory
> available in free to avoid triggering oom-killer?
The easiest solution is to add swap space. That will increase your
virtual memory. Then you won't run out of memory and will avoid the
out of memory killer.