Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian User (http://www.linux-archive.org/debian-user/)
-   -   Why was oom-killer invoked? (http://www.linux-archive.org/debian-user/646247-why-oom-killer-invoked.html)

tim truman 03-19-2012 07:48 AM

Why was oom-killer invoked?
 
Hi,

Please help me understand why oom-killer was invoked?

From the SAR logs we can see that there is lots of memory in use by the system cache, but free is low.* How can we ensure there is more memory available in free to avoid triggering oom-killer?


From Kernel Log:
Mar 16 23:30:06 kernel: [49173666.630999] postgres invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Mar 16 23:30:06 kernel: [49173666.631008] Pid: 14865, comm: postgres Not tainted 2.6.26-2-686-bigmem #1

Mar 16 23:30:06 kernel: [49173666.631019]* [<c016004e>] oom_kill_process+0x4f/0x195
Mar 16 23:30:06 kernel: [49173666.631028]* [<c0160478>] out_of_memory+0x14e/0x17f
Mar 16 23:30:06 kernel: [49173666.631032]* [<c01623e0>] __alloc_pages_internal+0x2b8/0x34e

Mar 16 23:30:06 kernel: [49173666.631037]* [<c0162482>] __alloc_pages+0x7/0x9
Mar 16 23:30:06 kernel: [49173666.631041]* [<c028cdd1>] tcp_sendmsg+0x41b/0x8d9
Mar 16 23:30:06 kernel: [49173666.631048]* [<c029468c>] tcp_current_mss+0xaa/0xc8

Mar 16 23:30:06 kernel: [49173666.631052]* [<c025fc13>] sock_sendmsg+0xc7/0xe1
Mar 16 23:30:06 kernel: [49173666.631058]* [<c0138850>] autoremove_wake_function+0x0/0x2d
Mar 16 23:30:06 kernel: [49173666.631062]* [<c029ad8e>] tcp_v4_rcv+0x545/0x597

Mar 16 23:30:06 kernel: [49173666.631066]* [<c028368f>] ip_local_deliver_finish+0xe8/0x183
Mar 16 23:30:06 kernel: [49173666.631069]* [<c028358a>] ip_rcv_finish+0x286/0x2a3
Mar 16 23:30:06 kernel: [49173666.631075]* [<c02604f7>] sys_sendto+0x105/0x130

Mar 16 23:30:06 kernel: [49173666.631077]* [<c013e5e2>] clocksource_get_next+0x39/0x3f
Mar 16 23:30:06 kernel: [49173666.631081]* [<c013d5da>] update_wall_time+0x519/0x68f
Mar 16 23:30:06 kernel: [49173666.631087]* [<c013aaa4>] hrtimer_forward+0xe4/0x100

Mar 16 23:30:06 kernel: [49173666.631090]* [<c013cfc4>] getnstimeofday+0x37/0xbc
Mar 16 23:30:06 kernel: [49173666.631095]* [<c026053b>] sys_send+0x19/0x1d
Mar 16 23:30:06 kernel: [49173666.631098]* [<c0260dd3>] sys_socketcall+0xed/0x19e

Mar 16 23:30:06 kernel: [49173666.631102]* [<c010afdb>] do_IRQ+0x52/0x63
Mar 16 23:30:06 kernel: [49173666.631106]* [<c0108857>] sysenter_past_esp+0x78/0xb1
Mar 16 23:30:06 kernel: [49173666.631111]* =======================

Mar 16 23:30:06 kernel: [49173666.631113] Mem-info:
Mar 16 23:30:06 kernel: [49173666.631114] DMA per-cpu:
Mar 16 23:30:06 kernel: [49173666.631116] CPU*** 0: hi:*** 0, btch:** 1 usd:** 0
Mar 16 23:30:06 kernel: [49173666.631117] CPU*** 1: hi:*** 0, btch:** 1 usd:** 0

Mar 16 23:30:06 kernel: [49173666.631119] CPU*** 2: hi:*** 0, btch:** 1 usd:** 0
Mar 16 23:30:06 kernel: [49173666.631120] CPU*** 3: hi:*** 0, btch:** 1 usd:** 0
Mar 16 23:30:06 kernel: [49173666.631122] Normal per-cpu:

Mar 16 23:30:06 kernel: [49173666.631123] CPU*** 0: hi:* 186, btch:* 31 usd: 181
Mar 16 23:30:06 kernel: [49173666.631125] CPU*** 1: hi:* 186, btch:* 31 usd: 179
Mar 16 23:30:06 kernel: [49173666.631126] CPU*** 2: hi:* 186, btch:* 31 usd: 174

Mar 16 23:30:06 kernel: [49173666.631128] CPU*** 3: hi:* 186, btch:* 31 usd: 166
Mar 16 23:30:06 kernel: [49173666.631129] HighMem per-cpu:
Mar 16 23:30:06 kernel: [49173666.631131] CPU*** 0: hi:* 186, btch:* 31 usd: 160

Mar 16 23:30:06 kernel: [49173666.631132] CPU*** 1: hi:* 186, btch:* 31 usd:* 15
Mar 16 23:30:06 kernel: [49173666.631134] CPU*** 2: hi:* 186, btch:* 31 usd: 126
Mar 16 23:30:06 kernel: [49173666.631135] CPU*** 3: hi:* 186, btch:* 31 usd: 150

Mar 16 23:30:06 kernel: [49173666.631138] Active:1648305 inactive:211079 dirty:222 writeback:0 unstable:0
Mar 16 23:30:06 kernel: [49173666.631139]* free:8709 slab:11962 mapped:539697 pagetables:188949 bounce:0
Mar 16 23:30:06 kernel: [49173666.631142] DMA free:3544kB min:68kB low:84kB high:100kB active:8kB inactive:4kB present:16256k

B pages_scanned:0 all_unreclaimable? no
Mar 16 23:30:06 kernel: [49173666.631144] lowmem_reserve[]: 0 873 9128 9128
Mar 16 23:30:06 kernel: [49173666.631148] Normal free:3652kB min:3744kB low:4680kB high:5616kB active:320kB inactive:184kB pr

esent:894080kB pages_scanned:1012 all_unreclaimable? yes
Mar 16 23:30:06 kernel: [49173666.631151] lowmem_reserve[]: 0 0 66040 66040
Mar 16 23:30:06 kernel: [49173666.631155] HighMem free:27640kB min:512kB low:9368kB high:18228kB active:6592892kB inactive:84

4128kB present:8453120kB pages_scanned:0 all_unreclaimable? no
Mar 16 23:30:06 kernel: [49173666.631157] lowmem_reserve[]: 0 0 0 0
Mar 16 23:30:06 kernel: [49173666.631160] DMA: 387*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0

*4096kB = 3588kB
Mar 16 23:30:06 kernel: [49173666.631167] Normal: 21*4kB 6*8kB 2*16kB 3*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB
*0*4096kB = 3780kB
Mar 16 23:30:06 kernel: [49173666.631173] HighMem: 2228*4kB 2061*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*

2048kB 0*4096kB = 27432kB
Mar 16 23:30:06 kernel: [49173666.631182] 1086618 total pagecache pages
Mar 16 23:30:06 kernel: [49173666.631184] Swap cache: add 28233, delete 28233, find 12822/15537
Mar 16 23:30:06 kernel: [49173666.631186] Free swap* = 995184kB

Mar 16 23:30:06 kernel: [49173666.631187] Total swap = 995988kB
Mar 16 23:30:06 kernel: [49173666.669460] 2359296 pages of RAM
Mar 16 23:30:06 kernel: [49173666.669463] 2129920 pages of HIGHMEM
Mar 16 23:30:06 kernel: [49173666.669465] 281785 reserved pages

Mar 16 23:30:06 kernel: [49173666.669466] 46131460 pages shared
Mar 16 23:30:06 kernel: [49173666.669467] 0 pages swap cached
Mar 16 23:30:06 kernel: [49173666.669468] 222 pages dirty
Mar 16 23:30:06 kernel: [49173666.669470] 0 pages writeback

Mar 16 23:30:06 kernel: [49173666.669471] 539697 pages mapped
Mar 16 23:30:06 kernel: [49173666.669472] 11962 pages slab
Mar 16 23:30:06 kernel: [49173666.669473] 188949 pages pagetables
Mar 16 23:30:06 kernel: [49173666.669476] Out of memory: kill process 27302 (postgres) score 337392 or a child

Mar 16 23:30:06 kernel: [49173666.669567] Killed process 27304 (postgres)




Mar 16 23:30:06 kernel: [49173666.712469] postgres invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Mar 16 23:30:06 kernel: [49173666.712474] Pid: 14865, comm: postgres Not tainted 2.6.26-2-686-bigmem #1

Mar 16 23:30:06 kernel: [49173666.712489]* [<c016004e>] oom_kill_process+0x4f/0x195
Mar 16 23:30:06 kernel: [49173666.712500]* [<c0160478>] out_of_memory+0x14e/0x17f
Mar 16 23:30:06 kernel: [49173666.712505]* [<c01623e0>] __alloc_pages_internal+0x2b8/0x34e

Mar 16 23:30:06 kernel: [49173666.712510]* [<c0162482>] __alloc_pages+0x7/0x9
Mar 16 23:30:06 kernel: [49173666.712513]* [<c028cdd1>] tcp_sendmsg+0x41b/0x8d9
Mar 16 23:30:06 kernel: [49173666.712521]* [<c029468c>] tcp_current_mss+0xaa/0xc8

Mar 16 23:30:06 kernel: [49173666.712525]* [<c025fc13>] sock_sendmsg+0xc7/0xe1
Mar 16 23:30:06 kernel: [49173666.712532]* [<c0138850>] autoremove_wake_function+0x0/0x2d
Mar 16 23:30:06 kernel: [49173666.712537]* [<c029ad8e>] tcp_v4_rcv+0x545/0x597

Mar 16 23:30:06 kernel: [49173666.712541]* [<c028368f>] ip_local_deliver_finish+0xe8/0x183
Mar 16 23:30:06 kernel: [49173666.712544]* [<c028358a>] ip_rcv_finish+0x286/0x2a3
Mar 16 23:30:06 kernel: [49173666.712547]* [<c0268d31>] netif_receive_skb+0x31c/0x349

Mar 16 23:30:06 kernel: [49173666.712553]* [<c02604f7>] sys_sendto+0x105/0x130
Mar 16 23:30:06 kernel: [49173666.712556]* [<c013e5e2>] clocksource_get_next+0x39/0x3f
Mar 16 23:30:06 kernel: [49173666.712561]* [<c013d5da>] update_wall_time+0x519/0x68f

Mar 16 23:30:06 kernel: [49173666.712567]* [<c013aaa4>] hrtimer_forward+0xe4/0x100
Mar 16 23:30:06 kernel: [49173666.712570]* [<c013cfc4>] getnstimeofday+0x37/0xbc
Mar 16 23:30:06 kernel: [49173666.712575]* [<c026053b>] sys_send+0x19/0x1d

Mar 16 23:30:06 kernel: [49173666.712578]* [<c0260dd3>] sys_socketcall+0xed/0x19e
Mar 16 23:30:06 kernel: [49173666.712582]* [<c010afdb>] do_IRQ+0x52/0x63
Mar 16 23:30:06 kernel: [49173666.712586]* [<c0108857>] sysenter_past_esp+0x78/0xb1

Mar 16 23:30:06 kernel: [49173666.712598]* =======================
Mar 16 23:30:06 kernel: [49173666.712600] Mem-info:
Mar 16 23:30:06 kernel: [49173666.712601] DMA per-cpu:
Mar 16 23:30:06 kernel: [49173666.712603] CPU*** 0: hi:*** 0, btch:** 1 usd:** 0

Mar 16 23:30:06 kernel: [49173666.712604] CPU*** 1: hi:*** 0, btch:** 1 usd:** 0
Mar 16 23:30:06 kernel: [49173666.712606] CPU*** 2: hi:*** 0, btch:** 1 usd:** 0
Mar 16 23:30:06 kernel: [49173666.712607] CPU*** 3: hi:*** 0, btch:** 1 usd:** 0

Mar 16 23:30:06 kernel: [49173666.712609] Normal per-cpu:
Mar 16 23:30:06 kernel: [49173666.712610] CPU*** 0: hi:* 186, btch:* 31 usd: 156
Mar 16 23:30:06 kernel: [49173666.712612] CPU*** 1: hi:* 186, btch:* 31 usd: 155

Mar 16 23:30:06 kernel: [49173666.712614] CPU*** 2: hi:* 186, btch:* 31 usd: 177
Mar 16 23:30:06 kernel: [49173666.712615] CPU*** 3: hi:* 186, btch:* 31 usd: 180
Mar 16 23:30:06 kernel: [49173666.712616] HighMem per-cpu:

Mar 16 23:30:06 kernel: [49173666.712618] CPU*** 0: hi:* 186, btch:* 31 usd: 159
Mar 16 23:30:06 kernel: [49173666.712620] CPU*** 1: hi:* 186, btch:* 31 usd:* 13
Mar 16 23:30:06 kernel: [49173666.712621] CPU*** 2: hi:* 186, btch:* 31 usd: 126

Mar 16 23:30:06 kernel: [49173666.712623] CPU*** 3: hi:* 186, btch:* 31 usd: 149
Mar 16 23:30:06 kernel: [49173666.712625] Active:1647934 inactive:211078 dirty:3 writeback:0 unstable:0
Mar 16 23:30:06 kernel: [49173666.712626]* free:9053 slab:11962 mapped:537653 pagetables:188949 bounce:0

Mar 16 23:30:06 kernel: [49173666.712629] DMA free:3544kB min:68kB low:84kB high:100kB active:0kB inactive:44kB present:16256
kB pages_scanned:33 all_unreclaimable? no
Mar 16 23:30:06 kernel: [49173666.712632] lowmem_reserve[]: 0 873 9128 9128

Mar 16 23:30:06 kernel: [49173666.712636] Normal free:3852kB min:3744kB low:4680kB high:5616kB active:332kB inactive:140kB pr
esent:894080kB pages_scanned:917 all_unreclaimable? yes
Mar 16 23:30:06 kernel: [49173666.712638] lowmem_reserve[]: 0 0 66040 66040

Mar 16 23:30:06 kernel: [49173666.712642] HighMem free:28816kB min:512kB low:9368kB high:18228kB active:6591432kB inactive:84
4128kB present:8453120kB pages_scanned:0 all_unreclaimable? no
Mar 16 23:30:06 kernel: [49173666.712644] lowmem_reserve[]: 0 0 0 0

Mar 16 23:30:06 kernel: [49173666.712647] DMA: 388*4kB 1*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0
*4096kB = 3592kB
Mar 16 23:30:06 kernel: [49173666.712654] Normal: 83*4kB 6*8kB 2*16kB 3*32kB 1*64kB 1*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB

*0*4096kB = 4028kB
Mar 16 23:30:06 kernel: [49173666.712661] HighMem: 2565*4kB 2063*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*
2048kB 0*4096kB = 28796kB
Mar 16 23:30:06 kernel: [49173666.712668] 1086618 total pagecache pages

Mar 16 23:30:06 kernel: [49173666.712670] Swap cache: add 28233, delete 28233, find 12822/15537
Mar 16 23:30:06 kernel: [49173666.712671] Free swap* = 995184kB
Mar 16 23:30:06 kernel: [49173666.712673] Total swap = 995988kB

Mar 16 23:30:06 kernel: [49173666.739537] 2359296 pages of RAM
Mar 16 23:30:06 kernel: [49173666.739537] 2129920 pages of HIGHMEM
Mar 16 23:30:06 kernel: [49173666.739537] 281785 reserved pages
Mar 16 23:30:06 kernel: [49173666.739537] 45818784 pages shared

Mar 16 23:30:06 kernel: [49173666.739537] 0 pages swap cached
Mar 16 23:30:06 kernel: [49173666.739537] 3 pages dirty
Mar 16 23:30:06 kernel: [49173666.739537] 0 pages writeback
Mar 16 23:30:06 kernel: [49173666.739537] 536777 pages mapped

Mar 16 23:30:06 kernel: [49173666.739537] 11962 pages slab
Mar 16 23:30:06 kernel: [49173666.739537] 188949 pages pagetables
Mar 16 23:30:06 kernel: [49173666.739537] Out of memory: kill process 27302 (postgres) score 335971 or a child

Mar 16 23:30:06 kernel: [49173666.739537] Killed process 27305 (postgres)

Output from SAR:
22:05:01******* 28904** 8281140**** 99.65**** 10552** 4357216*** 995184****** 804***** 0.08******** 0
22:15:02******* 27056** 8282988**** 99.67**** 11428** 4357428*** 995184****** 804***** 0.08******** 0

22:25:01******* 26740** 8283304**** 99.68**** 10404** 4359928*** 995184****** 804***** 0.08******** 0
22:35:01******* 27020** 8283024**** 99.67**** 11748** 4357312*** 995184****** 804***** 0.08******** 0
22:45:01******* 29172** 8280872**** 99.65**** 10872** 4355748*** 995184****** 804***** 0.08******** 0

22:55:02******* 29092** 8280952**** 99.65***** 8352** 4357176*** 995184****** 804***** 0.08******** 0
23:05:01******* 94768** 8215276**** 98.86**** 13384** 4283616*** 995184****** 804***** 0.08******** 0
23:15:01******* 23720** 8286324**** 99.71**** 10372** 4360112*** 995184****** 804***** 0.08******** 0

23:25:01******* 34256** 8275788**** 99.59***** 6896** 4352652*** 995184****** 804***** 0.08******** 0
23:35:01***** 3809472** 4500572**** 54.16***** 1652** 4346532*** 995184****** 804***** 0.08******** 0
23:45:01***** 3220276** 5089768**** 61.25**** 29880** 4891584*** 995184****** 804***** 0.08******** 0

23:55:01***** 3219504** 5090540**** 61.26**** 30656** 4891720*** 995184****** 804***** 0.08******** 0
23:59:01***** 3219276** 5090768**** 61.26**** 30996** 4891776*** 995184****** 804***** 0.08******** 0
00:00:01***** 3218864** 5091180**** 61.27**** 31092** 4891808*** 995184****** 804***** 0.08******** 0

Average:****** 172503** 8137541**** 97.92**** 16271** 4345654*** 995204****** 784***** 0.08******** 4


************ total****** used****** free**** shared*** buffers**** cached
Mem:********* 8115****** 7968******* 146********* 0******** 23****** 5575

-/+ buffers/cache:****** 2369****** 5745
Swap:********* 972******* 127******* 845


TIA,

Mit

Stan Hoeppner 03-19-2012 09:11 AM

Why was oom-killer invoked?
 
On 3/19/2012 3:48 AM, tim truman wrote:

> Please help me understand why oom-killer was invoked?

You already know the answer: what does "oom" stand for?

>>From the SAR logs we can see that there is lots of memory in use by the
> system cache, but free is low. How can we ensure there is more memory
> available in free to avoid triggering oom-killer?

Switch to a 64 bit kernel and user space, including 64bit pgsql. This
gives you a flat memory space so any app can have more than 2GB virtual
address space. You already knew this as well, but you've been fighting
the switch tooth and nail, screaming "but I shouldn't _have to_, it
works now, mostly". It's the "mostly" part that is your current
problem. Switch to 64bit with its large flat memory space and you'll be
much happier.

I still can't understand why so many people are so damn resistant to
going 64bit, *especially* database users, who benefit the most from it.

--
Stan


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4F67065C.9080506@hardwarefreak.com">http://lists.debian.org/4F67065C.9080506@hardwarefreak.com

hvw59601 03-19-2012 01:19 PM

Why was oom-killer invoked?
 
Stan Hoeppner wrote:

On 3/19/2012 3:48 AM, tim truman wrote:


Please help me understand why oom-killer was invoked?


You already know the answer: what does "oom" stand for?


>From the SAR logs we can see that there is lots of memory in use by the
system cache, but free is low. How can we ensure there is more memory
available in free to avoid triggering oom-killer?


Switch to a 64 bit kernel and user space, including 64bit pgsql. This
gives you a flat memory space so any app can have more than 2GB virtual
address space. You already knew this as well, but you've been fighting
the switch tooth and nail, screaming "but I shouldn't _have to_, it
works now, mostly". It's the "mostly" part that is your current
problem. Switch to 64bit with its large flat memory space and you'll be
much happier.


and some apps. run 3x faster...




I still can't understand why so many people are so damn resistant to
going 64bit, *especially* database users, who benefit the most from it.



Hugo


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org

Archive: jk7f9s$kce$1@dough.gmane.org">http://lists.debian.org/jk7f9s$kce$1@dough.gmane.org

Bob Proulx 03-20-2012 03:59 AM

Why was oom-killer invoked?
 
tim truman wrote:
> Please help me understand why oom-killer was invoked?

You already know that the reason is that your system was out of
memory. Why even ask.

The OOM killer is a terrible thing. I have ranted about it often.
Turning it off is the best way to avoid it.

I have previously ranted about this in some detail here:

http://lists.debian.org/debian-user/2007/08/msg00022.html

And more recently with a little more rationale here:

http://lists.debian.org/debian-user/2008/04/msg02554.html

> From the SAR logs we can see that there is lots of memory in use by the
> system cache, but free is low. How can we ensure there is more memory
> available in free to avoid triggering oom-killer?

The easiest solution is to add swap space. That will increase your
virtual memory. Then you won't run out of memory and will avoid the
out of memory killer.

> total used free shared buffers cached
> Mem: 8115 7968 146 0 23 5575
> -/+ buffers/cache: 2369 5745
> Swap: 972 127 845

That is a very small amount of memory. I wouldn't have guessed that a
system could run with that small of memory these days.

Bob


All times are GMT. The time now is 04:14 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.