FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 04-22-2011, 02:13 PM
Lesław Kopeć
 
Default Bug#620297: Same strange load calculation error on 2.6.32-30

Hello,

I've noticed the same strange load calculation inconsistency while
running kernel 2.6.32-30. Load values are reported too low for given CPU
utilization. As far as I remember older kernels from 2.6.32 line had
also the same problem. I did some tests on the same hardware using
2.6.26-26lenny2 and 2.6.30-6 kernels, but their load seems to be just right.

What's more puzzling is that load is reported correctly on 2.6.32 when
CPU cores are used in 100%, e.g. running:
# stress -c 4

reports load 4 when 4 cores are quite busy. However running about 80 php-cgi
processes when idle is greater than 20% (a rough estimation) results in
this load calculation error.

I did some searching and found similar bug report on Launchpad:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/513848

and a discussion on LKML:
http://lkml.org/lkml/2010/4/13/394

The patch proposed on LKML is the one that got applied to Ubuntu's
kernel version 2.6.32-20.29. I've patched the Debian's 2.6.32-30 and it
seems that load values are correct, although still a bit lower than on
other kernels.

Tests were run on simultaneously different servers (same hardware and same
jobs run) and I was getting similar results when I switched kernels between
machines.

*** 2.6.26-26lenny2
# vmstat 10 4
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
4 0 0 5540920 0 651616 0 0 0 0 102 277 24 2 73 0
7 0 0 5541300 0 651616 0 0 0 0 4098 12131 23 3 75 0
1 0 0 5546456 0 651616 0 0 0 0 4032 11665 23 2 75 0
0 0 0 5545568 0 651616 0 0 0 0 3646 10950 25 2 73 0

# cat /proc/loadavg
2.82 2.99 2.94 1/240 24960

*** 2.6.32-30
# vmstat 10 4
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 5521764 0 676944 0 0 0 0 62 131 27 3 71 0
6 0 0 5523952 0 676944 0 0 0 0 8339 16585 21 2 76 0
2 0 0 5525632 0 676944 0 0 0 0 9603 19163 31 3 66 0
6 0 0 5525188 0 676956 0 0 0 0 9474 17614 31 3 66 0

# cat /proc/loadavg
0.05 0.14 0.28 8/284 13343

*** 2.6.32-30 (patched)
# vmstat 10 4
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 5504608 0 657820 0 0 0 0 24 53 26 2 72 0
0 0 0 5507128 0 657820 0 0 0 0 8530 17326 23 2 75 0
6 0 0 5508288 0 657824 0 0 0 0 8634 17051 26 3 72 0
4 0 0 5506352 0 657824 0 0 0 0 9139 18106 27 3 70 0

# cat /proc/loadavg
2.31 2.10 1.94 5/284 12602


-- System Information:
Debian Release: 5.0.8
APT prefers oldstable
APT policy: (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash


--
Lesław Kopeć
Administrator

email: leslaw.kopec@nasza-klasa.pl
tel: +48 519 300 129

Nasza Klasa Sp. z o.o.
ul. Gen. J. Bema 2, 50-265 Wrocław

Sąd Rejonowy dla Wrocławia - Fabrycznej we Wrocławiu,
VI Wydział Gospodarczy Krajowego Rejestru Sądowego,
nr KRS:0000289629, NIP:898-21-22-104, REGON:020586020,
Kapitał zakładowy: 67 850,00 PLN
 
Old 04-22-2011, 02:13 PM
Lesław Kopeć
 
Default Bug#620297: Same strange load calculation error on 2.6.32-30

Hello,

I've noticed the same strange load calculation inconsistency while
running kernel 2.6.32-30. Load values are reported too low for given CPU
utilization. As far as I remember older kernels from 2.6.32 line had
also the same problem. I did some tests on the same hardware using
2.6.26-26lenny2 and 2.6.30-6 kernels, but their load seems to be just right.

What's more puzzling is that load is reported correctly on 2.6.32 when
CPU cores are used in 100%, e.g. running:
# stress -c 4

reports load 4 when 4 cores are quite busy. However running about 80 php-cgi
processes when idle is greater than 20% (a rough estimation) results in
this load calculation error.

I did some searching and found similar bug report on Launchpad:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/513848

and a discussion on LKML:
http://lkml.org/lkml/2010/4/13/394

The patch proposed on LKML is the one that got applied to Ubuntu's
kernel version 2.6.32-20.29. I've patched the Debian's 2.6.32-30 and it
seems that load values are correct, although still a bit lower than on
other kernels.

Tests were run on simultaneously different servers (same hardware and same
jobs run) and I was getting similar results when I switched kernels between
machines.

*** 2.6.26-26lenny2
# vmstat 10 4
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
4 0 0 5540920 0 651616 0 0 0 0 102 277 24 2 73 0
7 0 0 5541300 0 651616 0 0 0 0 4098 12131 23 3 75 0
1 0 0 5546456 0 651616 0 0 0 0 4032 11665 23 2 75 0
0 0 0 5545568 0 651616 0 0 0 0 3646 10950 25 2 73 0

# cat /proc/loadavg
2.82 2.99 2.94 1/240 24960

*** 2.6.32-30
# vmstat 10 4
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 5521764 0 676944 0 0 0 0 62 131 27 3 71 0
6 0 0 5523952 0 676944 0 0 0 0 8339 16585 21 2 76 0
2 0 0 5525632 0 676944 0 0 0 0 9603 19163 31 3 66 0
6 0 0 5525188 0 676956 0 0 0 0 9474 17614 31 3 66 0

# cat /proc/loadavg
0.05 0.14 0.28 8/284 13343

*** 2.6.32-30 (patched)
# vmstat 10 4
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 5504608 0 657820 0 0 0 0 24 53 26 2 72 0
0 0 0 5507128 0 657820 0 0 0 0 8530 17326 23 2 75 0
6 0 0 5508288 0 657824 0 0 0 0 8634 17051 26 3 72 0
4 0 0 5506352 0 657824 0 0 0 0 9139 18106 27 3 70 0

# cat /proc/loadavg
2.31 2.10 1.94 5/284 12602


-- System Information:
Debian Release: 5.0.8
APT prefers oldstable
APT policy: (500, 'oldstable')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/8 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash


--
Lesław Kopeć
Administrator

email: leslaw.kopec@nasza-klasa.pl
tel: +48 519 300 129

Nasza Klasa Sp. z o.o.
ul. Gen. J. Bema 2, 50-265 Wrocław

Sąd Rejonowy dla Wrocławia - Fabrycznej we Wrocławiu,
VI Wydział Gospodarczy Krajowego Rejestru Sądowego,
nr KRS:0000289629, NIP:898-21-22-104, REGON:020586020,
Kapitał zakładowy: 67 850,00 PLN
 
Old 10-28-2011, 11:58 AM
Lesław Kopeć
 
Default Bug#620297: Same strange load calculation error on 2.6.32-30

Hello again.

I've finally managed to get back to this bug and do some more testing.
There are two upstream patches that are fixing the load calculation bug:

74f5187ac8: sched: Cure load average vs NO_HZ woes
0f004f5a69: sched: Cure more NO_HZ load average woes

This time I'm using the 2.6.32-36 kernel version. I have prepared one
image with only the first patch applied (+nk0) and another with both of
them (+nk1). The standard Debian kernels (2.6.26-27 and 2.6.32-36) were
also taken into account. Each version was also complied with
CONFIG_NO_HZ=n to see if this makes any difference. A comment in
0f004f5a69 states that CONFIG_NO_HZ should produce the same load.

The results are quite confusing:

kernel load average
2.6.26-27 (NOHZ): 12.86 12.91 13.08
2.6.26-27 (HZ): 14.35 13.56 13.04
2.6.32-36 (NOHZ): 0.42 0.54 0.50
2.6.32-36 (HZ): 0.74 0.89 0.79
2.6.32-36+nk0 (NOHZ): 2.62 2.48 2.37
2.6.32-36+nk0 (HZ): 9.10 9.78 10.03
2.6.32-36+nk1 (NOHZ): 0.62 0.64 0.58
2.6.32-36+nk1 (HZ): 10.64 10.96 10.68

Running vmstat 60 4 produces almost the same output on all hosts:

procs -----------memory---------- ---swap-- -----io---- -system------cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
16 0 0 6719880 0 2091432 0 0 0 0 0 1 22 2 76 0
9 0 0 6665312 0 2091436 0 0 0 0 14685 51017 36 4 60 0
11 0 0 6711220 0 2091428 0 0 0 0 14557 50844 36 4 61 0
15 0 0 6719872 0 2091432 0 0 0 0 15073 52578 38 4 58 0
15 0 0 6716792 0 2091432 0 0 0 0 14822 49843 36 4 60 0

I've attached a comparison of all load charts and one CPU chart. The CPU
usage is very similar on all hosts and tiny variations shouldn't matter
in the long run. Especially since load values are so different. I've
also attached a modified patch 74f5187ac8 (just a minor correction) that
can be applied cleanly to Debian's 2.6.32 sources.

The bug seems to only appear if the cores go into idle. When I run a
process that hogs a given number of cores the load is increased by a
corresponding amount. This happens on all kernels that I've tested.

The hosts are diskless (root on a NFS share) PHP workers that have the
same hardware (24 CPU cores) and do the same kind of work. I getting
similar results on disk based systems as well.

Conclusions:
- load values are quite similar on HZ kernels with exception of 2.6.32-36
- load is suspiciously low on 2.6.32 NOHZ kernels
- patch 74f5187ac8 seems to a step in the right direction
- applying patch 0f004f5a69 on top of first one produces the similar
results as an unpatched kernel

Can somebody verify that the patches behave the same way? Should the
patches be applied to kernels from 2.6.32 line? Is there something I'm
missing? I'm running out of ideas and would appreciate any hints.

--
Lesław Kopeć
 

Thread Tools




All times are GMT. The time now is 09:18 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org