FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 12-30-2009, 03:59 AM
John R Pierce
 
Default Find reason for heavy load

Noob Centos Admin wrote:
> My Centos 5 server has seen the average load jumped through the roof
> recently despite having no major additional clients placed on it.
> Previously, I was looking at an average of less than 0.6 load, I had a
> monitoring script that sends an email warning me if the current load
> stayed above 0.6 for more than 2 minutes. This script used to trigger
> perhaps once an hour during peak periods. Even so, I seldom see
> numbers higher than 1.x
>
> On 4th Dec, somebody from an Indian IP range started hammering my SMTP
> service, attempting to use it as an open relay. Naturally that didn't
> work and only end up budging my typical 400KB daily log report into
> 2MB~4MB affairs.
>
> After observing a few days to determine the IP range, I started
> blocking the Indian subnet with apf. Initially I had problems with
> getting apf to wok properly but after a couple of days managed to get
> the block working and my daily log went back down to expected size
> when all those connection attempts disappear from exim's log.
>
> Now this is when my server load started to shoot through the roof with
> figures like 8.64 5.90 3.62 being reported by my monitoring script,
> triggering so often. I had to raise my threshold to 1.6 to keep my own
> script from spamming myself.
>
> I've tried changing several things on the server, since initially it
> seems like the high load may be due to I/O wait. So I turning off
> non-essential services like OpenNMS to see if that had any effect. I
> also turned off apf and inserted rules manually into iptables to
> reduce the number of iptable rules the system has to process.
>
> All that doesn't seem to help much, I'm still getting consistent
> server loads in the 2.x to 3.x range almost all the time.
>
> The problem is using top, none of my processes are showing abnormal
> CPU%, most are well under 5%, manually adding them up doesn't equate
> the 200% to 300% the load figures of 2.x and 3.x are indicating.
>
> Even top's own summary says CPU % is in the 20~30% range, what's
> worrying is the System% is also in the same range. I have no idea what
> is "system" doing since it appears that anything running inside the
> kernel is lumped under "system". Or why even totalling both % up, I
> would expect 50~60% to translate to the expected load of 0.5~0.6 yet
> system load stats is 5x what's expected.
>
> I've installed utilities like dstat to try to see if I can figure out
> which process is making the system calls that is clogging up the
> server but either I don't understand it or it's not the right tool.
>
> So I'll appreciate some advice on how/what should I do next to
> identify the cause. Thanks in advance!

last time I saw something like that, it was a bunch of chinese 'bots'
hammering on my public services like ssh. another admin had turned
pop3 on too, this created a very heavy load yet they didn't show up in
top (bunches of pop3 and ssh processes showed up in ps -auxww, however,
plug netstat -an


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-30-2009, 04:55 AM
Ross Walker
 
Default Find reason for heavy load

On Dec 29, 2009, at 11:44 PM, Noob Centos Admin
<centos.admin@gmail.com> wrote:

> My Centos 5 server has seen the average load jumped through the roof
> recently despite having no major additional clients placed on it.
> Previously, I was looking at an average of less than 0.6 load, I had
> a monitoring script that sends an email warning me if the current
> load stayed above 0.6 for more than 2 minutes. This script used to
> trigger perhaps once an hour during peak periods. Even so, I seldom
> see numbers higher than 1.x
>
> On 4th Dec, somebody from an Indian IP range started hammering my
> SMTP service, attempting to use it as an open relay. Naturally that
> didn't work and only end up budging my typical 400KB daily log
> report into 2MB~4MB affairs.
>
> After observing a few days to determine the IP range, I started
> blocking the Indian subnet with apf. Initially I had problems with
> getting apf to wok properly but after a couple of days managed to
> get the block working and my daily log went back down to expected
> size when all those connection attempts disappear from exim's log.
>
> Now this is when my server load started to shoot through the roof
> with figures like 8.64 5.90 3.62 being reported by my monitoring
> script, triggering so often. I had to raise my threshold to 1.6 to
> keep my own script from spamming myself.
>
> I've tried changing several things on the server, since initially it
> seems like the high load may be due to I/O wait. So I turning off
> non-essential services like OpenNMS to see if that had any effect. I
> also turned off apf and inserted rules manually into iptables to
> reduce the number of iptable rules the system has to process.
>
> All that doesn't seem to help much, I'm still getting consistent
> server loads in the 2.x to 3.x range almost all the time.
>
> The problem is using top, none of my processes are showing abnormal
> CPU%, most are well under 5%, manually adding them up doesn't equate
> the 200% to 300% the load figures of 2.x and 3.x are indicating.
>
> Even top's own summary says CPU % is in the 20~30% range, what's
> worrying is the System% is also in the same range. I have no idea
> what is "system" doing since it appears that anything running inside
> the kernel is lumped under "system". Or why even totalling both %
> up, I would expect 50~60% to translate to the expected load of
> 0.5~0.6 yet system load stats is 5x what's expected.
>
> I've installed utilities like dstat to try to see if I can figure
> out which process is making the system calls that is clogging up the
> server but either I don't understand it or it's not the right tool.
>
> So I'll appreciate some advice on how/what should I do next to
> identify the cause. Thanks in advance!

Try blocking the IPs on the router and see if that helps.

You can also run iostat and look at the disk usage which also
generates load.

How many cores does your machine have? Load avg is calculated for a
single core, so a quad core would reach 100% utilization at a load of
4, but high iowaits can generate an artificially high load avg as well
(and why one sees greater than 100% utilization).

I really wish load would be broken down as CPU/memory/disk instead of
the ambiguous load avg, and show network read/write utilization in
ifconfig.

-Ross

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-30-2009, 04:56 AM
Noob Centos Admin
 
Default Find reason for heavy load

Hi,

> last time I saw something like that, it was a bunch of chinese 'bots'
> hammering on my public services like ssh.
>another admin had turned
> pop3 on too, this created a very heavy load yet they didn't show up in
> top (bunches of pop3 and ssh processes showed up in ps -auxww,
> however, plug netstat -an

Unfortunately the server is meant for web/email purposes so I can't
turn off pop3/smtp. Naturally ps shows up a lot of httpd/mysql &
exim/dovecot processes but a cursory glance doesn't see any suspicious
IPs.

Similarly, I did a quick look at netstat -an and most of the IP are
from local ISP that my clients are using.

One thing that occurred to me is, does using iptables to block smtp
attempt uses more "system" resources as opposed to letting the bot
flood my smtp logs with pointless attempts?
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-30-2009, 05:05 AM
Noob Centos Admin
 
Default Find reason for heavy load

Hi,

> Try blocking the IPs on the router and see if that helps.

Unfortunately the server's in a DC so the router is not under our control.

> You can also run iostat and look at the disk usage which also
> generates load.

I did try iostat and its iowait% did coincide with top's report, which
is basically in the low 1~2%.

However, iostat reports much lower %user and $system compared to top
running at the same time so I'm not quite sure if I can rely on its
figures.

> How many cores does your machine have? Load avg is calculated for a
> single core, so a quad core would reach 100% utilization at a load of
> 4, but high iowaits can generate an artificially high load avg as well
> (and why one sees greater than 100% utilization).

It's a dual core that's why I was getting concerned since loads above
2.0 would imply the system's processing capacity was apparently maxed.
However, load and percentages don't add up.

For example, now I'm seeing
top - 14:04:30 up 171 days, 7:14, 1 user, load average: 3.33, 3.97, 3.81
Tasks: 246 total, 2 running, 236 sleeping, 0 stopped, 8 zombie
Cpu(s): 13.3%us, 16.0%sy, 0.0%ni, 67.5%id, 3.0%wa, 0.0%hi, 0.2%si, 0.0%st

iostat
Linux 2.6.18-128.1.16.el5xen 12/30/2009
avg-cpu: %user %nice %system %iowait %steal %idle
3.28 0.20 1.16 2.38 0.01 92.97


> I really wish load would be broken down as CPU/memory/disk instead of
> the ambiguous load avg, and show network read/write utilization in
> ifconfig.

Totally agreed. All the load number is doing is telling me something
is using up resources somewhere but not a single clue otherwise!
Confusing, frustrating and worrying at the same time
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-30-2009, 05:21 AM
John R Pierce
 
Default Find reason for heavy load

Noob Centos Admin wrote:
> However, iostat reports much lower %user and $system compared to top
> running at the same time so I'm not quite sure if I can rely on its
> figures.
> ...
> iostat
> Linux 2.6.18-128.1.16.el5xen 12/30/2009
> avg-cpu: %user %nice %system %iowait %steal %idle
> 3.28 0.20 1.16 2.38 0.01 92.97
>



iostat, if run with no parameters shows the average since reboot or
statistics reset.

run `iostat -x 5` to a) show details on all devices, and B) show 5
second samples. ignore the first output as thats average. the 2nd
and beyond outputs represent 5 second samples.


note, btw, 'load average' isn't CPU usage, its the number of processes
that are waiting to run. a load average of 8 means there are 8
processes waiting to use system resources. this does include processes
in iowait, but doesn't include processes that are sleeping on semaphores
and such, so it can be quite a lot higher than the cpu workload.




_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-30-2009, 05:24 AM
Ross Walker
 
Default Find reason for heavy load

On Dec 30, 2009, at 1:05 AM, Noob Centos Admin
<centos.admin@gmail.com> wrote:

> Hi,
>
>> Try blocking the IPs on the router and see if that helps.
>
> Unfortunately the server's in a DC so the router is not under our
> control.

That sucks, oh well.

>> You can also run iostat and look at the disk usage which also
>> generates load.
>
> I did try iostat and its iowait% did coincide with top's report, which
> is basically in the low 1~2%.
>
> However, iostat reports much lower %user and $system compared to top
> running at the same time so I'm not quite sure if I can rely on its
> figures.

Yes, I'm not sure iostat's CPU numbers represent the full CPU
utilization, or only the CPU utilization for IO.

>> How many cores does your machine have? Load avg is calculated for a
>> single core, so a quad core would reach 100% utilization at a load of
>> 4, but high iowaits can generate an artificially high load avg as
>> well
>> (and why one sees greater than 100% utilization).
>
> It's a dual core that's why I was getting concerned since loads above
> 2.0 would imply the system's processing capacity was apparently maxed.
> However, load and percentages don't add up.

They never do because of the time scaled averages.

> For example, now I'm seeing
> top - 14:04:30 up 171 days, 7:14, 1 user, load average: 3.33,
> 3.97, 3.81
> Tasks: 246 total, 2 running, 236 sleeping, 0 stopped, 8 zombie
> Cpu(s): 13.3%us, 16.0%sy, 0.0%ni, 67.5%id, 3.0%wa, 0.0%hi,
> 0.2%si, 0.0%st
>
> iostat
> Linux 2.6.18-128.1.16.el5xen 12/30/2009
> avg-cpu: %user %nice %system %iowait %steal %idle
> 3.28 0.20 1.16 2.38 0.01 92.97
>
>
>> I really wish load would be broken down as CPU/memory/disk instead of
>> the ambiguous load avg, and show network read/write utilization in
>> ifconfig.
>
> Totally agreed. All the load number is doing is telling me something
> is using up resources somewhere but not a single clue otherwise!
> Confusing, frustrating and worrying at the same time

Maybe someone could write a command-line utility that outputs the
system load broken down into CPU/memory/disk/network. Call it
'sysload' and take the system configuration into account.

Take a look at your iptables setup, make sure the blocked ip rules are
checked first before any other and drop the packets without any icmp
(give em a black hole to stare at).

-Ross
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-30-2009, 06:16 AM
Christoph Maser
 
Default Find reason for heavy load

Am Mittwoch, den 30.12.2009, 05:44 +0100 schrieb Noob Centos Admin:
> since initially it seems like the high load may be due to I/O wait

Maybe this will help you to identify the IO loading process:

http://dag.wieers.com/blog/red-hat-backported-io-accounting-to-rhel5

Chris


financial.com AG

Munich head office/Hauptsitz München: Maria-Probst-Str. 19 | 80939 München | Germany
Frankfurt branch office/Niederlassung Frankfurt: Messeturm | Friedrich-Ebert-Anlage 49 | 60327 Frankfurt | Germany
Management board/Vorstand: Dr. Steffen Boehnert | Dr. Alexis Eisenhofer | Dr. Yann Samson | Matthias Wiederwach
Supervisory board/Aufsichtsrat: Dr. Dr. Ernst zur Linden (chairman/Vorsitzender)
Register court/Handelsregister: Munich – HRB 128 972 | Sales tax ID number/St.Nr.: DE205 370 553
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-30-2009, 04:09 PM
Thomas Harold
 
Default Find reason for heavy load

On 12/29/2009 11:44 PM, Noob Centos Admin wrote:
> My Centos 5 server has seen the average load jumped through the roof
> recently despite having no major additional clients placed on it.
> Previously, I was looking at an average of less than 0.6 load, I had a
> monitoring script that sends an email warning me if the current load
> stayed above 0.6 for more than 2 minutes. This script used to trigger
> perhaps once an hour during peak periods. Even so, I seldom see numbers
> higher than 1.x
>

You should also try out "atop" instead of just using top. The major
advantage is that it gives you more information about the disk and
network utilization.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-30-2009, 08:44 PM
Ugo Bellavance
 
Default Find reason for heavy load

On 2009-12-29 23:44, Noob Centos Admin wrote:
> My Centos 5 server has seen the average load jumped through the roof
> recently despite having no major additional clients placed on it.
> Previously, I was looking at an average of less than 0.6 load, I had a
> monitoring script that sends an email warning me if the current load
> stayed above 0.6 for more than 2 minutes. This script used to trigger
> perhaps once an hour during peak periods. Even so, I seldom see numbers
> higher than 1.x
>
> On 4th Dec, somebody from an Indian IP range started hammering my SMTP
> service, attempting to use it as an open relay. Naturally that didn't
> work and only end up budging my typical 400KB daily log report into
> 2MB~4MB affairs.
>
> After observing a few days to determine the IP range, I started blocking
> the Indian subnet with apf. Initially I had problems with getting apf to
> wok properly but after a couple of days managed to get the block working
> and my daily log went back down to expected size when all those
> connection attempts disappear from exim's log.
>
> Now this is when my server load started to shoot through the roof with
> figures like 8.64 5.90 3.62 being reported by my monitoring script,
> triggering so often. I had to raise my threshold to 1.6 to keep my own
> script from spamming myself.
>
> I've tried changing several things on the server, since initially it
> seems like the high load may be due to I/O wait. So I turning off
> non-essential services like OpenNMS to see if that had any effect. I
> also turned off apf and inserted rules manually into iptables to reduce
> the number of iptable rules the system has to process.
>
> All that doesn't seem to help much, I'm still getting consistent server
> loads in the 2.x to 3.x range almost all the time.
>
> The problem is using top, none of my processes are showing abnormal
> CPU%, most are well under 5%, manually adding them up doesn't equate the
> 200% to 300% the load figures of 2.x and 3.x are indicating.
>
> Even top's own summary says CPU % is in the 20~30% range, what's
> worrying is the System% is also in the same range. I have no idea what
> is "system" doing since it appears that anything running inside the
> kernel is lumped under "system". Or why even totalling both % up, I
> would expect 50~60% to translate to the expected load of 0.5~0.6 yet
> system load stats is 5x what's expected.
>
> I've installed utilities like dstat to try to see if I can figure out
> which process is making the system calls that is clogging up the server
> but either I don't understand it or it's not the right tool.
>
> So I'll appreciate some advice on how/what should I do next to identify
> the cause. Thanks in advance!

Dstat could at least tell you if your problem is CPU or I/O.

Even better, run

vmstat 2 10

Look at the first two columns. What column have higher numbers? If r,
you're CPU-bound. If b, you're I/O bound.

If you're I/O bound, I suggest you use atop to determine which processes
take disk time.

You can also use iostat -x 2 10.

I really suggest you read on vmstat and iostat, they will always be helpful.

Did you check if you have a defect disk or a rebuilding array? That
could be the cause.

Regards,

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-31-2009, 08:14 AM
Noob Centos Admin
 
Default Find reason for heavy load

Hi,

> > since initially it seems like the high load may be due to I/O wait
> Maybe this will help you to identify the IO loading process:
>
> http://dag.wieers.com/blog/red-hat-backported-io-accounting-to-rhel5

Thanks for the suggestion, I did install dstat earlier while trying to
figure things out on my own. However, I think my kernel being the
older version does not support the latest feature the website was
pointing out. Given that it's a live server not within physical touch,
I'm a little wary of doing kernel updates that might just kill it

I'll try other methods first and see if they help, if not, I'll
probably have to bite the bullet and do it over a weekend where I get
more time to repair any inadvertent damage.
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 06:32 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org