EXTERNAL: RHELv4 and v5 - So slow as to be unusable.
Interesting. You mentioned file IO maybe being related? Is this a custom kernel? Is it possible you're using a circuitous IO route to get to disk (like maybe IDE SCSI, or some such?)
-----Original Message-----
From: redhat-list-bounces@redhat.com [mailto:redhat-list-bounces@redhat.com] On Behalf Of Mohammad Zakaria
Sent: Wednesday, October 13, 2010 8:44 AM
To: General Red Hat Linux discussion list
Subject: EXTERNAL:Re: RHELv4 and v5 - So slow as to be unusable.
Hello Grey,
Are those machines having the same brand or HW??
and if you reconnect to your NTP server does the clock start counting right or at the same 7.8 rate??
try to reset one of the machines BIOS to its defaults and check the results??
have you tried to disable one of your processors cores and work with a single processor??
--- On Sat, 10/9/10, Mohammad Zakaria <myz_sa@yahoo.com> wrote:
From: Mohammad Zakaria <myz_sa@yahoo.com>
Subject: Re: RHELv4 and v5 - So slow as to be unusable.
To: "General Red Hat Linux discussion list" <redhat-list@redhat.com>
Date: Saturday, October 9, 2010, 12:58 PM
If you have one piece of RAM try to replace it and check your box status, or if
it is a combination of 2 sets, try the system performance with each RAM
separately, if there is any problem with your RAM HW you should detect that
easily and fix it.
________________________________
From: Gary E Barnes <gebarnes@us.ibm.com>
To: redhat-list@redhat.com
Sent: Thu, October 7, 2010 9:17:01 PM
Subject: Re: RHELv4 and v5 - So slow as to be unusable.
> From: Laszlo Beres <laszlo@beres.me>
> Subject: Re: RHELv4 and v5 - So slow as to be unusable.
>
> On Wed, Oct 6, 2010 at 9:22 PM, Gary E Barnes <gebarnes@us.ibm.com>
wrote:
>
> > "top" says that nothing is going on although the load average is 3+.
> > "sar" also says that nothing is going on.
>
> There's no such thing "nothing is going on". You should see CPU
> status, process status, etc. vmstat also can give you some hints about
> the system health.
Oh but there is such a thing. I have one of the machines in this weird
slowdown state right at this moment. It started around 4:45pm yesterday,
after running perfectly for about 3 hours 15 minutes, and I left it
overnight to see if maybe it would get "over it" by itself. Hasn't
happened though.
Here is the very first header from the "top" display of a top I started
just for this example.
The processes that show up in the first line or two of top are things such
as:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27231 geb 16 0 28316 11m 8412 S 9.8 0.5 0:00.41
gnome-terminal
5813 root 16 0 165m 31m 6316 S 3.8 1.3 25:05.70 X
6697 geb 16 0 22280 10m 7644 S 1.9 0.4 0:18.69 wnck-applet
3817 rpc 15 0 2336 592 484 S 0.2 0.0 0:01.60 portmap
6472 geb 16 0 3544 1472 876 S 0.2 0.1 0:02.08 gam_server
6650 geb 16 0 4048 2116 1332 S 0.2 0.1 0:00.87 xalarm
3805 root 16 0 2492 312 220 S 0.2 0.0 0:00.55 irqbalance
4867 root 16 0 2800 844 624 D 0.2 0.0 0:03.32 rpc.mountd
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init
6451 geb 16 0 12764 7512 1688 S 0.1 0.3 0:01.66 gconfd-2
27229 geb2 16 0 3012 1044 772 R 0.1 0.0 0:00.05 top
27229 geb2 16 0 3012 1044 772 R 0.5 0.0 0:00.07 top
4996 root 16 0 4772 3104 1536 S 0.2 0.1 0:02.47 hald
6472 geb 16 0 3544 1472 876 S 0.2 0.1 0:02.09 gam_server
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init
27229 geb2 16 0 3012 1044 772 R 0.5 0.0 0:00.09 top
23574 geb 16 0 145m 68m 26m S 0.2 2.7 2:44.86 firefox-bin
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init
As you can see, there is essentially "nothing going on".
An yet the machine is very unresponsive. If I run a command that hasn't
been run in a while (don't know the time frame, but it seems to be only
minutes) then the command takes >30 seconds to execute. For example, I
just did the "date" command and when it finally responded I did the
hwclock command. Both took >30 seconds to run. Now if I repeat those
commands they execute immediately. I'm presuming that this is due to
executable file caching in the operating system. If I wait a while then
the >30 second wait will reappear for those same commands. Presumably
they've left that cache.
This behavior is observable both in xterm's on the console and also
through ssh connections from another machine.
Programs that are already loaded and running seem to be pretty much ok, at
least until they need to go read some new file or write some new file,
then they hang for a while and eventually get going again.
If I run sar (sysstat package) I get essentially the same picture. From a
"sar -A 30 4" here are the averages for two minutes. Load average 3+ and
>99% idle. Nearly no I/O of any sort; not 0 but very low amounts for two
minutes.
The machine entered this state at about 4:45pm yesterday afternoon. It is
now 12:00 noon the next day.
The "date" command says that the system thinks that the time is 7:26PM
yesterday.
In the last 47 minutes the system clock has gained only 6 minutes. A rate
of somewhere around 7.8.
Another interesting little symptom, when this slowdown is in effect the
keyboard autorepeat on keys stops working.
If this was the only machine doing this I'd think it was a hardware
problem. But (a) it isn't the only machine and (b) while it seems to
always happen to these machines, it is only after running for at least a
few hours without problems.
Gary
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
* * *
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
10-13-2010, 02:45 PM
"Burke, Thomas G."
EXTERNAL: RHELv4 and v5 - So slow as to be unusable.
Oh, 2nd question. SSDs? There are issues there, too...
-----Original Message-----
From: redhat-list-bounces@redhat.com [mailto:redhat-list-bounces@redhat.com] On Behalf Of Mohammad Zakaria
Sent: Wednesday, October 13, 2010 8:44 AM
To: General Red Hat Linux discussion list
Subject: EXTERNAL:Re: RHELv4 and v5 - So slow as to be unusable.
Hello Grey,
Are those machines having the same brand or HW??
and if you reconnect to your NTP server does the clock start counting right or at the same 7.8 rate??
try to reset one of the machines BIOS to its defaults and check the results??
have you tried to disable one of your processors cores and work with a single processor??
--- On Sat, 10/9/10, Mohammad Zakaria <myz_sa@yahoo.com> wrote:
From: Mohammad Zakaria <myz_sa@yahoo.com>
Subject: Re: RHELv4 and v5 - So slow as to be unusable.
To: "General Red Hat Linux discussion list" <redhat-list@redhat.com>
Date: Saturday, October 9, 2010, 12:58 PM
If you have one piece of RAM try to replace it and check your box status, or if
it is a combination of 2 sets, try the system performance with each RAM
separately, if there is any problem with your RAM HW you should detect that
easily and fix it.
________________________________
From: Gary E Barnes <gebarnes@us.ibm.com>
To: redhat-list@redhat.com
Sent: Thu, October 7, 2010 9:17:01 PM
Subject: Re: RHELv4 and v5 - So slow as to be unusable.
> From: Laszlo Beres <laszlo@beres.me>
> Subject: Re: RHELv4 and v5 - So slow as to be unusable.
>
> On Wed, Oct 6, 2010 at 9:22 PM, Gary E Barnes <gebarnes@us.ibm.com>
wrote:
>
> > "top" says that nothing is going on although the load average is 3+.
> > "sar" also says that nothing is going on.
>
> There's no such thing "nothing is going on". You should see CPU
> status, process status, etc. vmstat also can give you some hints about
> the system health.
Oh but there is such a thing. I have one of the machines in this weird
slowdown state right at this moment. It started around 4:45pm yesterday,
after running perfectly for about 3 hours 15 minutes, and I left it
overnight to see if maybe it would get "over it" by itself. Hasn't
happened though.
Here is the very first header from the "top" display of a top I started
just for this example.
The processes that show up in the first line or two of top are things such
as:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27231 geb 16 0 28316 11m 8412 S 9.8 0.5 0:00.41
gnome-terminal
5813 root 16 0 165m 31m 6316 S 3.8 1.3 25:05.70 X
6697 geb 16 0 22280 10m 7644 S 1.9 0.4 0:18.69 wnck-applet
3817 rpc 15 0 2336 592 484 S 0.2 0.0 0:01.60 portmap
6472 geb 16 0 3544 1472 876 S 0.2 0.1 0:02.08 gam_server
6650 geb 16 0 4048 2116 1332 S 0.2 0.1 0:00.87 xalarm
3805 root 16 0 2492 312 220 S 0.2 0.0 0:00.55 irqbalance
4867 root 16 0 2800 844 624 D 0.2 0.0 0:03.32 rpc.mountd
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init
6451 geb 16 0 12764 7512 1688 S 0.1 0.3 0:01.66 gconfd-2
27229 geb2 16 0 3012 1044 772 R 0.1 0.0 0:00.05 top
27229 geb2 16 0 3012 1044 772 R 0.5 0.0 0:00.07 top
4996 root 16 0 4772 3104 1536 S 0.2 0.1 0:02.47 hald
6472 geb 16 0 3544 1472 876 S 0.2 0.1 0:02.09 gam_server
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init
27229 geb2 16 0 3012 1044 772 R 0.5 0.0 0:00.09 top
23574 geb 16 0 145m 68m 26m S 0.2 2.7 2:44.86 firefox-bin
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init
As you can see, there is essentially "nothing going on".
An yet the machine is very unresponsive. If I run a command that hasn't
been run in a while (don't know the time frame, but it seems to be only
minutes) then the command takes >30 seconds to execute. For example, I
just did the "date" command and when it finally responded I did the
hwclock command. Both took >30 seconds to run. Now if I repeat those
commands they execute immediately. I'm presuming that this is due to
executable file caching in the operating system. If I wait a while then
the >30 second wait will reappear for those same commands. Presumably
they've left that cache.
This behavior is observable both in xterm's on the console and also
through ssh connections from another machine.
Programs that are already loaded and running seem to be pretty much ok, at
least until they need to go read some new file or write some new file,
then they hang for a while and eventually get going again.
If I run sar (sysstat package) I get essentially the same picture. From a
"sar -A 30 4" here are the averages for two minutes. Load average 3+ and
>99% idle. Nearly no I/O of any sort; not 0 but very low amounts for two
minutes.
The machine entered this state at about 4:45pm yesterday afternoon. It is
now 12:00 noon the next day.
The "date" command says that the system thinks that the time is 7:26PM
yesterday.
In the last 47 minutes the system clock has gained only 6 minutes. A rate
of somewhere around 7.8.
Another interesting little symptom, when this slowdown is in effect the
keyboard autorepeat on keys stops working.
If this was the only machine doing this I'd think it was a hardware
problem. But (a) it isn't the only machine and (b) while it seems to
always happen to these machines, it is only after running for at least a
few hours without problems.
Gary
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
* * *
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list