FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Red Hat Linux

 
 
LinkBack Thread Tools
 
Old 10-04-2010, 10:26 PM
Kenneth Kirchner
 
Default RHELv4 and v5 - So slow as to be unusable.

I would recommend installing some kind of performance monitoring software like Nagios or just SNMPd and Cacti. These will track the performance of your machine and let you see memory, cpu, disk I/O, processes, etc over time to help identify what is going on. These arent system intensive and give you much better visibility when problems like this do occur. There are many benefits.

My two cents anyway...

-Ken

On Oct 4, 2010, at 10:58 AM, Gary E Barnes wrote:

> The past week I upgraded our RHELv3 machines to v4. Previously we had
> several v3's, one v4, and one v5. The v5 has never worked. Now the new
> v4's are acting up.
>
> Boot the machine, things are fine. Wait overnight and the machine may
> take ten minutes to unlock the screen, may take several 10's of seconds to
> do an ls, and generally simply isn't usable.
>
> The v4's if you reboot them seem to be fine for the day.
> The v5 if you reboot it is fine for maybe 15 minutes.
>
> The v4's, there will be a load average of 3 to 4, but top says nothing
> whatsoever (other than top and the xterm) is running.
> The v5, there will be a load average of 0.1 or less and top again says
> nothing is running.
>
> SELinux is turned off. Firewall is turned off. I've even tried turning
> off every service that isn't vital to being able to simply boot the
> machines.
>
> Any ideas?
>
> Gary
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list


--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 10-04-2010, 10:55 PM
James Jones
 
Default RHELv4 and v5 - So slow as to be unusable.

I agree with Ken, you need to figure out what is eating your lunch, as they
say. You have several programs such as top, and system monitor that can
give you some high level look into what is going on. Also, how much memory
is on the machines and what type of cpus are installed.

If you can provide some info either from top or system monitor that may help
in providing some additional assistance also.

jim

On Mon, Oct 4, 2010 at 2:26 PM, Kenneth Kirchner <ken@kirchners.com> wrote:

> I would recommend installing some kind of performance monitoring software
> like Nagios or just SNMPd and Cacti. These will track the performance of
> your machine and let you see memory, cpu, disk I/O, processes, etc over time
> to help identify what is going on. These arent system intensive and give
> you much better visibility when problems like this do occur. There are many
> benefits.
>
> My two cents anyway...
>
> -Ken
>
> On Oct 4, 2010, at 10:58 AM, Gary E Barnes wrote:
>
> > The past week I upgraded our RHELv3 machines to v4. Previously we had
> > several v3's, one v4, and one v5. The v5 has never worked. Now the new
> > v4's are acting up.
> >
> > Boot the machine, things are fine. Wait overnight and the machine may
> > take ten minutes to unlock the screen, may take several 10's of seconds
> to
> > do an ls, and generally simply isn't usable.
> >
> > The v4's if you reboot them seem to be fine for the day.
> > The v5 if you reboot it is fine for maybe 15 minutes.
> >
> > The v4's, there will be a load average of 3 to 4, but top says nothing
> > whatsoever (other than top and the xterm) is running.
> > The v5, there will be a load average of 0.1 or less and top again says
> > nothing is running.
> >
> > SELinux is turned off. Firewall is turned off. I've even tried turning
> > off every service that isn't vital to being able to simply boot the
> > machines.
> >
> > Any ideas?
> >
> > Gary
> > --
> > redhat-list mailing list
> > unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
> > https://www.redhat.com/mailman/listinfo/redhat-list
>
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>



--
James R. Jones
System Manager
UAF-BBC
PO Box 1070
Dillingham, AK 99576
907-842-8312
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 10-04-2010, 11:12 PM
"Geofrey Rainey"
 
Default RHELv4 and v5 - So slow as to be unusable.

I don't know if anyone has said this yet, but have you installed the
sysstat package and used the "sar" utility? Perhaps you've got I/O
issues which sar will reveal.

-----Original Message-----
From: redhat-list-bounces@redhat.com
[mailto:redhat-list-bounces@redhat.com] On Behalf Of Kenneth Kirchner
Sent: Tuesday, 5 October 2010 11:27 a.m.
To: General Red Hat Linux discussion list
Subject: Re: RHELv4 and v5 - So slow as to be unusable.

I would recommend installing some kind of performance monitoring
software like Nagios or just SNMPd and Cacti. These will track the
performance of your machine and let you see memory, cpu, disk I/O,
processes, etc over time to help identify what is going on. These arent
system intensive and give you much better visibility when problems like
this do occur. There are many benefits.

My two cents anyway...

-Ken

On Oct 4, 2010, at 10:58 AM, Gary E Barnes wrote:

> The past week I upgraded our RHELv3 machines to v4. Previously we had

> several v3's, one v4, and one v5. The v5 has never worked. Now the
new
> v4's are acting up.
>
> Boot the machine, things are fine. Wait overnight and the machine may

> take ten minutes to unlock the screen, may take several 10's of
seconds to
> do an ls, and generally simply isn't usable.
>
> The v4's if you reboot them seem to be fine for the day.
> The v5 if you reboot it is fine for maybe 15 minutes.
>
> The v4's, there will be a load average of 3 to 4, but top says nothing

> whatsoever (other than top and the xterm) is running.
> The v5, there will be a load average of 0.1 or less and top again says

> nothing is running.
>
> SELinux is turned off. Firewall is turned off. I've even tried
turning
> off every service that isn't vital to being able to simply boot the
> machines.
>
> Any ideas?
>
> Gary
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list


--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
================================================== ========
For more information on the Television New Zealand Group, visit us
online at tvnz.co.nz
================================================== ========
CAUTION: This e-mail and any attachment(s) contain information that
is intended to be read only by the named recipient(s). This information
is not to be used or stored by any other person and/or organisation.


--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 10-06-2010, 07:22 PM
Gary E Barnes
 
Default RHELv4 and v5 - So slow as to be unusable.

The machines are both Pentium-4's, around 6 years old, 2Gb memory and IDE
disks. Runlevel is 6 (full GUI). No obvious errors are showing up in
/var/log/messages or in dmesg. I have three machines in another room that
are the same model, upgraded the same way, that seem to be happy. Those
machines are only used remotely and not generally from their consoles.

"top" says that nothing is going on although the load average is 3+.
"sar" also says that nothing is going on.

Yesterday I turned off the BIOS power management (so no disks spin down
and
no monitors turn off and such). I also changed the /etc/ntp.conf to more
closely match the latest ".rpmnew" version from Red Hat. It references
our
company's itnernal ntp servers but otherwise it is out-of-the-box from Red
Hat. And I thought that maybe I had found something in doing that. The
machines ran fine all the rest of the day.

Then at about 7:18pm last night both machines essentially stopped working.
I had "sar" running on the one machine dumping data every 30 seconds.
According to the sar output, at about 7:01pm last night that machine
essentially stopped having any work to do. Disk activity went to near
zero,
the machine went to 99.99% idle, there was network activity every once in
a
while, the occassion hint of I/O activity but nothing else.

The user on that machine was logged in remotely and they said that at
about
7:15pm last night the connection suddenly got so slow that they couldn't
work any more. The other machine was not actively in use at the time
although the user was logged in and the screen locked.

This morning coming in, both machines still thought it was about 7:18pm.
The "date" was the day before at 7:18pm (give or take depending on which
machine) and the /usr/sbin/hwclock was correct about the actual time.

As a separate problem, last evening I discovered that NFS mount points
being
exported from all of the RHELv4 machines can be mounted by Solaris v6, v7,
v8, and v9 machines but Solaris v10 machines, both Sparc and X86 based,
are
unable to mount the RHELv4 mount points. And there are no errors in
/var/log/messages. HP, SGI, and AIX machines can mount those points, but
not Solaris 10.

I may have "fixed" the RHELv5 version of the problem. I had noticed that
netstat was reporting around 2200 TIME_WAIT sockets, nearly all for the
NIS
or DNS servers. I find that by setting the systcl tcp_tw_reuse flag to 1
(default is 0 on RHELv3, RHELv4, and RHELv5) that the number of sockets in
TIME_WAIT drops to what I see on other machines *and* the RHELv5 machine
no
longer develops its version of the slowdown problem.

If I can ever get one of the problem RHELv4 machines to run a netstat
while
the slowdown effect is in effect I'll have to see if something similar
helps
there. It is hard to get their "attention" when the slowdown effect is
going
on. It can be done but have coffee handy.

Gary

> ------------------------------------------------------------
> From: "Marti, Robert" <RJM002@shsu.edu>
>
> What kind of disks are you using? I'd tend to look at IO issues with
that kind of description.
> ------------------------------------------------------------
> From: m.roth@5-cent.us
>
> What runlevel? Any clues in /var/log/messages? Or dmesg?
> ------------------------------------------------------------
> From: "Mr. Paul M. Whitney" <paul.whitney@me.com>
>
> Do you have any other software installed? anti-virus? other third-party
> software? It could be a memory leak.
> ------------------------------------------------------------
> From: Kenneth Kirchner <ken@kirchners.com>
>
> I would recommend installing some kind of performance monitoring
software
> like Nagios or just SNMPd and Cacti. These will track the performance
of
> your machine and let you see memory, cpu, disk I/O, processes, etc over
time
> to help identify what is going on. These arent system intensive and
give
> you much better visibility when problems like this do occur. There are
many
> benefits.
> ------------------------------------------------------------
> From: James Jones <jrjones@alaska.edu>
>
> I agree with Ken, you need to figure out what is eating your lunch, as
they
> say. You have several programs such as top, and system monitor that can
> give you some high level look into what is going on. Also, how much
memory
> is on the machines and what type of cpus are installed.
>
> If you can provide some info either from top or system monitor that may
help
> in providing some additional assistance also.
> ------------------------------------------------------------
> From: "Geofrey Rainey" <Geofrey.Rainey@tvnz.co.nz>
>
> I don't know if anyone has said this yet, but have you installed the
> sysstat package and used the "sar" utility? Perhaps you've got I/O
> issues which sar will reveal.
> ------------------------------------------------------------
> [mailto:redhat-list-bounces@redhat.com] On Behalf Of Kenneth Kirchner
>
> I would recommend installing some kind of performance monitoring
> software like Nagios or just SNMPd and Cacti. These will track the
> performance of your machine and let you see memory, cpu, disk I/O,
> processes, etc over time to help identify what is going on. These arent
> system intensive and give you much better visibility when problems like
> this do occur. There are many benefits.
> ------------------------------------------------------------
> On Oct 4, 2010, at 10:58 AM, Gary E Barnes wrote:
>
> > The past week I upgraded our RHELv3 machines to v4. Previously we had
> > several v3's, one v4, and one v5. The v5 has never worked. Now the
new
> > v4's are acting up.
> >
> > Boot the machine, things are fine. Wait overnight and the machine may
> > take ten minutes to unlock the screen, may take several 10's of
seconds to
> > do an ls, and generally simply isn't usable.
> >
> > The v4's if you reboot them seem to be fine for the day.
> > The v5 if you reboot it is fine for maybe 15 minutes.
> >
> > The v4's, there will be a load average of 3 to 4, but top says nothing
> > whatsoever (other than top and the xterm) is running.
> > The v5, there will be a load average of 0.1 or less and top again says
> > nothing is running.
> >
> > SELinux is turned off. Firewall is turned off. I've even tried
turning
> > off every service that isn't vital to being able to simply boot the
> > machines.
> >
> > Any ideas?
> >
> > Gary
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 10-07-2010, 12:20 PM
Laszlo Beres
 
Default RHELv4 and v5 - So slow as to be unusable.

On Wed, Oct 6, 2010 at 9:22 PM, Gary E Barnes <gebarnes@us.ibm.com> wrote:

> "top" says that nothing is going on although the load average is 3+.
> "sar" also says that nothing is going on.

There's no such thing "nothing is going on". You should see CPU
status, process status, etc. vmstat also can give you some hints about
the system health.

--
László Béres* * * * * * Unix system engineer
http://www.google.com/profiles/beres.laszlo

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 10-07-2010, 07:17 PM
Gary E Barnes
 
Default RHELv4 and v5 - So slow as to be unusable.

> From: Laszlo Beres <laszlo@beres.me>
> Subject: Re: RHELv4 and v5 - So slow as to be unusable.
>
> On Wed, Oct 6, 2010 at 9:22 PM, Gary E Barnes <gebarnes@us.ibm.com>
wrote:
>
> > "top" says that nothing is going on although the load average is 3+.
> > "sar" also says that nothing is going on.
>
> There's no such thing "nothing is going on". You should see CPU
> status, process status, etc. vmstat also can give you some hints about
> the system health.

Oh but there is such a thing. I have one of the machines in this weird
slowdown state right at this moment. It started around 4:45pm yesterday,
after running perfectly for about 3 hours 15 minutes, and I left it
overnight to see if maybe it would get "over it" by itself. Hasn't
happened though.

Here is the very first header from the "top" display of a top I started
just for this example.

top - 19:18:20 up 4:33, 4 users, load average: 3.56, 3.58, 3.54
Tasks: 159 total, 16 running, 143 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.3% us, 0.4% sy, 2.9% ni, 95.1% id, 0.3% wa, 0.0% hi, 0.0%
si
Mem: 2586400k total, 1880032k used, 706368k free, 193036k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1220324k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5813 root 16 0 164m 31m 6124 R 0.2 1.3 25:05.54 X
6650 geb 16 0 4048 2116 1332 S 0.2 0.1 0:00.86 xalarm
27229 geb2 16 0 3008 960 696 R 0.2 0.0 0:00.01 top
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init

And here is the first refresh of that display (I'm capturing this in an
Emacs buffer if you're curious).

Tasks: 161 total, 3 running, 158 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.9% us, 0.5% sy, 0.1% ni, 98.5% id, 0.0% wa, 0.0% hi, 0.0%
si
Mem: 2586400k total, 1885680k used, 700720k free, 193036k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1220584k cached

Here is the second, notice the 100% idle value.

Tasks: 161 total, 2 running, 159 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0%
si
Mem: 2586400k total, 1885696k used, 700704k free, 193044k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1220576k cached

There is memory available. There is swap available.
Idle occasionally drops to 99.9%.

Tasks: 161 total, 2 running, 159 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0% us, 0.1% sy, 0.0% ni, 99.8% id, 0.0% wa, 0.0% hi, 0.0%
si
Mem: 2586400k total, 1885464k used, 700936k free, 193076k buffers
Swap: 4192956k total, 0k used, 4192956k free, 1220544k cached

The processes that show up in the first line or two of top are things such
as:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27231 geb 16 0 28316 11m 8412 S 9.8 0.5 0:00.41
gnome-terminal
5813 root 16 0 165m 31m 6316 S 3.8 1.3 25:05.70 X
6697 geb 16 0 22280 10m 7644 S 1.9 0.4 0:18.69 wnck-applet

3817 rpc 15 0 2336 592 484 S 0.2 0.0 0:01.60 portmap
6472 geb 16 0 3544 1472 876 S 0.2 0.1 0:02.08 gam_server
6650 geb 16 0 4048 2116 1332 S 0.2 0.1 0:00.87 xalarm
3805 root 16 0 2492 312 220 S 0.2 0.0 0:00.55 irqbalance
4867 root 16 0 2800 844 624 D 0.2 0.0 0:03.32 rpc.mountd
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init
6451 geb 16 0 12764 7512 1688 S 0.1 0.3 0:01.66 gconfd-2
27229 geb2 16 0 3012 1044 772 R 0.1 0.0 0:00.05 top
27229 geb2 16 0 3012 1044 772 R 0.5 0.0 0:00.07 top
4996 root 16 0 4772 3104 1536 S 0.2 0.1 0:02.47 hald
6472 geb 16 0 3544 1472 876 S 0.2 0.1 0:02.09 gam_server
1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init
27229 geb2 16 0 3012 1044 772 R 0.5 0.0 0:00.09 top
23574 geb 16 0 145m 68m 26m S 0.2 2.7 2:44.86 firefox-bin

1 root 16 0 2724 512 436 S 0.0 0.0 0:00.68 init

As you can see, there is essentially "nothing going on".

An yet the machine is very unresponsive. If I run a command that hasn't
been run in a while (don't know the time frame, but it seems to be only
minutes) then the command takes >30 seconds to execute. For example, I
just did the "date" command and when it finally responded I did the
hwclock command. Both took >30 seconds to run. Now if I repeat those
commands they execute immediately. I'm presuming that this is due to
executable file caching in the operating system. If I wait a while then
the >30 second wait will reappear for those same commands. Presumably
they've left that cache.

This behavior is observable both in xterm's on the console and also
through ssh connections from another machine.

Programs that are already loaded and running seem to be pretty much ok, at
least until they need to go read some new file or write some new file,
then they hang for a while and eventually get going again.

If I run sar (sysstat package) I get essentially the same picture. From a
"sar -A 30 4" here are the averages for two minutes. Load average 3+ and
>99% idle. Nearly no I/O of any sort; not 0 but very low amounts for two
minutes.

Average: proc/s
Average: 0.12

Average: cswch/s
Average: 48.90

Average: CPU %user %nice %system %iowait %idle
Average: all 0.29 0.01 0.12 0.07 99.51
Average: 0 0.25 0.01 0.13 0.00 99.60
Average: 1 0.33 0.01 0.11 0.13 99.42

Average: INTR intr/s
Average: sum 11.87

Average: pgpgin/s pgpgout/s fault/s majflt/s
Average: 0.01 8.55 34.76 0.00

Average: pswpin/s pswpout/s
Average: 0.00 0.00

Average: tps rtps wtps bread/s bwrtn/s
Average: 1.35 0.00 1.35 0.02 17.11

Average: frmpg/s bufpg/s campg/s
Average: -0.73 0.02 -0.02

Average: CPU i000/s i001/s i008/s i009/s i012/s i014/s i015/s
i177/s i185/s i193/s i201/s i209/s
Average: 0 0.02 0.00 0.00 0.00 0.00 0.00 0.09
7.76 0.00 0.00 0.00 0.00
Average: 1 0.01 2.65 0.00 0.00 0.00 1.35 0.00
0.00 0.00 0.00 0.00 0.00

Average: IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s
txcmp/s rxmcst/s
Average: lo 0.00 0.00 0.12 0.12 0.00
0.00 0.00
Average: eth0 3.35 2.25 364.83 196.51 0.00
0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00
0.00 0.00

Average: IFACE rxerr/s txerr/s coll/s rxdrop/s txdrop/s
txcarr/s rxfram/s rxfifo/s txfifo/s
Average: lo 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
Average: eth0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00

Average: DEV tps rd_sec/s wr_sec/s
Average: dev1-0 0.00 0.00 0.00
Average: dev1-1 0.00 0.00 0.00
Average: dev1-2 0.00 0.00 0.00
Average: dev1-3 0.00 0.00 0.00
Average: dev1-4 0.00 0.00 0.00
Average: dev1-5 0.00 0.00 0.00
Average: dev1-6 0.00 0.00 0.00
Average: dev1-7 0.00 0.00 0.00
Average: dev1-8 0.00 0.00 0.00
Average: dev1-9 0.00 0.00 0.00
Average: dev1-10 0.00 0.00 0.00
Average: dev1-11 0.00 0.00 0.00
Average: dev1-12 0.00 0.00 0.00
Average: dev1-13 0.00 0.00 0.00
Average: dev1-14 0.00 0.00 0.00
Average: dev1-15 0.00 0.00 0.00
Average: dev3-0 1.35 0.02 17.11
Average: dev3-1 0.00 0.00 0.00
Average: dev3-2 0.13 0.02 3.35
Average: dev3-3 0.00 0.00 0.00
Average: dev3-4 0.00 0.00 0.00
Average: dev3-5 1.22 0.01 13.75
Average: dev22-64 0.00 0.00 0.00
Average: dev22-65 0.00 0.00 0.00
Average: dev22-0 0.00 0.00 0.00
Average: dev2-0 0.00 0.00 0.00
Average: dev9-0 0.00 0.00 0.00

Average: kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree
kbswpused %swpused kbswpcad
Average: 696608 1889792 73.07 193202 1220938 4192956
0 0.00 0

Average: dentunusd file-sz inode-sz super-sz %super-sz dquot-sz
%dquot-sz rtsig-sz %rtsig-sz
Average: 216815 3285 185128 0 0.00 0
0.00 0 0.00

Average: totsck tcpsck udpsck rawsck ip-frag
Average: 343 56 8 0 0

Average: runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15
Average: 14 186 3.35 3.61 3.59

The machine entered this state at about 4:45pm yesterday afternoon. It is
now 12:00 noon the next day.
The "date" command says that the system thinks that the time is 7:26PM
yesterday.
In the last 47 minutes the system clock has gained only 6 minutes. A rate
of somewhere around 7.8.
Another interesting little symptom, when this slowdown is in effect the
keyboard autorepeat on keys stops working.

If this was the only machine doing this I'd think it was a hardware
problem. But (a) it isn't the only machine and (b) while it seems to
always happen to these machines, it is only after running for at least a
few hours without problems.

Gary
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 10-07-2010, 08:42 PM
Gary E Barnes
 
Default RHELv4 and v5 - So slow as to be unusable.

> From: lists-redhat <replies-lists-b3z1-redhat@listmail.innovate.net>
> Date: 10/07/2010 12:44 PM
> Subject: Re: RHELv4 and v5 - So slow as to be unusable.
>
> I think you indicated in one of your earlier messages that you have
> ntpd running on these machines. Have you tried turning it off (and
> killing the running process) to see what happens?
>
> I don't normally run ntpd on my laptops but one got it
> installed/enabled on a recent install. When not connected to a
> network I was seeing things somewhat like what you're seeing. The
> load goes up, but nothing shows in "top", and the keyboard become
> fairly unresponsive. Killing off ntpd seemed to clean up the problem.
>
> If that fixes it, then you can try to debug what's going on, e.g.,
> confirming that they really do have access to your in-house ntp
> server.
>
> Of course, it may not be this at all, but since I'm not seeing a lot
> of ideas I thought I'd pass along this one.
>
> - Richard

ntp is on my list of suspects. I've been considering running it under a
tracing program so that I can watch the system calls and try to correlate
them with the problem when it occurs. But killing ntpd, or doing
"/etc/init.d/ntpd stop" doesn't fix the problem. I'll have to try turning
it off, reboot, and run for a day without it and see what happens. One
frustration with this problem is that I only get one or two shots a day at
"seeing" anything.

Gary
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 10-08-2010, 11:33 AM
Romeo Theriault
 
Default RHELv4 and v5 - So slow as to be unusable.

>
> ntp is on my list of suspects. *I've been considering running it under a
> tracing program so that I can watch the system calls and try to correlate
> them with the problem when it occurs. *But killing ntpd, or doing
> "/etc/init.d/ntpd stop" doesn't fix the problem. *I'll have to try turning
> it off, reboot, and run for a day without it and see what happens. *One
> frustration with this problem is that I only get one or two shots a day at
> "seeing" anything.
>
> * * * *Gary


Odd problem, I might also try booting into another possibly older
kernel to see if that affects things at all.


--
Romeo Theriault

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 10-08-2010, 04:22 PM
Peter Skensved
 
Default RHELv4 and v5 - So slow as to be unusable.

>
> > "top" says that nothing is going on although the load average is 3+.
> > "sar" also says that nothing is going on.
>


An unresponsive DNS server would slow NFS to a crawl. If that is the case
you should probably put the relevant IP addreses in /etc/hosts .

Every time you type a command the shell searches all paths to find the
right executable. So it may have to wait for DNS and NFS. The way to
check whether this is happening or not is to type a command, time the response
and then type the same command using the full path. If there is a huge difference
in response time you're having an NFS problem.

peter


----

Peter Skensved Email : peter at nospam dot SNO dot Phy at QueensU dot CA
Dept. of Physics,
Queen's University,
Kingston, Ontario,
Canada

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 10-08-2010, 05:34 PM
Yong Huang
 
Default RHELv4 and v5 - So slow as to be unusable.

Gary,

As you proved, not all performance problems can be identified by
performance monitoring tools. In this case, "performance" is not a good
word. "Locking" may be better.

We recently had a problem with TrendMicro on our RHEL 5 box. cp a 1GB
file took 35 minutes for the prompt to come back, even though the copied
file started to have the same checksum and size after about 1 minute.
/proc/<cp pid>/status shows disk sleep state. The cp command is not
killable, indicating it's in kernel mode not coming back up. strace or
pstack the process hangs (but strace or pstack is killable). The message
in /var/log/messages sheds light on the problem:

Sep 26 11:02:11 ourhostname kernel: INFO: task cp:10658 blocked for more than 120 seconds.
Sep 26 11:02:11 ourhostname kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...
Sep 26 11:02:11 ourhostname kernel: Call Trace:
Sep 26 11:02:11 ourhostname kernel: [<ffffffff884a45a8>] :splxmod:closeHook+0x784/0x9d8

So some splxmod module's closeHook function is the suspect since it's at
the top of the call stack. Searching on Google indicates it's a module
in TrendMicro's software. We contacted them and they quickly provided a patch.

RHEL 4 doesn't have /proc/sys/kernel/hung_task_timeout_secs. I'm not sure
if the kernel can be reconfigured to add that. For those interested, the
source code is at
http://koders.com/c/fidFAF17DCD13DB287057ACC4136EEEFE2D9644BA9A.aspx

In your case, can you try pstack and strace on a simple process such as
date (both programs need to be installed)? And tell us /proc/<pid>/status.

Yong Huang




--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 

Thread Tools




All times are GMT. The time now is 02:46 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org