FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 09-17-2012, 02:35 PM
Uwe Bolick
 
Default Problem with cpu time

Hi,

We have observed a strange behaviour on our compute-nodes after the
upgrade to squeeze and on new nodes freshly installed with squeeze.

All processes running longer than 24.8 days lead to "nonsense"
cpu-time. Below is an example output of "ps -u username f" over time:

[2012-05-29 05:49:33] 30590 ? R 35793:27 ./fortran_kdis 2 25
[2012-05-29 05:49:38] 30590 ? R 35793:32 ./fortran_kdis 2 25
[2012-05-29 05:49:43] 30590 ? R 35793:37 ./fortran_kdis 2 25
[2012-05-29 05:49:48] 30590 ? R 11129636:45 ./fortran_kdis 2 25
[2012-05-29 05:49:53] 30590 ? R 11129636:45 ./fortran_kdis 2 25
[2012-05-29 05:49:58] 30590 ? R 11129636:45 ./fortran_kdis 2 25
[2012-05-29 11:20:36] 30590 ? R 11129636:45 ./fortran_kdis 2 25

Several days later, the accumulated cpu time value remains the same.

The daily report of the "Grid Engine 2011.11" job scheduler for this
job shows:

...
...:86412.030000:6925734.008546:...
...:86380.140000:6923160.882762:...
...:86423.790000:6926644.923450:...
...:30016546.230000:2405779823.509468:... <---- day with jump
...:0.000000:0.000000:...
...:0.000000:0.000000:...
...:0.000000:0.000000:...
...:0.000000:0.000000:...
...:0.000000:0.000000:...
...:0.000000:0.000000::...
...:0.000000:0.000000:...
...:17112.340000:1371446.414438:...
...:86395.480000:6924655.520745:...
...:86411.810000:6926306.216313:...
...:86415.170000:6926575.536817:...
...:85071.220000:6818939.616130:...
...

The two numbers between the ... represent values for "ru_utime" and
"ru_stime". The accounting values for ru_utime the days before the
"jump" are correct but afterwards they are nonsense for some days and
than ok again (this job was running with 100% cpu usage all the
time!). But all values for ru_stime are looking strange. Keep in mind:
1 day == 86400 sec.

In addition for all jobs showing this behaviour after 35793:37, the
values for the accumulated cpu-usage differ for every job:

[2012-05-29 05:18:32] 30591 ? R 10557290:44 ./fortran_kdis 2 27
[2012-05-29 05:34:42] 30636 ? R 11129626:19 ./fortran_kdis 2 31
[2012-05-29 05:58:20] 30637 ? R 12274089:59 ./fortran_kdis 2 30
[2012-05-29 06:02:37] 30630 ? R 12274256:17 ./fortran_kdis 2 28
[2012-05-29 06:03:12] 30634 ? R 11129641:38 ./fortran_kdis 2 29
[2012-05-29 06:09:44] 30638 ? R 12274280:17 ./fortran_kdis 2 32
[2012-05-29 06:23:55] 30587 ? R 11701990:44 ./fortran_kdis 2 26

Used kernel and architecture are:
# uname -a
Linux warg09 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux

Any help to get rid of this issue, would be highly appreciated.

Thanks in advance...

--
Uwe Bolick
Zentrum für Astronomie und Astrophysik
Technische Universität Berlin
EW 8-1, Hardenbergstr. 36, D-10623 Berlin (Germany)


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120917143547.GA15147@astro.physik.tu-berlin.de">http://lists.debian.org/20120917143547.GA15147@astro.physik.tu-berlin.de
 

Thread Tools




All times are GMT. The time now is 04:27 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org