FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Red Hat Linux

 
 
LinkBack Thread Tools
 
Old 08-15-2011, 06:32 AM
 
Default Options to stop processes that can't be killed -9 other than reboot

Killall <process name >

-- mukesh
Sent from BlackBerry® on Airtel

-----Original Message-----
From: sunhux G <sunhux@gmail.com>
Sender: redhat-list-bounces@redhat.com
Date: Mon, 15 Aug 2011 10:53:45
To: General Red Hat Linux discussion list<redhat-list@redhat.com>
Reply-To: General Red Hat Linux discussion list <redhat-list@redhat.com>
Subject: Options to stop processes that can't be killed -9 other than reboot

Hi

I have 2 processes (shown by ps -ef below) which has 'jammed' the tape
drive below & I can't "kill -9" them.

Is there any way short of reboot to stop them, say "service xxx restart" or
anything else other than rebooting this Linux 4.x server? Since reboot
involves doing "service stop xxx" of various services, surely one of the
xxx must be able to stop the processes (just an educated guess). We
faced this issue with our Dataprotector quite often so frequent reboot
is not an option.

# ps -ef |grep -i bma |grep -v grep
root 10197 1 0 Aug13 ? 00:00:08 /opt/omni/lbin/vbda
-bmaname HP:Ultrium 4-SCSI_4 -type 2 -start 1313175661 -level 0
-access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
1313175612 -volume / -profile -no_lock -hlink -no_touch -no_encode
-no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
-report 0 -on_busy 2 -no_nthlink -archattr -share_info -objname 02
xxxdgjt1.ss.de:/ // / -no_aligned
root 23303 1 0 Aug13 ? 00:00:03 /opt/omni/lbin/vbda
-bmaname HP:Ultrium 4-SCSI_1 -type 2 -start 1313192083 -level 0
-access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
1313192026 -volume / -profile -no_lock -hlink -no_touch -no_encode
-no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
-report 0 -on_busy 2 -no_nthlink -archattr -share_info -objname 02
xxxdgjt1.ss.de:/ // / -no_aligned
root 25618 1 0 Aug13 ? 00:00:03 /opt/omni/lbin/vbda
-bmaname HP:Ultrium 4-SCSI_1 -type 2 -start 1313195066 -level 0
-access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
1313195016 -volume / -profile -no_lock -hlink -no_touch -no_encode
-no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
-report 0 -on_busy 2 -no_nthlink -archattr -share_info -objname 02
xxxdgjt1.ss.de:/ // / -no_aligned


they're listening on the Tcp ports :

[root@xxxdgjt1 ~]# netstat -antp | grep 25618
tcp 21 0 172.17.1.47:5555 172.17.12.12:2128
CLOSE_WAIT 25618/vbda
[root@xxxdgjt1 ~]# netstat -antp | grep 23303
tcp 21 0 172.17.1.47:5555 172.17.12.12:2073
CLOSE_WAIT 23303/vbda


fuser all other partitions do not show processes locking/opening files, only the
root (ie / ) partition :

# fuser / |grep 25618 ==> will show 25618 & 25618r as amongst the processes
# fuser / |grep 23303 ==> will show 23303 & 23303r as amongst the processes


# cd /etc
# ls */*omni*
xinetd.d/omni

opt/omni:
client server

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 08-15-2011, 06:55 AM
sunhux G
 
Default Options to stop processes that can't be killed -9 other than reboot

If "kill -9 pid" doesn't work, would killall work?


On Mon, Aug 15, 2011 at 2:32 PM, <mukesh.kale09@gmail.com> wrote:
> Killall <process name >
>
> -- mukesh

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 08-19-2011, 05:35 AM
Cameron Simpson
 
Default Options to stop processes that can't be killed -9 other than reboot

On 15Aug2011 14:55, sunhux G <sunhux@gmail.com> wrote:
| On Mon, Aug 15, 2011 at 2:32 PM, <mukesh.kale09@gmail.com> wrote:
| > Killall <process name >
| > -- mukesh
|
| If "kill -9 pid" doesn't work, would killall work?

Of course not.

Basicly, if kill -9 doesn't remove the process, it is wedged on some
kernel level resource. When that comes good the process will exit but
not before.

Does "lsof -p 25618" (adjust for whatever PID) tell you anything useful
about the hung process? Is the tape drive ok? Does the command "strace
-p 25618" (again, adjust) tell you what function call is in progress,
and what it is accessing? Probably the tape drive, but check out the
system call file descriptor against those shown by lsof.
--
Cameron Simpson <cs@zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

I will not compromise, just to look good in your eyes. - Falling Joys

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 08-19-2011, 04:49 PM
Yong Huang
 
Default Options to stop processes that can't be killed -9 other than reboot

> | If "kill -9 pid" doesn't work, would killall work?
>
> Of course not.
>
> Basicly, if kill -9 doesn't remove the process, it is
> wedged on some
> kernel level resource. When that comes good the process
> will exit but
> not before.
>
> Does "lsof -p 25618" (adjust for whatever PID) tell you
> anything useful
> about the hung process? Is the tape drive ok? Does the
> command "strace
> -p 25618" (again, adjust) tell you what function call is in
> progress,
> and what it is accessing? Probably the tape drive, but
> check out the
> system call file descriptor against those shown by lsof.
> --
> Cameron Simpson <cs@zip.com.au>

Indeed no OS command can "kill" a process that kill -9 cannot "kill".
But if the process is associated with some hardware, you can try doing
something on it. For instance, pulling a cable may fire a trigger
inside the device driver and wake up the stuck process from the kernel
space.

Yong Huang

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 08-30-2011, 01:21 AM
"Furnish, Trever G"
 
Default Options to stop processes that can't be killed -9 other than reboot

Those are HP Data Protector processes, and as you have hopefully gleaned from the other responses to your question, it's not the process that's hung the tape drive, but rather the tape drive that's hung the process.

If you use 'ps auxww' instead of 'ps -ef', then you'll see process state as the 8th column. My guess is that this column will be either "D" (uninterruptible sleep) or "Z" (defunct). If it's Z, stop worrying about it. If it's D...give up or start jiggling cables. :-) D typically means it's waiting on hardware, and it'll wait forever. If it's "S", then it'll wake up when the kernel wakes it up (but it should respond to the kill -9 in that case).

Here's the process state list from the man page of RHEL's ps:
PROCESS STATE CODES
Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe
the state of a process.
D Uninterruptible sleep (usually IO)
R Running or runnable (on run queue)
S Interruptible sleep (waiting for an event to complete)
T Stopped, either by a job control signal or because it is being traced.
W paging (not valid since the 2.6.xx kernel)
X dead (should never be seen)
Z Defunct ("zombie") process, terminated but not reaped by its parent.



-----Original Message-----
From: redhat-list-bounces@redhat.com [mailto:redhat-list-bounces@redhat.com] On Behalf Of sunhux G
Sent: Sunday, August 14, 2011 10:54 PM
To: General Red Hat Linux discussion list
Subject: Options to stop processes that can't be killed -9 other than reboot

Hi

I have 2 processes (shown by ps -ef below) which has 'jammed' the tape drive below & I can't "kill -9" them.

Is there any way short of reboot to stop them, say "service xxx restart" or anything else other than rebooting this Linux 4.x server? Since reboot involves doing "service stop xxx" of various services, surely one of the xxx must be able to stop the processes (just an educated guess). We faced this issue with our Dataprotector quite often so frequent reboot is not an option.

# ps -ef |grep -i bma |grep -v grep
root 10197 1 0 Aug13 ? 00:00:08 /opt/omni/lbin/vbda
-bmaname HP:Ultrium 4-SCSI_4 -type 2 -start 1313175661 -level 0 -access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
1313175612 -volume / -profile -no_lock -hlink -no_touch -no_encode -no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile -report 0 -on_busy 2 -no_nthlink -archattr -share_info -objname 02 xxxdgjt1.ss.de:/ // / -no_aligned
root 23303 1 0 Aug13 ? 00:00:03 /opt/omni/lbin/vbda
-bmaname HP:Ultrium 4-SCSI_1 -type 2 -start 1313192083 -level 0 -access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
1313192026 -volume / -profile -no_lock -hlink -no_touch -no_encode -no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile -report 0 -on_busy 2 -no_nthlink -archattr -share_info -objname 02 xxxdgjt1.ss.de:/ // / -no_aligned
root 25618 1 0 Aug13 ? 00:00:03 /opt/omni/lbin/vbda
-bmaname HP:Ultrium 4-SCSI_1 -type 2 -start 1313195066 -level 0 -access 1 0 -protection 2 1209600 -name / -ma xxxdgjt1.ss.de 22000 -id
1313195016 -volume / -profile -no_lock -hlink -no_touch -no_encode -no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile -report 0 -on_busy 2 -no_nthlink -archattr -share_info -objname 02 xxxdgjt1.ss.de:/ // / -no_aligned


they're listening on the Tcp ports :

[root@xxxdgjt1 ~]# netstat -antp | grep 25618
tcp 21 0 172.17.1.47:5555 172.17.12.12:2128
CLOSE_WAIT 25618/vbda
[root@xxxdgjt1 ~]# netstat -antp | grep 23303
tcp 21 0 172.17.1.47:5555 172.17.12.12:2073
CLOSE_WAIT 23303/vbda


fuser all other partitions do not show processes locking/opening files, only the root (ie / ) partition :

# fuser / |grep 25618 ==> will show 25618 & 25618r as amongst the processes
# fuser / |grep 23303 ==> will show 23303 & 23303r as amongst the processes


# cd /etc
# ls */*omni*
xinetd.d/omni

opt/omni:
client server

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 09-13-2011, 02:01 PM
sunhux G
 
Default Options to stop processes that can't be killed -9 other than reboot

Hi Trever,


It just happened again on another Linux media server
& looks like it's a "D" (uninterruptible sleep) :


# ps axfu | grep 7892
root 17369 0.0 0.0 5024 640 pts/2 S+ 20:39 0:00
\_ grep 7892
root 7892 0.0 0.1 6316 2808 ? Ds Sep12 0:13
/opt/omni/lbin/vbda -bmaname HP:Ultrium 4-SCSI_3 -type 2 -start
1315817521 -level 0 -access 1 0 -protection 2 2332800 -load 1.000000
-name hostname.ss.de [/] -ma hostname.ss.de 22000 -id 1315817438
-volume / -profile -trees / -no_lock -hlink -no_touch -no_encode
-no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
-report 0 -on_busy 2 -no_nthlink -archattr -share_info -objname 02
hostname.ss.de:/ // hostname.ss.de [/] -no_aligned
root 15604 0.0 0.0 5480 560 ? D 15:05 0:00 lsof -a -p 7892
root 15636 0.0 0.0 5400 560 ? D 15:10 0:00 lsof -a -p 7892


# lsof -a -p 7892
< above command just pauses/hangs there; it's been over 2 hrs already >

# strace -p7892
Process 7892 attached - interrupt to quit
< it pauses there & Ctrl-C did not yield any response >
(have to do 'pkill -9 strace' to exit it)


# kill -9 7892
(& it's still there as shown below: )

# ps axfu |grep bma | grep omni |grep 7892
root 7892 0.0 0.1 6316 2808 ? Ds Sep12 0:13
/opt/omni/lbin/vbda -bmaname HP:Ultrium 4-SCSI_3 -type 2 -start
1315817521 -level 0 -access 1 0 -protection 2 2332800 -load 1.000000
-name hostname.ss.de [/] -ma hostname.ss.de 22000 -id 1315817438
-volume / -profile -trees / -no_lock -hlink -no_touch -no_encode
-no_expand_sparse -no_nwuncompress -no_compress -no_preview -profile
-report 0 -on_busy 2 -no_nthlink -archattr -share_info -objname 02
hostname.ss.de:/ // hostname.ss.de [/] -no_aligned


There's several 'Closed-wait' sessions which had been there for hours:
I also login to the remote server cellmgsvr (Win 2003 server) & issue
"netstat -ano" to search for pid of those sessions' pids (eg 3215, 3453,
2578) but none were there :

# lsof -i :5555
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
vbda 1235 root 0u IPv4 1824346 TCP
hostname.ss.demni->cellsvmgr.ss.de:4033 (ESTABLISHED)
vbda 1235 root 1u IPv4 1824346 TCP
hostname.ss.demni->cellsvmgr.ss.de:4033 (ESTABLISHED)
vbda 3998 root 0u IPv4 1942430 TCP
hostname.ss.demni->cellsvmgr.ss.de:3366 (ESTABLISHED)
vbda 3998 root 1u IPv4 1942430 TCP
hostname.ss.demni->cellsvmgr.ss.de:3366 (ESTABLISHED)
vbda 4757 root 0u IPv4 1832833 TCP
hostname.ss.demni->cellsvmgr.ss.de:2798 (ESTABLISHED)
vbda 4757 root 1u IPv4 1832833 TCP
hostname.ss.demni->cellsvmgr.ss.de:2798 (ESTABLISHED)
vbda 7892 root 0u IPv4 1950188 TCP
hostname.ss.demni->cellsvmgr.ss.de:2356 (ESTABLISHED)
vbda 7892 root 1u IPv4 1950188 TCP
hostname.ss.demni->cellsvmgr.ss.de:2356 (ESTABLISHED)
vbda 9475 root 0u IPv4 1955789 TCP
hostname.ss.demni->cellsvmgr.ss.de:3215 (CLOSE_WAIT)
vbda 9475 root 1u IPv4 1955789 TCP
hostname.ss.demni->cellsvmgr.ss.de:3215 (CLOSE_WAIT)
vbda 10177 root 0u IPv4 1956998 TCP
hostname.ss.demni->cellsvmgr.ss.de:3453 (CLOSE_WAIT)
vbda 10177 root 1u IPv4 1956998 TCP
hostname.ss.demni->cellsvmgr.ss.de:3453 (CLOSE_WAIT)
fsbrda 14948 root 0u IPv4 1969734 TCP
hostname.ss.demni->cellsvmgr.ss.de:2578 (CLOSE_WAIT)
fsbrda 14948 root 1u IPv4 1969734 TCP
hostname.ss.demni->cellsvmgr.ss.de:2578 (CLOSE_WAIT)
xinetd 15500 root 5u IPv4 1971072 TCP *mni (LISTEN)
#
#
# lsof -i :2356
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
vbda 7892 root 0u IPv4 1950188 TCP
hostname.ss.demni->cellsvmgr.ss.de:2356 (ESTABLISHED)
vbda 7892 root 1u IPv4 1950188 TCP
hostname.ss.demni->cellsvmgr.ss.de:2356 (ESTABLISHED)

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 09-13-2011, 02:14 PM
 
Default Options to stop processes that can't be killed -9 other than reboot

sunhux G wrote:
<snip>
<snip>
> # lsof -a -p 7892
> < above command just pauses/hangs there; it's been over 2 hrs already >
>
> # strace -p7892
> Process 7892 attached - interrupt to quit
> < it pauses there & Ctrl-C did not yield any response >
> (have to do 'pkill -9 strace' to exit it)
<snip>
As a side note, did you try <crtl-D>?

mark


--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 09-14-2011, 12:38 AM
Cameron Simpson
 
Default Options to stop processes that can't be killed -9 other than reboot

On 13Sep2011 10:14, m.roth@5-cent.us <m.roth@5-cent.us> wrote:
| sunhux G wrote:
| <snip>
| <snip>
| > # lsof -a -p 7892
| > < above command just pauses/hangs there; it's been over 2 hrs already >
| >
| > # strace -p7892
| > Process 7892 attached - interrupt to quit
| > < it pauses there & Ctrl-C did not yield any response >
| > (have to do 'pkill -9 strace' to exit it)
| <snip>
| As a side note, did you try <crtl-D>?
| mark

Sigh. Whatever for? At least ctrl-C sends SIGINT. ctlr-D just flushes
the input stream, generating a zero length read if the stream is already
flushed, and a zero length read means EOF. But the program has to be running
_and_ reading its input to notice this.

In short, if "kill -9" doesn't get you anywhere, ctrl-D doesn't even
make into the waste of time category because it is not even a signal.

Cheers,
--
Cameron Simpson <cs@zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

Well, if you didn't struggle so much, you wouldn't get rope burns.

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 09-14-2011, 05:18 PM
"Allen, Jack"
 
Default Options to stop processes that can't be killed -9 other than reboot

> Mike Burger wrote:
>> If you have a process that is stuck in a zombie mode and kill -9
isn't
>> getting rid of it, you may need to do something with the parent
process
>> that spawned it in the first place.
>
> Yeah, but too often, the parent process has gone, and the zombie's now
got
> a parent of 1.

That would stink, yeah. :-(

[Jack Allen] I thought I would add a few comments about this type of
problem. If a process will not exit after a "kill -9 PID" has been done,
then it is stuck waiting on the kernel to complete something on its
behalf. When you "send a single" to a PID, you are not really send a
single, you are only setting a bit in the process that indicates a
single has been posted for that process. When the kernel schedules the
process to run again the bits are looked at and handled as setup up by
the process, single catchers. But -9 cannot be caught and processed by
the process. The kernel will cause the process to exit.

Now how can a process get stuck waiting on the kernel. Here is an
example that use to happen quite often when 9trk tape drives were used.
Many of you may have never seen one. Anyway, say some type of backup was
being writing to a 9trk tape drive that is 2400 feet long. When the
backup completed it may display a message to that affect and then close
the file descriptor associated with the tape drive causing it to rewind
the tape. Well it takes maybe 20 to 30 seconds or more to rewind the
tape and the operator would push the online button during that time, to
take it offline and push the unload button. The process is waiting for
the kernel to let it know the tape has rewound and is back at load point
and considered closed. This will never happen because the tape drive is
now offline and will not generate an interrupt when the tape completes
the rewind and is at load point. Therefore the operator does not get
their prompt back or whatever should have happened next. You can do
"kill -9 PID" on the process but it is not going to terminate. All they
had to do was thread the tape and put it online again and the kernel
received an interrupt from the device and determine a process was
waiting to be woke up and wake it up. But if a "kill -9 PID" had been
done the process will terminated, if not then it may display something
else for the operator to do, like mount another tape.

Now about a Zombie process. A Zombie is a process that has exited,
wither that be because it called exit() or received some signal that
caused it to end. It is in the Zombie state because its parent has not
done a wait() to pick up its exit status. If the parent has exited then
it is inherited by PID 1 (init). This is by design. When this happens,
PID 1 is woke up and does a wait() which returns the PID and the exit
status. It determines that it was not a PID it started and just ignores
it. But the fact that it did the wait(), the PID is removed from the
process table.

So if you do "kill -9 PID" and the process does not become a Zombie,
then it is stuck waiting on the kernel. It will do no good to kill the
parent. If it does become a Zombie, and the parent does not do a wait(),
then the parent has a bug or it is waiting on something and just has not
gotten around to doing a wait().

-----
Jack Allen

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 
Old 09-14-2011, 10:37 PM
Cameron Simpson
 
Default Options to stop processes that can't be killed -9 other than reboot

On 14Sep2011 10:13, m.roth@5-cent.us <m.roth@5-cent.us> wrote:
| > If you have a process that is stuck in a zombie mode and kill -9 isn't
| > getting rid of it, you may need to do something with the parent process
| > that spawned it in the first place.
|
| Yeah, but too often, the parent process has gone, and the zombie's now got
| a parent of 1.

Then it is not a zombie. Zombies are exited processes which still have their
parent, but the parent has not (yet) collected their exit status.

There's no point trying to kill a zombie - it has already exited and is
using no resources other than a process table slot to preserve the PID
against reuse until the parent has waited, and to hold the exit status
the parent will collect.

In the cited case, the process hasn't exited, so it is not a zombie anyway.

Cheers,
--
Cameron Simpson <cs@zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

I was moving so fast I started using Him as a braking marker.
- motorcyclist test pilot

--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list
 

Thread Tools




All times are GMT. The time now is 12:14 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org