FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Crash Utility

 
 
LinkBack Thread Tools
 
Old 11-30-2010, 07:33 PM
Dave Anderson
 
Default Show missing tasks in ps

----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:

> Hi Dave,
>
> I got an s390x dump of a Linux 2.6.36 system, where a task (kmcheck, pid=44) is
> missing in the ps output. I debugged the problem and I think that I found the
> reason:
>
> It looks like that crash does not walk the linked list of the pid hash table
> to the end, if it finds a NULL pointer in the pid.tasks[PIDTYPE_PID=0]
> array. Unfortunately, for the struct pid that is before our lost task in the
> linked list this condition is true. Therefore crash does not find our task.

That sounds similar to the fix Bob Montgomery made in 5.0.7:

- Fix for the potential to miss one or more tasks in 2.6.23 and earlier
kernels, presumably due to catching an entry the kernel's pid_hash[]
chain in transition. Without the patch, the task will simply not be
seen in the gathered task list.
(bob.montgomery@hp.com)

where this was his patch posting -- which fixed refresh_hlist_task_table_v2():

[Crash-utility] Missing PID 1 is crash problem with losing tasks
https://www.redhat.com/archives/crash-utility/2010-August/msg00049.html

and where your patch fixes refresh_hlist_task_table_v3().

I'll give it a test run...

Thanks,
Dave


> The attached patch seems to fix this problem.
>
> Here my crash debug log with the 2.6.36 dump:
> ---------------------------------------------
> Task "kmcheck" is in hash slot 2941 in the linked list at position 2:
>
> crash> print pid_hash[2941]
> $4 = {
> first = 0x3f5fb7f8
> }
>
> crash> upid
> struct upid {
> int nr;
> struct pid_namespace *ns;
> struct hlist_node pid_chain;
> }
> SIZE: 32
>
> crash> upid.pid_chain
> struct upid {
> [16] struct hlist_node pid_chain;
> }
>
> crash> eval 0x3f5fb7f8 - 16
> hexadecimal: 3f5fb7e8
>
> crash> upid 3f5fb7e8 <<<<---- the first upid in the list
> struct upid {
> nr = 565,
> ns = 0x81d8f8,
> pid_chain = {
> next = 0x3edea2b0,
> pprev = 0x96554e8
> }
> }
>
> crash> pid
> struct pid {
> atomic_t count;
> unsigned int level;
> struct hlist_head tasks[3];
> struct rcu_head rcu;
> struct upid numbers[1];
> }
> SIZE: 80
>
> crash> pid.numbers
> struct pid {
> [48] struct upid numbers[1];
> }
>
> crash> eval 3f5fb7e8 - 48
> hexadecimal: 3f5fb7b8
>
> crash> pid 3f5fb7b8
> struct pid {
> count = {
> counter = 1
> },
> level = 0,
> tasks = {{
> first = 0x0 <<<----------- tasks[0] is NULL
> }, {
> first = 0x3d488620
> }, {
> first = 0x0
> }},
> rcu = {
> next = 0x5a5a5a5a5a5a5a5a,
> func = 0x5a5a5a5a5a5a5a5a
> },
> numbers = {{
> nr = 565,
> ns = 0x81d8f8,
> pid_chain = {
> next = 0x3edea2b0, <<<--------- Pointer to second element in
> list
> pprev = 0x96554e8
> }
> }}
> }
>
> crash> eval 0x3edea2b0 - 16
> hexadecimal: 3edea2a0 <<<-- The second upid in the list
>
> crash> upid 0x3edea2a0
> struct upid {
> nr = 44, <<<--- Our missing pid=44 (kmcheck)
> ns = 0x81d8f8,
> pid_chain = {
> next = 0x0,
> pprev = 0x3f5fb7f8
> }
> }
>
> crash> eval 0x3edea2a0 - 48
> hexadecimal: 3edea270
>
> crash> pid 3edea270
> struct pid {
> count = {
> counter = 5
> },
> level = 0,
> tasks = {{
> first = 0x3e799908 <<<--- Pointer to our task_struct.pids
> }, {
> first = 0x0
> }, {
> first = 0x0
> }},
> rcu = {
> next = 0x5a5a5a5a5a5a5a5a,
> func = 0x5a5a5a5a5a5a5a5a
> },
> numbers = {{
> nr = 44,
> ns = 0x81d8f8,
> pid_chain = {
> next = 0x0,
> pprev = 0x3f5fb7f8
> }
> }}
> }
>
> crash> task_struct.pids
> struct task_struct {
> [712] struct pid_link pids[3];
> }
>
> crash> eval 0x3e799908 - 712
> hexadecimal: 3e799640
>
> crash> task_struct 3e799640 | grep comm
> comm = "kmcheck0000000000000000", <<<--- here it is
> ---
> task.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> --- a/task.c
> +++ b/task.c
> @@ -2006,7 +2006,7 @@ do_chained:
> }
>
> if (pid_tasks_0 == 0)
> - continue;
> + goto chain_next;
>
> next = pid_tasks_0 - OFFSET(task_struct_pids);
>
> @@ -2042,7 +2042,7 @@ do_chained:
> }
>
> cnt++;
> -
> +chain_next:
> if (pnext) {
> kpp = pnext;
> upid = pnext - OFFSET(upid_pid_chain);
>
>
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 12-01-2010, 12:38 PM
Dave Anderson
 
Default Show missing tasks in ps

----- "Dave Anderson" <anderson@redhat.com> wrote:

> ----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:
>
> > Hi Dave,
> >
> > I got an s390x dump of a Linux 2.6.36 system, where a task (kmcheck, pid=44) is
> > missing in the ps output. I debugged the problem and I think that I found the
> > reason:
> >
> > It looks like that crash does not walk the linked list of the pid hash table
> > to the end, if it finds a NULL pointer in the pid.tasks[PIDTYPE_PID=0]
> > array. Unfortunately, for the struct pid that is before our lost task in the
> > linked list this condition is true. Therefore crash does not find our task.
>
> That sounds similar to the fix Bob Montgomery made in 5.0.7:
>
> - Fix for the potential to miss one or more tasks in 2.6.23 and earlier
> kernels, presumably due to catching an entry the kernel's pid_hash[]
> chain in transition. Without the patch, the task will simply not be
> seen in the gathered task list.
> (bob.montgomery@hp.com)
>
> where this was his patch posting -- which fixed refresh_hlist_task_table_v2():
>
> [Crash-utility] Missing PID 1 is crash problem with losing tasks
> https://www.redhat.com/archives/crash-utility/2010-August/msg00049.html
>
> and where your patch fixes refresh_hlist_task_table_v3().
>
> I'll give it a test run...
>
> Thanks,
> Dave

Hi Michael,

Works well -- it's a rare occurrance, but the patch uncovered a total of
seven missing tasks in a test run on a sample set of 50 "v3" dumpfiles.

Queued for next release.

Thanks,
Dave



>
> > The attached patch seems to fix this problem.
> >
> > Here my crash debug log with the 2.6.36 dump:
> > ---------------------------------------------
> > Task "kmcheck" is in hash slot 2941 in the linked list at position
> 2:
> >
> > crash> print pid_hash[2941]
> > $4 = {
> > first = 0x3f5fb7f8
> > }
> >
> > crash> upid
> > struct upid {
> > int nr;
> > struct pid_namespace *ns;
> > struct hlist_node pid_chain;
> > }
> > SIZE: 32
> >
> > crash> upid.pid_chain
> > struct upid {
> > [16] struct hlist_node pid_chain;
> > }
> >
> > crash> eval 0x3f5fb7f8 - 16
> > hexadecimal: 3f5fb7e8
> >
> > crash> upid 3f5fb7e8 <<<<---- the first upid in the list
> > struct upid {
> > nr = 565,
> > ns = 0x81d8f8,
> > pid_chain = {
> > next = 0x3edea2b0,
> > pprev = 0x96554e8
> > }
> > }
> >
> > crash> pid
> > struct pid {
> > atomic_t count;
> > unsigned int level;
> > struct hlist_head tasks[3];
> > struct rcu_head rcu;
> > struct upid numbers[1];
> > }
> > SIZE: 80
> >
> > crash> pid.numbers
> > struct pid {
> > [48] struct upid numbers[1];
> > }
> >
> > crash> eval 3f5fb7e8 - 48
> > hexadecimal: 3f5fb7b8
> >
> > crash> pid 3f5fb7b8
> > struct pid {
> > count = {
> > counter = 1
> > },
> > level = 0,
> > tasks = {{
> > first = 0x0 <<<----------- tasks[0] is NULL
> > }, {
> > first = 0x3d488620
> > }, {
> > first = 0x0
> > }},
> > rcu = {
> > next = 0x5a5a5a5a5a5a5a5a,
> > func = 0x5a5a5a5a5a5a5a5a
> > },
> > numbers = {{
> > nr = 565,
> > ns = 0x81d8f8,
> > pid_chain = {
> > next = 0x3edea2b0, <<<--------- Pointer to second element
> in
> > list
> > pprev = 0x96554e8
> > }
> > }}
> > }
> >
> > crash> eval 0x3edea2b0 - 16
> > hexadecimal: 3edea2a0 <<<-- The second upid in the list
> >
> > crash> upid 0x3edea2a0
> > struct upid {
> > nr = 44, <<<--- Our missing pid=44 (kmcheck)
> > ns = 0x81d8f8,
> > pid_chain = {
> > next = 0x0,
> > pprev = 0x3f5fb7f8
> > }
> > }
> >
> > crash> eval 0x3edea2a0 - 48
> > hexadecimal: 3edea270
> >
> > crash> pid 3edea270
> > struct pid {
> > count = {
> > counter = 5
> > },
> > level = 0,
> > tasks = {{
> > first = 0x3e799908 <<<--- Pointer to our task_struct.pids
> > }, {
> > first = 0x0
> > }, {
> > first = 0x0
> > }},
> > rcu = {
> > next = 0x5a5a5a5a5a5a5a5a,
> > func = 0x5a5a5a5a5a5a5a5a
> > },
> > numbers = {{
> > nr = 44,
> > ns = 0x81d8f8,
> > pid_chain = {
> > next = 0x0,
> > pprev = 0x3f5fb7f8
> > }
> > }}
> > }
> >
> > crash> task_struct.pids
> > struct task_struct {
> > [712] struct pid_link pids[3];
> > }
> >
> > crash> eval 0x3e799908 - 712
> > hexadecimal: 3e799640
> >
> > crash> task_struct 3e799640 | grep comm
> > comm = "kmcheck0000000000000000", <<<--- here it
> is
> > ---
> > task.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > --- a/task.c
> > +++ b/task.c
> > @@ -2006,7 +2006,7 @@ do_chained:
> > }
> >
> > if (pid_tasks_0 == 0)
> > - continue;
> > + goto chain_next;
> >
> > next = pid_tasks_0 - OFFSET(task_struct_pids);
> >
> > @@ -2042,7 +2042,7 @@ do_chained:
> > }
> >
> > cnt++;
> > -
> > +chain_next:
> > if (pnext) {
> > kpp = pnext;
> > upid = pnext - OFFSET(upid_pid_chain);
> >
> >
> > --
> > Crash-utility mailing list
> > Crash-utility@redhat.com
> > https://www.redhat.com/mailman/listinfo/crash-utility
>
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 

Thread Tools




All times are GMT. The time now is 06:00 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org