FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Crash Utility

 
 
LinkBack Thread Tools
 
Old 01-22-2010, 01:18 PM
Michael Holzheu
 
Default Running idle threads show wrong CPU numbers

Hi Dave,

I have a problem with a dump where I have defined five CPUs and two of
them are offline. In fact the logical CPUs are defined as follows:

0 on
1 on
2 off
3 off
4 on

The CPU online map looks correct:

crash> print/x *cpu_online_mask
$4 = {
bits = {0x13} ---> b10011
}

When I issue "ps" I see that all running tasks are idle, but the CPU
numbers are not correct (0,1,2 and not 0,1,4):

PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 0 0 0 800ef0 RU 0.0 0 0 [swapper]
> 0 0 1 18c24240 RU 0.0 0 0 [swapper]
> 0 0 2 18c2c340 RU 0.0 0 0 [swapper]

I tried to debug the problem, but got stuck somewhere in "task.c". I
think there is a problem with the idle threads initialization, where the
online map is not considered.

Maybe you can see the bug immediately. Otherwise I will have spend more
effort for debugging that problem. I hope not :-)

Michael




--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 01-22-2010, 01:32 PM
Dave Anderson
 
Default Running idle threads show wrong CPU numbers

----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:

> Hi Dave,
>
> I have a problem with a dump where I have defined five CPUs and two of
> them are offline. In fact the logical CPUs are defined as follows:
>
> 0 on
> 1 on
> 2 off
> 3 off
> 4 on
>
> The CPU online map looks correct:
>
> crash> print/x *cpu_online_mask
> $4 = {
> bits = {0x13} ---> b10011
> }
>
> When I issue "ps" I see that all running tasks are idle, but the CPU
> numbers are not correct (0,1,2 and not 0,1,4):
>
> PID PPID CPU TASK ST %MEM VSZ RSS COMM
> > 0 0 0 800ef0 RU 0.0 0 0 [swapper]
> > 0 0 1 18c24240 RU 0.0 0 0 [swapper]
> > 0 0 2 18c2c340 RU 0.0 0 0 [swapper]
>
> I tried to debug the problem, but got stuck somewhere in "task.c". I
> think there is a problem with the idle threads initialization, where the
> online map is not considered.
>
> Maybe you can see the bug immediately. Otherwise I will have spend more
> effort for debugging that problem. I hope not :-)

Does "sys" show 5 or 3 cpus? I'm guessing it shows 3, but should show 5.
It looks like the s390/s390x files need to use "get_highest_cpu_online()-1"
(like x86_64 and ppc64) in order to determine the number of cpus to account
for. As it is now, they do this, and would therefore only account for the
first 3 cpus:

int
s390x_get_smp_cpus(void)
{
return get_cpus_online();
}

int
s390_get_smp_cpus(void)
{
return get_cpus_online();
}

Dave




--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 01-22-2010, 05:57 PM
Dave Anderson
 
Default Running idle threads show wrong CPU numbers

----- "Dave Anderson" <anderson@redhat.com> wrote:

> ----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:
>
> > Hi Dave,
> >
> > I have a problem with a dump where I have defined five CPUs and two
> of
> > them are offline. In fact the logical CPUs are defined as follows:
> >
> > 0 on
> > 1 on
> > 2 off
> > 3 off
> > 4 on
> >
> > The CPU online map looks correct:
> >
> > crash> print/x *cpu_online_mask
> > $4 = {
> > bits = {0x13} ---> b10011
> > }
> >
> > When I issue "ps" I see that all running tasks are idle, but the CPU
> > numbers are not correct (0,1,2 and not 0,1,4):
> >
> > PID PPID CPU TASK ST %MEM VSZ RSS COMM
> > > 0 0 0 800ef0 RU 0.0 0 0 [swapper]
> > > 0 0 1 18c24240 RU 0.0 0 0 [swapper]
> > > 0 0 2 18c2c340 RU 0.0 0 0 [swapper]
> >
> > I tried to debug the problem, but got stuck somewhere in "task.c". I
> > think there is a problem with the idle threads initialization, where the
> > online map is not considered.
> >
> > Maybe you can see the bug immediately. Otherwise I will have spend more
> > effort for debugging that problem. I hope not :-)
>
> Does "sys" show 5 or 3 cpus? I'm guessing it shows 3, but should show 5.
> It looks like the s390/s390x files need to use "get_highest_cpu_online()-1"
> (like x86_64 and ppc64) in order to determine the number of cpus to account
> for. As it is now, they do this, and would therefore only account for the
> first 3 cpus:
>
> int
> s390x_get_smp_cpus(void)
> {
> return get_cpus_online();
> }
>
> int
> s390_get_smp_cpus(void)
> {
> return get_cpus_online();
> }

In other words, just have the two functions above do this:

return (get_highest_cpu_online() + 1);

The offline cpus will still show their swapper tasks and their
runqueues given that they still exist, although quiescent.

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-10-2010, 12:32 PM
Michael Holzheu
 
Default Running idle threads show wrong CPU numbers

Hallo Dave,

On Fri, 2010-01-22 at 09:32 -0500, Dave Anderson wrote:
> ----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:
>
> > Hi Dave,
> >
> > I have a problem with a dump where I have defined five CPUs and two of
> > them are offline. In fact the logical CPUs are defined as follows:
> >
> > 0 on
> > 1 on
> > 2 off
> > 3 off
> > 4 on
> >
> > The CPU online map looks correct:
> >
> > crash> print/x *cpu_online_mask
> > $4 = {
> > bits = {0x13} ---> b10011
> > }
> >
> > When I issue "ps" I see that all running tasks are idle, but the CPU
> > numbers are not correct (0,1,2 and not 0,1,4):
> >
> > PID PPID CPU TASK ST %MEM VSZ RSS COMM
> > > 0 0 0 800ef0 RU 0.0 0 0 [swapper]
> > > 0 0 1 18c24240 RU 0.0 0 0 [swapper]
> > > 0 0 2 18c2c340 RU 0.0 0 0 [swapper]
> >
> > I tried to debug the problem, but got stuck somewhere in "task.c". I
> > think there is a problem with the idle threads initialization, where the
> > online map is not considered.
> >
> > Maybe you can see the bug immediately. Otherwise I will have spend more
> > effort for debugging that problem. I hope not :-)
>
> Does "sys" show 5 or 3 cpus? I'm guessing it shows 3, but should show 5.

Yes it shows 3.

> It looks like the s390/s390x files need to use "get_highest_cpu_online()-1"
> (like x86_64 and ppc64) in order to determine the number of cpus to account
> for. As it is now, they do this, and would therefore only account for the
> first 3 cpus:
>
> int
> s390x_get_smp_cpus(void)
> {
> return get_cpus_online();
> }
>
> int
> s390_get_smp_cpus(void)
> {
> return get_cpus_online();
> }

Hmmm ok...

When I change get_smp_cpus() to return "get_highest_cpu_online() + 1" I
see five swapper idle tasks when using "ps". The problem I now have is
that I have to provide a backtrace for the offline cpus. But the offline
CPUs do not have any stack on s390. Is there a way to tell crash that
there is no backtrace available? Probably I overlooked something...

Michael

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-10-2010, 02:00 PM
Michael Holzheu
 
Default Running idle threads show wrong CPU numbers

Hi again,

On Wed, 2010-02-10 at 14:32 +0100, Michael Holzheu wrote:
> Hallo Dave,
>
> On Fri, 2010-01-22 at 09:32 -0500, Dave Anderson wrote:
> > ----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:
> >
> > > Hi Dave,
> > >
> > > I have a problem with a dump where I have defined five CPUs and two of
> > > them are offline. In fact the logical CPUs are defined as follows:
> > >
> > > 0 on
> > > 1 on
> > > 2 off
> > > 3 off
> > > 4 on
> > >
> > > The CPU online map looks correct:
> > >
> > > crash> print/x *cpu_online_mask
> > > $4 = {
> > > bits = {0x13} ---> b10011
> > > }
> > >
> > > When I issue "ps" I see that all running tasks are idle, but the CPU
> > > numbers are not correct (0,1,2 and not 0,1,4):
> > >
> > > PID PPID CPU TASK ST %MEM VSZ RSS COMM
> > > > 0 0 0 800ef0 RU 0.0 0 0 [swapper]
> > > > 0 0 1 18c24240 RU 0.0 0 0 [swapper]
> > > > 0 0 2 18c2c340 RU 0.0 0 0 [swapper]
> > >
> > > I tried to debug the problem, but got stuck somewhere in "task.c". I
> > > think there is a problem with the idle threads initialization, where the
> > > online map is not considered.
> > >
> > > Maybe you can see the bug immediately. Otherwise I will have spend more
> > > effort for debugging that problem. I hope not :-)
> >
> > Does "sys" show 5 or 3 cpus? I'm guessing it shows 3, but should show 5.
>
> Yes it shows 3.
>
> > It looks like the s390/s390x files need to use "get_highest_cpu_online()-1"
> > (like x86_64 and ppc64) in order to determine the number of cpus to account
> > for. As it is now, they do this, and would therefore only account for the
> > first 3 cpus:
> >
> > int
> > s390x_get_smp_cpus(void)
> > {
> > return get_cpus_online();
> > }
> >
> > int
> > s390_get_smp_cpus(void)
> > {
> > return get_cpus_online();
> > }
>
> Hmmm ok...
>
> When I change get_smp_cpus() to return "get_highest_cpu_online() + 1" I
> see five swapper idle tasks when using "ps". The problem I now have is
> that I have to provide a backtrace for the offline cpus. But the offline
> CPUs do not have any stack on s390. Is there a way to tell crash that
> there is no backtrace available? Probably I overlooked something...

Ok, I think I got it now. In case of an offline CPU, I will use
"task_struct_thread_ksp" like I do it for non active tasks.

When I do that I get for the swapper tasks with the offline CPUs:

PID: 0 TASK: 18d38340 CPU: 2 COMMAND: "swapper"
#0 [18d3feb8] ret_from_fork at 117e12

PID: 0 TASK: 18d40440 CPU: 3 COMMAND: "swapper"
#0 [18d47eb8] ret_from_fork at 117e12


Michael

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-10-2010, 02:08 PM
Dave Anderson
 
Default Running idle threads show wrong CPU numbers

----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:

> Hi again,

> > When I change get_smp_cpus() to return "get_highest_cpu_online() + 1" I
> > see five swapper idle tasks when using "ps". The problem I now have is
> > that I have to provide a backtrace for the offline cpus. But the offline
> > CPUs do not have any stack on s390. Is there a way to tell crash that
> > there is no backtrace available? Probably I overlooked something...
>
> Ok, I think I got it now. In case of an offline CPU, I will use
> "task_struct_thread_ksp" like I do it for non active tasks.
>
> When I do that I get for the swapper tasks with the offline CPUs:
>
> PID: 0 TASK: 18d38340 CPU: 2 COMMAND: "swapper"
> #0 [18d3feb8] ret_from_fork at 117e12
>
> PID: 0 TASK: 18d40440 CPU: 3 COMMAND: "swapper"
> #0 [18d47eb8] ret_from_fork at 117e12

I'm not why you should do anything. The cpu is offline and for all
practical purposes it doesn't exist, so why bother?

The patch I have queued just uses get_highest_cpu_online()+1 and
does nothing else. But I only tested it on a live system, and
any backtrace attempt on the offlined swapper task just shows
(active). What happens when you do a "bt -a" with a dumpfile?

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-10-2010, 02:42 PM
Michael Holzheu
 
Default Running idle threads show wrong CPU numbers

On Wed, 2010-02-10 at 10:08 -0500, Dave Anderson wrote:
> ----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:
>
> > Hi again,
>
> > > When I change get_smp_cpus() to return "get_highest_cpu_online() + 1" I
> > > see five swapper idle tasks when using "ps". The problem I now have is
> > > that I have to provide a backtrace for the offline cpus. But the offline
> > > CPUs do not have any stack on s390. Is there a way to tell crash that
> > > there is no backtrace available? Probably I overlooked something...
> >
> > Ok, I think I got it now. In case of an offline CPU, I will use
> > "task_struct_thread_ksp" like I do it for non active tasks.
> >
> > When I do that I get for the swapper tasks with the offline CPUs:
> >
> > PID: 0 TASK: 18d38340 CPU: 2 COMMAND: "swapper"
> > #0 [18d3feb8] ret_from_fork at 117e12
> >
> > PID: 0 TASK: 18d40440 CPU: 3 COMMAND: "swapper"
> > #0 [18d47eb8] ret_from_fork at 117e12
>
> I'm not why you should do anything. The cpu is offline and for all
> practical purposes it doesn't exist, so why bother?

Because you can do a "bt" on the swapper task with the offline CPU.
Then s390x_get_stack_frame() is called where I figure out the stack
pointer and instruction address. In that function I check if the task is
currently running on a CPU and in that case I get the information from
the associated s390 lowcore, where the registers are stored in case of a
dump. If the task is not running I get the information from the thread
struct.

> The patch I have queued just uses get_highest_cpu_online()+1 and
> does nothing else. But I only tested it on a live system, and
> any backtrace attempt on the offlined swapper task just shows
> (active). What happens when you do a "bt -a" with a dumpfile?

It shows all swapper tasks (online and offline), but I get errors for
the backtrace for the offline CPUs.

The attached patch would solve the problem (and eliminate most of the
probably redundant s390(x)_has_cpu() function.


With this patch "ps" shows:

PID PPID CPU TASK ST %MEM VSZ RSS COMM
> 0 0 0 800ef0 RU 0.0 0 0 [swapper]
> 0 0 1 18d30240 RU 0.0 0 0 [swapper]
> 0 0 2 18d38340 RU 0.0 0 0 [swapper]
> 0 0 3 18d40440 RU 0.0 0 0 [swapper]
> 0 0 4 18d48540 RU 0.0 0 0 [swapper]
1 0 1 18d18040 IN 0.2 2244 1020 init
...

And "bt -a" shows:

PID: 0 TASK: 800ef0 CPU: 0 COMMAND: "swapper"
LOWCORE INFO:
-psw : 0x0706000180000000 0x0000000000115564
-function : vtime_stop_cpu at 115564
-prefix : 0x18d28000
-cpu timer: 0x7fff00c1 0x00c584ef
...

PID: 0 TASK: 18d30240 CPU: 1 COMMAND: "swapper"
LOWCORE INFO:
-psw : 0x0706000180000000 0x0000000000115564
-function : vtime_stop_cpu at 115564
...

PID: 0 TASK: 18d38340 CPU: 2 COMMAND: "swapper"
#0 [18d3feb8] ret_from_fork at 117e12
...

PID: 0 TASK: 18d40440 CPU: 3 COMMAND: "swapper"
#0 [18d47eb8] ret_from_fork at 117e12
...

PID: 0 TASK: 18d48540 CPU: 4 COMMAND: "swapper"
LOWCORE INFO:
-psw : 0x0706000180000000 0x0000000000115564
-function : vtime_stop_cpu at 115564
-prefix : 0x1416a000

Michael
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-10-2010, 03:09 PM
Dave Anderson
 
Default Running idle threads show wrong CPU numbers

----- Forwarded Message -----
From: "Dave Anderson" <anderson@redhat.com>

----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:

> On Wed, 2010-02-10 at 10:08 -0500, Dave Anderson wrote:
> > ----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:
> >
> > > Hi again,
> >
> > > > When I change get_smp_cpus() to return "get_highest_cpu_online() + 1" I
> > > > see five swapper idle tasks when using "ps". The problem I now have is
> > > > that I have to provide a backtrace for the offline cpus. But the offline
> > > > CPUs do not have any stack on s390. Is there a way to tell crash that
> > > > there is no backtrace available? Probably I overlooked something...
> > >
> > > Ok, I think I got it now. In case of an offline CPU, I will use
> > > "task_struct_thread_ksp" like I do it for non active tasks.
> > >
> > > When I do that I get for the swapper tasks with the offline CPUs:
> > >
> > > PID: 0 TASK: 18d38340 CPU: 2 COMMAND: "swapper"
> > > #0 [18d3feb8] ret_from_fork at 117e12
> > >
> > > PID: 0 TASK: 18d40440 CPU: 3 COMMAND: "swapper"
> > > #0 [18d47eb8] ret_from_fork at 117e12
> >
> > I'm not why you should do anything. The cpu is offline and for all
> > practical purposes it doesn't exist, so why bother?
>
> Because you can do a "bt" on the swapper task with the offline CPU.
> Then s390x_get_stack_frame() is called where I figure out the stack
> pointer and instruction address. In that function I check if the task is
> currently running on a CPU and in that case I get the information from
> the associated s390 lowcore, where the registers are stored in case of a
> dump. If the task is not running I get the information from the thread
> struct.
>
> > The patch I have queued just uses get_highest_cpu_online()+1 and
> > does nothing else. But I only tested it on a live system, and
> > any backtrace attempt on the offlined swapper task just shows
> > (active). What happens when you do a "bt -a" with a dumpfile?
>
> It shows all swapper tasks (online and offline), but I get errors for
> the backtrace for the offline CPUs.

What kind of errors?

>
> The attached patch would solve the problem (and eliminate most of the
> probably redundant s390(x)_has_cpu() function.

I don't see what's being solved by the patch (not the s390x_get_smp_cpus
parts) -- does the "old" s390x_has_cpu() fail?

Even though the task is offline, the runqueue will still show its percpu
swapper task as the current task on that cpu.

Dave


>
> With this patch "ps" shows:
>
> PID PPID CPU TASK ST %MEM VSZ RSS COMM
> > 0 0 0 800ef0 RU 0.0 0 0 [swapper]
> > 0 0 1 18d30240 RU 0.0 0 0 [swapper]
> > 0 0 2 18d38340 RU 0.0 0 0 [swapper]
> > 0 0 3 18d40440 RU 0.0 0 0 [swapper]
> > 0 0 4 18d48540 RU 0.0 0 0 [swapper]
> 1 0 1 18d18040 IN 0.2 2244 1020 init
> ...
>
> And "bt -a" shows:
>
> PID: 0 TASK: 800ef0 CPU: 0 COMMAND: "swapper"
> LOWCORE INFO:
> -psw : 0x0706000180000000 0x0000000000115564
> -function : vtime_stop_cpu at 115564
> -prefix : 0x18d28000
> -cpu timer: 0x7fff00c1 0x00c584ef
> ...
>
> PID: 0 TASK: 18d30240 CPU: 1 COMMAND: "swapper"
> LOWCORE INFO:
> -psw : 0x0706000180000000 0x0000000000115564
> -function : vtime_stop_cpu at 115564
> ...
>
> PID: 0 TASK: 18d38340 CPU: 2 COMMAND: "swapper"
> #0 [18d3feb8] ret_from_fork at 117e12
> ...
>
> PID: 0 TASK: 18d40440 CPU: 3 COMMAND: "swapper"
> #0 [18d47eb8] ret_from_fork at 117e12
> ...
>
> PID: 0 TASK: 18d48540 CPU: 4 COMMAND: "swapper"
> LOWCORE INFO:
> -psw : 0x0706000180000000 0x0000000000115564
> -function : vtime_stop_cpu at 115564
> -prefix : 0x1416a000
>
> Michael

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-10-2010, 05:45 PM
Michael Holzheu
 
Default Running idle threads show wrong CPU numbers

On Wed, 2010-02-10 at 11:09 -0500, Dave Anderson wrote:
> ----- Forwarded Message -----
> From: "Dave Anderson" <anderson@redhat.com>
>
> ----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:
>
> > On Wed, 2010-02-10 at 10:08 -0500, Dave Anderson wrote:
> > > ----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:
> > >
> > > > Hi again,
> > >
> > > > > When I change get_smp_cpus() to return "get_highest_cpu_online() + 1" I
> > > > > see five swapper idle tasks when using "ps". The problem I now have is
> > > > > that I have to provide a backtrace for the offline cpus. But the offline
> > > > > CPUs do not have any stack on s390. Is there a way to tell crash that
> > > > > there is no backtrace available? Probably I overlooked something...
> > > >
> > > > Ok, I think I got it now. In case of an offline CPU, I will use
> > > > "task_struct_thread_ksp" like I do it for non active tasks.
> > > >
> > > > When I do that I get for the swapper tasks with the offline CPUs:
> > > >
> > > > PID: 0 TASK: 18d38340 CPU: 2 COMMAND: "swapper"
> > > > #0 [18d3feb8] ret_from_fork at 117e12
> > > >
> > > > PID: 0 TASK: 18d40440 CPU: 3 COMMAND: "swapper"
> > > > #0 [18d47eb8] ret_from_fork at 117e12
> > >
> > > I'm not why you should do anything. The cpu is offline and for all
> > > practical purposes it doesn't exist, so why bother?
> >
> > Because you can do a "bt" on the swapper task with the offline CPU.
> > Then s390x_get_stack_frame() is called where I figure out the stack
> > pointer and instruction address. In that function I check if the task is
> > currently running on a CPU and in that case I get the information from
> > the associated s390 lowcore, where the registers are stored in case of a
> > dump. If the task is not running I get the information from the thread
> > struct.
> >
> > > The patch I have queued just uses get_highest_cpu_online()+1 and
> > > does nothing else. But I only tested it on a live system, and
> > > any backtrace attempt on the offlined swapper task just shows
> > > (active). What happens when you do a "bt -a" with a dumpfile?
> >
> > It shows all swapper tasks (online and offline), but I get errors for
> > the backtrace for the offline CPUs.
>
> What kind of errors?

The problem is that for the offline swapper tasks
s390x_get_stack_frame() is called. In that function I check with
s390x_has_cpu() if the task is currently running on a CPU. Because of
the missing CPU online check, s390x_has_cpu() returns TRUE. Therefore I
try to read the CPU registers from the lowcore of that CPU. The lowcore
pointer is zero, because the CPU is offline. Therefore the read stack
pointer (register 15) is wrong and the backtrace fails.

> >
> > The attached patch would solve the problem (and eliminate most of the
> > probably redundant s390(x)_has_cpu() function.
>
> I don't see what's being solved by the patch (not the s390x_get_smp_cpus
> parts) -- does the "old" s390x_has_cpu() fail?

The old s390x_has_cpu() returns TRUE for the offline swapper tasks. And
I think that this is wrong.

The new implementation of s390x_has_cpu() should return TRUE if the task
is running on a online CPU and FALSE otherwise:

+ if (is_task_active(bt->task) && (kt->cpu_flags[cpu] & ONLINE))
+ return TRUE;
+ else
+ return FALSE;


Michael

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 02-10-2010, 06:01 PM
Dave Anderson
 
Default Running idle threads show wrong CPU numbers

----- "Michael Holzheu" <holzheu@linux.vnet.ibm.com> wrote:

> > > It shows all swapper tasks (online and offline), but I get errors for
> > > the backtrace for the offline CPUs.
> >
> > What kind of errors?
>
> The problem is that for the offline swapper tasks
> s390x_get_stack_frame() is called. In that function I check with
> s390x_has_cpu() if the task is currently running on a CPU. Because of
> the missing CPU online check, s390x_has_cpu() returns TRUE. Therefore I
> try to read the CPU registers from the lowcore of that CPU. The lowcore
> pointer is zero, because the CPU is offline. Therefore the read stack
> pointer (register 15) is wrong and the backtrace fails.
>
> > >
> > > The attached patch would solve the problem (and eliminate most of the
> > > probably redundant s390(x)_has_cpu() function.
> >
> > I don't see what's being solved by the patch (not the s390x_get_smp_cpus
> > parts) -- does the "old" s390x_has_cpu() fail?
>
> The old s390x_has_cpu() returns TRUE for the offline swapper tasks. And
> I think that this is wrong.

Hmmm... To me, it is TRUE, i.e., the existing-but-idle swapper task for
an offline cpu actually *does* own that cpu.

And that's why I was wondering about what error message gets shown.

>
> The new implementation of s390x_has_cpu() should return TRUE if the task
> is running on a online CPU and FALSE otherwise:
>
> + if (is_task_active(bt->task) && (kt->cpu_flags[cpu] & ONLINE))
> + return TRUE;
> + else
> + return FALSE;

This is probably OK, although I am slightly hesitant about throwing out all
of the old backwards-compatibility code in the s390[x]_has_cpu() functions.
I thought maybe it would be safer to leave well enough alone, and not
worry about any error messages from backtraces of offline cpus.
It might be even more useful that there are error messages to alert
the user that the cpu is not online?

Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 

Thread Tools




All times are GMT. The time now is 02:47 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org