FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Crash Utility

 
 
LinkBack Thread Tools
 
Old 03-05-2010, 07:01 PM
Dave Anderson
 
Default Display online cpus value in preference to kt->cpus

----- "Luciano Chavez" <lnx1138@linux.vnet.ibm.com> wrote:

> Howdy,
>
> crash 5.0.0 introduced a change to ppc64_paca_init() in ppc64.c to
> manipulate kt->cpus to fix a 4.0-8.11 regression when cpu_possible_map
> has more cpus than cpu_online_map. The change basically adjusts kt->cpus
> to the highest cpu index + 1. In situations where cpus from 0 through
> highest index are all online, this will equal online cpus.
> On IBM POWER based system supporting SMT, we can have it dynamically
> enabled and disabled and so we can go from:
>
> brucelp3:~ # ppc64_cpu --smt=on
> brucelp3:~ # cat /sys/devices/system/cpu/online
> 0-9
>
> to
>
> brucelp3:~ # ppc64_cpu --smt=off
> brucelp3:~ # cat /sys/devices/system/cpu/online
> 0,2,4,6,8
>
> In this situation, the new code will determine that kt->cpus is 9. crash
> will display:
>
> KERNEL: /boot/vmlinux
> DUMPFILE: /dev/mem
> CPUS: 9 ===============> Rather than 5
> DATE: Fri Feb 26 06:06:51 2010
> UPTIME: 04:22:34
> LOAD AVERAGE: 0.49, 0.14, 0.05
> TASKS: 320
> NODENAME: brucelp3
> RELEASE: 2.6.32.8-0.3-ppc64
> VERSION: #1 SMP 2010-02-22 16:22:25 +0100
> MACHINE: ppc64 (unknown Mhz)
> MEMORY: 1 GB
> PID: 19948
> COMMAND: "crash"
> TASK: c00000002433be50 [THREAD_INFO: c0000000238a0000]
> CPU: 2
> STATE: TASK_RUNNING (ACTIVE)
>
> kernel_init() initially does come up with 5 for kt->cpus initially before
> the machdep init routine (ppc64_paca_init) ends up changing it to 9 in
> the above situation.
>
> Because of the way other parts of the code seem to iterate, allowing kt->cpus
> to get set to the number of online cpus (5) would make them not work properly
> either. Case in point, the ps command. It would iterate through the first 5
> cpus for the swapper tasks and stop, providing no information for swapper tasks
> on online cpus 6 and 8.
>
> Rather than displaying kt->cpu for cpu count to display to users, can we call
> get_cpus_online() itself. This will solve the problem of keeping kt->cpu
> separate from get_cpus_online(). So commands like ps, set etc can still rely on
> kt->cpu as set for each architecture. Additionally, we need to consider
> providing the value as it exists after machine dependent init in case
> the symbols required by get_cpus_online() are not available.
>
> The patch provided creates a new routine called get_cpus_to_display() in
> kernel.c that simply attempts to retrieve the count of online CPUs with
> get_cpus_online(). However, since the cpu_online_map symbol may not be
> available with all kernels, the cpus_to_display() routine will
> return the value for kt->cpus if get_cpus_online() return 0 for that
> case to maintain backwards compatibility.
>
> With this in mind, all locations (mainly the display_machine_stats() routines
> for each architecture, display_sys_stats() and dump_kernel_table()) in the
> source that would print the cpu count using kt->cpus now call get_cpus_to_display()
> to obtain that value.
>
> This should hopefully provide the user with an expected CPU count regardless of
> the internal manipulation that is sometimes done to the kt->cpus value.
>
> regards,
> --
> Luciano Chavez <lnx1138@linux.vnet.ibm.com>
> IBM Linux Technology Center

Right -- I thought of tinkering with the initial system banner and
the "sys" output to show the total cpus (highest+1) plus the online
count, i.e., something like:

crash> sys
KERNEL: /usr/lib/debug/lib/modules/2.6.18-128.el5/vmlinux
DUMPFILE: /dev/crash
CPUS: 4 (3 online)
DATE: Fri Mar 5 14:48:23 2010
UPTIME: 32 days, 06:20:13
LOAD AVERAGE: 0.10, 0.18, 0.17
TASKS: 269
NODENAME: crash.usersys.redhat.com
RELEASE: 2.6.18-128.el5
VERSION: #1 SMP Wed Dec 17 11:41:38 EST 2008
MACHINE: x86_64 (1995 Mhz)
MEMORY: 1 GB
crash>

But I decided against even doing that because, say, in the example
above, there could be 8 cpus, with cpus 0,1,2 and 3 online. In
that case, the output would be "CPUS: 4 (4 online)" which would also
be somewhat misleading. In that case, the cpu_present_mask would
need to be consulted, but again, that's a fairly recent kernel addition.

So I left it purposely as it is now. Yes the count shown may
be more than what's online, but I kind of like the idea of having
the "CPUS" count reflect what will be seen when running commands
that cycle through all cpus. Showing just the online count is
kind of misleading in that case.

I don't much care about the "mach" output though, and as a compromise,
I have no problem changing those to show "ONLINE CPUS:" instead of
just "CPUS".

Comments?

Dave


>
> diff -up crash-5.0.1/alpha.c.orig crash-5.0.1/alpha.c
> --- crash-5.0.1/alpha.c.orig 2010-03-04 07:19:23.000000000 -0600
> +++ crash-5.0.1/alpha.c 2010-03-04 07:20:43.000000000 -0600
> @@ -2641,7 +2641,7 @@ alpha_display_machine_stats(void)
>
> fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> fprintf(fp, " MEMORY SIZE: %s
",
> get_memory_size(buf));
> - fprintf(fp, " CPUS: %d
", kt->cpus);
> + fprintf(fp, " CPUS: %d
",
> get_cpus_to_display());
> fprintf(fp, " PROCESSOR SPEED: ");
> if ((mhz = machdep->processor_speed()))
> fprintf(fp, "%ld Mhz
", mhz);
> diff -up crash-5.0.1/defs.h.orig crash-5.0.1/defs.h
> --- crash-5.0.1/defs.h.orig 2010-02-17 15:21:24.000000000 -0600
> +++ crash-5.0.1/defs.h 2010-03-04 07:20:43.000000000 -0600
> @@ -3717,6 +3717,7 @@ int get_cpus_online(void);
> int get_cpus_present(void);
> int get_cpus_possible(void);
> int get_highest_cpu_online(void);
> +int get_cpus_to_display(void);
> int in_cpu_map(int, int);
> void paravirt_init(void);
> void print_stack_text_syms(struct bt_info *, ulong, ulong);
> diff -up crash-5.0.1/ia64.c.orig crash-5.0.1/ia64.c
> --- crash-5.0.1/ia64.c.orig 2010-02-17 15:21:24.000000000 -0600
> +++ crash-5.0.1/ia64.c 2010-03-04 07:20:43.000000000 -0600
> @@ -2325,7 +2325,7 @@ ia64_display_machine_stats(void)
>
> fprintf(fp, " MACHINE TYPE: %s
",
> uts->machine);
> fprintf(fp, " MEMORY SIZE: %s
",
> get_memory_size(buf));
> - fprintf(fp, " CPUS: %d
", kt->cpus);
> + fprintf(fp, " CPUS: %d
",
> get_cpus_to_display());
> fprintf(fp, " PROCESSOR SPEED: ");
> if ((mhz = machdep->processor_speed()))
> fprintf(fp, "%ld Mhz
", mhz);
> diff -up crash-5.0.1/kernel.c.orig crash-5.0.1/kernel.c
> --- crash-5.0.1/kernel.c.orig 2010-02-17 15:21:24.000000000 -0600
> +++ crash-5.0.1/kernel.c 2010-03-04 07:20:43.000000000 -0600
> @@ -3871,7 +3871,7 @@ display_sys_stats(void)
> }
>
>
> - fprintf(fp, " CPUS: %d
", kt->cpus);
> + fprintf(fp, " CPUS: %d
", get_cpus_to_display());
> if (ACTIVE())
> get_symbol_data("xtime", sizeof(struct timespec),
> &kt->date);
> fprintf(fp, " DATE: %s
",
> @@ -4289,7 +4289,7 @@ dump_kernel_table(int verbose)
> fprintf(fp, " init_begin: %lx
", kt->init_begin);
> fprintf(fp, " init_end: %lx
", kt->init_end);
> fprintf(fp, " end: %lx
", kt->end);
> - fprintf(fp, " cpus: %d
", kt->cpus);
> + fprintf(fp, " cpus: %d
", get_cpus_to_display());
> fprintf(fp, " cpus_override: %s
", kt->cpus_override);
> fprintf(fp, " NR_CPUS: %d (compiled-in to this version
> of %s)
",
> NR_CPUS, pc->program_name);
> @@ -6257,6 +6257,16 @@ get_cpus_possible()
> }
>
> /*
> + * For when displaying cpus, return the number of cpus online if
> possible, otherwise kt->cpus.
> + */
> +int
> +get_cpus_to_display(void)
> +{
> + int online = get_cpus_online();
> + return (online ? online : kt->cpus);
> +}
> +
> +/*
> * Xen machine-address to pseudo-physical-page translator.
> */
> ulonglong
> diff -up crash-5.0.1/ppc64.c.orig crash-5.0.1/ppc64.c
> --- crash-5.0.1/ppc64.c.orig 2010-02-17 15:21:24.000000000 -0600
> +++ crash-5.0.1/ppc64.c 2010-03-04 07:20:43.000000000 -0600
> @@ -2215,7 +2215,7 @@ ppc64_display_machine_stats(void)
>
> fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> fprintf(fp, " MEMORY SIZE: %s
",
> get_memory_size(buf));
> - fprintf(fp, " CPUS: %d
", kt->cpus);
> + fprintf(fp, " CPUS: %d
",
> get_cpus_to_display());
> fprintf(fp, " PROCESSOR SPEED: ");
> if ((mhz = machdep->processor_speed()))
> fprintf(fp, "%ld Mhz
", mhz);
> diff -up crash-5.0.1/ppc.c.orig crash-5.0.1/ppc.c
> --- crash-5.0.1/ppc.c.orig 2010-03-04 07:20:14.000000000 -0600
> +++ crash-5.0.1/ppc.c 2010-03-04 07:20:43.000000000 -0600
> @@ -1355,7 +1355,7 @@ ppc_display_machine_stats(void)
>
> fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> fprintf(fp, " MEMORY SIZE: %s
",
> get_memory_size(buf));
> - fprintf(fp, " CPUS: %d
", kt->cpus);
> + fprintf(fp, " CPUS: %d
",
> get_cpus_to_display());
> fprintf(fp, " PROCESSOR SPEED: ");
> if ((mhz = machdep->processor_speed()))
> fprintf(fp, "%ld Mhz
", mhz);
> diff -up crash-5.0.1/s390.c.orig crash-5.0.1/s390.c
> --- crash-5.0.1/s390.c.orig 2010-02-17 15:21:24.000000000 -0600
> +++ crash-5.0.1/s390.c 2010-03-04 07:20:43.000000000 -0600
> @@ -1032,7 +1032,7 @@ s390_display_machine_stats(void)
>
> fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> fprintf(fp, " MEMORY SIZE: %s
", get_memory_size(buf));
> - fprintf(fp, " CPUS: %d
", kt->cpus);
> + fprintf(fp, " CPUS: %d
", get_cpus_to_display());
> fprintf(fp, " PROCESSOR SPEED: ");
> if ((mhz = machdep->processor_speed()))
> fprintf(fp, "%ld Mhz
", mhz);
> diff -up crash-5.0.1/s390x.c.orig crash-5.0.1/s390x.c
> --- crash-5.0.1/s390x.c.orig 2010-02-17 15:21:24.000000000 -0600
> +++ crash-5.0.1/s390x.c 2010-03-04 07:20:43.000000000 -0600
> @@ -1284,7 +1284,7 @@ s390x_display_machine_stats(void)
>
> fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> fprintf(fp, " MEMORY SIZE: %s
", get_memory_size(buf));
> - fprintf(fp, " CPUS: %d
", kt->cpus);
> + fprintf(fp, " CPUS: %d
", get_cpus_to_display());
> fprintf(fp, " PROCESSOR SPEED: ");
> if ((mhz = machdep->processor_speed()))
> fprintf(fp, "%ld Mhz
", mhz);
> diff -up crash-5.0.1/x86_64.c.orig crash-5.0.1/x86_64.c
> --- crash-5.0.1/x86_64.c.orig 2010-02-17 15:21:24.000000000 -0600
> +++ crash-5.0.1/x86_64.c 2010-03-04 07:20:43.000000000 -0600
> @@ -4412,7 +4412,7 @@ x86_64_display_machine_stats(void)
>
> fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> fprintf(fp, " MEMORY SIZE: %s
",
> get_memory_size(buf));
> - fprintf(fp, " CPUS: %d
", kt->cpus);
> + fprintf(fp, " CPUS: %d
",
> get_cpus_to_display());
> fprintf(fp, " PROCESSOR SPEED: ");
> if ((mhz = machdep->processor_speed()))
> fprintf(fp, "%ld Mhz
", mhz);
> diff -up crash-5.0.1/x86.c.orig crash-5.0.1/x86.c
> --- crash-5.0.1/x86.c.orig 2010-02-17 15:21:24.000000000 -0600
> +++ crash-5.0.1/x86.c 2010-03-04 07:20:43.000000000 -0600
> @@ -3950,7 +3950,7 @@ x86_display_machine_stats(void)
>
> fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> fprintf(fp, " MEMORY SIZE: %s
",
> get_memory_size(buf));
> - fprintf(fp, " CPUS: %d
", kt->cpus);
> + fprintf(fp, " CPUS: %d
", get_cpus_to_display());
> fprintf(fp, " PROCESSOR SPEED: ");
> if ((mhz = machdep->processor_speed()))
> fprintf(fp, "%ld Mhz
", mhz);
>
>
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 03-05-2010, 07:45 PM
Luciano Chavez
 
Default Display online cpus value in preference to kt->cpus

On Fri, 2010-03-05 at 15:01 -0500, Dave Anderson wrote:
> ----- "Luciano Chavez" <lnx1138@linux.vnet.ibm.com> wrote:
>
> > Howdy,
> >
> > crash 5.0.0 introduced a change to ppc64_paca_init() in ppc64.c to
> > manipulate kt->cpus to fix a 4.0-8.11 regression when cpu_possible_map
> > has more cpus than cpu_online_map. The change basically adjusts kt->cpus
> > to the highest cpu index + 1. In situations where cpus from 0 through
> > highest index are all online, this will equal online cpus.
> > On IBM POWER based system supporting SMT, we can have it dynamically
> > enabled and disabled and so we can go from:
> >
> > brucelp3:~ # ppc64_cpu --smt=on
> > brucelp3:~ # cat /sys/devices/system/cpu/online
> > 0-9
> >
> > to
> >
> > brucelp3:~ # ppc64_cpu --smt=off
> > brucelp3:~ # cat /sys/devices/system/cpu/online
> > 0,2,4,6,8
> >
> > In this situation, the new code will determine that kt->cpus is 9. crash
> > will display:
> >
> > KERNEL: /boot/vmlinux
> > DUMPFILE: /dev/mem
> > CPUS: 9 ===============> Rather than 5
> > DATE: Fri Feb 26 06:06:51 2010
> > UPTIME: 04:22:34
> > LOAD AVERAGE: 0.49, 0.14, 0.05
> > TASKS: 320
> > NODENAME: brucelp3
> > RELEASE: 2.6.32.8-0.3-ppc64
> > VERSION: #1 SMP 2010-02-22 16:22:25 +0100
> > MACHINE: ppc64 (unknown Mhz)
> > MEMORY: 1 GB
> > PID: 19948
> > COMMAND: "crash"
> > TASK: c00000002433be50 [THREAD_INFO: c0000000238a0000]
> > CPU: 2
> > STATE: TASK_RUNNING (ACTIVE)
> >
> > kernel_init() initially does come up with 5 for kt->cpus initially before
> > the machdep init routine (ppc64_paca_init) ends up changing it to 9 in
> > the above situation.
> >
> > Because of the way other parts of the code seem to iterate, allowing kt->cpus
> > to get set to the number of online cpus (5) would make them not work properly
> > either. Case in point, the ps command. It would iterate through the first 5
> > cpus for the swapper tasks and stop, providing no information for swapper tasks
> > on online cpus 6 and 8.
> >
> > Rather than displaying kt->cpu for cpu count to display to users, can we call
> > get_cpus_online() itself. This will solve the problem of keeping kt->cpu
> > separate from get_cpus_online(). So commands like ps, set etc can still rely on
> > kt->cpu as set for each architecture. Additionally, we need to consider
> > providing the value as it exists after machine dependent init in case
> > the symbols required by get_cpus_online() are not available.
> >
> > The patch provided creates a new routine called get_cpus_to_display() in
> > kernel.c that simply attempts to retrieve the count of online CPUs with
> > get_cpus_online(). However, since the cpu_online_map symbol may not be
> > available with all kernels, the cpus_to_display() routine will
> > return the value for kt->cpus if get_cpus_online() return 0 for that
> > case to maintain backwards compatibility.
> >
> > With this in mind, all locations (mainly the display_machine_stats() routines
> > for each architecture, display_sys_stats() and dump_kernel_table()) in the
> > source that would print the cpu count using kt->cpus now call get_cpus_to_display()
> > to obtain that value.
> >
> > This should hopefully provide the user with an expected CPU count regardless of
> > the internal manipulation that is sometimes done to the kt->cpus value.
> >
> > regards,
> > --
> > Luciano Chavez <lnx1138@linux.vnet.ibm.com>
> > IBM Linux Technology Center
>
> Right -- I thought of tinkering with the initial system banner and
> the "sys" output to show the total cpus (highest+1) plus the online
> count, i.e., something like:
>
> crash> sys
> KERNEL: /usr/lib/debug/lib/modules/2.6.18-128.el5/vmlinux
> DUMPFILE: /dev/crash
> CPUS: 4 (3 online)
> DATE: Fri Mar 5 14:48:23 2010
> UPTIME: 32 days, 06:20:13
> LOAD AVERAGE: 0.10, 0.18, 0.17
> TASKS: 269
> NODENAME: crash.usersys.redhat.com
> RELEASE: 2.6.18-128.el5
> VERSION: #1 SMP Wed Dec 17 11:41:38 EST 2008
> MACHINE: x86_64 (1995 Mhz)
> MEMORY: 1 GB
> crash>
>
> But I decided against even doing that because, say, in the example
> above, there could be 8 cpus, with cpus 0,1,2 and 3 online. In
> that case, the output would be "CPUS: 4 (4 online)" which would also
> be somewhat misleading. In that case, the cpu_present_mask would
> need to be consulted, but again, that's a fairly recent kernel addition.
>
> So I left it purposely as it is now. Yes the count shown may
> be more than what's online, but I kind of like the idea of having
> the "CPUS" count reflect what will be seen when running commands
> that cycle through all cpus. Showing just the online count is
> kind of misleading in that case.
>
> I don't much care about the "mach" output though, and as a compromise,
> I have no problem changing those to show "ONLINE CPUS:" instead of
> just "CPUS".
>
> Comments?
>
> Dave
>
>

Hi Dave,

Thinking about backward compatibility, would displaying "ONLINE CPUS"
still seem OK for the case where kernel_init() finds the smp_num_cpus
symbol (as for a 2.4 kernel)? Before there were the various cpu maps, I
think smp_num_cpus was analogous to the possible cpus as opposed to
online. I can see this requiring some thought as to what CPUS in the
output means when you have various different maps now (online, possible,
and present). That being said, it would be good to leave no doubt and
explicitly state the count is for the present or online CPUS with the
latter being my suggestion.

I forgot to mention that I suspect the problem I mentioned before would
get stranger for POWER7 which offers 4 threads per core. I didn't have
access to a POWER7 machine to see just what it would do if we tried
disabling SMT as before but it follows the same pattern the count
displayed would be way off from the online count.

> >
> > diff -up crash-5.0.1/alpha.c.orig crash-5.0.1/alpha.c
> > --- crash-5.0.1/alpha.c.orig 2010-03-04 07:19:23.000000000 -0600
> > +++ crash-5.0.1/alpha.c 2010-03-04 07:20:43.000000000 -0600
> > @@ -2641,7 +2641,7 @@ alpha_display_machine_stats(void)
> >
> > fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> > fprintf(fp, " MEMORY SIZE: %s
",
> > get_memory_size(buf));
> > - fprintf(fp, " CPUS: %d
", kt->cpus);
> > + fprintf(fp, " CPUS: %d
",
> > get_cpus_to_display());
> > fprintf(fp, " PROCESSOR SPEED: ");
> > if ((mhz = machdep->processor_speed()))
> > fprintf(fp, "%ld Mhz
", mhz);
> > diff -up crash-5.0.1/defs.h.orig crash-5.0.1/defs.h
> > --- crash-5.0.1/defs.h.orig 2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/defs.h 2010-03-04 07:20:43.000000000 -0600
> > @@ -3717,6 +3717,7 @@ int get_cpus_online(void);
> > int get_cpus_present(void);
> > int get_cpus_possible(void);
> > int get_highest_cpu_online(void);
> > +int get_cpus_to_display(void);
> > int in_cpu_map(int, int);
> > void paravirt_init(void);
> > void print_stack_text_syms(struct bt_info *, ulong, ulong);
> > diff -up crash-5.0.1/ia64.c.orig crash-5.0.1/ia64.c
> > --- crash-5.0.1/ia64.c.orig 2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/ia64.c 2010-03-04 07:20:43.000000000 -0600
> > @@ -2325,7 +2325,7 @@ ia64_display_machine_stats(void)
> >
> > fprintf(fp, " MACHINE TYPE: %s
",
> > uts->machine);
> > fprintf(fp, " MEMORY SIZE: %s
",
> > get_memory_size(buf));
> > - fprintf(fp, " CPUS: %d
", kt->cpus);
> > + fprintf(fp, " CPUS: %d
",
> > get_cpus_to_display());
> > fprintf(fp, " PROCESSOR SPEED: ");
> > if ((mhz = machdep->processor_speed()))
> > fprintf(fp, "%ld Mhz
", mhz);
> > diff -up crash-5.0.1/kernel.c.orig crash-5.0.1/kernel.c
> > --- crash-5.0.1/kernel.c.orig 2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/kernel.c 2010-03-04 07:20:43.000000000 -0600
> > @@ -3871,7 +3871,7 @@ display_sys_stats(void)
> > }
> >
> >
> > - fprintf(fp, " CPUS: %d
", kt->cpus);
> > + fprintf(fp, " CPUS: %d
", get_cpus_to_display());
> > if (ACTIVE())
> > get_symbol_data("xtime", sizeof(struct timespec),
> > &kt->date);
> > fprintf(fp, " DATE: %s
",
> > @@ -4289,7 +4289,7 @@ dump_kernel_table(int verbose)
> > fprintf(fp, " init_begin: %lx
", kt->init_begin);
> > fprintf(fp, " init_end: %lx
", kt->init_end);
> > fprintf(fp, " end: %lx
", kt->end);
> > - fprintf(fp, " cpus: %d
", kt->cpus);
> > + fprintf(fp, " cpus: %d
", get_cpus_to_display());
> > fprintf(fp, " cpus_override: %s
", kt->cpus_override);
> > fprintf(fp, " NR_CPUS: %d (compiled-in to this version
> > of %s)
",
> > NR_CPUS, pc->program_name);
> > @@ -6257,6 +6257,16 @@ get_cpus_possible()
> > }
> >
> > /*
> > + * For when displaying cpus, return the number of cpus online if
> > possible, otherwise kt->cpus.
> > + */
> > +int
> > +get_cpus_to_display(void)
> > +{
> > + int online = get_cpus_online();
> > + return (online ? online : kt->cpus);
> > +}
> > +
> > +/*
> > * Xen machine-address to pseudo-physical-page translator.
> > */
> > ulonglong
> > diff -up crash-5.0.1/ppc64.c.orig crash-5.0.1/ppc64.c
> > --- crash-5.0.1/ppc64.c.orig 2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/ppc64.c 2010-03-04 07:20:43.000000000 -0600
> > @@ -2215,7 +2215,7 @@ ppc64_display_machine_stats(void)
> >
> > fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> > fprintf(fp, " MEMORY SIZE: %s
",
> > get_memory_size(buf));
> > - fprintf(fp, " CPUS: %d
", kt->cpus);
> > + fprintf(fp, " CPUS: %d
",
> > get_cpus_to_display());
> > fprintf(fp, " PROCESSOR SPEED: ");
> > if ((mhz = machdep->processor_speed()))
> > fprintf(fp, "%ld Mhz
", mhz);
> > diff -up crash-5.0.1/ppc.c.orig crash-5.0.1/ppc.c
> > --- crash-5.0.1/ppc.c.orig 2010-03-04 07:20:14.000000000 -0600
> > +++ crash-5.0.1/ppc.c 2010-03-04 07:20:43.000000000 -0600
> > @@ -1355,7 +1355,7 @@ ppc_display_machine_stats(void)
> >
> > fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> > fprintf(fp, " MEMORY SIZE: %s
",
> > get_memory_size(buf));
> > - fprintf(fp, " CPUS: %d
", kt->cpus);
> > + fprintf(fp, " CPUS: %d
",
> > get_cpus_to_display());
> > fprintf(fp, " PROCESSOR SPEED: ");
> > if ((mhz = machdep->processor_speed()))
> > fprintf(fp, "%ld Mhz
", mhz);
> > diff -up crash-5.0.1/s390.c.orig crash-5.0.1/s390.c
> > --- crash-5.0.1/s390.c.orig 2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/s390.c 2010-03-04 07:20:43.000000000 -0600
> > @@ -1032,7 +1032,7 @@ s390_display_machine_stats(void)
> >
> > fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> > fprintf(fp, " MEMORY SIZE: %s
", get_memory_size(buf));
> > - fprintf(fp, " CPUS: %d
", kt->cpus);
> > + fprintf(fp, " CPUS: %d
", get_cpus_to_display());
> > fprintf(fp, " PROCESSOR SPEED: ");
> > if ((mhz = machdep->processor_speed()))
> > fprintf(fp, "%ld Mhz
", mhz);
> > diff -up crash-5.0.1/s390x.c.orig crash-5.0.1/s390x.c
> > --- crash-5.0.1/s390x.c.orig 2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/s390x.c 2010-03-04 07:20:43.000000000 -0600
> > @@ -1284,7 +1284,7 @@ s390x_display_machine_stats(void)
> >
> > fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> > fprintf(fp, " MEMORY SIZE: %s
", get_memory_size(buf));
> > - fprintf(fp, " CPUS: %d
", kt->cpus);
> > + fprintf(fp, " CPUS: %d
", get_cpus_to_display());
> > fprintf(fp, " PROCESSOR SPEED: ");
> > if ((mhz = machdep->processor_speed()))
> > fprintf(fp, "%ld Mhz
", mhz);
> > diff -up crash-5.0.1/x86_64.c.orig crash-5.0.1/x86_64.c
> > --- crash-5.0.1/x86_64.c.orig 2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/x86_64.c 2010-03-04 07:20:43.000000000 -0600
> > @@ -4412,7 +4412,7 @@ x86_64_display_machine_stats(void)
> >
> > fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> > fprintf(fp, " MEMORY SIZE: %s
",
> > get_memory_size(buf));
> > - fprintf(fp, " CPUS: %d
", kt->cpus);
> > + fprintf(fp, " CPUS: %d
",
> > get_cpus_to_display());
> > fprintf(fp, " PROCESSOR SPEED: ");
> > if ((mhz = machdep->processor_speed()))
> > fprintf(fp, "%ld Mhz
", mhz);
> > diff -up crash-5.0.1/x86.c.orig crash-5.0.1/x86.c
> > --- crash-5.0.1/x86.c.orig 2010-02-17 15:21:24.000000000 -0600
> > +++ crash-5.0.1/x86.c 2010-03-04 07:20:43.000000000 -0600
> > @@ -3950,7 +3950,7 @@ x86_display_machine_stats(void)
> >
> > fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> > fprintf(fp, " MEMORY SIZE: %s
",
> > get_memory_size(buf));
> > - fprintf(fp, " CPUS: %d
", kt->cpus);
> > + fprintf(fp, " CPUS: %d
", get_cpus_to_display());
> > fprintf(fp, " PROCESSOR SPEED: ");
> > if ((mhz = machdep->processor_speed()))
> > fprintf(fp, "%ld Mhz
", mhz);
> >
> >
> > --
> > Crash-utility mailing list
> > Crash-utility@redhat.com
> > https://www.redhat.com/mailman/listinfo/crash-utility
>
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
--
Luciano Chavez <lnx1138@linux.vnet.ibm.com>
IBM Linux Technology Center

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 03-05-2010, 08:54 PM
Dave Anderson
 
Default Display online cpus value in preference to kt->cpus

----- "Luciano Chavez" <lnx1138@linux.vnet.ibm.com> wrote:

> Hi Dave,
>
> Thinking about backward compatibility, would displaying "ONLINE CPUS"
> still seem OK for the case where kernel_init() finds the smp_num_cpus
> symbol (as for a 2.4 kernel)? Before there were the various cpu maps, I
> think smp_num_cpus was analogous to the possible cpus as opposed to
> online. I can see this requiring some thought as to what CPUS in the
> output means when you have various different maps now (online, possible,
> and present). That being said, it would be good to leave no doubt and
> explicitly state the count is for the present or online CPUS with the
> latter being my suggestion.
>
> I forgot to mention that I suspect the problem I mentioned before would
> get stranger for POWER7 which offers 4 threads per core. I didn't have
> access to a POWER7 machine to see just what it would do if we tried
> disabling SMT as before but it follows the same pattern the count
> displayed would be way off from the online count.

I just ran through a bunch of stashed dumpfiles I have on hand, and
it gets even murkier when dealing with Xen or KVM kernels, because
as part of the post-crash shutdown (or forced dump), all but one of
the cpus may be taken "offline". So even though there may be 4 vcpus,
and crash correctly shows 4 "CPUS", the cpu_online_map shows only one
cpu bit. So if we went ahead and displayed a number based upon the
cpu_online_map, it would completely misleading. Incorrect actually...

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 03-08-2010, 01:49 PM
Dave Anderson
 
Default Display online cpus value in preference to kt->cpus

----- "Dave Anderson" <anderson@redhat.com> wrote:

> ----- "Luciano Chavez" <lnx1138@linux.vnet.ibm.com> wrote:
>
> > Hi Dave,
> >
> > Thinking about backward compatibility, would displaying "ONLINE CPUS"
> > still seem OK for the case where kernel_init() finds the smp_num_cpus
> > symbol (as for a 2.4 kernel)? Before there were the various cpu maps, I
> > think smp_num_cpus was analogous to the possible cpus as opposed to
> > online. I can see this requiring some thought as to what CPUS in the
> > output means when you have various different maps now (online, possible,
> > and present). That being said, it would be good to leave no doubt and
> > explicitly state the count is for the present or online CPUS with the
> > latter being my suggestion.
> >
> > I forgot to mention that I suspect the problem I mentioned before would
> > get stranger for POWER7 which offers 4 threads per core. I didn't have
> > access to a POWER7 machine to see just what it would do if we tried
> > disabling SMT as before but it follows the same pattern the count
> > displayed would be way off from the online count.
>
> I just ran through a bunch of stashed dumpfiles I have on hand, and
> it gets even murkier when dealing with Xen or KVM kernels, because
> as part of the post-crash shutdown (or forced dump), all but one of
> the cpus may be taken "offline". So even though there may be 4 vcpus,
> and crash correctly shows 4 "CPUS", the cpu_online_map shows only one
> cpu bit. So if we went ahead and displayed a number based upon the
> cpu_online_map, it would completely misleading. Incorrect
> actually...

You can always dump the possible/present/online map information with
the "help -k" debug option.

So for example, taking a 2.6.9-era (RHEL4) xen kernel that crashed
on vcpu 3 due to a NULL reference, the hypervisor made a callback to
the other vcpus to shut them down prior to the core dumping procedure:

crash> help -k
...
cpu_possible_map: (does not exist)
cpu_present_map: 0 1 2 3
cpu_online_map: 3
...

So the online map cannot be used for the cpu count, and for that
matter, it wouldn't make any sense to even display the online map
count.

In any case, for now I prefer not to change things, at least for the
other architectures.

That being said, I defer machine-specific items for ppc64, s390
and s390x to the IBM maintainers, and to HP for ia64. (The ppc
and alpha architectures have no active "maintainers" any more,
so those arches are pretty much withering on the vine.)

So if you want to do something specifically for ppc64, please
re-post a patch for just that architecture.

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 03-08-2010, 04:28 PM
Luciano Chavez
 
Default Display online cpus value in preference to kt->cpus

On Mon, 2010-03-08 at 09:49 -0500, Dave Anderson wrote:
> ----- "Dave Anderson" <anderson@redhat.com> wrote:
>
> > ----- "Luciano Chavez" <lnx1138@linux.vnet.ibm.com> wrote:
> >
> > > Hi Dave,
> > >
> > > Thinking about backward compatibility, would displaying "ONLINE CPUS"
> > > still seem OK for the case where kernel_init() finds the smp_num_cpus
> > > symbol (as for a 2.4 kernel)? Before there were the various cpu maps, I
> > > think smp_num_cpus was analogous to the possible cpus as opposed to
> > > online. I can see this requiring some thought as to what CPUS in the
> > > output means when you have various different maps now (online, possible,
> > > and present). That being said, it would be good to leave no doubt and
> > > explicitly state the count is for the present or online CPUS with the
> > > latter being my suggestion.
> > >
> > > I forgot to mention that I suspect the problem I mentioned before would
> > > get stranger for POWER7 which offers 4 threads per core. I didn't have
> > > access to a POWER7 machine to see just what it would do if we tried
> > > disabling SMT as before but it follows the same pattern the count
> > > displayed would be way off from the online count.
> >
> > I just ran through a bunch of stashed dumpfiles I have on hand, and
> > it gets even murkier when dealing with Xen or KVM kernels, because
> > as part of the post-crash shutdown (or forced dump), all but one of
> > the cpus may be taken "offline". So even though there may be 4 vcpus,
> > and crash correctly shows 4 "CPUS", the cpu_online_map shows only one
> > cpu bit. So if we went ahead and displayed a number based upon the
> > cpu_online_map, it would completely misleading. Incorrect
> > actually...
>
> You can always dump the possible/present/online map information with
> the "help -k" debug option.
>
> So for example, taking a 2.6.9-era (RHEL4) xen kernel that crashed
> on vcpu 3 due to a NULL reference, the hypervisor made a callback to
> the other vcpus to shut them down prior to the core dumping procedure:
>
> crash> help -k
> ...
> cpu_possible_map: (does not exist)
> cpu_present_map: 0 1 2 3
> cpu_online_map: 3
> ...
>
> So the online map cannot be used for the cpu count, and for that
> matter, it wouldn't make any sense to even display the online map
> count.
>
> In any case, for now I prefer not to change things, at least for the
> other architectures.
>
> That being said, I defer machine-specific items for ppc64, s390
> and s390x to the IBM maintainers, and to HP for ia64. (The ppc
> and alpha architectures have no active "maintainers" any more,
> so those arches are pretty much withering on the vine.)
>
> So if you want to do something specifically for ppc64, please
> re-post a patch for just that architecture.
>
> Dave
>

Dave,

Thanks for taking a good look at all the many cases that would make a
general solution of using online cpu count messy. I originally did want
to make this change only applicable to ppc64. The thing was, only
ppc64_display_machine_stats() was possible to affect and to make the
value displayed consistent, changing display_sys_stats() and
dump_kernel_table() was necessary.

So, re-thinking this to be a ppc64 specific change to CPUS to be
displayed as the online count when possible and having everyone else do
what they do now, which is to display kt->cpus, I suggest the following:

1. Add a get_cpus_to_display as a machdep function
2. For ppc64, initialize machdep->get_cpus_to_display to
ppc64_get_cpus_to_display() which will attempt to use get_cpus_online()
or fallback to using kt->cpus
3. For all other architectures, have them initialize
machdep->get_cpus_to_display to generic_get_cpus_to_display() which
returns kt->cpus to maintain the status quo of the code as it is now
4. Replace kt->cpus in display_sys_stats() and dump_kernel_table() in
kernel.c to invoke machdep->get_cpus_to_display() when displaying CPUS

Let me know what you think. I think this solution allows for future
flexibility for other architectures if in the future they individually
need to change what they display for the cpu count.

regards,
--
Luciano Chavez <lnx1138@linux.vnet.ibm.com>
IBM Linux Technology Center

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 03-08-2010, 06:52 PM
Dave Anderson
 
Default Display online cpus value in preference to kt->cpus

----- "Luciano Chavez" <lnx1138@linux.vnet.ibm.com> wrote:

> > So if you want to do something specifically for ppc64, please
> > re-post a patch for just that architecture.
> >
> > Dave
> >
>
> Dave,
>
> Thanks for taking a good look at all the many cases that would make a
> general solution of using online cpu count messy. I originally did want
> to make this change only applicable to ppc64. The thing was, only
> ppc64_display_machine_stats() was possible to affect and to make the
> value displayed consistent, changing display_sys_stats() and
> dump_kernel_table() was necessary.
>
> So, re-thinking this to be a ppc64 specific change to CPUS to be
> displayed as the online count when possible and having everyone else do
> what they do now, which is to display kt->cpus, I suggest the following:
>
> 1. Add a get_cpus_to_display as a machdep function
> 2. For ppc64, initialize machdep->get_cpus_to_display to ppc64_get_cpus_to_display()
> which will attempt to use get_cpus_online() or fallback to using kt->cpus
> 3. For all other architectures, have them initialize machdep->get_cpus_to_display
> to generic_get_cpus_to_display() which returns kt->cpus to maintain the status
> quo of the code as it is now
> 4. Replace kt->cpus in display_sys_stats() and dump_kernel_table() in kernel.c to
> invoke machdep->get_cpus_to_display() when displaying CPUS

Well, we certainly don't want to change the "cpus:" output of dump_kernel_table()
because its purpose there *is* specifically to dump the kt->cpus value.

>
> Let me know what you think. I think this solution allows for future
> flexibility for other architectures if in the future they individually
> need to change what they display for the cpu count.

The fact of the matter is that it's really not machine-specific in the
sense that your function is just parsing the architecture-neutral cpu maps.
And even the "online-oddities" that I mentioned were not machine-specific,
but rather virtual-vs-baremetal issues.

For now, all I was thinking would be to simply change display_sys_stats()
to something like:

if (machine_type("PPC64"))
your_function();
else
fprintf(fp, " CPUS: %d
", kt->cpus);

and since your_function() does not need to be in a machine-specific
file, just put in kernel.c. And you can also call it from the
ppc64_display_machine_stats() function.

Dave



--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 03-08-2010, 07:30 PM
Luciano Chavez
 
Default Display online cpus value in preference to kt->cpus

On Mon, 2010-03-08 at 14:52 -0500, Dave Anderson wrote:
>
> For now, all I was thinking would be to simply change display_sys_stats()
> to something like:
>
> if (machine_type("PPC64"))
> your_function();
> else
> fprintf(fp, " CPUS: %d
", kt->cpus);
>
> and since your_function() does not need to be in a machine-specific
> file, just put in kernel.c. And you can also call it from the
> ppc64_display_machine_stats() function.

Hi Dave,

I like this particular solution best! Simple and straightforward. I'll
work on a patch that does exactly as suggested. I'll post it for review
once I have it done this afternoon. Thanks for the help!

regards,
--
Luciano Chavez <lnx1138@linux.vnet.ibm.com>
IBM Linux Technology Center

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 03-08-2010, 08:23 PM
Luciano Chavez
 
Default Display online cpus value in preference to kt->cpus

On Mon, 2010-03-08 at 20:30 +0000, Luciano Chavez wrote:
> On Mon, 2010-03-08 at 14:52 -0500, Dave Anderson wrote:
> >
> > For now, all I was thinking would be to simply change display_sys_stats()
> > to something like:
> >
> > if (machine_type("PPC64"))
> > your_function();
> > else
> > fprintf(fp, " CPUS: %d
", kt->cpus);
> >
> > and since your_function() does not need to be in a machine-specific
> > file, just put in kernel.c. And you can also call it from the
> > ppc64_display_machine_stats() function.
>
> Hi Dave,
>
> I like this particular solution best! Simple and straightforward. I'll
> work on a patch that does exactly as suggested. I'll post it for review
> once I have it done this afternoon. Thanks for the help!

Below is the simpler revised patch. It applies and compiles to the
latest crash 5.0.1 source cleanly.

I still need to verify it still fixes the original problem. I am
confident it will but just wanted to post the patch up first.

BTW, I was able to briefly access a POWER7 box this morning and
confirmed that at least the CPU count displayed would be worse on one of
those systems if indeed SMT was disabled and when using the current code
minus the patch.

~ # cat /sys/devices/system/cpu/online
0-23
~ # ppc64_cpu --smt=off
~ # cat /sys/devices/system/cpu/online
0,4,8,12,16,20

The current 5.0.1 code would display CPUS: 21 rather than CPUS: 6

Once I confirm the patch fixes this problem, I will post a followup.

--
Luciano Chavez <lnx1138@linux.vnet.ibm.com>
IBM Linux Technology Center

diff -up crash-5.0.1/defs.h.old crash-5.0.1/defs.h
--- crash-5.0.1/defs.h.old 2010-03-08 14:23:57.000000000 -0600
+++ crash-5.0.1/defs.h 2010-03-08 14:34:29.000000000 -0600
@@ -3717,6 +3717,7 @@ int get_cpus_online(void);
int get_cpus_present(void);
int get_cpus_possible(void);
int get_highest_cpu_online(void);
+int get_cpus_to_display(void);
int in_cpu_map(int, int);
void paravirt_init(void);
void print_stack_text_syms(struct bt_info *, ulong, ulong);
diff -up crash-5.0.1/kernel.c.old crash-5.0.1/kernel.c
--- crash-5.0.1/kernel.c.old 2010-03-08 14:23:45.000000000 -0600
+++ crash-5.0.1/kernel.c 2010-03-08 15:09:05.000000000 -0600
@@ -3871,7 +3871,8 @@ display_sys_stats(void)
}


- fprintf(fp, " CPUS: %d
", kt->cpus);
+ fprintf(fp, " CPUS: %d
",
+ machine_type("PPC64") ? get_cpus_to_display() : kt->cpus);
if (ACTIVE())
get_symbol_data("xtime", sizeof(struct timespec), &kt->date);
fprintf(fp, " DATE: %s
",
@@ -6256,6 +6257,18 @@ get_cpus_possible()
return possible;
}

+
+/*
+ * For when displaying cpus, return the number of cpus online if possible, otherwise kt->cpus.
+ */
+int
+get_cpus_to_display(void)
+{
+ int online = get_cpus_online();
+
+ return (online ? online : kt->cpus);
+}
+
/*
* Xen machine-address to pseudo-physical-page translator.
*/
diff -up crash-5.0.1/ppc64.c.old crash-5.0.1/ppc64.c
--- crash-5.0.1/ppc64.c.old 2010-03-08 14:24:07.000000000 -0600
+++ crash-5.0.1/ppc64.c 2010-03-08 14:38:10.000000000 -0600
@@ -2215,7 +2215,7 @@ ppc64_display_machine_stats(void)

fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
fprintf(fp, " MEMORY SIZE: %s
", get_memory_size(buf));
- fprintf(fp, " CPUS: %d
", kt->cpus);
+ fprintf(fp, " CPUS: %d
", get_cpus_to_display());
fprintf(fp, " PROCESSOR SPEED: ");
if ((mhz = machdep->processor_speed()))
fprintf(fp, "%ld Mhz
", mhz);


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 03-08-2010, 10:38 PM
Luciano Chavez
 
Default Display online cpus value in preference to kt->cpus

On Mon, 2010-03-08 at 21:23 +0000, Luciano Chavez wrote:
> On Mon, 2010-03-08 at 20:30 +0000, Luciano Chavez wrote:
> > On Mon, 2010-03-08 at 14:52 -0500, Dave Anderson wrote:
> > >
> > > For now, all I was thinking would be to simply change display_sys_stats()
> > > to something like:
> > >
> > > if (machine_type("PPC64"))
> > > your_function();
> > > else
> > > fprintf(fp, " CPUS: %d
", kt->cpus);
> > >
> > > and since your_function() does not need to be in a machine-specific
> > > file, just put in kernel.c. And you can also call it from the
> > > ppc64_display_machine_stats() function.
> >
> > Hi Dave,
> >
> > I like this particular solution best! Simple and straightforward. I'll
> > work on a patch that does exactly as suggested. I'll post it for review
> > once I have it done this afternoon. Thanks for the help!
>
> Below is the simpler revised patch. It applies and compiles to the
> latest crash 5.0.1 source cleanly.
>
> I still need to verify it still fixes the original problem. I am
> confident it will but just wanted to post the patch up first.
>
> BTW, I was able to briefly access a POWER7 box this morning and
> confirmed that at least the CPU count displayed would be worse on one of
> those systems if indeed SMT was disabled and when using the current code
> minus the patch.
>
> ~ # cat /sys/devices/system/cpu/online
> 0-23
> ~ # ppc64_cpu --smt=off
> ~ # cat /sys/devices/system/cpu/online
> 0,4,8,12,16,20
>
> The current 5.0.1 code would display CPUS: 21 rather than CPUS: 6
>
> Once I confirm the patch fixes this problem, I will post a followup.
>

I tested the patch and confirmed it displayed the expected results after
disabling smt on a ppc64 test system. Previously, CPUS
count would have been displayed incorrectly as 15 on this test system.

[root@elm3b130 crash-5.0.1]# ppc64_cpu --smt=on
[root@elm3b130 crash-5.0.1]# cat /sys/devices/system/cpu/cpu*/online | grep -c 1
16
[root@elm3b130 crash-5.0.1]# ppc64_cpu --smt=off
[root@elm3b130 crash-5.0.1]# cat /sys/devices/system/cpu/cpu*/online | grep -c 1
8
[root@elm3b130 crash-5.0.1]# ./crash

crash 5.0.1
Copyright (C) 2002-2010 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.

GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64-unknown-linux-gnu"...

KERNEL: /usr/lib/debug/lib/modules/2.6.18-190.el5/vmlinux
DUMPFILE: /dev/mem
CPUS: 8
DATE: Mon Mar 8 18:36:44 2010
UPTIME: 01:49:41
LOAD AVERAGE: 0.54, 0.54, 0.36
TASKS: 176
NODENAME: elm3b130
RELEASE: 2.6.18-190.el5
VERSION: #1 SMP Mon Feb 22 19:18:34 EST 2010
MACHINE: ppc64 (1654 Mhz)
MEMORY: 4 GB
PID: 20625
COMMAND: "crash"
TASK: c000000090456c00 [THREAD_INFO: c0000000ceb84000]
CPU: 14
STATE: TASK_RUNNING (ACTIVE)

crash> sys
KERNEL: /usr/lib/debug/lib/modules/2.6.18-190.el5/vmlinux
DUMPFILE: /dev/mem
CPUS: 8
DATE: Mon Mar 8 18:36:49 2010
UPTIME: 01:49:45
LOAD AVERAGE: 0.50, 0.53, 0.36
TASKS: 177
NODENAME: elm3b130
RELEASE: 2.6.18-190.el5
VERSION: #1 SMP Mon Feb 22 19:18:34 EST 2010
MACHINE: ppc64 (1654 Mhz)
MEMORY: 4 GB
crash>


--
Luciano Chavez <lnx1138@linux.vnet.ibm.com>
IBM Linux Technology Center

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 03-09-2010, 12:36 PM
Dave Anderson
 
Default Display online cpus value in preference to kt->cpus

----- "Luciano Chavez" <lnx1138@linux.vnet.ibm.com> wrote:

> Hi Dave,
> >
> > I like this particular solution best! Simple and straightforward. I'll
> > work on a patch that does exactly as suggested. I'll post it for review
> > once I have it done this afternoon. Thanks for the help!
>
> Below is the simpler revised patch. It applies and compiles to the
> latest crash 5.0.1 source cleanly.
>
> I still need to verify it still fixes the original problem. I am
> confident it will but just wanted to post the patch up first.
>
> BTW, I was able to briefly access a POWER7 box this morning and
> confirmed that at least the CPU count displayed would be worse on one of
> those systems if indeed SMT was disabled and when using the current code
> minus the patch.

Queued for the next release.

Thanks,
Dave

>
> ~ # cat /sys/devices/system/cpu/online
> 0-23
> ~ # ppc64_cpu --smt=off
> ~ # cat /sys/devices/system/cpu/online
> 0,4,8,12,16,20
>
> The current 5.0.1 code would display CPUS: 21 rather than CPUS: 6
>
> Once I confirm the patch fixes this problem, I will post a followup.
>
> --
> Luciano Chavez <lnx1138@linux.vnet.ibm.com>
> IBM Linux Technology Center
>
> diff -up crash-5.0.1/defs.h.old crash-5.0.1/defs.h
> --- crash-5.0.1/defs.h.old 2010-03-08 14:23:57.000000000 -0600
> +++ crash-5.0.1/defs.h 2010-03-08 14:34:29.000000000 -0600
> @@ -3717,6 +3717,7 @@ int get_cpus_online(void);
> int get_cpus_present(void);
> int get_cpus_possible(void);
> int get_highest_cpu_online(void);
> +int get_cpus_to_display(void);
> int in_cpu_map(int, int);
> void paravirt_init(void);
> void print_stack_text_syms(struct bt_info *, ulong, ulong);
> diff -up crash-5.0.1/kernel.c.old crash-5.0.1/kernel.c
> --- crash-5.0.1/kernel.c.old 2010-03-08 14:23:45.000000000 -0600
> +++ crash-5.0.1/kernel.c 2010-03-08 15:09:05.000000000 -0600
> @@ -3871,7 +3871,8 @@ display_sys_stats(void)
> }
>
>
> - fprintf(fp, " CPUS: %d
", kt->cpus);
> + fprintf(fp, " CPUS: %d
",
> + machine_type("PPC64") ? get_cpus_to_display() : kt->cpus);
> if (ACTIVE())
> get_symbol_data("xtime", sizeof(struct timespec),
> &kt->date);
> fprintf(fp, " DATE: %s
",
> @@ -6256,6 +6257,18 @@ get_cpus_possible()
> return possible;
> }
>
> +
> +/*
> + * For when displaying cpus, return the number of cpus online if
> possible, otherwise kt->cpus.
> + */
> +int
> +get_cpus_to_display(void)
> +{
> + int online = get_cpus_online();
> +
> + return (online ? online : kt->cpus);
> +}
> +
> /*
> * Xen machine-address to pseudo-physical-page translator.
> */
> diff -up crash-5.0.1/ppc64.c.old crash-5.0.1/ppc64.c
> --- crash-5.0.1/ppc64.c.old 2010-03-08 14:24:07.000000000 -0600
> +++ crash-5.0.1/ppc64.c 2010-03-08 14:38:10.000000000 -0600
> @@ -2215,7 +2215,7 @@ ppc64_display_machine_stats(void)
>
> fprintf(fp, " MACHINE TYPE: %s
", uts->machine);
> fprintf(fp, " MEMORY SIZE: %s
",
> get_memory_size(buf));
> - fprintf(fp, " CPUS: %d
", kt->cpus);
> + fprintf(fp, " CPUS: %d
",
> get_cpus_to_display());
> fprintf(fp, " PROCESSOR SPEED: ");
> if ((mhz = machdep->processor_speed()))
> fprintf(fp, "%ld Mhz
", mhz);
>
>
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 

Thread Tools




All times are GMT. The time now is 01:23 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org