FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Crash Utility

 
 
LinkBack Thread Tools
 
Old 11-11-2009, 01:52 PM
Dave Anderson
 
Default invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"

----- "Bob Montgomery" <bob.montgomery@hp.com> wrote:

> I have a dump from a 2.6.31-based x86_64 system where the number of
> "possible" cpus equals the system's NR_CPUS (32).
> On that system, the __per_cpu_offset table in the kernel consists of 32
> valid offset pointers.
>
> When crash loads this table into its __per_cpu_offset[NR_CPUS=4096]
> array in struct kernel_table, it knows the length of the kernel's array
> (32*sizeof(long)), and copies the 32 pointers, leaving the rest of its
> (much longer) array full of 0x0s.
>
> (This happens in kernel.c)
>
> 193 if (symbol_exists("__per_cpu_offset")) {
> 194 if (LKCD_KERNTYPES())
> 195 i = get_cpus_possible();
> 196 else
> 197 i = get_array_length("__per_cpu_offset", NULL, 0);
> 198 get_symbol_data("__per_cpu_offset",
> 199 sizeof(long)*((i && (i <= NR_CPUS)) ? i : NR_CPUS),
> 200 &kt->__per_cpu_offset[0]);
> 201 kt->flags |= PER_CPU_OFF;
> 202 }
>
> Later, in a couple of places, crash checks for the maximum valid
> __per_cpu_offset by reading the cpu_number value out of each per_cpu
> area and comparing it to the expected number until the comparison fails.
> (Remember NR_CPUS in crash is much larger then the kernel's NR_CPUS, and
> that's OK).
>
> >From x86_64.c:
>
> 4201 for (i = cpus = 0; i < NR_CPUS; i++) {
> 4202 readmem(symbol_value("per_cpu__cpu_number") +
> 4203 kt->__per_cpu_offset[i], KVADDR,
> 4204 &cpunumber, sizeof(int),
> 4205 "cpu number (per_cpu)", FAULT_ON_ERROR);
> 4206 if (cpunumber != cpus)
> 4207 break;
> 4208 cpus++;
> 4209 }
>
> This works well when the kernel's array has fewer real per_cpu_offsets
> than its own NR_CPUS, since the kernel preloads its array with a pointer
> (BOOT_PERCPU_OFFSET) and when this loop runs past the real
> per_cpu_offset pointers and tries to use the BOOT_PERCPU_OFFSET, it
> reads a bogus value for cpunumber and terminates.
>
> But when the kernel's table is full of valid per_cpu_offset pointers,
> this loop continues off the end of that into the part of crash's
> __per_cpu_offset array that has the 0x0 initial values, and dies with:
>
> crash: invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"
>
> The cc08 comes from the symbol_value of per_cpu__cpu_number:
> 000000000000cc08 D per_cpu__cpu_number
>
> Bottom line: Crash is assuming an insufficient array termination for
> the kernel's __per_cpu_offset array (a pointer that points to an invalid
> cpu_number).
>
> The included patch adds an additional loop termination so that crash
> doesn't run off the end of what it loaded from the dump. It just checks
> for a NULL 0x0 value in kt->__per_cpu_offset[i].
>
> Bob Montgomery,
> Working at HP

I have a similar-but-different fix queued for this, but instead of
checking for a NULL kt->__per_cpu_offset[i] entry, it changes the
readmem() call to RETURN_ON_ERROR|QUIET instead of FAULT_ON_ERROR
like this:

if (!readmem(symbol_value("per_cpu__cpu_number") +
kt->__per_cpu_offset[i],
KVADDR, &cpunumber, sizeof(int),
"cpu number (per_cpu)", QUIET|RETURN_ON_ERROR))
break;

That should prevent the failure you're seeing.

But another question is in the (extremely) rare circumstance of a
non-CONFIG_SMP kernel. In that case, the kt->__per_cpu_offset[] array
would be all NULL, and the symbol_value("per_cpu__cpu_number")
call would return the qualified unity-mapped address. So the
virtual address calculation should work in x86_64_per_cpu_init(),
and the loop wouldn't even be entered in x86_64_get_smp_cpus()

That being said, I don't think I've seen a recent x86_64 kernel
that was not compiled CONFIG_SMP, so I can't confirm that it's
ever been tested.

So for sanity's sake, maybe your patch should also be applied,
but should also check if the "i" index is non-zero?

Thanks,
Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 11-11-2009, 05:54 PM
Dave Anderson
 
Default invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"

----- "Bob Montgomery" <bob.montgomery@hp.com> wrote:

> On Wed, 2009-11-11 at 14:52 +0000, Dave Anderson wrote:
> > ----- "Bob Montgomery" <bob.montgomery@hp.com> wrote:
> >
> > > I have a dump from a 2.6.31-based x86_64 system where the number of
> > > "possible" cpus equals the system's NR_CPUS (32).
> > > On that system, the __per_cpu_offset table in the kernel consists of 32
> > > valid offset pointers.
>
> > I have a similar-but-different fix queued for this, but instead of
> > checking for a NULL kt->__per_cpu_offset[i] entry, it changes the
> > readmem() call to RETURN_ON_ERROR|QUIET instead of FAULT_ON_ERROR
> > like this:
> >
> > if (!readmem(symbol_value("per_cpu__cpu_number") +
> > kt->__per_cpu_offset[i],
> > KVADDR, &cpunumber, sizeof(int),
> > "cpu number (per_cpu)", QUIET|RETURN_ON_ERROR))
> > break;
>
> > That should prevent the failure you're seeing.
>
> I did that first, and thought it was sort of cheating :-)

Sort of. But at that point in time we're still kind of blindly
wading around in the murk trying to figure out what we're
running on...

>
> > But another question is in the (extremely) rare circumstance of a
> > non-CONFIG_SMP kernel. In that case, the kt->__per_cpu_offset[] array
> > would be all NULL, and the symbol_value("per_cpu__cpu_number")
> > call would return the qualified unity-mapped address. So the
> > virtual address calculation should work in x86_64_per_cpu_init(),
> > and the loop wouldn't even be entered in x86_64_get_smp_cpus()
> >
> > That being said, I don't think I've seen a recent x86_64 kernel
> > that was not compiled CONFIG_SMP, so I can't confirm that it's
> > ever been tested.
> >
> > So for sanity's sake, maybe your patch should also be applied,
> > but should also check if the "i" index is non-zero?
>
> So like this?
> + if (i && (kt->__per_cpu_offset[i] == NULL))
> + break;

Yes.

>
> So it's always ok to try the readmem on the first element of
> the array. And the RETURN_ON_ERROR would deal with something going
> wrong with that, although that case would presumably be a real
> problem with the dump, right? (cpus == 0)

Most likely yes. The motivation for my fix was due to a failure
attempting to readmem() a legitimate virtual address that was an
an excluded page from a makedumpfile-generated dump. If I recall
correctly, it was an in-house kexec-tools bugzilla, but I can't
find it.

Dave



--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 11-12-2009, 12:39 PM
Dave Anderson
 
Default invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"

----- "Bob Montgomery" <bob.montgomery@hp.com> wrote:

> On Wed, 2009-11-11 at 18:54 +0000, Dave Anderson wrote:
>
> > > > But another question is in the (extremely) rare circumstance of
> a
> > > > non-CONFIG_SMP kernel. In that case, the kt->__per_cpu_offset[] array
> > > > would be all NULL, and the symbol_value("per_cpu__cpu_number")
> > > > call would return the qualified unity-mapped address. So the
> > > > virtual address calculation should work in x86_64_per_cpu_init(),
> > > > and the loop wouldn't even be entered in x86_64_get_smp_cpus()
> > > >
> > > > That being said, I don't think I've seen a recent x86_64 kernel
> > > > that was not compiled CONFIG_SMP, so I can't confirm that it's
> > > > ever been tested.
> > > >
> > > > So for sanity's sake, maybe your patch should also be applied,
> > > > but should also check if the "i" index is non-zero?
>
> Now I'm thinking that test won't be needed for the non-CONFIG_SMP
> kernel. If the array is full of 0x0s, the loop will compute the first
> address as (0x0 + symbol_value("per_cpu__cpu_number")) and read a
> cpunumber of 0. Then on the next iteration, it will calculate the very
> same address again, and read the same cpunumber of 0. But now the test
> is against cpus==1, so that test will fail and we'll drop out of the
> loop, right?

Right!

> In the real smp case, we'll still try to read the small offset (cc08)
> like an address, but be spared any embarrassment by the QUIET|
> RETURN_ON_ERROR fix.

Just to be clear, I think that we agree that:

(1) the QUIET|RETURN_ON_ERROR be applied in both functions,
(2) the kt->__per_cpu_offset[] NULL-check should be completely dropped
in x86_64_per_cpu_init(), and
(3) the kt->__per_cpu_offset[] NULL-check should still be applied in
x86_64_get_smp_cpus() since that loop pre-requires that it's SMP.

Dave



--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 11-18-2009, 07:14 PM
Dave Anderson
 
Default invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"

----- "Bob Montgomery" <bob.montgomery@hp.com> wrote:

> On Thu, 2009-11-12 at 13:39 +0000, Dave Anderson wrote:
> > ----- "Bob Montgomery" <bob.montgomery@hp.com> wrote:
>
> >
> > > In the real smp case, we'll still try to read the small offset
> (cc08)
> > > like an address, but be spared any embarrassment by the QUIET|
> > > RETURN_ON_ERROR fix.
> >
> > Just to be clear, I think that we agree that:
> >
> > (1) the QUIET|RETURN_ON_ERROR be applied in both functions,
> > (2) the kt->__per_cpu_offset[] NULL-check should be completely dropped
> > in x86_64_per_cpu_init(), and
> > (3) the kt->__per_cpu_offset[] NULL-check should still be applied in
> > x86_64_get_smp_cpus() since that loop pre-requires that it's SMP.
>
> I think (3) makes it apparent what we're trying to prevent, but even
> without the NULL-check, if we go ahead and access cc08, the QUIET|
> RETURN_ON_ERROR fix alone would save us, I think. Either way my
> problem goes away :-)
>
> Is the next version getting close, or do we need to patch 4.1.0
> internally for a while?

Yeah, I can update to 4.1.1 this week...

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 

Thread Tools




All times are GMT. The time now is 10:16 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org