invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"
----- "Bob Montgomery" <bob.montgomery@hp.com> wrote:
> I have a dump from a 2.6.31-based x86_64 system where the number of > "possible" cpus equals the system's NR_CPUS (32). > On that system, the __per_cpu_offset table in the kernel consists of 32 > valid offset pointers. > > When crash loads this table into its __per_cpu_offset[NR_CPUS=4096] > array in struct kernel_table, it knows the length of the kernel's array > (32*sizeof(long)), and copies the 32 pointers, leaving the rest of its > (much longer) array full of 0x0s. > > (This happens in kernel.c) > > 193 if (symbol_exists("__per_cpu_offset")) { > 194 if (LKCD_KERNTYPES()) > 195 i = get_cpus_possible(); > 196 else > 197 i = get_array_length("__per_cpu_offset", NULL, 0); > 198 get_symbol_data("__per_cpu_offset", > 199 sizeof(long)*((i && (i <= NR_CPUS)) ? i : NR_CPUS), > 200 &kt->__per_cpu_offset[0]); > 201 kt->flags |= PER_CPU_OFF; > 202 } > > Later, in a couple of places, crash checks for the maximum valid > __per_cpu_offset by reading the cpu_number value out of each per_cpu > area and comparing it to the expected number until the comparison fails. > (Remember NR_CPUS in crash is much larger then the kernel's NR_CPUS, and > that's OK). > > >From x86_64.c: > > 4201 for (i = cpus = 0; i < NR_CPUS; i++) { > 4202 readmem(symbol_value("per_cpu__cpu_number") + > 4203 kt->__per_cpu_offset[i], KVADDR, > 4204 &cpunumber, sizeof(int), > 4205 "cpu number (per_cpu)", FAULT_ON_ERROR); > 4206 if (cpunumber != cpus) > 4207 break; > 4208 cpus++; > 4209 } > > This works well when the kernel's array has fewer real per_cpu_offsets > than its own NR_CPUS, since the kernel preloads its array with a pointer > (BOOT_PERCPU_OFFSET) and when this loop runs past the real > per_cpu_offset pointers and tries to use the BOOT_PERCPU_OFFSET, it > reads a bogus value for cpunumber and terminates. > > But when the kernel's table is full of valid per_cpu_offset pointers, > this loop continues off the end of that into the part of crash's > __per_cpu_offset array that has the 0x0 initial values, and dies with: > > crash: invalid kernel virtual address: cc08 type: "cpu number (per_cpu)" > > The cc08 comes from the symbol_value of per_cpu__cpu_number: > 000000000000cc08 D per_cpu__cpu_number > > Bottom line: Crash is assuming an insufficient array termination for > the kernel's __per_cpu_offset array (a pointer that points to an invalid > cpu_number). > > The included patch adds an additional loop termination so that crash > doesn't run off the end of what it loaded from the dump. It just checks > for a NULL 0x0 value in kt->__per_cpu_offset[i]. > > Bob Montgomery, > Working at HP I have a similar-but-different fix queued for this, but instead of checking for a NULL kt->__per_cpu_offset[i] entry, it changes the readmem() call to RETURN_ON_ERROR|QUIET instead of FAULT_ON_ERROR like this: if (!readmem(symbol_value("per_cpu__cpu_number") + kt->__per_cpu_offset[i], KVADDR, &cpunumber, sizeof(int), "cpu number (per_cpu)", QUIET|RETURN_ON_ERROR)) break; That should prevent the failure you're seeing. But another question is in the (extremely) rare circumstance of a non-CONFIG_SMP kernel. In that case, the kt->__per_cpu_offset[] array would be all NULL, and the symbol_value("per_cpu__cpu_number") call would return the qualified unity-mapped address. So the virtual address calculation should work in x86_64_per_cpu_init(), and the loop wouldn't even be entered in x86_64_get_smp_cpus() That being said, I don't think I've seen a recent x86_64 kernel that was not compiled CONFIG_SMP, so I can't confirm that it's ever been tested. So for sanity's sake, maybe your patch should also be applied, but should also check if the "i" index is non-zero? Thanks, Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"
----- "Bob Montgomery" <bob.montgomery@hp.com> wrote:
> On Wed, 2009-11-11 at 14:52 +0000, Dave Anderson wrote: > > ----- "Bob Montgomery" <bob.montgomery@hp.com> wrote: > > > > > I have a dump from a 2.6.31-based x86_64 system where the number of > > > "possible" cpus equals the system's NR_CPUS (32). > > > On that system, the __per_cpu_offset table in the kernel consists of 32 > > > valid offset pointers. > > > I have a similar-but-different fix queued for this, but instead of > > checking for a NULL kt->__per_cpu_offset[i] entry, it changes the > > readmem() call to RETURN_ON_ERROR|QUIET instead of FAULT_ON_ERROR > > like this: > > > > if (!readmem(symbol_value("per_cpu__cpu_number") + > > kt->__per_cpu_offset[i], > > KVADDR, &cpunumber, sizeof(int), > > "cpu number (per_cpu)", QUIET|RETURN_ON_ERROR)) > > break; > > > That should prevent the failure you're seeing. > > I did that first, and thought it was sort of cheating :-) Sort of. But at that point in time we're still kind of blindly wading around in the murk trying to figure out what we're running on... > > > But another question is in the (extremely) rare circumstance of a > > non-CONFIG_SMP kernel. In that case, the kt->__per_cpu_offset[] array > > would be all NULL, and the symbol_value("per_cpu__cpu_number") > > call would return the qualified unity-mapped address. So the > > virtual address calculation should work in x86_64_per_cpu_init(), > > and the loop wouldn't even be entered in x86_64_get_smp_cpus() > > > > That being said, I don't think I've seen a recent x86_64 kernel > > that was not compiled CONFIG_SMP, so I can't confirm that it's > > ever been tested. > > > > So for sanity's sake, maybe your patch should also be applied, > > but should also check if the "i" index is non-zero? > > So like this? > + if (i && (kt->__per_cpu_offset[i] == NULL)) > + break; Yes. > > So it's always ok to try the readmem on the first element of > the array. And the RETURN_ON_ERROR would deal with something going > wrong with that, although that case would presumably be a real > problem with the dump, right? (cpus == 0) Most likely yes. The motivation for my fix was due to a failure attempting to readmem() a legitimate virtual address that was an an excluded page from a makedumpfile-generated dump. If I recall correctly, it was an in-house kexec-tools bugzilla, but I can't find it. Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"
----- "Bob Montgomery" <bob.montgomery@hp.com> wrote:
> On Wed, 2009-11-11 at 18:54 +0000, Dave Anderson wrote: > > > > > But another question is in the (extremely) rare circumstance of > a > > > > non-CONFIG_SMP kernel. In that case, the kt->__per_cpu_offset[] array > > > > would be all NULL, and the symbol_value("per_cpu__cpu_number") > > > > call would return the qualified unity-mapped address. So the > > > > virtual address calculation should work in x86_64_per_cpu_init(), > > > > and the loop wouldn't even be entered in x86_64_get_smp_cpus() > > > > > > > > That being said, I don't think I've seen a recent x86_64 kernel > > > > that was not compiled CONFIG_SMP, so I can't confirm that it's > > > > ever been tested. > > > > > > > > So for sanity's sake, maybe your patch should also be applied, > > > > but should also check if the "i" index is non-zero? > > Now I'm thinking that test won't be needed for the non-CONFIG_SMP > kernel. If the array is full of 0x0s, the loop will compute the first > address as (0x0 + symbol_value("per_cpu__cpu_number")) and read a > cpunumber of 0. Then on the next iteration, it will calculate the very > same address again, and read the same cpunumber of 0. But now the test > is against cpus==1, so that test will fail and we'll drop out of the > loop, right? Right! > In the real smp case, we'll still try to read the small offset (cc08) > like an address, but be spared any embarrassment by the QUIET| > RETURN_ON_ERROR fix. Just to be clear, I think that we agree that: (1) the QUIET|RETURN_ON_ERROR be applied in both functions, (2) the kt->__per_cpu_offset[] NULL-check should be completely dropped in x86_64_per_cpu_init(), and (3) the kt->__per_cpu_offset[] NULL-check should still be applied in x86_64_get_smp_cpus() since that loop pre-requires that it's SMP. Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
invalid kernel virtual address: cc08 type: "cpu number (per_cpu)"
----- "Bob Montgomery" <bob.montgomery@hp.com> wrote:
> On Thu, 2009-11-12 at 13:39 +0000, Dave Anderson wrote: > > ----- "Bob Montgomery" <bob.montgomery@hp.com> wrote: > > > > > > In the real smp case, we'll still try to read the small offset > (cc08) > > > like an address, but be spared any embarrassment by the QUIET| > > > RETURN_ON_ERROR fix. > > > > Just to be clear, I think that we agree that: > > > > (1) the QUIET|RETURN_ON_ERROR be applied in both functions, > > (2) the kt->__per_cpu_offset[] NULL-check should be completely dropped > > in x86_64_per_cpu_init(), and > > (3) the kt->__per_cpu_offset[] NULL-check should still be applied in > > x86_64_get_smp_cpus() since that loop pre-requires that it's SMP. > > I think (3) makes it apparent what we're trying to prevent, but even > without the NULL-check, if we go ahead and access cc08, the QUIET| > RETURN_ON_ERROR fix alone would save us, I think. Either way my > problem goes away :-) > > Is the next version getting close, or do we need to patch 4.1.0 > internally for a while? Yeah, I can update to 4.1.1 this week... Dave -- Crash-utility mailing list Crash-utility@redhat.com https://www.redhat.com/mailman/listinfo/crash-utility |
| All times are GMT. The time now is 01:09 PM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.