crash 5.0.3
Copyright (C) 2002-2010 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.0
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc64-unknown-linux-gnu"...
> Hi Everyone,
>
> We are facing a problem while analysing the vmcore on PPC64 systems
> running SLES11 SP1.
>
> ===============
> please wait... (gathering kmem slab cache data)
>
> crash: seek error: kernel virtual address: c0000000af715480 type: "kmem_cache buffer"
>
> crash: unable to initialize kmem slab cache subsystem
>
> please wait... (gathering module symbol data)
> WARNING: cannot access vmalloc'd module memory
>
> please wait... (gathering task table data)
> crash: cannot read pid_hash upid
>
> crash: cannot read pid_hash upid
>
> crash: cannot read pid_hash upid
> =====================
>
> Version: crash-5.0.3
>
> Command used:
> #crash vmlinux-2.6.32.10-0.4.99.25.62005-ppc64.debug
> vmlinux-2.6.32.10-0.4.99.25.62005-ppc64 vmcore
>
> Attaching the output of the above command..
>
>
> Using crash -d8 for above command,
> ========================
> <snip>
> <readmem: c00000000134ffa0, KVADDR, "memory section", 32, (FOE), 122f94b0>
> addr: c00000000134ffa0 paddr: 134ffa0 cnt: 32
> <readmem: c00000000134ffc0, KVADDR, "memory section", 32, (FOE), 122f94b0>
> addr: c00000000134ffc0 paddr: 134ffc0 cnt: 32
> <readmem: c00000000134ffe0, KVADDR, "memory section", 32, (FOE), 122f94b0>
> addr: c00000000134ffe0 paddr: 134ffe0 cnt: 32
> NOTE: page_hash_table does not exist in this kernel
>
> please wait... (gathering kmem slab cache data)
> <readmem: c0000000012fc718, KVADDR, "cache_chain", 8, (FOE), fffeff7f108>
> addr: c0000000012fc718 paddr: 12fc718 cnt: 8
> GETBUF(248 -> 1)
> FREEBUF(1)
> GETBUF(10344 -> 1)
> <readmem: c000000000d8af90, KVADDR, "kmem_cache buffer", 10344, (FOE), 1082f5d8>
> addr: c000000000d8af90 paddr: d8af90 cnt: 10344
> GETBUF(248 -> 2)
> FREEBUF(2)
> FREEBUF(1)
> GETBUF(10344 -> 1)
> <readmem: c0000000af715480, KVADDR, "kmem_cache buffer", 10344, (ROE), 1082f5d8>
> addr: c0000000af715480 paddr: af715480 cnt: 10344
>
> crash: seek error: kernel virtual address: c0000000af715480 type: "kmem_cache buffer"
> FREEBUF(1)
>
> crash: unable to initialize kmem slab cache subsystem
> =================================
>
> NOTE: Crash was able to read a vmcore on the same system that was
> manually generated using: echo c > /proc/sysrq-trigger.
The cause for seek errors depends upon the type
of dumpfile.
You didn't mention which type of dumpfile the vmcore
is, so I'll presume that it's either an ELF-format
kdump or a compressed kdump created by makedumpfile.
If it's an ELF-format kdump, seek errors are returned
by the read_netdump() function in netdump.c. If the
ELF header indicates that the physical address is contained
within one of the PT_LOAD segments, it calculates the
vmcore file offset from that, and simply does this:
if (lseek(nd->ndfd, offset, SEEK_SET) == -1)
return SEEK_ERROR;
But that's highly unlikely to fail, even if the lseek
offset is beyond the end of the file. And if it went
beyond the end of the vmcore file, the subsequent read()
would fail, and return a READ_ERROR instead. Also, if
none of the ELF header PT_LOAD segments contain the requested
physical address, it also would have returned a READ_ERROR.
So presuming that it's a compressed kdump, the seek error
most likely comes from here in read_diskdump() in diskdump.c:
if ((pfn >= dd->header->max_mapnr) || !page_is_ram(pfn))
return SEEK_ERROR;
where the requested physical address pfn values are larger
than the max_mapnr value advertised in the header.
When you do any "crash -d# ...", the dumpfile header will
be dumped first. What does that show?
Dave
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
04-21-2010, 06:09 AM
Pavan Naregundi
crash seek error, failed to read vmcore file
On Tue, 2010-04-20 at 09:14 -0400, Dave Anderson wrote:
> ----- "Pavan Naregundi" <pavan@linux.vnet.ibm.com> wrote:
>
> The cause for seek errors depends upon the type
> of dumpfile.
>
> You didn't mention which type of dumpfile the vmcore
> is, so I'll presume that it's either an ELF-format
> kdump or a compressed kdump created by makedumpfile.
>
> So presuming that it's a compressed kdump, the seek error
> most likely comes from here in read_diskdump() in diskdump.c:
>
> if ((pfn >= dd->header->max_mapnr) || !page_is_ram(pfn))
> return SEEK_ERROR;
>
> where the requested physical address pfn values are larger
> than the max_mapnr value advertised in the header.
>
> When you do any "crash -d# ...", the dumpfile header will
> be dumped first. What does that show?
>
> Dave
Dave,
Dumpfile is compressed kdump created by makedumpfile.
header shows the following values:
max_mapnr: 32768
block_shift: 16
Yes. Adding some debug printf's shows me that (pfn >=
dd->header->max_mapnr) fails.
For example: in the first seek error,
crash: seek error: kernel virtual address: c0000000af715480 type:
"kmem_cache buffer"
paddr: af715480 => pfn=44913
crash -d8 log: http://pastebin.com/qrCvyPfR
Thanks..Pavan
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
> On Tue, 2010-04-20 at 09:14 -0400, Dave Anderson wrote:
> > ----- "Pavan Naregundi" <pavan@linux.vnet.ibm.com> wrote:
> >
> > The cause for seek errors depends upon the type
> > of dumpfile.
> >
> > You didn't mention which type of dumpfile the vmcore
> > is, so I'll presume that it's either an ELF-format
> > kdump or a compressed kdump created by makedumpfile.
> >
> > So presuming that it's a compressed kdump, the seek error
> > most likely comes from here in read_diskdump() in diskdump.c:
> >
> > if ((pfn >= dd->header->max_mapnr) || !page_is_ram(pfn))
> > return SEEK_ERROR;
> >
> > where the requested physical address pfn values are larger
> > than the max_mapnr value advertised in the header.
> >
> > When you do any "crash -d# ...", the dumpfile header will
> > be dumped first. What does that show?
> >
> > Dave
>
>
> Dave,
>
> Dumpfile is compressed kdump created by makedumpfile.
>
> header shows the following values:
> max_mapnr: 32768
> block_shift: 16
>
> Yes. Adding some debug printf's shows me that (pfn >=
> dd->header->max_mapnr) fails.
>
> For example: in the first seek error,
> crash: seek error: kernel virtual address: c0000000af715480 type:
> "kmem_cache buffer"
>
> paddr: af715480 => pfn=44913
>
> crash -d8 log: http://pastebin.com/qrCvyPfR
>
> Thanks..Pavan
OK, so the compressed dumpfile has exactly 32768 pages of physical
memory, or exactly 2GB. That being the case, the crash utility
will fail all readmem attempts above that value, and obviously
there is critical data above the artificial 2GB threshold.
The question at hand is why kdump is creating a truncated dumpfile
with a max_mapnr of 32768:
(1) makedumpfile determines the "max_mapnr" value based upon the
highest physical address found in any of the PT_LOAD segments
of the /proc/vmcore file on the secondary kernel.
(2) the /proc/vmcore PT_LOAD segments were pre-calculated during
the primary kernel's kdump initialization phase, based upon
the values found in the set of "/proc/device-tree/memory@xxx/reg"
files existing in the primary kernel, where the "xxx" is the
starting physical address of the memory region, and the "reg"
file in that directory contains the size of the memory region.
For whatever reason, those files showed a maximum of 2GB of
physical memory. (If you do not use makedumpfile, and then do
a "readelf -a" of the resultant vmcore file, you will see
the PT_LOAD segment values.)
Does the SLES11 vmlinux-2.6.32.10-0.4.99.25.62005-ppc64 kernel
contain this patch?:
author Brian King <brking@linux.vnet.ibm.com>
Mon, 19 Oct 2009 05:51:34 +0000 (05:51 +0000)
committer Benjamin Herrenschmidt <benh@kernel.crashing.org>
Fri, 30 Oct 2009 06:20:56 +0000 (17:20 +1100)
commit 8be8cf5b47f72096e42bf88cc3afff7a942a346c
tree 9adff0fa02123f48fbfa40abb55a5c01be8c2fa4
parent 6cff46f4bc6cc4a8a4154b0b6a2e669db08e8fd2
powerpc: Add kdump support to Collaborative Memory Manager
When running Active Memory Sharing, the Collaborative Memory Manager (CMM)
may mark some pages as "loaned" with the hypervisor. Periodically, the
CMM will query the hypervisor for a loan request, which is a single signed
value. When kexec'ing into a kdump kernel, the CMM driver in the kdump
kernel is not aware of the pages the previous kernel had marked as "loaned",
so the hypervisor and the CMM driver are out of sync. Fix the CMM driver
to handle this scenario by ignoring requests to decrease the number of loaned
pages if we don't think we have any pages loaned. Pages that are marked as
"loaned" which are not in the balloon will automatically get switched to "active"
the next time we touch the page. This also fixes the case where totalram_pages
is smaller than min_mem_mb, which can occur during kdump.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
I ask because we also have an outstanding bugzilla that exhibits similar
behavior, where an abnormally small ppc64 vmcore file gets created
because there was only a single /proc/device-tree/memory@0 directory
file that showed just a small subset of the total physical memory.
Typically there are many of those "memory@xxx" directories, but in
the failing scenario, there was only one /proc/device-tree/memory@0
directory.
Anyway, there's (unproven) speculation that the kernel patch above
is related to the problem.
In any case, unfortunately, there's nothing can be done from the crash
utility's perspective.
Dave
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
04-22-2010, 10:07 AM
Pavan Naregundi
crash seek error, failed to read vmcore file
On Wed, 2010-04-21 at 09:58 -0400, Dave Anderson wrote:
> ----- "Pavan Naregundi" <pavan@linux.vnet.ibm.com> wrote:
>
> > On Tue, 2010-04-20 at 09:14 -0400, Dave Anderson wrote:
> > > ----- "Pavan Naregundi" <pavan@linux.vnet.ibm.com> wrote:
> > >
> > > The cause for seek errors depends upon the type
> > > of dumpfile.
> > >
> > > You didn't mention which type of dumpfile the vmcore
> > > is, so I'll presume that it's either an ELF-format
> > > kdump or a compressed kdump created by makedumpfile.
> > >
> > > So presuming that it's a compressed kdump, the seek error
> > > most likely comes from here in read_diskdump() in diskdump.c:
> > >
> > > if ((pfn >= dd->header->max_mapnr) || !page_is_ram(pfn))
> > > return SEEK_ERROR;
> > >
> > > where the requested physical address pfn values are larger
> > > than the max_mapnr value advertised in the header.
> > >
> > > When you do any "crash -d# ...", the dumpfile header will
> > > be dumped first. What does that show?
> > >
> > > Dave
> >
> >
> > Dave,
> >
> > Dumpfile is compressed kdump created by makedumpfile.
> >
> > header shows the following values:
> > max_mapnr: 32768
> > block_shift: 16
> >
> > Yes. Adding some debug printf's shows me that (pfn >=
> > dd->header->max_mapnr) fails.
> >
> > For example: in the first seek error,
> > crash: seek error: kernel virtual address: c0000000af715480 type:
> > "kmem_cache buffer"
> >
> > paddr: af715480 => pfn=44913
> >
> > crash -d8 log: http://pastebin.com/qrCvyPfR
> >
> > Thanks..Pavan
>
> OK, so the compressed dumpfile has exactly 32768 pages of physical
> memory, or exactly 2GB. That being the case, the crash utility
> will fail all readmem attempts above that value, and obviously
> there is critical data above the artificial 2GB threshold.
>
> The question at hand is why kdump is creating a truncated dumpfile
> with a max_mapnr of 32768:
>
> (1) makedumpfile determines the "max_mapnr" value based upon the
> highest physical address found in any of the PT_LOAD segments
> of the /proc/vmcore file on the secondary kernel.
> (2) the /proc/vmcore PT_LOAD segments were pre-calculated during
> the primary kernel's kdump initialization phase, based upon
> the values found in the set of "/proc/device-tree/memory@xxx/reg"
> files existing in the primary kernel, where the "xxx" is the
> starting physical address of the memory region, and the "reg"
> file in that directory contains the size of the memory region.
>
> For whatever reason, those files showed a maximum of 2GB of
> physical memory. (If you do not use makedumpfile, and then do
> a "readelf -a" of the resultant vmcore file, you will see
> the PT_LOAD segment values.)
>
> Does the SLES11 vmlinux-2.6.32.10-0.4.99.25.62005-ppc64 kernel
> contain this patch?:
>
> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=8be8cf5b47f72096e42bf88cc3a fff7a942a346c
>
> I ask because we also have an outstanding bugzilla that exhibits similar
> behavior, where an abnormally small ppc64 vmcore file gets created
> because there was only a single /proc/device-tree/memory@0 directory
> file that showed just a small subset of the total physical memory.
> Typically there are many of those "memory@xxx" directories, but in
> the failing scenario, there was only one /proc/device-tree/memory@0
> directory.
>
> Anyway, there's (unproven) speculation that the kernel patch above
> is related to the problem.
>
> In any case, unfortunately, there's nothing can be done from the crash
> utility's perspective.
>
> Dave
Thank you Dave.
Our SLES11 does not have the above patch you mentioned, but at the same
time system is not AMS enabled and CONFIG_CMM is also not set in the
config file..
This system also has /proc/device-tree/memory@0 dir only..
Regards..Pavan
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
> >
> > In any case, unfortunately, there's nothing can be done from the crash
> > utility's perspective.
> >
> > Dave
>
> Thank you Dave.
>
> Our SLES11 does not have the above patch you mentioned, but at the same
> time system is not AMS enabled and CONFIG_CMM is also not set in the config file..
>
> This system also has /proc/device-tree/memory@0 dir only..
I don't have access to the original "problem" ppc64 machine,
but here I'm logged into another ppc64, where the memory
advertised in /proc/device-tree is as expected. It has
these file memory@xxx/reg files, showing a set of contiguous
memory chunks. The first one is 128MB, followed by a series
of 16MB chunks:
ZONE NAME SIZE MEM_MAP START_PADDR START_MAPNR
0 DMA 31744 f000000000000000 0 0
1 Normal 0 0 0 0
2 Movable 0 0 0 0
...
So everything looks fine.
But if your system has just a single /proc/device-tree/memory@0
directory whose size doesn't match up with what the live kernel is
using, then that's the kernel bug.
> In any case, unfortunately, there's nothing can be done from the crash
> utility's perspective.
BTW, you can get minimal data from your truncated vmcore using
the --minimal switch that IBM contributed a while back:
# crash --minimal vmcore vmlinux
It at least offers the log, dis, rd, sym and eval commands, which may
or may not help. It's actually come in quite handy a few times.
Anyway, if you guys come up with a kernel fix, can you post it here
as well?
Thanks,
Dave
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
04-23-2010, 06:43 AM
Pavan Naregundi
crash seek error, failed to read vmcore file
On Thu, 2010-04-22 at 09:31 -0400, Dave Anderson wrote:
>
> Anyway, if you guys come up with a kernel fix, can you post it here
> as well?
>
> Thanks,
> Dave
Sure. I will do that..
Thanks..Pavan
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility