We have a custom kernel based on 2.6.27.39. This kernel
has 2/2 memory split. Now we have one crash dump that can be
successfully be opened with crash 4.0-8.8 but not with crash 5.0.
This crashdump happens because double free of memory block, so there
might be some memory corruption in cache data area.
Unfortunately I cannot pinpoint the exact version where this
starts to happen because I could not find older crash releases.
> Hello,
>
> We have a custom kernel based on 2.6.27.39. This kernel
> has 2/2 memory split. Now we have one crash dump that can be
> successfully be opened with crash 4.0-8.8 but not with crash 5.0.
> This crashdump happens because double free of memory block, so there
> might be some memory corruption in cache data area.
>
> Unfortunately I cannot pinpoint the exact version where this
> starts to happen because I could not find older crash releases.
>
> Here is some debug info.
>
> The tail of crash -d 10 output
> ...
> NOTE: page_hash_table does not exist in this kernel
> please wait... (gathering kmem slab cache data)<readmem: 8075801c,
> KVADDR,
> "cache_chain", 4, (FOE), ffb944f8>
> addr: 8075801c paddr: 75801c cnt: 4
> GETBUF(128 -> 0)
> FREEBUF(0)
> GETBUF(204 -> 0)
> <readmem: 8067f1c0, KVADDR, "kmem_cache buffer", 204, (FOE), 8520f00>
> addr: 8067f1c0 paddr: 67f1c0 cnt: 204
> GETBUF(128 -> 1)
> FREEBUF(1)
> GETBUF(128 -> 1)
> FREEBUF(1)
>
> kmem_cache_downsize: SIZE(kmem_cache_s): 204 cache_cache.buffer_size: 0
> kmem_cache_downsize: nr_node_ids: 1
> FREEBUF(0)
>
> crash: zero-size memory allocation! (called from 80b7b7b)
> >
> addr2line -e crash 80b7b7b
> /workarea/build/packages/crash/crash-5.0.0-32bit/memory.c:7439
>
> I'm happy to test patches.
Nice bug report!
Here's what's happening:
It's related to this patch that went into 4.1.0:
- Fix for a potential failure to initialize the kmem slab cache
subsystem on 2.6.22 and later CONFIG_SLAB kernels if the dumpfile
has pages excluded by the makedumpfile facility. Without the patch,
the following error message would be displayed during initialization:
"crash: page excluded: kernel virtual address: <address> type:
kmem_cache_s buffer", followed by "crash: unable to initialize kmem
slab cache subsystem".
(anderson@redhat.com)
The patch was put in place due to this definition of the kmem_cache data structure:
struct kmem_cache {
/* 1) per-cpu data, touched during every alloc/free */
struct array_cache *array[NR_CPUS];
/* 2) Cache tunables. Protected by cache_chain_mutex */
unsigned int batchcount;
unsigned int limit;
... [ snip ] ...
* We put nodelists[] at the end of kmem_cache, because we want to size
* this array to nr_node_ids slots instead of MAX_NUMNODES
* (see kmem_cache_init())
* We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
* is statically defined, so we reserve the max number of nodes.
*/
struct kmem_list3 *nodelists[MAX_NUMNODES];
/*
* Do not add fields after nodelists[]
*/
};
where for all kernel instances of the kmem_cache data structure *except* for
the head "cache_cache" kmem_cache structure, every other kmem_cache structure
in the kernel has its nodelists[] array downsized to whatever "nr_node_ids"
is initialized to. The actual size of all of the downsized kmem_cache data
structures can be found in the head "cache_cache.buffer_size" field.
But when the crash utility queries gdb for the size of a kmem_cache
structure it gets the "full" size as declared in the vmlinux debuginfo
data. And so whenever a kmem_cache structure was read by crash, it
was using the "full" size instead of the downsized size. Doing that
type of over-sized read could potentially extend into the next page,
and there was a reported case where doing that happened to extend into
a page that was excluded by makedumpfile. Hence the kmem_cache_downsize()
function added to memory.c.
In vm_init() there was an initial STRUCT_SIZE_INIT(kmem_cache_s, ...)
that set the size to 204 bytes. But then kmem_cache_downsize() was
called to downsize to whatever cache_cache.buffer_size contains:
But your kernel shows cache_cache.buffer_size set to zero -- and the
ASSIGN_SIZE(kmem_cache_s) above dutifully downsized the data structure
size from 204 to zero. Later on, that size was used to allocate a
kmem_cache buffer, which failed when a GETBUF() was called with a zero-size.
I guess a check could be made above for a zero cache_cache.buffer_size,
but why would that ever be?
Try this:
# crash --no_kmem_cache vmlinux vmcore
which will allow you to get past the kmem_cache initialization.
Then enter:
crash> p cache_cache
Does the "buffer_size" member really show zero?
BTW, you can work around the problem by commenting out the call
to kmem_cache_downsize() in vm_init(). (And if you're using
makedumpfile with excluded pages, hope that the problem I described
above doesn't occur...)
Dave
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
01-13-2010, 08:59 AM
crash-5.0: zero-size memory-allocation
> From:
>
> Dave Anderson <anderson@redhat.com>
>
*...
> But your kernel shows cache_cache.buffer_size
set to zero -- and the
> ASSIGN_SIZE(kmem_cache_s) above dutifully downsized the data structure
> size from 204 to zero. *Later on, that size was used to allocate
a
> kmem_cache buffer, which failed when a GETBUF() was called with a
zero-size.
> *
> I guess a check could be made above for a zero cache_cache.buffer_size,
> but why would that ever be?
>
> Try this:
>
> * # crash --no_kmem_cache vmlinux vmcore
>
> which will allow you to get past the kmem_cache initialization.
> > From:
> >
> > Dave Anderson <anderson@redhat.com>
> >
> ...
> > But your kernel shows cache_cache.buffer_size set to zero -- and the
> > ASSIGN_SIZE(kmem_cache_s) above dutifully downsized the data structure
> > size from 204 to zero. Later on, that size was used to allocate a
> > kmem_cache buffer, which failed when a GETBUF() was called with a zero-size.
> >
> > I guess a check could be made above for a zero cache_cache.buffer_size,
> > but why would that ever be?
> >
> > Try this:
> >
> > # crash --no_kmem_cache vmlinux vmcore
> >
> > which will allow you to get past the kmem_cache initialization.
> >
> > Then enter:
> >
> > crash> p cache_cache
> >
> > Does the "buffer_size" member really show zero?
>
> Yes it seems so!
> initialize_task_state: using old defaults
> <readmem: 8067a300, KVADDR, "fill_task_struct", 868, (ROE), 86e3f78>
> addr: 8067a300 paddr: 67a300 cnt: 868
> STATE: TASK_RUNNING (PANIC)
>
> crash> p cache_cache
> cache_cache = GETBUF(128 -> 0)
> <readmem: 8067f1c0, KVADDR, "gdb_readmem_callback", 204, (ROE), 8ac00d8>
> addr: 8067f1c0 paddr: 67f1c0 cnt: 204
> $3 = {
> array = {0x0, 0x8067f1c4, 0x8067f1c4, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0xf7813e00, 0xf7849400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
> batchcount = 0,
> limit = 0,
> shared = 0,
> buffer_size = 0,
> reciprocal_buffer_size = 0,
> flags = 0,
> num = 0,
> gfporder = 0,
> gfpflags = 60,
> colour = 120,
> colour_off = 8,
> slabp_cache = 0x100,
> slab_size = 16777216,
> dflags = 0,
> ctor = 0xf,
> name = 0x0,
> next = {
> next = 0x0,
> prev = 0x2
> },
> nodelists = {0x40}
> }
> FREEBUF(0)
That's some serious corruption!
> >
> > BTW, you can work around the problem by commenting out the call
> > to kmem_cache_downsize() in vm_init().
>
> This workaround works ok.
But even then, if you comment out the call to kmem_cache_downsize(),
the kmem_cache_init() function could not have done anything useful
because the "cache_cache.next.next" pointer is corrupted with a NULL,
which points to the first of the chain of kmem_cache slab cache headers.
I'm surprised it managed to continue without running into another
roadblock -- did it display the "crash: unable to initialize kmem
slab cache subsystem" error message?
> > (And if you're using makedumpfile with excluded pages, hope that
> > the problem I described above doesn't occur...)
> >
> We are not excluding files so this is not a big issue. Also
> the --no_kmem_cache lets me open dump and let me do quite many things
> already.
Like I mentioned before, I could put a check in kmem_cache_downsize()
to check for a zero buffer_size, but the odds of that happening are
absurdly small. I suppose I could check whether the value is less
than the kmem_cache.nodelists structure offset.
Dave
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
01-14-2010, 05:06 AM
crash-5.0: zero-size memory-allocation
> Dave Anderson <anderson@redhat.com>
>
> To:
>
> "Discussion list for crash utility usage, maintenance and
> That would be usefull, just warn that some major corruption seems to have
> happen.It is always good to get atleast some crash info out. For example
> dmesg and bt. I'll gladly test patches, if needed.
Patch attached...
> Also one question. Is there some hidden option that will show all the
> hidden crash command line options, e.g. --no_kmem_cache and alike?
No, for the most part they are there for debugging crash itself,
or were put in place as a result of specific odd-ball vmcores,
or short-time kernels that were missing a key ingredient, etc.
So, for example, with the attached patch, --no_kmem_cache should
not be needed, even with your horrifically corrupted vmcore...
Dave
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
01-15-2010, 08:01 AM
crash-5.0: zero-size memory-allocation
crash-utility-bounces@redhat.com wrote on 14.01.2010
16:18:53:
> From:
>
> Dave Anderson <anderson@redhat.com>
>
> > That would be usefull, just warn that some major corruption seems
to have
> > happen.It is always good to get atleast some crash info out.
For example
> > dmesg and bt. I'll gladly test patches, if needed.
>
> Patch attached...
* * * * This patch works
well. Thank you!
> *
> > Also one question. Is there some hidden option that will show
all the
> > hidden crash command line options, e.g. --no_kmem_cache and alike?
>
> No, for the most part they are there for debugging crash itself,
> or were put in place as a result of specific odd-ball vmcores,
> or short-time kernels that were missing a key ingredient, etc.
>
> So, for example, with the attached patch, --no_kmem_cache should
> not be needed, even with your horrifically corrupted vmcore...
>
* * * *Ok, thanks for explanation.--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility