FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Redhat > Crash Utility

 
 
LinkBack Thread Tools
 
Old 01-12-2010, 09:57 AM
 
Default crash-5.0: zero-size memory-allocation

Hello,

We have a custom kernel based on 2.6.27.39. This kernel
has 2/2 memory split. Now we have one crash dump that can be
successfully be opened with crash 4.0-8.8 but not with crash 5.0.
This crashdump happens because double free of memory block, so there
might be some memory corruption in cache data area.

Unfortunately I cannot pinpoint the exact version where this
starts to happen because I could not find older crash releases.

Here is some debug info.

The tail of crash -d 10 output
...
NOTE: page_hash_table does not exist in this kernel
please wait... (gathering kmem slab cache data)<readmem: 8075801c, KVADDR,
"cache_chain", 4, (FOE), ffb944f8>
addr: 8075801c paddr: 75801c cnt: 4
GETBUF(128 -> 0)
FREEBUF(0)
GETBUF(204 -> 0)
<readmem: 8067f1c0, KVADDR, "kmem_cache buffer", 204, (FOE), 8520f00>
addr: 8067f1c0 paddr: 67f1c0 cnt: 204
GETBUF(128 -> 1)
FREEBUF(1)
GETBUF(128 -> 1)
FREEBUF(1)

kmem_cache_downsize: SIZE(kmem_cache_s): 204 cache_cache.buffer_size: 0
kmem_cache_downsize: nr_node_ids: 1
FREEBUF(0)

crash: zero-size memory allocation! (called from 80b7b7b)
>
addr2line -e crash 80b7b7b
/workarea/build/packages/crash/crash-5.0.0-32bit/memory.c:7439

I'm happy to test patches.


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 01-12-2010, 01:54 PM
Dave Anderson
 
Default crash-5.0: zero-size memory-allocation

----- "ville mattila" <ville.mattila@stonesoft.com> wrote:

> Hello,
>
> We have a custom kernel based on 2.6.27.39. This kernel
> has 2/2 memory split. Now we have one crash dump that can be
> successfully be opened with crash 4.0-8.8 but not with crash 5.0.
> This crashdump happens because double free of memory block, so there
> might be some memory corruption in cache data area.
>
> Unfortunately I cannot pinpoint the exact version where this
> starts to happen because I could not find older crash releases.
>
> Here is some debug info.
>
> The tail of crash -d 10 output
> ...
> NOTE: page_hash_table does not exist in this kernel
> please wait... (gathering kmem slab cache data)<readmem: 8075801c,
> KVADDR,
> "cache_chain", 4, (FOE), ffb944f8>
> addr: 8075801c paddr: 75801c cnt: 4
> GETBUF(128 -> 0)
> FREEBUF(0)
> GETBUF(204 -> 0)
> <readmem: 8067f1c0, KVADDR, "kmem_cache buffer", 204, (FOE), 8520f00>
> addr: 8067f1c0 paddr: 67f1c0 cnt: 204
> GETBUF(128 -> 1)
> FREEBUF(1)
> GETBUF(128 -> 1)
> FREEBUF(1)
>
> kmem_cache_downsize: SIZE(kmem_cache_s): 204 cache_cache.buffer_size: 0
> kmem_cache_downsize: nr_node_ids: 1
> FREEBUF(0)
>
> crash: zero-size memory allocation! (called from 80b7b7b)
> >
> addr2line -e crash 80b7b7b
> /workarea/build/packages/crash/crash-5.0.0-32bit/memory.c:7439
>
> I'm happy to test patches.

Nice bug report!

Here's what's happening:

It's related to this patch that went into 4.1.0:

- Fix for a potential failure to initialize the kmem slab cache
subsystem on 2.6.22 and later CONFIG_SLAB kernels if the dumpfile
has pages excluded by the makedumpfile facility. Without the patch,
the following error message would be displayed during initialization:
"crash: page excluded: kernel virtual address: <address> type:
kmem_cache_s buffer", followed by "crash: unable to initialize kmem
slab cache subsystem".
(anderson@redhat.com)

The patch was put in place due to this definition of the kmem_cache data structure:

struct kmem_cache {
/* 1) per-cpu data, touched during every alloc/free */
struct array_cache *array[NR_CPUS];
/* 2) Cache tunables. Protected by cache_chain_mutex */
unsigned int batchcount;
unsigned int limit;

... [ snip ] ...

* We put nodelists[] at the end of kmem_cache, because we want to size
* this array to nr_node_ids slots instead of MAX_NUMNODES
* (see kmem_cache_init())
* We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
* is statically defined, so we reserve the max number of nodes.
*/
struct kmem_list3 *nodelists[MAX_NUMNODES];
/*
* Do not add fields after nodelists[]
*/
};

where for all kernel instances of the kmem_cache data structure *except* for
the head "cache_cache" kmem_cache structure, every other kmem_cache structure
in the kernel has its nodelists[] array downsized to whatever "nr_node_ids"
is initialized to. The actual size of all of the downsized kmem_cache data
structures can be found in the head "cache_cache.buffer_size" field.

But when the crash utility queries gdb for the size of a kmem_cache
structure it gets the "full" size as declared in the vmlinux debuginfo
data. And so whenever a kmem_cache structure was read by crash, it
was using the "full" size instead of the downsized size. Doing that
type of over-sized read could potentially extend into the next page,
and there was a reported case where doing that happened to extend into
a page that was excluded by makedumpfile. Hence the kmem_cache_downsize()
function added to memory.c.

Anyway, given that your debug output shows:

kmem_cache_downsize: SIZE(kmem_cache_s): 204 cache_cache.buffer_size: 0
kmem_cache_downsize: nr_node_ids: 1

In vm_init() there was an initial STRUCT_SIZE_INIT(kmem_cache_s, ...)
that set the size to 204 bytes. But then kmem_cache_downsize() was
called to downsize to whatever cache_cache.buffer_size contains:

...

buffer_size = UINT(cache_buf +
MEMBER_OFFSET("kmem_cache", "buffer_size"));

if (buffer_size < SIZE(kmem_cache_s)) {
ASSIGN_SIZE(kmem_cache_s) = buffer_size;

if (kernel_symbol_exists("nr_node_ids")) {
get_symbol_data("nr_node_ids", sizeof(int),
&nr_node_ids);
vt->kmem_cache_len_nodes = nr_node_ids;

} else
vt->kmem_cache_len_nodes = 1;

if (CRASHDEBUG(1)) {
fprintf(fp,
"
kmem_cache_downsize: SIZE(kmem_cache_s): %ld "
"cache_cache.buffer_size: %d
",
STRUCT_SIZE("kmem_cache"), buffer_size);
fprintf(fp,
"kmem_cache_downsize: nr_node_ids: %ld
",
vt->kmem_cache_len_nodes);
}
}

But your kernel shows cache_cache.buffer_size set to zero -- and the
ASSIGN_SIZE(kmem_cache_s) above dutifully downsized the data structure
size from 204 to zero. Later on, that size was used to allocate a
kmem_cache buffer, which failed when a GETBUF() was called with a zero-size.

I guess a check could be made above for a zero cache_cache.buffer_size,
but why would that ever be?

Try this:

# crash --no_kmem_cache vmlinux vmcore

which will allow you to get past the kmem_cache initialization.

Then enter:

crash> p cache_cache

Does the "buffer_size" member really show zero?

BTW, you can work around the problem by commenting out the call
to kmem_cache_downsize() in vm_init(). (And if you're using
makedumpfile with excluded pages, hope that the problem I described
above doesn't occur...)

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 01-13-2010, 08:59 AM
 
Default crash-5.0: zero-size memory-allocation

> From:

>

> Dave Anderson <anderson@redhat.com>

>

*...

> But your kernel shows cache_cache.buffer_size
set to zero -- and the

> ASSIGN_SIZE(kmem_cache_s) above dutifully downsized the data structure


> size from 204 to zero. *Later on, that size was used to allocate
a

> kmem_cache buffer, which failed when a GETBUF() was called with a
zero-size.

> *

> I guess a check could be made above for a zero cache_cache.buffer_size,

> but why would that ever be?

>

> Try this:

>

> * # crash --no_kmem_cache vmlinux vmcore

>

> which will allow you to get past the kmem_cache initialization.

>

> Then enter:

>

> * crash> p cache_cache

>

> Does the "buffer_size" member really show zero?



* Yes it seems so!

initialize_task_state: using old defaults

<readmem: 8067a300, KVADDR, "fill_task_struct",
868, (ROE), 86e3f78>

* * addr: 8067a300 *paddr: 67a300 *cnt:
868

* STATE: TASK_RUNNING (PANIC)



crash> p cache_cache

cache_cache = GETBUF(128 -> 0)

<readmem: 8067f1c0, KVADDR, "gdb_readmem_callback",
204, (ROE), 8ac00d8>

* * addr: 8067f1c0 *paddr: 67f1c0 *cnt:
204

$3 = {

* array = {0x0, 0x8067f1c4, 0x8067f1c4, 0x0,
0x0, 0x0, 0x0, 0x0, 0xf7813e00, 0xf7849400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0},

* batchcount = 0,

* limit = 0,

* shared = 0,

* buffer_size = 0,

* reciprocal_buffer_size = 0,

* flags = 0,

* num = 0,

* gfporder = 0,

* gfpflags = 60,

* colour = 120,

* colour_off = 8,

* slabp_cache = 0x100,

* slab_size = 16777216,

* dflags = 0,

* ctor = 0xf,

* name = 0x0,

* next = {

* * next = 0x0,

* * prev = 0x2

* },

* nodelists = {0x40}

}

FREEBUF(0)



>

> BTW, you can work around the problem by commenting out the call

> to kmem_cache_downsize() in vm_init(). *



* This workaround works ok. *



(And if you're using

> makedumpfile with excluded pages, hope that the problem I described

> above doesn't occur...)

>

* We are not excluding files so this is not a
big issue. Also

* the --no_kmem_cache lets me open dump and let
me do quite many things already.

*



*- Ville--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 01-13-2010, 01:09 PM
Dave Anderson
 
Default crash-5.0: zero-size memory-allocation

----- "ville mattila" <ville.mattila@stonesoft.com> wrote:

> > From:
> >
> > Dave Anderson <anderson@redhat.com>
> >
> ...
> > But your kernel shows cache_cache.buffer_size set to zero -- and the
> > ASSIGN_SIZE(kmem_cache_s) above dutifully downsized the data structure
> > size from 204 to zero. Later on, that size was used to allocate a
> > kmem_cache buffer, which failed when a GETBUF() was called with a zero-size.
> >
> > I guess a check could be made above for a zero cache_cache.buffer_size,
> > but why would that ever be?
> >
> > Try this:
> >
> > # crash --no_kmem_cache vmlinux vmcore
> >
> > which will allow you to get past the kmem_cache initialization.
> >
> > Then enter:
> >
> > crash> p cache_cache
> >
> > Does the "buffer_size" member really show zero?
>
> Yes it seems so!
> initialize_task_state: using old defaults
> <readmem: 8067a300, KVADDR, "fill_task_struct", 868, (ROE), 86e3f78>
> addr: 8067a300 paddr: 67a300 cnt: 868
> STATE: TASK_RUNNING (PANIC)
>
> crash> p cache_cache
> cache_cache = GETBUF(128 -> 0)
> <readmem: 8067f1c0, KVADDR, "gdb_readmem_callback", 204, (ROE), 8ac00d8>
> addr: 8067f1c0 paddr: 67f1c0 cnt: 204
> $3 = {
> array = {0x0, 0x8067f1c4, 0x8067f1c4, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0xf7813e00, 0xf7849400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
> 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},
> batchcount = 0,
> limit = 0,
> shared = 0,
> buffer_size = 0,
> reciprocal_buffer_size = 0,
> flags = 0,
> num = 0,
> gfporder = 0,
> gfpflags = 60,
> colour = 120,
> colour_off = 8,
> slabp_cache = 0x100,
> slab_size = 16777216,
> dflags = 0,
> ctor = 0xf,
> name = 0x0,
> next = {
> next = 0x0,
> prev = 0x2
> },
> nodelists = {0x40}
> }
> FREEBUF(0)

That's some serious corruption!

> >
> > BTW, you can work around the problem by commenting out the call
> > to kmem_cache_downsize() in vm_init().
>
> This workaround works ok.

But even then, if you comment out the call to kmem_cache_downsize(),
the kmem_cache_init() function could not have done anything useful
because the "cache_cache.next.next" pointer is corrupted with a NULL,
which points to the first of the chain of kmem_cache slab cache headers.
I'm surprised it managed to continue without running into another
roadblock -- did it display the "crash: unable to initialize kmem
slab cache subsystem" error message?

> > (And if you're using makedumpfile with excluded pages, hope that
> > the problem I described above doesn't occur...)
> >
> We are not excluding files so this is not a big issue. Also
> the --no_kmem_cache lets me open dump and let me do quite many things
> already.

Like I mentioned before, I could put a check in kmem_cache_downsize()
to check for a zero buffer_size, but the odds of that happening are
absurdly small. I suppose I could check whether the value is less
than the kmem_cache.nodelists structure offset.

Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 01-14-2010, 05:06 AM
 
Default crash-5.0: zero-size memory-allocation

> Dave Anderson <anderson@redhat.com>

>

> To:

>

> "Discussion list for crash utility usage, maintenance and

> development" <crash-utility@redhat.com>

>

> Date:

>

> 13.01.2010 16:14

>

> Subject:

>

> Re: [Crash-utility] crash-5.0: zero-size memory-allocation

>

> Sent by:

>

> crash-utility-bounces@redhat.com

>

>

> ----- "ville mattila" <ville.mattila@stonesoft.com>
wrote:

>

> > > From:

> > >

> > > Dave Anderson <anderson@redhat.com>

> > >

> > ...

> > > But your kernel shows cache_cache.buffer_size set to zero
-- and the

> > > ASSIGN_SIZE(kmem_cache_s) above dutifully downsized the
data structure

> > > size from 204 to zero. Later on, that size was used to allocate
a

> > > kmem_cache buffer, which failed when a GETBUF() was called
with

> a zero-size.

> > >

> > > I guess a check could be made above for a zero cache_cache.buffer_size,

> > > but why would that ever be?

> > >

> > > Try this:

> > >

> > > # crash --no_kmem_cache vmlinux vmcore

> > >

> > > which will allow you to get past the kmem_cache initialization.

> > >

> > > Then enter:

> > >

> > > crash> p cache_cache

> > >

> > > Does the "buffer_size" member really show zero?

> >

> > Yes it seems so!

> > initialize_task_state: using old defaults

> > <readmem: 8067a300, KVADDR, "fill_task_struct",
868, (ROE), 86e3f78>

> > addr: 8067a300 paddr: 67a300 cnt: 868

> > STATE: TASK_RUNNING (PANIC)

> >

> > crash> p cache_cache

> > cache_cache = GETBUF(128 -> 0)

> > <readmem: 8067f1c0, KVADDR, "gdb_readmem_callback",
204, (ROE), 8ac00d8>

> > addr: 8067f1c0 paddr: 67f1c0 cnt: 204

> > $3 = {

> > array = {0x0, 0x8067f1c4, 0x8067f1c4, 0x0, 0x0, 0x0, 0x0, 0x0,

> > 0xf7813e00, 0xf7849400, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
0x0,

> > 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0},

> > batchcount = 0,

> > limit = 0,

> > shared = 0,

> > buffer_size = 0,

> > reciprocal_buffer_size = 0,

> > flags = 0,

> > num = 0,

> > gfporder = 0,

> > gfpflags = 60,

> > colour = 120,

> > colour_off = 8,

> > slabp_cache = 0x100,

> > slab_size = 16777216,

> > dflags = 0,

> > ctor = 0xf,

> > name = 0x0,

> > next = {

> > next = 0x0,

> > prev = 0x2

> > },

> > nodelists = {0x40}

> > }

> > FREEBUF(0)

>

> That's some serious corruption!

> *

* * * *Yes, this double free caused a lot
of head scratching!





> > >

> > > BTW, you can work around the problem by commenting out the
call

> > > to kmem_cache_downsize() in vm_init().

> >

> > This workaround works ok.

>

> But even then, if you comment out the call to kmem_cache_downsize(),

> the kmem_cache_init() function could not have done anything useful

> because the "cache_cache.next.next" pointer is corrupted
with a NULL,

> which points to the first of the chain of kmem_cache slab cache headers.

> I'm surprised it managed to continue without running into another

> roadblock -- did it display the "crash: unable to initialize
kmem

> slab cache subsystem" error message?

>

* * * *No, there is no other error messages.




> > > (And if you're using makedumpfile with
excluded pages, hope that

> > > the problem I described above doesn't occur...)

> > >

> > We are not excluding files so this is not a big issue. Also

> > the --no_kmem_cache lets me open dump and let me do quite many
things

> > already.

>

> Like I mentioned before, I could put a check in kmem_cache_downsize()

> to check for a zero buffer_size, but the odds of that happening are

> absurdly small. *I suppose I could check whether the value is
less

> than the kmem_cache.nodelists structure offset.

>

*

* * That would be usefull, just warn that
some major corruption seems to have

* * happen.It is always good to get atleast
some crash info out. For example

* * dmesg and bt. I'll gladly test patches,
if needed.



* * Also one question. Is there some hidden
option that will show all the

* * hidden crash command line options, e.g.
--no_kmem_cache and alike?



*- Ville

* *





--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 01-14-2010, 01:18 PM
Dave Anderson
 
Default crash-5.0: zero-size memory-allocation

----- "ville mattila" <ville.mattila@stonesoft.com> wrote:

> That would be usefull, just warn that some major corruption seems to have
> happen.It is always good to get atleast some crash info out. For example
> dmesg and bt. I'll gladly test patches, if needed.

Patch attached...

> Also one question. Is there some hidden option that will show all the
> hidden crash command line options, e.g. --no_kmem_cache and alike?

No, for the most part they are there for debugging crash itself,
or were put in place as a result of specific odd-ball vmcores,
or short-time kernels that were missing a key ingredient, etc.

So, for example, with the attached patch, --no_kmem_cache should
not be needed, even with your horrifically corrupted vmcore...

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 
Old 01-15-2010, 08:01 AM
 
Default crash-5.0: zero-size memory-allocation

crash-utility-bounces@redhat.com wrote on 14.01.2010
16:18:53:



> From:

>

> Dave Anderson <anderson@redhat.com>

>

> > That would be usefull, just warn that some major corruption seems
to have

> > happen.It is always good to get atleast some crash info out.
For example

> > dmesg and bt. I'll gladly test patches, if needed.

>

> Patch attached...



* * * * This patch works
well. Thank you!



> *

> > Also one question. Is there some hidden option that will show
all the

> > hidden crash command line options, e.g. --no_kmem_cache and alike?

>

> No, for the most part they are there for debugging crash itself,

> or were put in place as a result of specific odd-ball vmcores,

> or short-time kernels that were missing a key ingredient, etc.

>

> So, for example, with the attached patch, --no_kmem_cache should

> not be needed, even with your horrifically corrupted vmcore...

>

* * * *Ok, thanks for explanation.--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
 

Thread Tools




All times are GMT. The time now is 11:15 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org