I have never seen this problem before, so the behavior you see is exactly what I have seen before. However with a fairly new kernel I did not get the correct crash_notes. The investigation lead to the patch for the problem described in my previous mail.
I have not investigated if there is any patch in newer kernels that changes this behavior and in that case where it comes from (it could be a patch by us). However as the algorithm for reading crash_notes is wrong, as it depends on a variable that is not yet initialized, I think it should be corrected anyhow. I have tested my patch with both newer and older kernels and it works as intended.
Jan
Jan Karlsson
Senior Software Engineer
MIB
*
Sony Mobile Communications
Tel: +46703062174
sonymobile.com
*
-----Original Message-----
From: crash-utility-bounces@redhat.com [mailto:crash-utility-bounces@redhat.com] On Behalf Of Dave Anderson
Sent: onsdag den 18 juli 2012 15:13
To: Discussion list for crash utility usage, maintenance and development
Cc: Fänge, Thomas
Subject: Re: [Crash-utility] ARM: crash registers might be read from the wrong physical address
----- Original Message -----
>
>
>
>
> Hi Dave
>
>
>
> I found a problem in arm.c that arm_get_crash_notes() is called too
> early. This has never been a problem until now.
>
> arm_get_crash_notes() in arm.c
> calls readmem(, KVADDR, )
> which calls kvtop()
> which calls machdep->kvtop that is arm_kvtop which uses
> vt->vmalloc_start
> vt->vmalloc_start is initialized in vm_init
>
> From main_loop:
>
> machdep_init(POST_GDB);
> vm_init();
> machdep_init(POST_VM);
>
> arm_get_crash_notes() is currently called in the POST_GDB section of
> machdep_init, but should be moved to the POST_VM section. I put the
> comment and the code just before:
>
> if (init_unwind_tables()) {
>
> and then it works fine. Without this fix the crash registers might be
> read from the wrong physical address.
>
> Jan
Looking at the 2.6.38-based SMP ARM sample kernel I have, the
arm_get_crash_notes() does not make any readmem() calls of a vmalloc address, only unity-mapped calls:
Have newer ARM kernels changed how percpu addresses are translated such that the note_ptrs[] entries become vmalloc addresses here in arm_get_crash_notes():?
if (symbol_exists("__per_cpu_offset")) {
/* Add __per_cpu_offset for each cpu to form the pointer to the notes */
for (i = 0; i<kt->cpus; i++)
notes_ptrs[i] = notes_ptrs[kt->cpus-1] + kt->__per_cpu_offset[i];
}
Dave
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
07-20-2012, 07:49 AM
"Karlsson, Jan"
ARM: crash registers might be read from the wrong physical address
What I see is the following:
crash> p crash_notes
crash_notes = $29 = (note_buf_t *) 0xf662e000
crash> p/x __per_cpu_offset
$31 = {0x39b2000, 0x39ba000, 0x39c2000, 0x39ca000}
0xf662e000 + 0x39b2000 = 0xf9fe0000 which is the address seen in readmem.
These are the interesting lines I see in source code (both newer and older kernels):
note_buf_t *crash_notes;
crash_notes = alloc_percpu(note_buf_t);
I do not really understand this in detail, but it seems that alloc_percpu uses "chunks" and may allocate new chunks if there is not enough memory in the currently available chunks. So what might have happen is in older cases there is space in first(??) chunk, while in the newer case a new chunk have to be allocated.
Jan
Jan Karlsson
Senior Software Engineer
MIB
*
Sony Mobile Communications
Tel: +46703062174
sonymobile.com
*
-----Original Message-----
From: crash-utility-bounces@redhat.com [mailto:crash-utility-bounces@redhat.com] On Behalf Of Dave Anderson
Sent: torsdag den 19 juli 2012 14:42
To: Discussion list for crash utility usage, maintenance and development
Cc: Fänge, Thomas
Subject: Re: [Crash-utility] ARM: crash registers might be read from the wrong physical address
----- Original Message -----
> These are the same lines in my case.
>
> <readmem: c0d2af6c, KVADDR, "crash_notes", 4, (ROE), 85ba84c>
> <read_kdump: addr: c0d2af6c paddr: 80f2af6c cnt: 4>
> <readmem: f9fe0000, KVADDR, "note_buf_t", 560, (ROE), 85bac40> <--- !!
> <readmem: c0004000, KVADDR, "pgd page", 16384, (FOE), 914e8d0>
>
> I have never seen this problem before, so the behavior you see is
> exactly what I have seen before. However with a fairly new kernel I
> did not get the correct crash_notes. The investigation lead to the
> patch for the problem described in my previous mail.
>
> I have not investigated if there is any patch in newer kernels that
> changes this behavior and in that case where it comes from (it could
> be a patch by us). However as the algorithm for reading crash_notes is
> wrong, as it depends on a variable that is not yet initialized, I
> think it should be corrected anyhow. I have tested my patch with both
> newer and older kernels and it works as intended.
OK, good. And so apparently the per-cpu region has been moved up into vmalloc space. I'll queue the change into crash-6.0.9.
For curiosity's sake, can you show me the per-cpu symbol list? In my sample ARM kernel, it's located in the unity-mapped region just below the .text section, and can be seen like this:
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility
07-20-2012, 08:03 AM
"Karlsson, Jan"
ARM: crash registers might be read from the wrong physical address
I forgot to say that the __per_cpu_start symbol is placed at a similar address as you see in your example. So there is no change in the handling of the basic per_cpu area.
Jan
-----Original Message-----
From: Karlsson, Jan
Sent: fredag den 20 juli 2012 09:49
To: 'Discussion list for crash utility usage, maintenance and development'
Cc: Fänge, Thomas
Subject: RE: [Crash-utility] ARM: crash registers might be read from the wrong physical address
What I see is the following:
crash> p crash_notes
crash_notes = $29 = (note_buf_t *) 0xf662e000
crash> p/x __per_cpu_offset
$31 = {0x39b2000, 0x39ba000, 0x39c2000, 0x39ca000}
0xf662e000 + 0x39b2000 = 0xf9fe0000 which is the address seen in readmem.
These are the interesting lines I see in source code (both newer and older kernels):
note_buf_t *crash_notes;
crash_notes = alloc_percpu(note_buf_t);
I do not really understand this in detail, but it seems that alloc_percpu uses "chunks" and may allocate new chunks if there is not enough memory in the currently available chunks. So what might have happen is in older cases there is space in first(??) chunk, while in the newer case a new chunk have to be allocated.
Jan
Jan Karlsson
Senior Software Engineer
MIB
*
Sony Mobile Communications
Tel: +46703062174
sonymobile.com
*
-----Original Message-----
From: crash-utility-bounces@redhat.com [mailto:crash-utility-bounces@redhat.com] On Behalf Of Dave Anderson
Sent: torsdag den 19 juli 2012 14:42
To: Discussion list for crash utility usage, maintenance and development
Cc: Fänge, Thomas
Subject: Re: [Crash-utility] ARM: crash registers might be read from the wrong physical address
----- Original Message -----
> These are the same lines in my case.
>
> <readmem: c0d2af6c, KVADDR, "crash_notes", 4, (ROE), 85ba84c>
> <read_kdump: addr: c0d2af6c paddr: 80f2af6c cnt: 4>
> <readmem: f9fe0000, KVADDR, "note_buf_t", 560, (ROE), 85bac40> <--- !!
> <readmem: c0004000, KVADDR, "pgd page", 16384, (FOE), 914e8d0>
>
> I have never seen this problem before, so the behavior you see is
> exactly what I have seen before. However with a fairly new kernel I
> did not get the correct crash_notes. The investigation lead to the
> patch for the problem described in my previous mail.
>
> I have not investigated if there is any patch in newer kernels that
> changes this behavior and in that case where it comes from (it could
> be a patch by us). However as the algorithm for reading crash_notes is
> wrong, as it depends on a variable that is not yet initialized, I
> think it should be corrected anyhow. I have tested my patch with both
> newer and older kernels and it works as intended.
OK, good. And so apparently the per-cpu region has been moved up into vmalloc space. I'll queue the change into crash-6.0.9.
For curiosity's sake, can you show me the per-cpu symbol list? In my sample ARM kernel, it's located in the unity-mapped region just below the .text section, and can be seen like this: