Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Crash Utility (http://www.linux-archive.org/crash-utility/)
-   -   Fwd: s390x fixes (http://www.linux-archive.org/crash-utility/661918-fwd-s390x-fixes.html)

Dave Anderson 05-01-2012 06:29 PM

Fwd: s390x fixes
 
Mistakenly cc'd to "crash-utility-owner@redhat.com" instead of this
list...

----- Forwarded Message -----
From: "Dave Anderson" <anderson@redhat.com>
To: "Michael Holzheu" <holzheu@linux.vnet.ibm.com>
Cc: crash-utility-owner@redhat.com
Sent: Monday, April 30, 2012 4:53:46 PM
Subject: s390x fixes


Hi Michael,

I've got a couple simple bug fixes for s390x that I want to
run by you, plus a third one that I don't have a fix for.

First the easy ones:

(1) "bt -t" and "bt -T" fail on the active task on a live system:

crash> bt -t
PID: 34875 TASK: 14342540 CPU: 1 COMMAND: "crash"
bt: invalid/stale stack pointer for this task: 0
crash> bt -T
PID: 34875 TASK: 14342540 CPU: 1 COMMAND: "crash"
bt: invalid/stale stack pointer for this task: 0
crash>

That can be fixed by adding a !LIVE() check to s390x_get_stack_frame()
so that it will use (bt->task + OFFSET(task_struct_thread_ksp):

/* get the stack pointer */
if(esp){
- if(s390x_has_cpu(bt)){
+ if (!LIVE() && s390x_has_cpu(bt)) {
ksp = ULONG(lowcore + MEMBER_OFFSET("_lowcore",
"gpregs_save_area") + (15 * S390X_WORD_SIZE));
} else {
readmem(bt->task + OFFSET(task_struct_thread_ksp),
KVADDR, &ksp, sizeof(void *),
"thread_struct ksp", FAULT_ON_ERROR);
}
*esp = ksp;
} else {


(2) "vm -p" can show bogus data when a page is not mapped, like this example:

crash> vm -p 1
PID: 1 TASK: 17b91120 CPU: 1 COMMAND: "init"
MM PGD RSS TOTAL_VM
14f48400 14f4c000 344k 3116k
VMA START END FLAGS FILE
14b88c80 2aab283b000 2aab2862000 8001875 /sbin/init
VIRTUAL PHYSICAL
2aab283b000 SWAP: (unknown swap location) OFFSET: 0
2aab283c000 SWAP: (unknown swap location) OFFSET: 0
2aab283d000 SWAP: (unknown swap location) OFFSET: 0
2aab283e000 SWAP: (unknown swap location) OFFSET: 0
2aab283f000 SWAP: (unknown swap location) OFFSET: 0
2aab2840000 SWAP: (unknown swap location) OFFSET: 0
2aab2841000 SWAP: (unknown swap location) OFFSET: 0
...

And that's because when a "machdep->uvtop()" operation is done on a user
page that is not resident, the machine-dependent function should return
FALSE -- but it should return the PTE value in the paddr pointer field
so that it can be translated by vm_area_page_dump(). The s390x_uvtop()
does not return the PTE, so the failed output can vary, because it's using
an uninitialized "paddr" stack variable. But this is another easy fix,
in this case to s390x_vtop():

/* lookup virtual address in page tables */
int s390x_vtop(ulong table, ulong vaddr, physaddr_t *phys_addr, int verbose)
{
ulong entry, paddr;
int level, len;

+ *phys_addr = 0;


(3) Even with the (2) applied, however, "vm -p" can fail to translate
user addresses in another situation. If you try this, you'll
see a number of failures like this:

crash> foreach user vm -p | grep PID
PID: 1 TASK: 17b91120 CPU: 1 COMMAND: "init"
PID: 599 TASK: 14fbc140 CPU: 1 COMMAND: "udevd"
PID: 955 TASK: 14343620 CPU: 0 COMMAND: "udevd"
PID: 961 TASK: 13f19220 CPU: 1 COMMAND: "udevd"
PID: 1246 TASK: 14cc0ab0 CPU: 0 COMMAND: "auditd"
PID: 1247 TASK: 14f88240 CPU: 0 COMMAND: "auditd"
PID: 1271 TASK: 140a3320 CPU: 0 COMMAND: "rsyslogd"
vm: read error: kernel virtual address: 0 type: "entry"
PID: 1272 TASK: 14b11520 CPU: 0 COMMAND: "rs:main Q:Reg"
vm: read error: kernel virtual address: 0 type: "entry"
PID: 1273 TASK: 16a32440 CPU: 1 COMMAND: "rsyslogd"
vm: read error: kernel virtual address: 0 type: "entry"
PID: 1274 TASK: 14c3cbb0 CPU: 0 COMMAND: "rsyslogd"
vm: read error: kernel virtual address: 0 type: "entry"
...

And if I take a particular case:

crash> vm -p
PID: 5088 TASK: 14399420 CPU: 1 COMMAND: "mingetty"
MM PGD RSS TOTAL_VM
14e49c00 147f8000 116k 2180k
... [ cut ] ...
VMA START END FLAGS FILE
14c49bc0 8dee1000 8df02000 100073
VIRTUAL PHYSICAL
8dee1000 ef03000
8dee2000 (not mapped)
8dee3000 (not mapped)
8dee4000 (not mapped)
8dee5000 (not mapped)
8dee6000 (not mapped)
8dee7000 (not mapped)
8dee8000 (not mapped)
8dee9000 (not mapped)
8deea000 (not mapped)
8deeb000 (not mapped)
8deec000 (not mapped)
8deed000 (not mapped)
8deee000 (not mapped)
8deef000 (not mapped)
8def0000 (not mapped)
8def1000 (not mapped)
8def2000 (not mapped)
8def3000 (not mapped)
8def4000 (not mapped)
8def5000 (not mapped)
8def6000 (not mapped)
8def7000 (not mapped)
8def8000 (not mapped)
8def9000 (not mapped)
8defa000 (not mapped)
8defb000 (not mapped)
8defc000 (not mapped)
8defd000 (not mapped)
8defe000 (not mapped)
8deff000 (not mapped)
vm: read error: kernel virtual address: 0 type: "entry"
crash>

So in this example, the page that's failing is 8df00000, which is
located in the VMA's range from 8dee1000 to 8df02000. But the
machdep->uvtop() operation fails unexpectedly:

crash> vtop -u 8df00000 -u
VIRTUAL PHYSICAL
vtop: read error: kernel virtual address: 0 type: "entry"
crash>

And that "entry" readmem() is in s390x.c code that I don't wish
to screw around with...

Hoping you can help,
Dave


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Dave Anderson 05-01-2012 07:24 PM

Fwd: s390x fixes
 
Hi Michael,

With respect to the 3rd "vm -p" bug, I did some cursory debugging, and
here's what I found.

In all cases, the readmem() failure occurs in _kl_pg_table_deref_s390x()
as a result of transitioning from one page of PTEs to the next, because
the pointer to the "next" page of PTES contains 0x20, which looks to be
_SEGMENT_ENTRY_INV or _REGION_ENTRY_INV? (not sure of the s390x nomenclature...)

So you'll see something like this in the page table that points
to the pages of PTEs:

...
c6386e0: 0000000000000020 0000000000000020 ....... .......
c6386f0: 000000001608c800 0000000000000020 ...............
c638700: 0000000000000020 0000000000000020 ....... .......
...

The vaddr's in the page of PTEs pointed to by c6386f0 (at 000000001608c800)
all resolve as expected, but when the virtual address bumps it to c6386f8,
it reads the 0x20, and passes it to _kl_pg_table_deref_s390x(). The user
vaddr(s) that resolve to that next page of PTEs are legitimate, given that
they are in the virtual region defined by the vm_area_struct. But they
certainly may not be mapped.

Anyway, it seems that there should be something that catches the invalid entry
in s390x_vtop() -- prior to calling _kl_pg_table_deref_s390x()-- and return
FALSE at that point.

So if I make this kludge:

...

/* Check if this is a large page. */
if (entry & 0x400ULL) {
/* Add the 1MB page offset and return the final value. */
*phys_addr = table + (vaddr & 0xfffffULL);
return TRUE;
}

======> if (entry == 0x20) return FALSE;

/* Get the page table entry */
entry = _kl_pg_table_deref_s390x(vaddr, entry & ~0x7ffULL);
if (!entry)
return FALSE;

/* Isolate the page origin from the page table entry. */
paddr = entry & ~0xfffULL;

/* Add the page offset and return the final value. */
*phys_addr = paddr + (vaddr & 0xfffULL);

return TRUE;
}

then everything seems to work OK.

So unless the calculation of the next page of PTEs is incorrect, which
seems unlikely, it seems that the 0x20 is legitimate, and should be
recognized? What do you think?

Dave


----- Original Message -----
>
> Mistakenly cc'd to "crash-utility-owner@redhat.com" instead of this
> list...
>
> ----- Forwarded Message -----
> From: "Dave Anderson" <anderson@redhat.com>
> To: "Michael Holzheu" <holzheu@linux.vnet.ibm.com>
> Cc: crash-utility-owner@redhat.com
> Sent: Monday, April 30, 2012 4:53:46 PM
> Subject: s390x fixes
>
>
> Hi Michael,
>
> I've got a couple simple bug fixes for s390x that I want to
> run by you, plus a third one that I don't have a fix for.
>
> First the easy ones:
>
> (1) "bt -t" and "bt -T" fail on the active task on a live system:
>
> crash> bt -t
> PID: 34875 TASK: 14342540 CPU: 1 COMMAND: "crash"
> bt: invalid/stale stack pointer for this task: 0
> crash> bt -T
> PID: 34875 TASK: 14342540 CPU: 1 COMMAND: "crash"
> bt: invalid/stale stack pointer for this task: 0
> crash>
>
> That can be fixed by adding a !LIVE() check to
> s390x_get_stack_frame()
> so that it will use (bt->task + OFFSET(task_struct_thread_ksp):
>
> /* get the stack pointer */
> if(esp){
> - if(s390x_has_cpu(bt)){
> + if (!LIVE() && s390x_has_cpu(bt)) {
> ksp = ULONG(lowcore +
> MEMBER_OFFSET("_lowcore",
> "gpregs_save_area") + (15 *
> S390X_WORD_SIZE));
> } else {
> readmem(bt->task +
> OFFSET(task_struct_thread_ksp),
> KVADDR, &ksp, sizeof(void *),
> "thread_struct ksp", FAULT_ON_ERROR);
> }
> *esp = ksp;
> } else {
>
>
> (2) "vm -p" can show bogus data when a page is not mapped, like this
> example:
>
> crash> vm -p 1
> PID: 1 TASK: 17b91120 CPU: 1 COMMAND: "init"
> MM PGD RSS TOTAL_VM
> 14f48400 14f4c000 344k 3116k
> VMA START END FLAGS FILE
> 14b88c80 2aab283b000 2aab2862000 8001875
> /sbin/init
> VIRTUAL PHYSICAL
> 2aab283b000 SWAP: (unknown swap location) OFFSET: 0
> 2aab283c000 SWAP: (unknown swap location) OFFSET: 0
> 2aab283d000 SWAP: (unknown swap location) OFFSET: 0
> 2aab283e000 SWAP: (unknown swap location) OFFSET: 0
> 2aab283f000 SWAP: (unknown swap location) OFFSET: 0
> 2aab2840000 SWAP: (unknown swap location) OFFSET: 0
> 2aab2841000 SWAP: (unknown swap location) OFFSET: 0
> ...
>
> And that's because when a "machdep->uvtop()" operation is done on a
> user
> page that is not resident, the machine-dependent function should
> return
> FALSE -- but it should return the PTE value in the paddr pointer
> field
> so that it can be translated by vm_area_page_dump(). The
> s390x_uvtop()
> does not return the PTE, so the failed output can vary, because it's
> using
> an uninitialized "paddr" stack variable. But this is another easy
> fix,
> in this case to s390x_vtop():
>
> /* lookup virtual address in page tables */
> int s390x_vtop(ulong table, ulong vaddr, physaddr_t *phys_addr, int
> verbose)
> {
> ulong entry, paddr;
> int level, len;
>
> + *phys_addr = 0;
>
>
> (3) Even with the (2) applied, however, "vm -p" can fail to translate
> user addresses in another situation. If you try this, you'll
> see a number of failures like this:
>
> crash> foreach user vm -p | grep PID
> PID: 1 TASK: 17b91120 CPU: 1 COMMAND: "init"
> PID: 599 TASK: 14fbc140 CPU: 1 COMMAND: "udevd"
> PID: 955 TASK: 14343620 CPU: 0 COMMAND: "udevd"
> PID: 961 TASK: 13f19220 CPU: 1 COMMAND: "udevd"
> PID: 1246 TASK: 14cc0ab0 CPU: 0 COMMAND: "auditd"
> PID: 1247 TASK: 14f88240 CPU: 0 COMMAND: "auditd"
> PID: 1271 TASK: 140a3320 CPU: 0 COMMAND: "rsyslogd"
> vm: read error: kernel virtual address: 0 type: "entry"
> PID: 1272 TASK: 14b11520 CPU: 0 COMMAND: "rs:main
> Q:Reg"
> vm: read error: kernel virtual address: 0 type: "entry"
> PID: 1273 TASK: 16a32440 CPU: 1 COMMAND: "rsyslogd"
> vm: read error: kernel virtual address: 0 type: "entry"
> PID: 1274 TASK: 14c3cbb0 CPU: 0 COMMAND: "rsyslogd"
> vm: read error: kernel virtual address: 0 type: "entry"
> ...
>
> And if I take a particular case:
>
> crash> vm -p
> PID: 5088 TASK: 14399420 CPU: 1 COMMAND: "mingetty"
> MM PGD RSS TOTAL_VM
> 14e49c00 147f8000 116k 2180k
> ... [ cut ] ...
> VMA START END FLAGS FILE
> 14c49bc0 8dee1000 8df02000 100073
> VIRTUAL PHYSICAL
> 8dee1000 ef03000
> 8dee2000 (not mapped)
> 8dee3000 (not mapped)
> 8dee4000 (not mapped)
> 8dee5000 (not mapped)
> 8dee6000 (not mapped)
> 8dee7000 (not mapped)
> 8dee8000 (not mapped)
> 8dee9000 (not mapped)
> 8deea000 (not mapped)
> 8deeb000 (not mapped)
> 8deec000 (not mapped)
> 8deed000 (not mapped)
> 8deee000 (not mapped)
> 8deef000 (not mapped)
> 8def0000 (not mapped)
> 8def1000 (not mapped)
> 8def2000 (not mapped)
> 8def3000 (not mapped)
> 8def4000 (not mapped)
> 8def5000 (not mapped)
> 8def6000 (not mapped)
> 8def7000 (not mapped)
> 8def8000 (not mapped)
> 8def9000 (not mapped)
> 8defa000 (not mapped)
> 8defb000 (not mapped)
> 8defc000 (not mapped)
> 8defd000 (not mapped)
> 8defe000 (not mapped)
> 8deff000 (not mapped)
> vm: read error: kernel virtual address: 0 type: "entry"
> crash>
>
> So in this example, the page that's failing is 8df00000, which is
> located in the VMA's range from 8dee1000 to 8df02000. But the
> machdep->uvtop() operation fails unexpectedly:
>
> crash> vtop -u 8df00000 -u
> VIRTUAL PHYSICAL
> vtop: read error: kernel virtual address: 0 type: "entry"
> crash>
>
> And that "entry" readmem() is in s390x.c code that I don't wish
> to screw around with...
>
> Hoping you can help,
> Dave
>
>
> --
> Crash-utility mailing list
> Crash-utility@redhat.com
> https://www.redhat.com/mailman/listinfo/crash-utility
>

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Dave Anderson 05-02-2012 12:55 PM

Fwd: s390x fixes
 
----- Forwarded Message -----
From: "Michael Holzheu" <holzheu@linux.vnet.ibm.com>
To: "Dave Anderson" <anderson@redhat.com>
Cc: crash-utility-owner@redhat.com
Sent: Wednesday, May 2, 2012 8:49:56 AM
Subject: Re: s390x fixes

Hi Dave,


On Mon, 30 Apr 2012 16:53:46 -0400 (EDT)
Dave Anderson <anderson@redhat.com> wrote:
> I've got a couple simple bug fixes for s390x that I want to
> run by you, plus a third one that I don't have a fix for.
>
> First the easy ones:
>
> (1) "bt -t" and "bt -T" fail on the active task on a live system:
>
> crash> bt -t
> PID: 34875 TASK: 14342540 CPU: 1 COMMAND: "crash"
> bt: invalid/stale stack pointer for this task: 0
> crash> bt -T
> PID: 34875 TASK: 14342540 CPU: 1 COMMAND: "crash"
> bt: invalid/stale stack pointer for this task: 0
> crash>
>
> That can be fixed by adding a !LIVE() check to s390x_get_stack_frame()
> so that it will use (bt->task + OFFSET(task_struct_thread_ksp):
>
> /* get the stack pointer */
> if(esp){
> - if(s390x_has_cpu(bt)){
> + if (!LIVE() && s390x_has_cpu(bt)) {
> ksp = ULONG(lowcore +
> MEMBER_OFFSET("_lowcore", "gpregs_save_area") + (15 *
> S390X_WORD_SIZE)); } else {
> readmem(bt->task +
> OFFSET(task_struct_thread_ksp), KVADDR, &ksp, sizeof(void *),
> "thread_struct ksp", FAULT_ON_ERROR);
> }
> *esp = ksp;
> } else {
>

Looks good, thanks!

> (2) "vm -p" can show bogus data when a page is not mapped, like this
> example:
>
> crash> vm -p 1
> PID: 1 TASK: 17b91120 CPU: 1 COMMAND: "init"
> MM PGD RSS TOTAL_VM
> 14f48400 14f4c000 344k 3116k
> VMA START END FLAGS FILE
> 14b88c80 2aab283b000 2aab2862000
> 8001875 /sbin/init VIRTUAL PHYSICAL
> 2aab283b000 SWAP: (unknown swap location) OFFSET: 0
> 2aab283c000 SWAP: (unknown swap location) OFFSET: 0
> 2aab283d000 SWAP: (unknown swap location) OFFSET: 0
> 2aab283e000 SWAP: (unknown swap location) OFFSET: 0
> 2aab283f000 SWAP: (unknown swap location) OFFSET: 0
> 2aab2840000 SWAP: (unknown swap location) OFFSET: 0
> 2aab2841000 SWAP: (unknown swap location) OFFSET: 0
> ...
>
> And that's because when a "machdep->uvtop()" operation is done on a
> user page that is not resident, the machine-dependent function should
> return FALSE -- but it should return the PTE value in the paddr
> pointer field so that it can be translated by vm_area_page_dump().
> The s390x_uvtop() does not return the PTE, so the failed output can
> vary, because it's using an uninitialized "paddr" stack variable.
> But this is another easy fix, in this case to s390x_vtop():
>
> /* lookup virtual address in page tables */
> int s390x_vtop(ulong table, ulong vaddr, physaddr_t *phys_addr, int
> verbose) {
> ulong entry, paddr;
> int level, len;
>
> + *phys_addr = 0;


Looks also good. But probably I should implement a better fix that
returns the pte for s390x swap entries.

>
> (3) Even with the (2) applied, however, "vm -p" can fail to translate
> user addresses in another situation. If you try this, you'll
> see a number of failures like this:
>
> crash> foreach user vm -p | grep PID
> PID: 1 TASK: 17b91120 CPU: 1 COMMAND: "init"
> PID: 599 TASK: 14fbc140 CPU: 1 COMMAND: "udevd"
> PID: 955 TASK: 14343620 CPU: 0 COMMAND: "udevd"
> PID: 961 TASK: 13f19220 CPU: 1 COMMAND: "udevd"
> PID: 1246 TASK: 14cc0ab0 CPU: 0 COMMAND: "auditd"
> PID: 1247 TASK: 14f88240 CPU: 0 COMMAND: "auditd"
> PID: 1271 TASK: 140a3320 CPU: 0 COMMAND: "rsyslogd"
> vm: read error: kernel virtual address: 0 type: "entry"
> PID: 1272 TASK: 14b11520 CPU: 0 COMMAND: "rs:main
> Q:Reg" vm: read error: kernel virtual address: 0 type: "entry"
> PID: 1273 TASK: 16a32440 CPU: 1 COMMAND: "rsyslogd"
> vm: read error: kernel virtual address: 0 type: "entry"
> PID: 1274 TASK: 14c3cbb0 CPU: 0 COMMAND: "rsyslogd"
> vm: read error: kernel virtual address: 0 type: "entry"
> ...
>
> And if I take a particular case:
>
> crash> vm -p
> PID: 5088 TASK: 14399420 CPU: 1 COMMAND: "mingetty"
> MM PGD RSS TOTAL_VM
> 14e49c00 147f8000 116k 2180k
> ... [ cut ] ...
> VMA START END FLAGS FILE
> 14c49bc0 8dee1000 8df02000 100073
> VIRTUAL PHYSICAL
> 8dee1000 ef03000
> 8dee2000 (not mapped)
> 8dee3000 (not mapped)
> 8dee4000 (not mapped)
> 8dee5000 (not mapped)
> 8dee6000 (not mapped)
> 8dee7000 (not mapped)
> 8dee8000 (not mapped)
> 8dee9000 (not mapped)
> 8deea000 (not mapped)
> 8deeb000 (not mapped)
> 8deec000 (not mapped)
> 8deed000 (not mapped)
> 8deee000 (not mapped)
> 8deef000 (not mapped)
> 8def0000 (not mapped)
> 8def1000 (not mapped)
> 8def2000 (not mapped)
> 8def3000 (not mapped)
> 8def4000 (not mapped)
> 8def5000 (not mapped)
> 8def6000 (not mapped)
> 8def7000 (not mapped)
> 8def8000 (not mapped)
> 8def9000 (not mapped)
> 8defa000 (not mapped)
> 8defb000 (not mapped)
> 8defc000 (not mapped)
> 8defd000 (not mapped)
> 8defe000 (not mapped)
> 8deff000 (not mapped)
> vm: read error: kernel virtual address: 0 type: "entry"
> crash>
>
> So in this example, the page that's failing is 8df00000, which is
> located in the VMA's range from 8dee1000 to 8df02000. But the
> machdep->uvtop() operation fails unexpectedly:
>
> crash> vtop -u 8df00000 -u
> VIRTUAL PHYSICAL
> vtop: read error: kernel virtual address: 0 type: "entry"
> crash>
>
> And that "entry" readmem() is in s390x.c code that I don't wish
> to screw around with...

See reply to your other note.

Michael

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Michael Holzheu 05-02-2012 12:58 PM

Fwd: s390x fixes
 
Hi Dave,

On Tue, 01 May 2012 15:24:17 -0400 (EDT)
Dave Anderson <anderson@redhat.com> wrote:

> With respect to the 3rd "vm -p" bug, I did some cursory debugging, and
> here's what I found.
>
> In all cases, the readmem() failure occurs in
> _kl_pg_table_deref_s390x() as a result of transitioning from one page
> of PTEs to the next, because the pointer to the "next" page of PTES
> contains 0x20, which looks to be _SEGMENT_ENTRY_INV or
> _REGION_ENTRY_INV? (not sure of the s390x nomenclature...)
>
> So you'll see something like this in the page table that points
> to the pages of PTEs:
>
> ...
> c6386e0: 0000000000000020
> 0000000000000020 ....... ....... c6386f0: 000000001608c800
> 0000000000000020 ............... c638700: 0000000000000020
> 0000000000000020 ....... ....... ...
>
> The vaddr's in the page of PTEs pointed to by c6386f0 (at
> 000000001608c800) all resolve as expected, but when the virtual
> address bumps it to c6386f8, it reads the 0x20, and passes it to
> _kl_pg_table_deref_s390x(). The user vaddr(s) that resolve to that
> next page of PTEs are legitimate, given that they are in the virtual
> region defined by the vm_area_struct. But they certainly may not be
> mapped.
>
> Anyway, it seems that there should be something that catches the
> invalid entry in s390x_vtop() -- prior to calling
> _kl_pg_table_deref_s390x()-- and return FALSE at that point.
>
> So if I make this kludge:
>
> ...
>
> /* Check if this is a large page. */
> if (entry & 0x400ULL) {
> /* Add the 1MB page offset and return the final
> value. */ *phys_addr = table + (vaddr & 0xfffffULL);
> return TRUE;
> }
>
> ======> if (entry == 0x20) return FALSE;
>
> /* Get the page table entry */
> entry = _kl_pg_table_deref_s390x(vaddr, entry & ~0x7ffULL);
> if (!entry)
> return FALSE;
>
> /* Isolate the page origin from the page table entry. */
> paddr = entry & ~0xfffULL;
>
> /* Add the page offset and return the final value. */
> *phys_addr = paddr + (vaddr & 0xfffULL);
>
> return TRUE;
> }
>
> then everything seems to work OK.
>
> So unless the calculation of the next page of PTEs is incorrect,
> which seems unlikely, it seems that the 0x20 is legitimate, and
> should be recognized? What do you think?

0x20 means that the invalid bit of a region table entry is set (bit 58).
The problem is a wrong check in _kl_rsg_table_deref_s390x() where we
check for 0x40 (instead of 0x20).

The following is the correct fix for the problem:
---
s390x.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/s390x.c
+++ b/s390x.c
@@ -568,7 +568,7 @@ static ulong _kl_rsg_table_deref_s390x(u
if ((entry & 0xcULL) != (level << 2))
return 0;
/* Check if the region table entry has the invalid bit set. */
- if (entry & 0x40ULL)
+ if (entry & 0x20ULL)
return 0;
/* Region table entry is valid and well formed. */
return entry;


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Dave Anderson 05-02-2012 01:08 PM

Fwd: s390x fixes
 
----- Original Message -----
> Hi Dave,
>
> On Tue, 01 May 2012 15:24:17 -0400 (EDT)
> Dave Anderson <anderson@redhat.com> wrote:
>
> > With respect to the 3rd "vm -p" bug, I did some cursory debugging,
> > and
> > here's what I found.
> >
> > In all cases, the readmem() failure occurs in
> > _kl_pg_table_deref_s390x() as a result of transitioning from one
> > page
> > of PTEs to the next, because the pointer to the "next" page of PTES
> > contains 0x20, which looks to be _SEGMENT_ENTRY_INV or
> > _REGION_ENTRY_INV? (not sure of the s390x nomenclature...)
> >
> > So you'll see something like this in the page table that points
> > to the pages of PTEs:
> >
> > ...
> > c6386e0: 0000000000000020
> > 0000000000000020 ....... ....... c6386f0: 000000001608c800
> > 0000000000000020 ............... c638700: 0000000000000020
> > 0000000000000020 ....... ....... ...
> >
> > The vaddr's in the page of PTEs pointed to by c6386f0 (at
> > 000000001608c800) all resolve as expected, but when the virtual
> > address bumps it to c6386f8, it reads the 0x20, and passes it to
> > _kl_pg_table_deref_s390x(). The user vaddr(s) that resolve to that
> > next page of PTEs are legitimate, given that they are in the virtual
> > region defined by the vm_area_struct. But they certainly may not be
> > mapped.
> >
> > Anyway, it seems that there should be something that catches the
> > invalid entry in s390x_vtop() -- prior to calling
> > _kl_pg_table_deref_s390x()-- and return FALSE at that point.
> >
> > So if I make this kludge:
> >
> > ...
> >
> > /* Check if this is a large page. */
> > if (entry & 0x400ULL) {
> > /* Add the 1MB page offset and return the final
> > value. */ *phys_addr = table + (vaddr & 0xfffffULL);
> > return TRUE;
> > }
> >
> > ======> if (entry == 0x20) return FALSE;
> >
> > /* Get the page table entry */
> > entry = _kl_pg_table_deref_s390x(vaddr, entry & ~0x7ffULL);
> > if (!entry)
> > return FALSE;
> >
> > /* Isolate the page origin from the page table entry. */
> > paddr = entry & ~0xfffULL;
> >
> > /* Add the page offset and return the final value. */
> > *phys_addr = paddr + (vaddr & 0xfffULL);
> >
> > return TRUE;
> > }
> >
> > then everything seems to work OK.
> >
> > So unless the calculation of the next page of PTEs is incorrect,
> > which seems unlikely, it seems that the 0x20 is legitimate, and
> > should be recognized? What do you think?
>
> 0x20 means that the invalid bit of a region table entry is set (bit 58).
> The problem is a wrong check in _kl_rsg_table_deref_s390x() where we
> check for 0x40 (instead of 0x20).
>
> The following is the correct fix for the problem:
> ---
> s390x.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> --- a/s390x.c
> +++ b/s390x.c
> @@ -568,7 +568,7 @@ static ulong _kl_rsg_table_deref_s390x(u
> if ((entry & 0xcULL) != (level << 2))
> return 0;
> /* Check if the region table entry has the invalid bit set. */
> - if (entry & 0x40ULL)
> + if (entry & 0x20ULL)
> return 0;
> /* Region table entry is valid and well formed. */
> return entry;

Much better -- queued for crash-6.0.7.

Thanks,
Dave




--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Dave Anderson 05-02-2012 01:21 PM

Fwd: s390x fixes
 
----- Original Message -----

> >
> > And that's because when a "machdep->uvtop()" operation is done on a
> > user page that is not resident, the machine-dependent function should
> > return FALSE -- but it should return the PTE value in the paddr
> > pointer field so that it can be translated by vm_area_page_dump().
> > The s390x_uvtop() does not return the PTE, so the failed output can
> > vary, because it's using an uninitialized "paddr" stack variable.
> > But this is another easy fix, in this case to s390x_vtop():
> >
> > /* lookup virtual address in page tables */
> > int s390x_vtop(ulong table, ulong vaddr, physaddr_t *phys_addr, int
> > verbose) {
> > ulong entry, paddr;
> > int level, len;
> >
> > + *phys_addr = 0;
>
>
> Looks also good. But probably I should implement a better fix that
> returns the pte for s390x swap entries.

That's true -- so since you've got the __swp_type() and __swp_offset()
macros #define'd for s390x, presumably it should work if you just pass
it back as is? Even though you've got the common s390x_vtop() function
used by both s390x_kvtop() and s390x_uvtop(), I think it would be
safe to pass the PTE entry back in either case, because kvtop() operations
don't care about the paddr contents if you return FALSE.

Dave

--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Michael Holzheu 05-02-2012 04:02 PM

Fwd: s390x fixes
 
Hi Dave,

On Wed, 02 May 2012 09:21:57 -0400 (EDT)
Dave Anderson <anderson@redhat.com> wrote:
>
> ----- Original Message -----
>
> > >
> > > And that's because when a "machdep->uvtop()" operation is done on
> > > a user page that is not resident, the machine-dependent function
> > > should return FALSE -- but it should return the PTE value in the
> > > paddr pointer field so that it can be translated by
> > > vm_area_page_dump(). The s390x_uvtop() does not return the PTE,
> > > so the failed output can vary, because it's using an
> > > uninitialized "paddr" stack variable. But this is another easy
> > > fix, in this case to s390x_vtop():
> > >
> > > /* lookup virtual address in page tables */
> > > int s390x_vtop(ulong table, ulong vaddr, physaddr_t *phys_addr,
> > > int verbose) {
> > > ulong entry, paddr;
> > > int level, len;
> > >
> > > + *phys_addr = 0;
> >
> >
> > Looks also good. But probably I should implement a better fix that
> > returns the pte for s390x swap entries.
>
> That's true -- so since you've got the __swp_type() and __swp_offset()
> macros #define'd for s390x, presumably it should work if you just pass
> it back as is? Even though you've got the common s390x_vtop()
> function used by both s390x_kvtop() and s390x_uvtop(), I think it
> would be safe to pass the PTE entry back in either case, because
> kvtop() operations don't care about the paddr contents if you return
> FALSE.

The following patch implements swap entry support for s390x and
fixes your issues (2) and (3).

Michael
---
s390x.c | 32 +++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)

--- a/s390x.c
+++ b/s390x.c
@@ -574,6 +574,19 @@ static ulong _kl_rsg_table_deref_s390x(u
return entry;
}

+/* Check for swap entry */
+static int swap_entry(ulong entry)
+{
+ if (THIS_KERNEL_VERSION < LINUX(2,6,19)) {
+ if ((entry & 0x601ULL) == 0x600ULL)
+ return 1;
+ } else {
+ if ((entry & 0x403ULL) == 0x403ULL)
+ return 1;
+ }
+ return 0;
+}
+
/* Page table traversal function */
static ulong _kl_pg_table_deref_s390x(ulong vaddr, ulong table)
{
@@ -583,13 +596,11 @@ static ulong _kl_pg_table_deref_s390x(ul
readmem(table + offset, KVADDR, &entry, sizeof(entry), "entry",
FAULT_ON_ERROR);
/*
- * Check if the page table entry could be read and doesn't have
- * any of the reserved bits set.
+ * Return zero if the page table entry has any of the reserved bits
+ * set (0x900) or the invalid bit (0x400) is set and it is not a
+ * swap entry.
*/
- if (entry & 0x900ULL)
- return 0;
- /* Check if the page table entry has the invalid bit set. */
- if (entry & 0x400ULL)
+ if ((entry & 0xd00ULL) && !swap_entry(entry))
return 0;
/* Page table entry is valid and well formed. */
return entry;
@@ -601,6 +612,7 @@ int s390x_vtop(ulong table, ulong vaddr,
ulong entry, paddr;
int level, len;

+ *phys_addr = 0;
/*
* Walk the region and segment tables.
* We assume that the table length field in the asce is set to the
@@ -619,7 +631,7 @@ int s390x_vtop(ulong table, ulong vaddr,
while (level >= 0) {
entry = _kl_rsg_table_deref_s390x(vaddr, table, len, level);
if (!entry)
- return 0;
+ return FALSE;
table = entry & ~0xfffULL;
len = entry & 0x3ULL;
level--;
@@ -637,6 +649,12 @@ int s390x_vtop(ulong table, ulong vaddr,
if (!entry)
return FALSE;

+ /* For swap entries we have to return FALSE and phys_addr = PTE */
+ if (swap_entry(entry)) {
+ *phys_addr = entry;
+ return FALSE;
+ }
+
/* Isolate the page origin from the page table entry. */
paddr = entry & ~0xfffULL;


--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility

Dave Anderson 05-02-2012 06:00 PM

Fwd: s390x fixes
 
----- Original Message -----
> Hi Dave,
>
> On Wed, 02 May 2012 09:21:57 -0400 (EDT)
> Dave Anderson <anderson@redhat.com> wrote:
> >
> > ----- Original Message -----
> >
> > > >
> > > > And that's because when a "machdep->uvtop()" operation is done on
> > > > a user page that is not resident, the machine-dependent function
> > > > should return FALSE -- but it should return the PTE value in the
> > > > paddr pointer field so that it can be translated by
> > > > vm_area_page_dump(). The s390x_uvtop() does not return the PTE,
> > > > so the failed output can vary, because it's using an
> > > > uninitialized "paddr" stack variable. But this is another easy
> > > > fix, in this case to s390x_vtop():
> > > >
> > > > /* lookup virtual address in page tables */
> > > > int s390x_vtop(ulong table, ulong vaddr, physaddr_t *phys_addr,
> > > > int verbose) {
> > > > ulong entry, paddr;
> > > > int level, len;
> > > >
> > > > + *phys_addr = 0;
> > >
> > >
> > > Looks also good. But probably I should implement a better fix that
> > > returns the pte for s390x swap entries.
> >
> > That's true -- so since you've got the __swp_type() and __swp_offset()
> > macros #define'd for s390x, presumably it should work if you just pass
> > it back as is? Even though you've got the common s390x_vtop()
> > function used by both s390x_kvtop() and s390x_uvtop(), I think it
> > would be safe to pass the PTE entry back in either case, because
> > kvtop() operations don't care about the paddr contents if you return
> > FALSE.
>
> The following patch implements swap entry support for s390x and
> fixes your issues (2) and (3).
>
> Michael
> ---
> s390x.c | 32 +++++++++++++++++++++++++-------
> 1 file changed, 25 insertions(+), 7 deletions(-)
>
> --- a/s390x.c
> +++ b/s390x.c
> @@ -574,6 +574,19 @@ static ulong _kl_rsg_table_deref_s390x(u
> return entry;
> }
>
> +/* Check for swap entry */
> +static int swap_entry(ulong entry)
> +{
> + if (THIS_KERNEL_VERSION < LINUX(2,6,19)) {
> + if ((entry & 0x601ULL) == 0x600ULL)
> + return 1;
> + } else {
> + if ((entry & 0x403ULL) == 0x403ULL)
> + return 1;
> + }
> + return 0;
> +}
> +
> /* Page table traversal function */
> static ulong _kl_pg_table_deref_s390x(ulong vaddr, ulong table)
> {
> @@ -583,13 +596,11 @@ static ulong _kl_pg_table_deref_s390x(ul
> readmem(table + offset, KVADDR, &entry, sizeof(entry), "entry",
> FAULT_ON_ERROR);
> /*
> - * Check if the page table entry could be read and doesn't have
> - * any of the reserved bits set.
> + * Return zero if the page table entry has any of the reserved bits
> + * set (0x900) or the invalid bit (0x400) is set and it is not a
> + * swap entry.
> */
> - if (entry & 0x900ULL)
> - return 0;
> - /* Check if the page table entry has the invalid bit set. */
> - if (entry & 0x400ULL)
> + if ((entry & 0xd00ULL) && !swap_entry(entry))
> return 0;
> /* Page table entry is valid and well formed. */
> return entry;
> @@ -601,6 +612,7 @@ int s390x_vtop(ulong table, ulong vaddr,
> ulong entry, paddr;
> int level, len;
>
> + *phys_addr = 0;
> /*
> * Walk the region and segment tables.
> * We assume that the table length field in the asce is set to the
> @@ -619,7 +631,7 @@ int s390x_vtop(ulong table, ulong vaddr,
> while (level >= 0) {
> entry = _kl_rsg_table_deref_s390x(vaddr, table, len, level);
> if (!entry)
> - return 0;
> + return FALSE;
> table = entry & ~0xfffULL;
> len = entry & 0x3ULL;
> level--;
> @@ -637,6 +649,12 @@ int s390x_vtop(ulong table, ulong vaddr,
> if (!entry)
> return FALSE;
>
> + /* For swap entries we have to return FALSE and phys_addr = PTE */
> + if (swap_entry(entry)) {
> + *phys_addr = entry;
> + return FALSE;
> + }
> +
> /* Isolate the page origin from the page table entry. */
> paddr = entry & ~0xfffULL;

Nice -- queued for crash-6.0.7.

Thanks,
Dave




--
Crash-utility mailing list
Crash-utility@redhat.com
https://www.redhat.com/mailman/listinfo/crash-utility


All times are GMT. The time now is 09:41 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.