FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 11-23-2009, 02:25 PM
Ian Campbell
 
Default Bug#544145: Crash with paravirt-ops 2.6.31.6 kernel

On Sun, 2009-11-22 at 09:54 +0000, Bastian Blank wrote:
> On Tue, Nov 17, 2009 at 10:04:36PM +0300, William Pitcock wrote:
> > [ 1.254927] init[1] general protection ip:f779042f sp:ff9b0340 error:0
>
> Hmm, this looks like the old Debian bug 544145[1]. For some reason the
> hypervisor jumps back into 64bit mode after a syscall instruction.
> Workaround: vdso32=0 or deinstall libc6-i686,

I just noticed that one of my test boxes has a AMD processor so I took a
bit of a look into this.

The issue seems to be with this bit of code in the hypervisor
(xen/arch/x86/x86_64/entry.S):

restore_all_guest:
ASSERT_INTERRUPTS_DISABLED
RESTORE_ALL
testw $TRAP_syscall,4(%rsp)
jz iret_exit_to_guest

addq $8,%rsp
popq %rcx # RIP
popq %r11 # CS
cmpw $FLAT_USER_CS32,%r11
popq %r11 # RFLAGS
popq %rsp # RSP
je 1f
sysretq
1: sysretl

We are attempting to return to the Linux defined __USER_CS32 (0x23)
which does not match the test for the Xen defined FLAT_USER_CS32
(0xe023) and therefore we hit the sysretq instead of the sysretl which
causes us to return with CS 0xe033 (FLAT_USER_CS64) instead of CS
0xe023.

This patch to the kernel fixes things but doesn't seem that
satisfactory:

diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
index 02f496a..203586d 100644
--- a/arch/x86/xen/xen-asm_64.S
+++ b/arch/x86/xen/xen-asm_64.S
@@ -93,7 +93,7 @@ ENTRY(xen_sysret32)
pushq $__USER32_DS
pushq PER_CPU_VAR(old_rsp)
pushq %r11
- pushq $__USER32_CS
+ pushq $FLAT_USER_CS32
pushq %rcx

pushq $VGCF_in_syscall

Coming from the other angle we could fix this in the hypervisor by
always returning to guest (user or kernel) via iret instead of sysret:

diff -r e7a1eab70fac xen/arch/x86/x86_64/entry.S
--- a/xen/arch/x86/x86_64/entry.S Mon Nov 09 10:24:54 2009 +0000
+++ b/xen/arch/x86/x86_64/entry.S Mon Nov 23 15:15:39 2009 +0000
@@ -48,22 +48,6 @@
restore_all_guest:
ASSERT_INTERRUPTS_DISABLED
RESTORE_ALL
- testw $TRAP_syscall,4(%rsp)
- jz iret_exit_to_guest
-
- addq $8,%rsp
- popq %rcx # RIP
- popq %r11 # CS
- cmpw $FLAT_USER_CS32,%r11
- popq %r11 # RFLAGS
- popq %rsp # RSP
- je 1f
- sysretq
-1: sysretl
-
- ALIGN
-/* No special register assumptions. */
-iret_exit_to_guest:
addq $8,%rsp
.Lft0: iretq

I think much of the issue stems from Xen defining several segment
descriptors which are essentially equivalent to the ones Linux uses. It
seems a bit ugly to expose these Xen defined descriptors to the guest
when it hasn't explicitly asked for them. On the other hand I'm not sure
what can realistically do since doing the Right Thing would seem to
involve looking up the descriptor in the GDT to determine if the
selector is 32 or 64 bit and/or context switching IA32_STAR in some
fashion to allow guests to specify their own userspace CS for sysret 32
and 64.

Perhaps simply not returning guest userspace with sysret (as above)
makes most sense, a syscall already takes a trap through the hypervisor
on both entry and exit so I'm not sure the difference between sysret and
iret is going to be noticeable.

Another option might be to define VGCF_compat_mode as a new flag to
HYPERVISOR_iret and select sysretq/sysretl based on that. This would
still expose Xen descriptors to guests which didn't ask for one but at
least it would (probably) be a compatible descriptor.

> It does not happen on XenSource 2.6.18 kernel

I assume that this kernel (perhaps coincidentally) manages to use
FLAT_USER_CS32 for compat mode processes.

> , or the Debian 2.6.26 kernel.

This was a forward ported 2.6.18-style kernel so I guess the same reason
as 2.6.18.

Ian.





--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 11-23-2009, 03:31 PM
Bastian Blank
 
Default Bug#544145: Crash with paravirt-ops 2.6.31.6 kernel

On Mon, Nov 23, 2009 at 03:25:35PM +0000, Ian Campbell wrote:
> We are attempting to return to the Linux defined __USER_CS32 (0x23)
> which does not match the test for the Xen defined FLAT_USER_CS32
> (0xe023) and therefore we hit the sysretq instead of the sysretl which
> causes us to return with CS 0xe033 (FLAT_USER_CS64) instead of CS
> 0xe023.

Well, the problem is that this code ignores what the AMD spec stats:

| Because a SYSCALLed operating system can be entered from either 64-bit
| mode or compatibility mode, the corresponding SYSRET must know the mode
| to which it must return. [...] In the service-routine entry point
| code, a flag can be set indicating which type of SYSRET is needed upon
| exiting the called routine.

The code actually have to know if it was called from 64 or compatibility
mode, not assume it. And it also say that you have to use sysret, and
not iret.

Bastian

--
I have never understood the female capacity to avoid a direct answer to
any question.
-- Spock, "This Side of Paradise", stardate 3417.3



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 11-23-2009, 03:31 PM
"Jan Beulich"
 
Default Bug#544145: Crash with paravirt-ops 2.6.31.6 kernel

>>> Ian Campbell <Ian.Campbell@citrix.com> 23.11.09 16:25 >>>
>Perhaps simply not returning guest userspace with sysret (as above)
>makes most sense, a syscall already takes a trap through the hypervisor
>on both entry and exit so I'm not sure the difference between sysret and
>iret is going to be noticeable.

But this is not just the return-to-user-space path you're changing, but
also the hypercall one. You certainly don't want an iret in that case.

I wonder though whether it wouldn't be possible to alter the TRAP_syscall
value (stored when entering the hypervisor) in do_iret(), so that whatever
do_iret() puts on the stack will be processed by iret.

>> It does not happen on XenSource 2.6.18 kernel
>
>I assume that this kernel (perhaps coincidentally) manages to use
>FLAT_USER_CS32 for compat mode processes.
>
>> , or the Debian 2.6.26 kernel.
>
>This was a forward ported 2.6.18-style kernel so I guess the same reason
>as 2.6.18.

If your analysis was right, 2.6.18 as well as our forward ported kernels
should also be affected (both ia32_sysenter_target and ia32_cstar_target
store __USER32_CS to the frame, and return via HYPERVISOR_iret), yet
supposedly they don't have the problem (though I can't say why that
would be). So perhaps there's some other yet un-described aspect to
this, or I'm being confused by something...

Jan




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 11-23-2009, 03:42 PM
Ian Campbell
 
Default Bug#544145: Crash with paravirt-ops 2.6.31.6 kernel

On Mon, 2009-11-23 at 16:31 +0000, Bastian Blank wrote:
> On Mon, Nov 23, 2009 at 03:25:35PM +0000, Ian Campbell wrote:
> > We are attempting to return to the Linux defined __USER_CS32 (0x23)
> > which does not match the test for the Xen defined FLAT_USER_CS32
> > (0xe023) and therefore we hit the sysretq instead of the sysretl which
> > causes us to return with CS 0xe033 (FLAT_USER_CS64) instead of CS
> > 0xe023.
>
> Well, the problem is that this code ignores what the AMD spec stats:
>
> | Because a SYSCALLed operating system can be entered from either 64-bit
> | mode or compatibility mode, the corresponding SYSRET must know the mode
> | to which it must return. [...] In the service-routine entry point
> | code, a flag can be set indicating which type of SYSRET is needed upon
> | exiting the called routine.
>
> The code actually have to know if it was called from 64 or compatibility
> mode, not assume it.

Sounds correct. This is tricky for a hypervisor since we don't know the
mode of the guest user-mode processes which made the syscall. The guest
kernel does know this which is why I proposed an additional
VGCF_compat_mode flag.

> And it also say that you have to use sysret, and not iret.

I don't believe that is the case (the processor would have to carry some
state for the entire duration of a syscall for it to make any
difference). I think the spec simply assumes that an OS author would
want to use sysret if they used syscall.

Ian.





--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 11-23-2009, 03:44 PM
Ian Campbell
 
Default Bug#544145: Crash with paravirt-ops 2.6.31.6 kernel

On Mon, 2009-11-23 at 16:31 +0000, Jan Beulich wrote:
> >>> Ian Campbell <Ian.Campbell@citrix.com> 23.11.09 16:25 >>>
> >Perhaps simply not returning guest userspace with sysret (as above)
> >makes most sense, a syscall already takes a trap through the hypervisor
> >on both entry and exit so I'm not sure the difference between sysret and
> >iret is going to be noticeable.
>
> But this is not just the return-to-user-space path you're changing, but
> also the hypercall one. You certainly don't want an iret in that case.

Don't the hypercalls already always go via iret?
- testw $TRAP_syscall,4(%rsp)
- jz iret_exit_to_guest
IOW if TRAP_syscall is not set (i.e. this is a hypercall not a syscall)
then exit via iret.

> I wonder though whether it wouldn't be possible to alter the TRAP_syscall
> value (stored when entering the hypervisor) in do_iret(), so that whatever
> do_iret() puts on the stack will be processed by iret.

That would make the VGCF_in_syscall passed to the iret hypercall
meaningless/useless?

>
> >> It does not happen on XenSource 2.6.18 kernel
> >
> >I assume that this kernel (perhaps coincidentally) manages to use
> >FLAT_USER_CS32 for compat mode processes.
> >
> >> , or the Debian 2.6.26 kernel.
> >
> >This was a forward ported 2.6.18-style kernel so I guess the same reason
> >as 2.6.18.
>
> If your analysis was right, 2.6.18 as well as our forward ported kernels
> should also be affected (both ia32_sysenter_target and ia32_cstar_target
> store __USER32_CS to the frame, and return via HYPERVISOR_iret), yet
> supposedly they don't have the problem (though I can't say why that
> would be). So perhaps there's some other yet un-described aspect to
> this, or I'm being confused by something...

I didn't try any of these kernels myself so I don't really know what
happens.

Ian.




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 11-23-2009, 04:13 PM
Keir Fraser
 
Default Bug#544145: Crash with paravirt-ops 2.6.31.6 kernel

On 23/11/2009 16:44, "Ian Campbell" <Ian.Campbell@citrix.com> wrote:

>> But this is not just the return-to-user-space path you're changing, but
>> also the hypercall one. You certainly don't want an iret in that case.
>
> Don't the hypercalls already always go via iret?
> - testw $TRAP_syscall,4(%rsp)
> - jz iret_exit_to_guest
> IOW if TRAP_syscall is not set (i.e. this is a hypercall not a syscall)
> then exit via iret.

I think not -- here TRAP_syscall means 'entered Xen via SYSCALL
instruction', not 'entered to do a syscall'. TRAP_syscall should be set
regardless of whether the SYSCALL instruction was executed by guest userland
or guest kernel.

-- Keir





--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 11-23-2009, 04:17 PM
Ian Campbell
 
Default Bug#544145: Crash with paravirt-ops 2.6.31.6 kernel

On Mon, 2009-11-23 at 17:13 +0000, Keir Fraser wrote:
> On 23/11/2009 16:44, "Ian Campbell" <Ian.Campbell@citrix.com> wrote:
>
> >> But this is not just the return-to-user-space path you're changing, but
> >> also the hypercall one. You certainly don't want an iret in that case.
> >
> > Don't the hypercalls already always go via iret?
> > - testw $TRAP_syscall,4(%rsp)
> > - jz iret_exit_to_guest
> > IOW if TRAP_syscall is not set (i.e. this is a hypercall not a syscall)
> > then exit via iret.
>
> I think not -- here TRAP_syscall means 'entered Xen via SYSCALL
> instruction', not 'entered to do a syscall'. TRAP_syscall should be set
> regardless of whether the SYSCALL instruction was executed by guest userland
> or guest kernel.

Oh yes, I was confused into thinking it was the same as VGCF_in_syscall
for some reason.

Ian.
>
> -- Keir
>
>





--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 11-23-2009, 04:23 PM
Bastian Blank
 
Default Bug#544145: Crash with paravirt-ops 2.6.31.6 kernel

On Mon, Nov 23, 2009 at 04:42:59PM +0000, Ian Campbell wrote:
> On Mon, 2009-11-23 at 16:31 +0000, Bastian Blank wrote:
> > The code actually have to know if it was called from 64 or compatibility
> > mode, not assume it.
> Sounds correct. This is tricky for a hypervisor since we don't know the
> mode of the guest user-mode processes which made the syscall. The guest
> kernel does know this which is why I proposed an additional
> VGCF_compat_mode flag.

Yeah.

> > And it also say that you have to use sysret, and not iret.
> I don't believe that is the case (the processor would have to carry some
> state for the entire duration of a syscall for it to make any
> difference). I think the spec simply assumes that an OS author would
> want to use sysret if they used syscall.

Well, it is documented this way. If you ignore it, it can work (and does
in this case) but is undefined behaviour.

Bastian

--
Bones: "The man's DEAD, Jim!"



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 11-23-2009, 11:39 PM
Jeremy Fitzhardinge
 
Default Bug#544145: Crash with paravirt-ops 2.6.31.6 kernel

On 11/23/09 07:25, Ian Campbell wrote:
> On Sun, 2009-11-22 at 09:54 +0000, Bastian Blank wrote:
>
>> On Tue, Nov 17, 2009 at 10:04:36PM +0300, William Pitcock wrote:
>>
>>> [ 1.254927] init[1] general protection ip:f779042f sp:ff9b0340 error:0
>>>
>> Hmm, this looks like the old Debian bug 544145[1]. For some reason the
>> hypervisor jumps back into 64bit mode after a syscall instruction.
>> Workaround: vdso32=0 or deinstall libc6-i686,
>>
> I just noticed that one of my test boxes has a AMD processor so I took a
> bit of a look into this.
>
> The issue seems to be with this bit of code in the hypervisor
> (xen/arch/x86/x86_64/entry.S):
>
> restore_all_guest:
> ASSERT_INTERRUPTS_DISABLED
> RESTORE_ALL
> testw $TRAP_syscall,4(%rsp)
> jz iret_exit_to_guest
>
> addq $8,%rsp
> popq %rcx # RIP
> popq %r11 # CS
> cmpw $FLAT_USER_CS32,%r11
> popq %r11 # RFLAGS
> popq %rsp # RSP
> je 1f
> sysretq
> 1: sysretl
>
> We are attempting to return to the Linux defined __USER_CS32 (0x23)
> which does not match the test for the Xen defined FLAT_USER_CS32
> (0xe023) and therefore we hit the sysretq instead of the sysretl which
> causes us to return with CS 0xe033 (FLAT_USER_CS64) instead of CS
> 0xe023.
>

Ah, good detective work.

> This patch to the kernel fixes things but doesn't seem that
> satisfactory:
>

It is a bit ugly. I guess you could just assert that FLAT_USER_CS32 is
part of the iret ABI so the guest has to use it, which appears to be the
defacto definition. The downside is that usermode could observe that it
has a non-standard cs selector; however, the Linux ABI doesn't define
the selector values (and they're different in native 32 bit vs compat
anyway, I think).

> diff --git a/arch/x86/xen/xen-asm_64.S b/arch/x86/xen/xen-asm_64.S
> index 02f496a..203586d 100644
> --- a/arch/x86/xen/xen-asm_64.S
> +++ b/arch/x86/xen/xen-asm_64.S
> @@ -93,7 +93,7 @@ ENTRY(xen_sysret32)
> pushq $__USER32_DS
> pushq PER_CPU_VAR(old_rsp)
> pushq %r11
> - pushq $__USER32_CS
> + pushq $FLAT_USER_CS32
> pushq %rcx
>
> pushq $VGCF_in_syscall
>
> Coming from the other angle we could fix this in the hypervisor by
> always returning to guest (user or kernel) via iret instead of sysret:
>
> diff -r e7a1eab70fac xen/arch/x86/x86_64/entry.S
> --- a/xen/arch/x86/x86_64/entry.S Mon Nov 09 10:24:54 2009 +0000
> +++ b/xen/arch/x86/x86_64/entry.S Mon Nov 23 15:15:39 2009 +0000
> @@ -48,22 +48,6 @@
> restore_all_guest:
> ASSERT_INTERRUPTS_DISABLED
> RESTORE_ALL
> - testw $TRAP_syscall,4(%rsp)
> - jz iret_exit_to_guest
> -
> - addq $8,%rsp
> - popq %rcx # RIP
> - popq %r11 # CS
> - cmpw $FLAT_USER_CS32,%r11
> - popq %r11 # RFLAGS
> - popq %rsp # RSP
> - je 1f
> - sysretq
> -1: sysretl
> -
> - ALIGN
> -/* No special register assumptions. */
> -iret_exit_to_guest:
> addq $8,%rsp
> .Lft0: iretq
>
> I think much of the issue stems from Xen defining several segment
> descriptors which are essentially equivalent to the ones Linux uses. It
> seems a bit ugly to expose these Xen defined descriptors to the guest
> when it hasn't explicitly asked for them. On the other hand I'm not sure
> what can realistically do since doing the Right Thing would seem to
> involve looking up the descriptor in the GDT to determine if the
> selector is 32 or 64 bit and/or context switching IA32_STAR in some
> fashion to allow guests to specify their own userspace CS for sysret 32
> and 64.
>

That would be a bit awkward to do in the iret fast path.

> Perhaps simply not returning guest userspace with sysret (as above)
> makes most sense, a syscall already takes a trap through the hypervisor
> on both entry and exit so I'm not sure the difference between sysret and
> iret is going to be noticeable.
>
> Another option might be to define VGCF_compat_mode as a new flag to
> HYPERVISOR_iret and select sysretq/sysretl based on that. This would
> still expose Xen descriptors to guests which didn't ask for one but at
> least it would (probably) be a compatible descriptor.
>

I don't think that's much of an improvement over using the Xen selector
for cs. Of course, it requires that the Xen selectors are actually part
of the ABI and won't change at some later point.

J



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 11-23-2009, 11:52 PM
Jeremy Fitzhardinge
 
Default Bug#544145: Crash with paravirt-ops 2.6.31.6 kernel

On 11/23/09 09:23, Bastian Blank wrote:
>> I don't believe that is the case (the processor would have to carry some
>> state for the entire duration of a syscall for it to make any
>> difference). I think the spec simply assumes that an OS author would
>> want to use sysret if they used syscall.
>>
> Well, it is documented this way. If you ignore it, it can work (and does
> in this case) but is undefined behaviour.
>

Linux freely uses iret to return from syscall for things like fork and
exec. They are complimentary instructions, but nothing requires them to
be paired.

J



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 

Thread Tools




All times are GMT. The time now is 09:50 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org