Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian Kernel (http://www.linux-archive.org/debian-kernel/)
-   -   Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90 (http://www.linux-archive.org/debian-kernel/484339-bug-611832-linux-image-2-6-32-5-amd64-general-protection-fault-reboot-under-qemu-native_stop_other_cpus-0x86-0x90.html)

Timo Juhani Lindfors 02-02-2011 04:42 PM

Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90
 
Package: linux-2.6
Version: 2.6.32-30
Severity: normal

Sometimes when I use

shutdown -r now

under qemu I get a general protection fault:

<6>[ 103.542142] e1000 0000:00:03.0: PCI INT A disabled
<0>[ 103.543710] Restarting system.
<4>[ 103.543772] machine restart
<0>[ 103.544118] general protection fault: fff2 [#1] SMP
<0>[ 103.544118] last sysfs file: /sys/devices/pci0000:00/0000:00:01.1/host0/target0:0:0/0:0:0:0/scsi_disk/0:0:0:0/manage_start_stop
<4>[ 103.544118] CPU 0
<4>[ 103.544118] Modules linked in: parport_pc psmouse mtdchar parport i2c_piix4 processor button pcspkr evdev serio_raw i2c_core ext2 mbcache softdog jffs2 zlib_deflat
e lzo_decompress lzo_compress mtdblock mtd_blkdevs mtdram mtd sg sr_mod cdrom sd_mod crc_t10dif ata_generic ata_piix thermal libata floppy thermal_sys scsi_mod e1000 [la
st unloaded: scsi_wait_scan]
<6>[ 103.544118] Pid: 1020, comm: reboot Not tainted 2.6.32-5-amd64 #1 Bochs
<6>[ 103.544118] RIP: 0010:[<ffffffff810239db>] [<ffffffff810239db>] native_stop_other_cpus+0x86/0x90
<6>[ 103.544118] RSP: 0018:ffff88001f2b3e08 EFLAGS: 00000246
<6>[ 103.544118] RAX: 0000000000000001 RBX: 0000000000000246 RCX: 0000000000000001
<6>[ 103.544118] RDX: 0101010101010101 RSI: 00000000000000ff RDI: 0000000000000246
<6>[ 103.544118] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000008
<6>[ 103.544118] R10: 0000000000000000 R11: ffffffff81027dfc R12: ffffffff814d6740
<6>[ 103.544118] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
<6>[ 103.544118] FS: 00007ff3e26fe700(0000) GS:ffff880001800000(0000) knlGS:0000000000000000
<6>[ 103.544118] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<6>[ 103.544118] CR2: 00007ff3e2701000 CR3: 000000001ddda000 CR4: 00000000000006f0
<6>[ 103.544118] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<6>[ 103.544118] DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
<4>[ 103.544118] Process reboot (pid: 1020, threadinfo ffff88001f2b2000, task ffff88001e89e350)
<0>[ 103.544118] Stack:
<4>[ 103.544118] 0000000001234567 0000000028121969 00000000fee1dead ffffffff8102388b
<4>[ 103.544118] <0> 0000000000000000 ffffffff810235f4 0000000001234567 ffffffff8105f961
<4>[ 103.544118] <0> 0000000000000000 0000000000000000 0000000002359000 00007ff3e2186000
<0>[ 103.544118] Call Trace:
<4>[ 103.544118] [<ffffffff8102388b>] ? native_machine_shutdown+0x56/0x6f
<4>[ 103.544118] [<ffffffff810235f4>] ? native_machine_restart+0x21/0x37
<4>[ 103.544118] [<ffffffff8105f961>] ? sys_reboot+0x146/0x190
<4>[ 103.544118] [<ffffffff810cea3e>] ? free_pgtables+0x9c/0xbe
<4>[ 103.544118] [<ffffffff810fea38>] ? dput+0x2c/0x15e
<4>[ 103.544118] [<ffffffff81103295>] ? mntput_no_expire+0x23/0xee
<4>[ 103.544118] [<ffffffff810ecd86>] ? filp_close+0x5b/0x62
<4>[ 103.544118] [<ffffffff81010b42>] ? system_call_fastpath+0x16/0x1b
<0>[ 103.544118] Code: 76 0e 85 ed 75 e0 48 85 db 74 05 48 ff cb eb d6 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 e8 1e 0b 00 00 48 89 df 57 9d <66> 66 90 66 90
5b 5d 41 5c c3 48 83 ec 08 48 8b 05 b0 27 4b 00
<1>[ 103.544118] RIP [<ffffffff810239db>] native_stop_other_cpus+0x86/0x90
<4>[ 103.544118] RSP <ffff88001f2b3e08>
<4>[ 103.544118] ---[ end trace c7434bd0d5312ada ]---

More info:
1) I disassembled the "Code:" part with gdb:

Dump of assembler code for function f:
0x0000000000600860 <f+0>: 76 0e jbe 0x600870 <f+16>
0x0000000000600862 <f+2>: 85 ed test %ebp,%ebp
0x0000000000600864 <f+4>: 75 e0 jne 0x600846 <data_start+6>
0x0000000000600866 <f+6>: 48 85 db test %rbx,%rbx
0x0000000000600869 <f+9>: 74 05 je 0x600870 <f+16>
0x000000000060086b <f+11>: 48 ff cb dec %rbx
0x000000000060086e <f+14>: eb d6 jmp 0x600846 <data_start+6>
0x0000000000600870 <f+16>: 9c pushfq
0x0000000000600871 <f+17>: 58 pop %rax
0x0000000000600872 <f+18>: 66 66 90 xchg %ax,%ax
0x0000000000600875 <f+21>: 66 90 xchg %ax,%ax
0x0000000000600877 <f+23>: 48 89 c3 mov %rax,%rbx
0x000000000060087a <f+26>: fa cli
0x000000000060087b <f+27>: 66 66 90 xchg %ax,%ax
0x000000000060087e <f+30>: 66 66 90 xchg %ax,%ax
0x0000000000600881 <f+33>: e8 1e 0b 00 00 callq 0x6013a4
0x0000000000600886 <f+38>: 48 89 df mov %rbx,%rdi

0x0000000000600889 <f+41>: 57 push %rdi
0x000000000060088a <f+42>: 9d popfq
0x000000000060088b <f+43>: 66 66 90 xchg %ax,%ax
0x000000000060088e <f+46>: 66 90 xchg %ax,%ax

0x0000000000600890 <f+48>: 5b pop %rbx
0x0000000000600891 <f+49>: 5d pop %rbp
0x0000000000600892 <f+50>: 41 5c pop %r12
0x0000000000600894 <f+52>: c3 retq
0x0000000000600895 <f+53>: 48 83 ec 08 sub $0x8,%rsp
0x0000000000600899 <f+57>: 48 8b 05 b0 27 4b 00 mov 0x4b27b0(%rip),%rax # 0xab3050
0x00000000006008a0 <f+64>: 00 00 add %al,(%rax)

2) I then run "objdump -axdt
/usr/lib/debug/boot/vmlinux-2.6.32-5-amd64" to see the code on disk:

ffffffff81023955 <native_stop_other_cpus>:
ffffffff81023955: 41 54 push %r12
ffffffff81023957: 83 3d da 8d 58 00 00 cmpl $0x0,0x588dda(%rip) # ffffffff815ac738 <reboot_force>
ffffffff8102395e: 55 push %rbp
ffffffff8102395f: 89 fd mov %edi,%ebp
ffffffff81023961: 53 push %rbx
ffffffff81023962: 75 7c jne ffffffff810239e0 <native_stop_other_cpus+0x8b>
ffffffff81023964: 4c 8b 25 cd ea 2e 00 mov 0x2eeacd(%rip),%r12 # ffffffff81312438 <cpu_online_mask>
ffffffff8102396b: be 00 02 00 00 mov $0x200,%esi
ffffffff81023970: 4c 89 e7 mov %r12,%rdi
ffffffff81023973: e8 09 38 17 00 callq ffffffff81197181 <__bitmap_weight>
ffffffff81023978: 83 f8 01 cmp $0x1,%eax
ffffffff8102397b: 76 43 jbe ffffffff810239c0 <native_stop_other_cpus+0x6b>
ffffffff8102397d: 48 8b 05 1c 28 4b 00 mov 0x4b281c(%rip),%rax # ffffffff814d61a0 <apic>
ffffffff81023984: bf f8 00 00 00 mov $0xf8,%edi
ffffffff81023989: bb 40 42 0f 00 mov $0xf4240,%ebx
ffffffff8102398e: ff 90 f0 00 00 00 callq *0xf0(%rax)
ffffffff81023994: eb 0a jmp ffffffff810239a0 <native_stop_other_cpus+0x4b>
ffffffff81023996: bf c7 10 00 00 mov $0x10c7,%edi
ffffffff8102399b: e8 72 19 17 00 callq ffffffff81195312 <__const_udelay>
ffffffff810239a0: be 00 02 00 00 mov $0x200,%esi
ffffffff810239a5: 4c 89 e7 mov %r12,%rdi
ffffffff810239a8: e8 d4 37 17 00 callq ffffffff81197181 <__bitmap_weight>
ffffffff810239ad: 83 f8 01 cmp $0x1,%eax
ffffffff810239b0: 76 0e jbe ffffffff810239c0 <native_stop_other_cpus+0x6b>
ffffffff810239b2: 85 ed test %ebp,%ebp
ffffffff810239b4: 75 e0 jne ffffffff81023996 <native_stop_other_cpus+0x41>
ffffffff810239b6: 48 85 db test %rbx,%rbx
ffffffff810239b9: 74 05 je ffffffff810239c0 <native_stop_other_cpus+0x6b>
ffffffff810239bb: 48 ff cb dec %rbx
ffffffff810239be: eb d6 jmp ffffffff81023996 <native_stop_other_cpus+0x41>
ffffffff810239c0: ff 14 25 f0 69 46 81 callq *0xffffffff814669f0
ffffffff810239c7: 48 89 c3 mov %rax,%rbx
ffffffff810239ca: ff 14 25 00 6a 46 81 callq *0xffffffff81466a00
ffffffff810239d1: e8 1e 0b 00 00 callq ffffffff810244f4 <disable_local_APIC>
ffffffff810239d6: 48 89 df mov %rbx,%rdi
ffffffff810239d9: ff 14 25 f8 69 46 81 callq *0xffffffff814669f8
ffffffff810239e0: 5b pop %rbx
ffffffff810239e1: 5d pop %rbp
ffffffff810239e2: 41 5c pop %r12
ffffffff810239e4: c3 retq

3) and also looked at the source:

static void native_stop_other_cpus(int wait)
{
unsigned long flags;
unsigned long timeout;

if (reboot_force)
return;

/*
* Use an own vector here because smp_call_function
* does lots of things not suitable in a panic situation.
* On most systems we could also use an NMI here,
* but there are a few systems around where NMI
* is problematic so stay with an non NMI for now
* (this implies we cannot stop CPUs spinning with irq off
* currently)
*/
if (num_online_cpus() > 1) {
apic->send_IPI_allbutself(REBOOT_VECTOR);

/*
* Don't wait longer than a second if the caller
* didn't ask us to wait.
*/
timeout = USEC_PER_SEC;
while (num_online_cpus() > 1 && (wait || timeout--))
udelay(1);
}

local_irq_save(flags);
disable_local_APIC();
local_irq_restore(flags);
}

4) Observation: RIP == 0xffffffff810239db is in the middle of the

ffffffff810239d9: ff 14 25 f8 69 46 81 callq *0xffffffff814669f8

instruction! If you compare the on-disk data to the "Code:" dump you
see that two calls have been replaced with the mysterious fragment

0x0000000000600889 <f+41>: 57 push %rdi
0x000000000060088a <f+42>: 9d popfq
0x000000000060088b <f+43>: 66 66 90 xchg %ax,%ax
0x000000000060088e <f+46>: 66 90 xchg %ax,%ax


Is this memory corruption? Or is linux trying to patch the calls?


-- Package-specific info:
** Version:
Linux version 2.6.32-5-amd64 (Debian 2.6.32-30) (ben@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed Jan 12 03:40:32 UTC 2011

** Command line:
root=/dev/md0 ro ramroot_uuid=962d307f-8f1f-4301-b07d-587e27bcfd44 ramroot_snapshot=0 panic=60

** Not tainted

** Kernel log:
[ 0.300506] IP route cache hash table entries: 4096 (order: 3, 32768 bytes)
[ 0.303863] TCP established hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.304643] TCP bind hash table entries: 16384 (order: 6, 262144 bytes)
[ 0.305220] TCP: Hash tables configured (established 16384 bind 16384)
[ 0.305305] TCP reno registered
[ 0.306306] NET: Registered protocol family 1
[ 0.306515] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[ 0.306642] pci 0000:00:01.0: PIIX3: Enabling Passive Release
[ 0.306786] pci 0000:00:01.0: Activating ISA DMA hang workarounds
[ 0.306957] pci 0000:00:02.0: Boot video device
[ 0.309300] Unpacking initramfs...
[ 1.658083] Freeing initrd memory: 9146k freed
[ 1.682391] audit: initializing netlink socket (disabled)
[ 1.682851] type=2000 audit(1296666817.680:1): initialized
[ 1.696153] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[ 1.709963] VFS: Disk quotas dquot_6.5.2
[ 1.710503] Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[ 1.712382] msgmni has been set to 993
[ 1.716250] alg: No test for stdrng (krng)
[ 1.716926] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 253)
[ 1.717029] io scheduler noop registered
[ 1.717079] io scheduler anticipatory registered
[ 1.717117] io scheduler deadline registered
[ 1.717485] io scheduler cfq registered (default)
[ 1.733529] Linux agpgart interface v0.103
[ 1.733895] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[ 1.734876] serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 1.738370] 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
[ 1.739747] input: Macintosh mouse button emulation as /devices/virtual/input/input0
[ 1.740912] PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12
[ 1.742764] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 1.742925] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 1.744845] mice: PS/2 mouse device common for all mice
[ 1.746505] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[ 1.747931] rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
[ 1.748451] rtc0: alarms up to one day, 114 bytes nvram, hpet irqs
[ 1.748799] cpuidle: using governor ladder
[ 1.748874] cpuidle: using governor menu
[ 1.749048] No iBFT detected.
[ 1.750973] TCP cubic registered
[ 1.752182] NET: Registered protocol family 10
[ 1.757514] lo: Disabled Privacy Extensions
[ 1.759888] Mobile IPv6
[ 1.760110] NET: Registered protocol family 17
[ 1.761193] PM: Resume from disk failed.
[ 1.761347] registered taskstats version 1
[ 1.762421] rtc_cmos 00:01: setting system clock to 2011-02-02 17:13:38 UTC (1296666818)
[ 1.762691] Initalizing network drop monitor service
[ 1.763416] Freeing unused kernel memory: 592k freed
[ 1.773508] Write protecting the kernel read-only data: 4236k
[ 2.170642] udev[46]: starting version 164
[ 5.746908] SCSI subsystem initialized
[ 5.771328] Intel(R) PRO/1000 Network Driver - version 7.3.21-k5-NAPI
[ 5.771448] Copyright (c) 1999-2006 Intel Corporation.
[ 5.783399] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11
[ 5.784364] e1000 0000:00:03.0: PCI INT A -> Link[LNKC] -> GSI 11 (level, high) -> IRQ 11
[ 5.786657] e1000 0000:00:03.0: setting latency timer to 64
[ 6.050139] e1000: 0000:00:03.0: e1000_probe: (PCI:33MHz:32-bit) 52:54:00:12:36:03
[ 6.229570] FDC 0 is a S82078B
[ 6.259038] libata version 3.00 loaded.
[ 6.338531] e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
[ 6.339300] ata_piix 0000:00:01.1: version 2.13
[ 6.341377] ata_piix 0000:00:01.1: setting latency timer to 64
[ 6.358618] scsi0 : ata_piix
[ 6.366675] scsi1 : ata_piix
[ 6.369102] ata1: PATA max MWDMA2 cmd 0x1f0 ctl 0x3f6 bmdma 0xc000 irq 14
[ 6.369211] ata2: PATA max MWDMA2 cmd 0x170 ctl 0x376 bmdma 0xc008 irq 15
[ 6.534702] ata2.01: NODEV after polling detection
[ 6.535831] ata2.00: ATAPI: QEMU DVD-ROM, 0.12.5, max UDMA/100
[ 6.537083] ata1.01: NODEV after polling detection
[ 6.537564] ata1.00: ATA-7: QEMU HARDDISK, 0.12.5, max UDMA/100
[ 6.537635] ata1.00: 983040 sectors, multi 16: LBA48
[ 6.538574] ata1.00: configured for MWDMA2
[ 6.544685] ata2.00: configured for MWDMA2
[ 6.546499] scsi 0:0:0:0: Direct-Access ATA QEMU HARDDISK 0.12 PQ: 0 ANSI: 5
[ 6.554829] scsi 1:0:0:0: CD-ROM QEMU QEMU DVD-ROM 0.12 PQ: 0 ANSI: 5
[ 6.782419] sd 0:0:0:0: [sda] 983040 512-byte logical blocks: (503 MB/480 MiB)
[ 6.788555] sd 0:0:0:0: [sda] Write Protect is off
[ 6.788704] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 6.789765] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[ 6.817483] sda: sda1
[ 6.835651] sd 0:0:0:0: [sda] Attached SCSI disk
[ 6.890889] sr0: scsi3-mmc drive: 4x/4x xa/form2 tray
[ 6.891124] Uniform CD-ROM driver Revision: 3.20
[ 6.908515] sr 1:0:0:0: Attached scsi CD-ROM sr0
[ 7.296567] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 7.312825] sr 1:0:0:0: Attached scsi generic sg1 type 5
[ 8.223484] JFFS2 version 2.2. (NAND) (SUMMARY) ? 2001-2006 Red Hat, Inc.
[ 8.317383] Software Watchdog Timer: 0.07 initialized. soft_noboot=0 soft_margin=60 sec (nowayout= 0)
[ 23.464102] udev[293]: starting version 164
[ 28.110250] input: PC Speaker as /devices/platform/pcspkr/input/input2
[ 29.292684] processor LNXCPU:00: registered as cooling_device0
[ 29.376527] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
[ 29.406395] ACPI: Power Button [PWRF]
[ 29.501299] piix4_smbus 0000:00:01.3: SMBus Host Controller at 0xb100, revision 0
[ 30.832085] parport_pc 00:05: reported by Plug and Play ACPI
[ 30.861945] parport0: PC-style at 0x378, irq 7 [PCSPP,TRISTATE]
[ 31.022884] input: ImExPS/2 Generic Explorer Mouse as /devices/platform/i8042/serio1/input/input4
[ 40.033423] e1000: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
[ 50.380380] eth0: no IPv6 routers present

** Model information
sys_vendor: Bochs
product_name: Bochs
product_version:
chassis_vendor: Bochs
chassis_version:
bios_vendor: Bochs
bios_version: Bochs

** Loaded modules:
Module Size Used by
parport_pc 18855 0
psmouse 49777 0
parport 27954 1 parport_pc
mtdchar 5434 0
i2c_piix4 8328 0
button 4650 0
processor 29935 0
pcspkr 1699 0
evdev 7352 0
serio_raw 3752 0
i2c_core 15712 1 i2c_piix4
ext2 52969 0
mbcache 5050 1 ext2
softdog 2896 0
jffs2 109352 0
zlib_deflate 17746 1 jffs2
lzo_decompress 2055 1 jffs2
lzo_compress 1734 1 jffs2
mtdblock 3294 0
mtd_blkdevs 4596 1 mtdblock
mtdram 1571 0
mtd 14269 6 mtdchar,jffs2,mtd_blkdevs,mtdram
sg 18744 0
sr_mod 12602 0
sd_mod 29889 0
crc_t10dif 1276 1 sd_mod
cdrom 29415 1 sr_mod
ata_generic 3047 0
ata_piix 21124 0
thermal 11674 0
libata 133632 2 ata_generic,ata_piix
floppy 49087 0
thermal_sys 11942 2 processor,thermal
e1000 85517 0
scsi_mod 122149 4 sg,sr_mod,sd_mod,libata

** Network interface configuration:
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

** Network status:
*** IP interfaces and addresses:
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 52:54:00:12:36:03 brd ff:ff:ff:ff:ff:ff
inet 10.7.6.214/16 brd 10.7.255.255 scope global eth0
inet6 fe80::5054:ff:fe12:3603/64 scope link
valid_lft forever preferred_lft forever

*** Device statistics:
Inter-| Receive | Transmit
face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed
lo: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
eth0:38479057 27305 0 0 0 0 0 0 519975 7069 0 0 0 0 0 0

*** Protocol statistics:
Ip:
26961 total packets received
26 with invalid addresses
0 forwarded
0 incoming packets discarded
26935 incoming packets delivered
7043 requests sent out
Icmp:
0 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
0 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
Tcp:
6 active connections openings
2 passive connection openings
0 failed connection attempts
0 connection resets received
1 connections established
25979 segments received
7009 segments send out
0 segments retransmited
0 bad segments received.
0 resets sent
Udp:
34 packets received
0 packets to unknown port received.
0 packet receive errors
34 packets sent
UdpLite:
TcpExt:
4 TCP sockets finished time wait in fast timer
5 delayed acks sent
1 packets directly queued to recvmsg prequeue.
25161 packet headers predicted
48 acknowledgments not containing data payload received
479 predicted acknowledgments
IpExt:
InMcastPkts: 9
InBcastPkts: 913
InOctets: 38081503
OutOctets: 419465
InMcastOctets: 288
InBcastOctets: 206258

*** Device features:
eth0: 0x10b89
lo: 0x13865

** PCI devices:
not available

** Sound cards:

-- System Information:
Debian Release: 6.0
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.32-5-amd64 (SMP w/1 CPU core)
Locale: LANG=C, LC_CTYPE=fi_FI (charmap=locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
ANSI_X3.4-1968)
Shell: /bin/sh linked to /bin/dash

Versions of packages linux-image-2.6.32-5-amd64 depends on:
ii debconf [debconf-2.0] 1.5.36 Debian configuration management sy
ii initramfs-tools [linux-initra 0.98.7 tools for generating an initramfs
ii linux-base 2.6.32-30 Linux image base package
ii module-init-tools 3.12-1 tools for managing Linux kernel mo

Versions of packages linux-image-2.6.32-5-amd64 recommends:
ii firmware-linux-free 2.6.32-30 Binary firmware for various driver

Versions of packages linux-image-2.6.32-5-amd64 suggests:
ii grub-legacy [grub] 0.97-64 GRand Unified Bootloader (Legacy v
pn linux-doc-2.6.32 <none> (no description available)

Versions of packages linux-image-2.6.32-5-amd64 is related to:
pn firmware-bnx2 <none> (no description available)
pn firmware-bnx2x <none> (no description available)
pn firmware-ipw2x00 <none> (no description available)
pn firmware-ivtv <none> (no description available)
pn firmware-iwlwifi <none> (no description available)
pn firmware-linux <none> (no description available)
pn firmware-linux-nonfree <none> (no description available)
pn firmware-qlogic <none> (no description available)
pn firmware-ralink <none> (no description available)
pn xen-hypervisor <none> (no description available)

-- debconf information:
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "fi_FI",
LANG = (unset)
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
linux-image-2.6.32-5-amd64/postinst/ignoring-do-bootloader-2.6.32-5-amd64:
linux-image-2.6.32-5-amd64/prerm/removing-running-kernel-2.6.32-5-amd64: true
linux-image-2.6.32-5-amd64/postinst/missing-firmware-2.6.32-5-amd64:
linux-image-2.6.32-5-amd64/postinst/depmod-error-initrd-2.6.32-5-amd64: false



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 84pqra1gfa.fsf@sauna.l.org">http://lists.debian.org/84pqra1gfa.fsf@sauna.l.org

Ben Hutchings 02-02-2011 09:36 PM

Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90
 
On Wed, 2011-02-02 at 19:42 +0200, Timo Juhani Lindfors wrote:
> Package: linux-2.6
> Version: 2.6.32-30
> Severity: normal
>
> Sometimes when I use
>
> shutdown -r now
>
> under qemu I get a general protection fault:

Which version of qemu are you using in the host? If you are using
kvm-qemu, which kernel version are you using in the host?

[...]
> 4) Observation: RIP == 0xffffffff810239db is in the middle of the
>
> ffffffff810239d9: ff 14 25 f8 69 46 81 callq *0xffffffff814669f8
>
> instruction! If you compare the on-disk data to the "Code:" dump you
> see that two calls have been replaced with the mysterious fragment
>
> 0x0000000000600889 <f+41>: 57 push %rdi
> 0x000000000060088a <f+42>: 9d popfq
> 0x000000000060088b <f+43>: 66 66 90 xchg %ax,%ax
> 0x000000000060088e <f+46>: 66 90 xchg %ax,%ax
>
>
> Is this memory corruption? Or is linux trying to patch the calls?
[...]

This looks like deliberate patching by the PV-alternatives mechanism.

Ben.

--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

Timo Juhani Lindfors 02-02-2011 09:50 PM

Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90
 
Ben Hutchings <ben@decadent.org.uk> writes:
> Which version of qemu are you using in the host? If you are using
> kvm-qemu, which kernel version are you using in the host?

The host is a xen domU:

lindi1:~$ qemu-system-x86_64 --version
QEMU PC emulator version 0.12.5 (Debian 0.12.5+dfsg-3), Copyright (c) 2003-2008 Fabrice Bellard
lindi1:~$ dpkg-query -W qemu
qemu 0.12.5+dfsg-3
lindi1:~$ dmesg|head -n3
[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 2.6.32-5-amd64 (Debian 2.6.32-30) (ben@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed Jan 12 03:40:32 UTC 2011

>> 0x0000000000600889 <f+41>: 57 push %rdi
>> 0x000000000060088a <f+42>: 9d popfq
>> 0x000000000060088b <f+43>: 66 66 90 xchg %ax,%ax
>> 0x000000000060088e <f+46>: 66 90 xchg %ax,%ax
>
> This looks like deliberate patching by the PV-alternatives mechanism.

Is this PV-alternatives a linux or qemu feature or are they both
cooperating?

I tried to look around but couldn't find the code yet.





--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 84lj1y126i.fsf@sauna.l.org">http://lists.debian.org/84lj1y126i.fsf@sauna.l.org

Ben Hutchings 02-15-2011 02:10 AM

Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90
 
On Thu, 2011-02-03 at 00:50 +0200, Timo Juhani Lindfors wrote:
> Ben Hutchings <ben@decadent.org.uk> writes:
> > Which version of qemu are you using in the host? If you are using
> > kvm-qemu, which kernel version are you using in the host?
>
> The host is a xen domU:

So this is ordinary qemu, not using hardware virtualisation?

> lindi1:~$ qemu-system-x86_64 --version
> QEMU PC emulator version 0.12.5 (Debian 0.12.5+dfsg-3), Copyright (c) 2003-2008 Fabrice Bellard
> lindi1:~$ dpkg-query -W qemu
> qemu 0.12.5+dfsg-3
> lindi1:~$ dmesg|head -n3
> [ 0.000000] Initializing cgroup subsys cpuset
> [ 0.000000] Initializing cgroup subsys cpu
> [ 0.000000] Linux version 2.6.32-5-amd64 (Debian 2.6.32-30) (ben@decadent.org.uk) (gcc version 4.3.5 (Debian 4.3.5-4) ) #1 SMP Wed Jan 12 03:40:32 UTC 2011
>
> >> 0x0000000000600889 <f+41>: 57 push %rdi
> >> 0x000000000060088a <f+42>: 9d popfq
> >> 0x000000000060088b <f+43>: 66 66 90 xchg %ax,%ax
> >> 0x000000000060088e <f+46>: 66 90 xchg %ax,%ax
> >
> > This looks like deliberate patching by the PV-alternatives mechanism.
>
> Is this PV-alternatives a linux or qemu feature or are they both
> cooperating?
>
> I tried to look around but couldn't find the code yet.

It's a kernel feature to be more efficient when running in a recognised
virtual machine implementation (PV = paravirtualisation).

Ben.

--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.

Timo Juhani Lindfors 02-15-2011 03:52 PM

Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90
 
Ben Hutchings <ben@decadent.org.uk> writes:
> It's a kernel feature to be more efficient when running in a recognised
> virtual machine implementation (PV = paravirtualisation).

thanks. I think it is the following code from vmi_32.c:

/*
* Apply patch if appropriate, return length of new instruction
* sequence. The callee does nop padding for us.
*/
static unsigned vmi_patch(u8 type, u16 clobbers, void *insns,
unsigned long ip, unsigned len)
{
switch (type) {
case PARAVIRT_PATCH(pv_irq_ops.irq_disable):
return patch_internal(VMI_CALL_DisableInterrupts, len,
insns, ip);
case PARAVIRT_PATCH(pv_irq_ops.irq_enable):
return patch_internal(VMI_CALL_EnableInterrupts, len,
insns, ip);
case PARAVIRT_PATCH(pv_irq_ops.restore_fl):
return patch_internal(VMI_CALL_SetInterruptMask, len,
insns, ip);
case PARAVIRT_PATCH(pv_irq_ops.save_fl):
return patch_internal(VMI_CALL_GetInterruptMask, len,
insns, ip);
case PARAVIRT_PATCH(pv_cpu_ops.iret):
return patch_internal(VMI_CALL_IRET, len, insns, ip);
case PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit):
return patch_internal(VMI_CALL_SYSEXIT, len, insns, ip);
default:
break;
}
return len;
}

I don't understand how the first xchg instruction in

0x0000000000600889 <f+41>: 57 push %rdi
0x000000000060088a <f+42>: 9d popfq
0x000000000060088b <f+43>: 66 66 90 xchg %ax,%ax
0x000000000060088e <f+46>: 66 90 xchg %ax,%ax

can generate a general protection fault. I googled around and found

"yes - it smells like it tries to deliver vector 0, after the panic
code has deinitialized the lapic / ioapic"

which suggests a qemu bug from
http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-09/msg09652.html

Shall I reassign the bug or do you know how to investigate this more?




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 84oc6d5jga.fsf@sauna.l.org">http://lists.debian.org/84oc6d5jga.fsf@sauna.l.org

Ben Hutchings 03-06-2011 10:50 PM

Bug#611832: linux-image-2.6.32-5-amd64: general protection fault at reboot under qemu: native_stop_other_cpus+0x86/0x90
 
[Excuse the duplicate; this is properly cc'd to bugs.debian.org.]

On Tue, 2011-02-15 at 18:52 +0200, Timo Juhani Lindfors wrote:
> Ben Hutchings <ben@decadent.org.uk> writes:
> > It's a kernel feature to be more efficient when running in a recognised
> > virtual machine implementation (PV = paravirtualisation).
>
> thanks. I think it is the following code from vmi_32.c:
[...]
> I don't understand how the first xchg instruction in
>
> 0x0000000000600889 <f+41>: 57 push %rdi
> 0x000000000060088a <f+42>: 9d popfq
> 0x000000000060088b <f+43>: 66 66 90 xchg %ax,%ax
> 0x000000000060088e <f+46>: 66 90 xchg %ax,%ax
>
> can generate a general protection fault. I googled around and found
>
> "yes - it smells like it tries to deliver vector 0, after the panic
> code has deinitialized the lapic / ioapic"
>
> which suggests a qemu bug from
> http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-09/msg09652.html
>
> Shall I reassign the bug or do you know how to investigate this more?

Sorry, I don't have a good idea how to investigate this further. The
message you're referring to is quite old and I would expect the bug to
have been fixed in qemu since then. Is the KVM host using an old
version?

Ben.

--
Ben Hutchings
Once a job is fouled up, anything done to improve it makes it worse.


All times are GMT. The time now is 11:30 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.