Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian Kernel (http://www.linux-archive.org/debian-kernel/)
-   -   repeated kernel crashes with PCI passthru (http://www.linux-archive.org/debian-kernel/397517-repeated-kernel-crashes-pci-passthru.html)

Csillag Kristof 07-10-2010 11:34 PM

repeated kernel crashes with PCI passthru
 
Hi all,

I have recently upgraded one of my Debian servers
from XEN 3.2 / Kernel 2.6.26
to XEN 4.0 / Kernel 2.6.32.

I have configured PCI passthru for a NIC.

Since the current Debian pvops kernel does not have the xen pci frontend
driver required for PCI passthru, I am running a XEN kernel in both dom0
and domU, so actual kernel versions are:

dom0: 2.6.32-5-xen-amd64 #1 SMP Tue Jun 1
domU: 2.6.32-5-xen-686 #1 SMP Tue Jul 6
the hypervisor is 4.0.1-rc3

(Random notes:
1. the dom0 is 64bit, this domU is 32bit.
2. The dom0 kernel is not the latest (-16), but the one before (-15),
because the current one won't boot up, see #588509 and #588426.
)

* * *

So, the system boots up as it should, but sometimes the domU crashes, with messages like these:

---------------------

[27047.101954] BUG: unable to handle kernel paging request at 00d90200
[27047.101979] IP: [<c11f01aa>] skb_release_data+0x71/0x90
[27047.102000] *pdpt = 0000000001c21027 *pde = 0000000000000000
[27047.102019] Thread overran stack, or stack corrupted
[27047.102031] Oops: 0000 [#1] SMP
[27047.102047] last sysfs file: /sys/devices/virtual/net/ppp0/uevent
[27047.102060] Modules linked in: tun xt_limit nf_nat_irc nf_nat_ftp ipt_LOG ipt_MASQUERADE xt_DSCP ipt_REJECT nf_conntrack_irc nf_conntrack_ftp xt_state xt_TCPMSS xt_tcpmss xt_tcpudp pppoe pppox ppp_generic slhc sundance mii iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle iptable_filter ip_tables x_tables dm_snapshot dm_mirror dm_region_hash dm_log dm_mod loop evdev snd_pcsp snd_pcm snd_timer snd xen_netfront soundcore snd_page_alloc ext3 jbd mbcache thermal_sys xen_blkfront
[27047.102275]
[27047.102285] Pid: 0, comm: swapper Not tainted (2.6.32-5-xen-686 #1)
[27047.102298] EIP: 0061:[<c11f01aa>] EFLAGS: 00010206 CPU: 0
[27047.102310] EIP is at skb_release_data+0x71/0x90
[27047.102321] EAX: 00d90200 EBX: 00000000 ECX: c2939c10 EDX: cec6b500
[27047.102333] ESI: cf8f0a80 EDI: cf8f09c0 EBP: c13919c8 ESP: c1383eec
[27047.102346] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0069
[27047.102358] Process swapper (pid: 0, ti=c1382000 task=c13c2ba0 task.ti=c13820
[27047.102371] Stack:
[27047.102379] cf8f0a80 c293a700 c11efdfb cf8f09c0 c11f4c35 00000011 c1380000 00000002
[27047.102415] <0> 00000008 c13919c8 c103c1ec c14594b0 00000001 0000000a 00000000 00000100
[27047.102455] <0> c1380000 00000000 c13c5d18 00000000 c103c2c4 00000000 c1383f5c c103c39a
[27047.102499] Call Trace:
[27047.102512] [<c11efdfb>] ? __kfree_skb+0xf/0x6e
[27047.102527] [<c11f4c35>] ? net_tx_action+0x58/0xf9
[27047.102542] [<c103c1ec>] ? __do_softirq+0xaa/0x151
[27047.102557] [<c103c2c4>] ? do_softirq+0x31/0x3c
[27047.102570] [<c103c39a>] ? irq_exit+0x26/0x58
[27047.102586] [<c1198a46>] ? xen_evtchn_do_upcall+0x22/0x2c
[27047.102604] [<c1009b7f>] ? xen_do_upcall+0x7/0xc
[27047.102630] [<c10023a7>] ? hypercall_page+0x3a7/0x1001
[27047.102647] [<c1006169>] ? xen_safe_halt+0xf/0x1b
[27047.102661] [<c10042bf>] ? xen_idle+0x23/0x30
[27047.102676] [<c1008168>] ? cpu_idle+0x89/0xa5
[27047.102691] [<c13fb80d>] ? start_kernel+0x318/0x31d
[27047.102706] [<c13fd3c3>] ? xen_start_kernel+0x615/0x61c
[27047.102721] [<c1409045>] ? print_local_APIC+0x61/0x380
[27047.102732] Code: 8b 44 02 30 e8 9a 4f ea ff 8b 96 a4 00 00 00 0f b7 42 04 39 c3 7c e5 8b 96 a4 00 00 00 8b 42 1c 85 c0 74 16 c7 42 1c 00 00 00 00 <8b> 18 e8 d2 fc ff ff 85 db 74 04 89 d8 eb f1 8b 86 a8 00 00 00
[27047.102981] EIP: [<c11f01aa>] skb_release_data+0x71/0x90 SS:ESP 0069:c1383eec
[27047.103003] CR2: 0000000000d90200
[27047.103018] ---[ end trace a577dfc0e629cd07 ]---
[27047.103028] Kernel panic - not syncing: Fatal exception in interrupt
[27047.103042] Pid: 0, comm: swapper Tainted: G D 2.6.32-5-xen-686 #1
[27047.103053] Call Trace:
[27047.103065] [<c128ae0d>] ? panic+0x38/0xe4
[27047.103078] [<c128d419>] ? oops_end+0x91/0x9d
[27047.103092] [<c1021b5a>] ? no_context+0x134/0x13d
[27047.103106] [<c1021c78>] ? __bad_area_nosemaphore+0x115/0x11d
[27047.103121] [<c10067f0>] ? check_events+0x8/0xc
[27047.103135] [<c10067e7>] ? xen_restore_fl_direct_end+0x0/0x1
[27047.103155] [<d0823fdb>] ? xennet_poll+0xaeb/0xb04 [xen_netfront]
[27047.103170] [<c10211df>] ? pvclock_clocksource_read+0xf9/0x10f
[27047.103185] [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10
[27047.103200] [<c114a00f>] ? xen_swiotlb_unmap_page+0x0/0x7
[27047.103214] [<c10067f0>] ? check_events+0x8/0xc
[27047.103227] [<c10060e8>] ? xen_force_evtchn_callback+0xc/0x10
[27047.103242] [<c128e3f4>] ? do_page_fault+0x115/0x307
[27047.103255] [<c128e2df>] ? do_page_fault+0x0/0x307
[27047.103268] [<c1021c8a>] ? bad_area_nosemaphore+0xa/0xc
[27047.103282] [<c128cb0b>] ? error_code+0x73/0x78
[27047.103295] [<c11f01aa>] ? skb_release_data+0x71/0x90
[27047.103308] [<c11efdfb>] ? __kfree_skb+0xf/0x6e
[27047.103321] [<c11f4c35>] ? net_tx_action+0x58/0xf9
[27047.103335] [<c103c1ec>] ? __do_softirq+0xaa/0x151
[27047.103348] [<c103c2c4>] ? do_softirq+0x31/0x3c
[27047.103361] [<c103c39a>] ? irq_exit+0x26/0x58
[27047.103374] [<c1198a46>] ? xen_evtchn_do_upcall+0x22/0x2c
[27047.103388] [<c1009b7f>] ? xen_do_upcall+0x7/0xc
[27047.103401] [<c10023a7>] ? hypercall_page+0x3a7/0x1001
[27047.103415] [<c1006169>] ? xen_safe_halt+0xf/0x1b
[27047.103428] [<c10042bf>] ? xen_idle+0x23/0x30
[27047.103440] [<c1008168>] ? cpu_idle+0x89/0xa5
[27047.103454] [<c13fb80d>] ? start_kernel+0x318/0x31d
[27047.103467] [<c13fd3c3>] ? xen_start_kernel+0x615/0x61c
[27047.103481] [<c1409045>] ? print_local_APIC+0x61/0x380
------------------------------------------------------------------------------------

Then, since the IRQ of the card is shared with the SATA controller,
this basically kills the whole host, requiring a HW reset.

(Sometimes this second problem also occurs when I am rebooting the domU normally;
see http://lists.xensource.com/archives/html/xen-devel/2009-07/msg00224.html
for the thread about the shared IRQ problem. )

This happens once in a few days, sometimes in a few hours, basically making
the whole system unusable.

* * *

Does anybody have any idea what could be happening here? How can I fix this?

Thank you for your help:

Kristof Csillag



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4C390380.9000104@gmail.com">http://lists.debian.org/4C390380.9000104@gmail.com


All times are GMT. The time now is 03:48 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.