FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 05-09-2012, 11:20 PM
Jonathan Nieder
 
Default Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)

Hi Gustavo,

gustavo panizzo wrote:

> i can get the nic to work using latest linus tree
> + ancient gentoo userland (udev 124), but is running at 10Mb/s half duplex
>
> 3.4.0-rc6+
> Settings for eth0:
> Supported ports: [ TP MII ]
> Supported link modes: 10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
[...]
> Advertised auto-negotiation: Yes
> Speed: 10Mb/s
> Duplex: Half
[...]
> Auto-negotiation: off
[...]
> while 2.6.28 runs at 100Mb/s full duplex
>
> Settings for eth0:
> Supported ports: [ TP MII ]
> Supported link modes: 10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
[...]
> Advertised auto-negotiation: No
> Speed: 100Mb/s
> Duplex: Full
[...]
> Auto-negotiation: on
[...]
> i will try latter with kernel from d-i or testing, but i think this
> sould go upstream

Interesting. How does a 3.2.y kernel behave with the ancient gentoo
userland? (Perhaps this is what you are planning to try later.)



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120509232056.GA7921@burratino">http://lists.debian.org/20120509232056.GA7921@burratino
 
Old 05-11-2012, 12:39 AM
"gustavo panizzo
 
Default Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)

> Interesting. How does a 3.2.y kernel behave with the ancient gentoo
> userland? (Perhaps this is what you are planning to try later.)

Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 0
Transceiver: external
Auto-negotiation: on
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
Link detected: yes

kernel is 3.2.15 taken out from apt-get linux-source-3.2
config is the same "gentoo" config

i cannot get to boot linux-image-3.2.0-2-sparc64_3.2.16-1_sparc due to not being able to mount root fs
i see this errors on kernel log

[ 52.363317] sun_esp: Unknown symbol scsi_esp_register (err 0)
[ 52.439003] sun_esp: Unknown symbol scsi_esp_intr (err 0)
[ 52.509998] sun_esp: Unknown symbol scsi_host_put (err 0)
[ 52.581304] sun_esp: Unknown symbol scsi_esp_template (err 0)
[ 52.656890] sun_esp: Unknown symbol scsi_esp_unregister (err 0)
[ 52.734804] sun_esp: Unknown symbol scsi_esp_cmd (err 0)
[ 52.804672] sun_esp: Unknown symbol scsi_host_alloc (err 0)
[ 53.004224] SCSI subsystem initialized

i will continue to experiment with this kernel (hopefully debootstrap will finish soon)

--
1AE0 322E B8F7 4717 BDEA BF1D 44BB 1BA7 9F6C 6333



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120511003911.GE28653@io.zumbi.com.ar">http://lists.debian.org/20120511003911.GE28653@io.zumbi.com.ar
 
Old 05-11-2012, 03:25 PM
"gustavo panizzo
 
Default Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)

adding debian-boot


i've installed unstable on the box (using debootstrap) and it boots
3.2.0-2-sparc64 sucessfully, networking works

obp diags shows no errors

but when i boot from network using
http://d-i.debian.org/daily-images/sparc/daily/netboot/boot.img 11-05-2012

i get the following error

┌─────────────── Detecting link on eth0; please wait... ├─────────────── ┐
│ │
│ 100% [ 246.994391] Unable to handle kernel NULL pointer dereference
247.074490] tsk->{mm,active_mm}->context = 000000000000019f │
14;10H[ 247.164534] tsk->{mm,active_mm}->pgd = fffff8001d48c000 │
[ 247.240508] Kernel panic - not syncing: Aiee, killing interrupt handler! │
[ 247.328648] Call Trace: │
[ 247.360793] [000000000045dcd4] do_exit+0x94/0x708 │
[ 247.423821] [0000000000427550] die_if_kernel+0x2a0/0x2c8─────────────── ─┘
[ 247.494864] [0000000000768c84] unhandled_fault+0x8c/0x98
[ 247.565915] [000000000076936c] do_sparc64_fault+0x6dc/0x780
[ 247.640377] [0000000000407880] sparc64_realfault_common+0x10/0x20
[ 247.721722] [0000000010015680] gem_poll+0x9fc/0x1328 [sungem]
[ 247.798478] [0000000000697110] net_rx_action+0x9c/0x234
[ 247.868369] [00000000004607f0] __do_softirq+0xdc/0x1c4
[ 247.937125] [000000000042a76c] do_softirq+0x54/0x80
[ 248.002442] [0000000000460a6c] irq_exit+0x38/0x94
[ 248.065474] [000000000042df38] timer_interrupt+0x90/0xa8
[ 248.136516] [00000000004209d4] tl0_irq14+0x14/0x20
[ 248.200692] [000000000049e764] touch_softlockup_watchdog+0x4/0xc
[ 248.280888] [00000000008f07e4] start_kernel+0x390/0x3a0
[ 248.350783] [0000000000750b88] tlb_fixup_done+0x80/0x88
[ 248.420672] [0000000000000000] (null)
[ 248.481416] Press Stop-A (L1-A) to return to the boot prom



--
1AE0 322E B8F7 4717 BDEA BF1D 44BB 1BA7 9F6C 6333



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120511152501.GB659@io.zumbi.com.ar">http://lists.debian.org/20120511152501.GB659@io.zumbi.com.ar
 
Old 05-11-2012, 10:04 PM
Jurij Smakov
 
Default Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)

On Fri, May 11, 2012 at 12:25:01PM -0300, gustavo panizzo <gfa> wrote:
> adding debian-boot
>
>
> i've installed unstable on the box (using debootstrap) and it boots
> 3.2.0-2-sparc64 sucessfully, networking works
>
> obp diags shows no errors
>
> but when i boot from network using
> http://d-i.debian.org/daily-images/sparc/daily/netboot/boot.img 11-05-2012
>
> i get the following error
>
> ┌─────────────── Detecting link on eth0; please wait... ├─────────────── ┐
> │ │
> │ 100% [ 246.994391] Unable to handle kernel NULL pointer dereference
> 247.074490] tsk->{mm,active_mm}->context = 000000000000019f │
> 14;10H[ 247.164534] tsk->{mm,active_mm}->pgd = fffff8001d48c000 │
> [ 247.240508] Kernel panic - not syncing: Aiee, killing interrupt handler! │
> [ 247.328648] Call Trace: │
> [ 247.360793] [000000000045dcd4] do_exit+0x94/0x708 │
> [ 247.423821] [0000000000427550] die_if_kernel+0x2a0/0x2c8─────────────── ─┘
> [ 247.494864] [0000000000768c84] unhandled_fault+0x8c/0x98
> [ 247.565915] [000000000076936c] do_sparc64_fault+0x6dc/0x780
> [ 247.640377] [0000000000407880] sparc64_realfault_common+0x10/0x20
> [ 247.721722] [0000000010015680] gem_poll+0x9fc/0x1328 [sungem]
> [ 247.798478] [0000000000697110] net_rx_action+0x9c/0x234
> [ 247.868369] [00000000004607f0] __do_softirq+0xdc/0x1c4
> [ 247.937125] [000000000042a76c] do_softirq+0x54/0x80
> [ 248.002442] [0000000000460a6c] irq_exit+0x38/0x94
> [ 248.065474] [000000000042df38] timer_interrupt+0x90/0xa8
> [ 248.136516] [00000000004209d4] tl0_irq14+0x14/0x20
> [ 248.200692] [000000000049e764] touch_softlockup_watchdog+0x4/0xc
> [ 248.280888] [00000000008f07e4] start_kernel+0x390/0x3a0
> [ 248.350783] [0000000000750b88] tlb_fixup_done+0x80/0x88
> [ 248.420672] [0000000000000000] (null)
> [ 248.481416] Press Stop-A (L1-A) to return to the boot prom

Interesting, so we are doing something funky during link detection to
trip this bug. The code which does it is in netcfg:

http://anonscm.debian.org/gitweb/?p=d-i/netcfg.git;a=tree

Here's the relevant code from netcfg-common.c:

1277 debconf_capb(client, "progresscancel");
1278 debconf_subst(client, "netcfg/link_detect_progress", "interface", if_name);
1279 debconf_progress_start(client, 0, 100, "netcfg/link_detect_progress");
1280 for (count = 0; count < link_waits; count++) {
1281 usleep(250000);
1282 if (debconf_progress_set(client, 50 * count / link_waits) == 30) {
1283 /* User cancelled on us... bugger */
1284 rv = 0;
1285 break;
1286 }
1287 if (ethtool_lite(if_name) == 1) /* ethtool-lite's CONNECTED */ {
1288 if (gateway.s_addr && !is_wireless_iface(if_name)) {
1289 for (count = 0; count < gw_tries; count++) {
1290 if (di_exec_shell_log(arping) == 0)
1291 break;
1292 if (debconf_progress_set(client, 50 + 50 * count / gw_tries) == 30)
1293 break;
1294 }
1295 }
1296 rv = 1;
1297 break;
1298 }
1299 debconf_progress_set(client, 100);
1300 }

Only two non-trivial things here: execution of ethtool_lite(if_name)
and invocation of arping. I would put my money on the former (defined
in ethtool_lite.c), because it uses low-level ioctls to query the
interface state.

You can test whether running it would trigger a failure on your
machine by downloading ethtool_lite.c and building it as a standalone
binary, the following commands appear to do the trick:

$ sudo apt-get build-dep netcfg
[...]
$ gcc -o ethtool-lite -DTEST ethtool-lite.c -ldebconfclient -ldebian-installer
$ sudo ./ethtool-lite eth0
ethtool-lite: eth0 is connected.
$

If that triggers a null pointer exception on your machine (try it both
with and without network brought up and check dmesg afterwards), we
will be in a very good position to report it upstream for fixing.

Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120511220421.GA10999@wooyd.org">http://lists.debian.org/20120511220421.GA10999@wooyd.org
 
Old 05-12-2012, 02:39 AM
"gustavo panizzo
 
Default Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)

Jurij Smakov <jurij@wooyd.org> wrote:

>
>If that triggers a null pointer exception on your machine (try it both
>with and without network brought up and check dmesg afterwards), we
>will be in a very good position to report it upstream for fixing.

i will be checking it next week
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 815ea480-c57d-4be9-902f-d66e6270d79c@email.android.com">http://lists.debian.org/815ea480-c57d-4be9-902f-d66e6270d79c@email.android.com
 
Old 05-12-2012, 03:43 PM
Ben Hutchings
 
Default Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)

On Fri, 2012-05-11 at 12:25 -0300, gustavo panizzo wrote:
> adding debian-boot
>
>
> i've installed unstable on the box (using debootstrap) and it boots
> 3.2.0-2-sparc64 sucessfully, networking works
>
> obp diags shows no errors
>
> but when i boot from network using
> http://d-i.debian.org/daily-images/sparc/daily/netboot/boot.img 11-05-2012
>
> i get the following error
>
> ┌─────────────── Detecting link on eth0; please wait... ├─────────────── ┐
> │ │
> │ 100% [ 246.994391] Unable to handle kernel NULL pointer dereference
> 247.074490] tsk->{mm,active_mm}->context = 000000000000019f │
> 14;10H[ 247.164534] tsk->{mm,active_mm}->pgd = fffff8001d48c000 │
> [ 247.240508] Kernel panic - not syncing: Aiee, killing interrupt handler! │
> [ 247.328648] Call Trace: │
> [ 247.360793] [000000000045dcd4] do_exit+0x94/0x708 │
> [ 247.423821] [0000000000427550] die_if_kernel+0x2a0/0x2c8─────────────── ─┘
> [ 247.494864] [0000000000768c84] unhandled_fault+0x8c/0x98
> [ 247.565915] [000000000076936c] do_sparc64_fault+0x6dc/0x780
> [ 247.640377] [0000000000407880] sparc64_realfault_common+0x10/0x20
> [ 247.721722] [0000000010015680] gem_poll+0x9fc/0x1328 [sungem]
[...]

This means we crashed:

> static __inline__ void gem_tx(struct net_device *dev, struct gem *gp, u32 gem_status)
> {
> int entry, limit;
>
> entry = gp->tx_old;
> limit = ((gem_status & GREG_STAT_TXNR) >> GREG_STAT_TXNR_SHIFT);
> while (entry != limit) {
> struct sk_buff *skb;
> struct gem_txd *txd;
> dma_addr_t dma_addr;
> u32 dma_len;
> int frag;
>
> if (netif_msg_tx_done(gp))
> printk(KERN_DEBUG "%s: tx done, slot %d
",
> gp->dev->name, entry);
> skb = gp->tx_skbs[entry];
> if (skb_shinfo(skb)->nr_frags) {

right here, while evaluating skb_shinfo(skb). Which probably means skb
was null. This *could* be due to broken hardware telling us that more
packets were sent then we actually queued, but probably not since
'networking works' when not using netboot.

Is the driver successfully resetting the network controller while
net-booting? It can time-out and will then log "SW reset is ghetto" but
will *not* abort initialisation.

Ben.

--
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
- Carolyn Scheppner
 
Old 05-22-2012, 10:26 PM
"gustavo panizzo
 
Default Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)

On Fri, May 11, 2012 at 11:04:22PM +0100, Jurij Smakov wrote:
[snip]

>
> Only two non-trivial things here: execution of ethtool_lite(if_name)
> and invocation of arping. I would put my money on the former (defined
> in ethtool_lite.c), because it uses low-level ioctls to query the
> interface state.
>
> You can test whether running it would trigger a failure on your
> machine by downloading ethtool_lite.c and building it as a standalone
> binary, the following commands appear to do the trick:
>
> $ sudo apt-get build-dep netcfg
> [...]
> $ gcc -o ethtool-lite -DTEST ethtool-lite.c -ldebconfclient -ldebian-installer
> $ sudo ./ethtool-lite eth0
> ethtool-lite: eth0 is connected.
> $
>
> If that triggers a null pointer exception on your machine (try it both
> with and without network brought up and check dmesg afterwards), we
> will be in a very good position to report it upstream for fixing.
i cannot repeat the issue using ethtool-lite (or arping) while booting
from disk, i can repeat the issue booting from network (22/05/2012
image) running netcfg or udhcp


also i can repeat the issue running
~ # ip link set dev eth0 up
while the cable is plugged in, or running the command and plugging the
cable later

if i (after getting the netimage) remove the link on eth0 and plug
eth1, installer works fine

--
1AE0 322E B8F7 4717 BDEA BF1D 44BB 1BA7 9F6C 6333



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120522222622.GB22114@io.zumbi.com.ar">http://lists.debian.org/20120522222622.GB22114@io.zumbi.com.ar
 

Thread Tools




All times are GMT. The time now is 01:27 AM.

VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org