Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)
Hi Gustavo,
gustavo panizzo wrote:
> i can get the nic to work using latest linus tree
> + ancient gentoo userland (udev 124), but is running at 10Mb/s half duplex
>
> 3.4.0-rc6+
> Settings for eth0:
> Supported ports: [ TP MII ]
> Supported link modes: 10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
[...]
> Advertised auto-negotiation: Yes
> Speed: 10Mb/s
> Duplex: Half
[...]
> Auto-negotiation: off
[...]
> while 2.6.28 runs at 100Mb/s full duplex
>
> Settings for eth0:
> Supported ports: [ TP MII ]
> Supported link modes: 10baseT/Half 10baseT/Full
> 100baseT/Half 100baseT/Full
[...]
> Advertised auto-negotiation: No
> Speed: 100Mb/s
> Duplex: Full
[...]
> Auto-negotiation: on
[...]
> i will try latter with kernel from d-i or testing, but i think this
> sould go upstream
Interesting. How does a 3.2.y kernel behave with the ancient gentoo
userland? (Perhaps this is what you are planning to try later.)
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120509232056.GA7921@burratino">http://lists.debian.org/20120509232056.GA7921@burratino
05-11-2012, 12:39 AM
"gustavo panizzo
Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)
> Interesting. How does a 3.2.y kernel behave with the ancient gentoo
> userland? (Perhaps this is what you are planning to try later.)
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Advertised auto-negotiation: Yes
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 0
Transceiver: external
Auto-negotiation: on
Supports Wake-on: d
Wake-on: d
Current message level: 0x00000007 (7)
Link detected: yes
kernel is 3.2.15 taken out from apt-get linux-source-3.2
config is the same "gentoo" config
i cannot get to boot linux-image-3.2.0-2-sparc64_3.2.16-1_sparc due to not being able to mount root fs
i see this errors on kernel log
[ 52.363317] sun_esp: Unknown symbol scsi_esp_register (err 0)
[ 52.439003] sun_esp: Unknown symbol scsi_esp_intr (err 0)
[ 52.509998] sun_esp: Unknown symbol scsi_host_put (err 0)
[ 52.581304] sun_esp: Unknown symbol scsi_esp_template (err 0)
[ 52.656890] sun_esp: Unknown symbol scsi_esp_unregister (err 0)
[ 52.734804] sun_esp: Unknown symbol scsi_esp_cmd (err 0)
[ 52.804672] sun_esp: Unknown symbol scsi_host_alloc (err 0)
[ 53.004224] SCSI subsystem initialized
i will continue to experiment with this kernel (hopefully debootstrap will finish soon)
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120511003911.GE28653@io.zumbi.com.ar">http://lists.debian.org/20120511003911.GE28653@io.zumbi.com.ar
05-11-2012, 03:25 PM
"gustavo panizzo
Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)
adding debian-boot
i've installed unstable on the box (using debootstrap) and it boots
3.2.0-2-sparc64 sucessfully, networking works
obp diags shows no errors
but when i boot from network using
http://d-i.debian.org/daily-images/sparc/daily/netboot/boot.img 11-05-2012
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120511152501.GB659@io.zumbi.com.ar">http://lists.debian.org/20120511152501.GB659@io.zumbi.com.ar
05-11-2012, 10:04 PM
Jurij Smakov
Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)
On Fri, May 11, 2012 at 12:25:01PM -0300, gustavo panizzo <gfa> wrote:
> adding debian-boot
>
>
> i've installed unstable on the box (using debootstrap) and it boots
> 3.2.0-2-sparc64 sucessfully, networking works
>
> obp diags shows no errors
>
> but when i boot from network using
> http://d-i.debian.org/daily-images/sparc/daily/netboot/boot.img 11-05-2012
>
> i get the following error
>
> ┌─────────────── Detecting link on eth0; please wait... ├─────────────── ┐
> │ │
> │ 100% [ 246.994391] Unable to handle kernel NULL pointer dereference
> 247.074490] tsk->{mm,active_mm}->context = 000000000000019f │
> 14;10H[ 247.164534] tsk->{mm,active_mm}->pgd = fffff8001d48c000 │
> [ 247.240508] Kernel panic - not syncing: Aiee, killing interrupt handler! │
> [ 247.328648] Call Trace: │
> [ 247.360793] [000000000045dcd4] do_exit+0x94/0x708 │
> [ 247.423821] [0000000000427550] die_if_kernel+0x2a0/0x2c8─────────────── ─┘
> [ 247.494864] [0000000000768c84] unhandled_fault+0x8c/0x98
> [ 247.565915] [000000000076936c] do_sparc64_fault+0x6dc/0x780
> [ 247.640377] [0000000000407880] sparc64_realfault_common+0x10/0x20
> [ 247.721722] [0000000010015680] gem_poll+0x9fc/0x1328 [sungem]
> [ 247.798478] [0000000000697110] net_rx_action+0x9c/0x234
> [ 247.868369] [00000000004607f0] __do_softirq+0xdc/0x1c4
> [ 247.937125] [000000000042a76c] do_softirq+0x54/0x80
> [ 248.002442] [0000000000460a6c] irq_exit+0x38/0x94
> [ 248.065474] [000000000042df38] timer_interrupt+0x90/0xa8
> [ 248.136516] [00000000004209d4] tl0_irq14+0x14/0x20
> [ 248.200692] [000000000049e764] touch_softlockup_watchdog+0x4/0xc
> [ 248.280888] [00000000008f07e4] start_kernel+0x390/0x3a0
> [ 248.350783] [0000000000750b88] tlb_fixup_done+0x80/0x88
> [ 248.420672] [0000000000000000] (null)
> [ 248.481416] Press Stop-A (L1-A) to return to the boot prom
Interesting, so we are doing something funky during link detection to
trip this bug. The code which does it is in netcfg:
Only two non-trivial things here: execution of ethtool_lite(if_name)
and invocation of arping. I would put my money on the former (defined
in ethtool_lite.c), because it uses low-level ioctls to query the
interface state.
You can test whether running it would trigger a failure on your
machine by downloading ethtool_lite.c and building it as a standalone
binary, the following commands appear to do the trick:
If that triggers a null pointer exception on your machine (try it both
with and without network brought up and check dmesg afterwards), we
will be in a very good position to report it upstream for fixing.
Best regards,
--
Jurij Smakov jurij@wooyd.org
Key: http://www.wooyd.org/pgpkey/ KeyID: C99E03CC
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120511220421.GA10999@wooyd.org">http://lists.debian.org/20120511220421.GA10999@wooyd.org
05-12-2012, 02:39 AM
"gustavo panizzo
Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)
Jurij Smakov <jurij@wooyd.org> wrote:
>
>If that triggers a null pointer exception on your machine (try it both
>with and without network brought up and check dmesg afterwards), we
>will be in a very good position to report it upstream for fixing.
i will be checking it next week
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 815ea480-c57d-4be9-902f-d66e6270d79c@email.android.com">http://lists.debian.org/815ea480-c57d-4be9-902f-d66e6270d79c@email.android.com
05-12-2012, 03:43 PM
Ben Hutchings
Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)
On Fri, 2012-05-11 at 12:25 -0300, gustavo panizzo wrote:
> adding debian-boot
>
>
> i've installed unstable on the box (using debootstrap) and it boots
> 3.2.0-2-sparc64 sucessfully, networking works
>
> obp diags shows no errors
>
> but when i boot from network using
> http://d-i.debian.org/daily-images/sparc/daily/netboot/boot.img 11-05-2012
>
> i get the following error
>
> ┌─────────────── Detecting link on eth0; please wait... ├─────────────── ┐
> │ │
> │ 100% [ 246.994391] Unable to handle kernel NULL pointer dereference
> 247.074490] tsk->{mm,active_mm}->context = 000000000000019f │
> 14;10H[ 247.164534] tsk->{mm,active_mm}->pgd = fffff8001d48c000 │
> [ 247.240508] Kernel panic - not syncing: Aiee, killing interrupt handler! │
> [ 247.328648] Call Trace: │
> [ 247.360793] [000000000045dcd4] do_exit+0x94/0x708 │
> [ 247.423821] [0000000000427550] die_if_kernel+0x2a0/0x2c8─────────────── ─┘
> [ 247.494864] [0000000000768c84] unhandled_fault+0x8c/0x98
> [ 247.565915] [000000000076936c] do_sparc64_fault+0x6dc/0x780
> [ 247.640377] [0000000000407880] sparc64_realfault_common+0x10/0x20
> [ 247.721722] [0000000010015680] gem_poll+0x9fc/0x1328 [sungem]
[...]
right here, while evaluating skb_shinfo(skb). Which probably means skb
was null. This *could* be due to broken hardware telling us that more
packets were sent then we actually queued, but probably not since
'networking works' when not using netboot.
Is the driver successfully resetting the network controller while
net-booting? It can time-out and will then log "SW reset is ghetto" but
will *not* abort initialisation.
Ben.
--
Ben Hutchings
Experience is directly proportional to the value of equipment destroyed.
- Carolyn Scheppner
05-22-2012, 10:26 PM
"gustavo panizzo
Bug#671895: Kernel NULL pointer dereference in sungem/gem_poll() ( updates)
On Fri, May 11, 2012 at 11:04:22PM +0100, Jurij Smakov wrote:
[snip]
>
> Only two non-trivial things here: execution of ethtool_lite(if_name)
> and invocation of arping. I would put my money on the former (defined
> in ethtool_lite.c), because it uses low-level ioctls to query the
> interface state.
>
> You can test whether running it would trigger a failure on your
> machine by downloading ethtool_lite.c and building it as a standalone
> binary, the following commands appear to do the trick:
>
> $ sudo apt-get build-dep netcfg
> [...]
> $ gcc -o ethtool-lite -DTEST ethtool-lite.c -ldebconfclient -ldebian-installer
> $ sudo ./ethtool-lite eth0
> ethtool-lite: eth0 is connected.
> $
>
> If that triggers a null pointer exception on your machine (try it both
> with and without network brought up and check dmesg afterwards), we
> will be in a very good position to report it upstream for fixing.
i cannot repeat the issue using ethtool-lite (or arping) while booting
from disk, i can repeat the issue booting from network (22/05/2012
image) running netcfg or udhcp
also i can repeat the issue running
~ # ip link set dev eth0 up
while the cable is plugged in, or running the command and plugging the
cable later
if i (after getting the netimage) remove the link on eth0 and plug
eth1, installer works fine
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120522222622.GB22114@io.zumbi.com.ar">http://lists.debian.org/20120522222622.GB22114@io.zumbi.com.ar