Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Debian Kernel (http://www.linux-archive.org/debian-kernel/)
-   -   Bug#628444: iwlagn - "MAC is in deep sleep", cannot restore wifi operation (http://www.linux-archive.org/debian-kernel/602031-bug-628444-iwlagn-mac-deep-sleep-cannot-restore-wifi-operation.html)

Jonathan Nieder 11-24-2011 02:46 AM

Bug#628444: iwlagn - "MAC is in deep sleep", cannot restore wifi operation
 
Shannon Dealy wrote:

> A developer at Intel contacted me regarding this bug the other day (he was
> following up on a similar bug report from another source) and I am working
> with him on the problem (currently doing a debug build of the module to
> collect data on what is happening).

That's good to hear. Did it bear any fruit? (Any test results or
public mailing list messages we can look at?)



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20111124034603.GA17588@elie.hsd1.il.comcast.net">h ttp://lists.debian.org/20111124034603.GA17588@elie.hsd1.il.comcast.net

Shannon Dealy 11-25-2011 03:18 PM

Bug#628444: iwlagn - "MAC is in deep sleep", cannot restore wifi operation
 
On Wed, 23 Nov 2011, Jonathan Nieder wrote:


Shannon Dealy wrote:


A developer at Intel contacted me regarding this bug the other day (he was
following up on a similar bug report from another source) and I am working
with him on the problem (currently doing a debug build of the module to
collect data on what is happening).


That's good to hear. Did it bear any fruit? (Any test results or
public mailing list messages we can look at?)


Nothing so far, both he and I are very busy so it is a slow process,
waiting for each of us to work the next step into our schedules.
Currently I am waiting for his response to a set of data I captured from
the driver surrounding one failure.


Shannon C. Dealy | DeaTech Research Inc.
dealy@deatech.com | - Custom Software Development -
Phone: (800) 467-5820 | - Natural Building Instruction -
or: (541) 929-4089 | www.deatech.com



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: alpine.DEB.2.00.1111250814390.11437@nashapur.deate ch.com">http://lists.debian.org/alpine.DEB.2.00.1111250814390.11437@nashapur.deate ch.com

Jonathan Nieder 02-11-2012 06:38 PM

Bug#628444: iwlagn - "MAC is in deep sleep", cannot restore wifi operation
 
Hi Juha,

Juha Jäykkä wrote:

> What is the status of the fix from Intel? I have the same problem since
> upgrading to 3.x series and it is VERY annoying - only way I can fix it is
> rebooting the kernel, so it really seems a kernel bug. Hibernating the system
> and doing a full power-off (unplug AC, remove battery, wait 60 seconds – this
> should do it) the laptop does NOT fix it, so I concur: the kernel must retain
> some bogus information somewhere since the hw has definitely lost its status
> info.

Please provide:

- steps to reproduce, assuming I had the same hardware
- expected result, actual result, and how the difference indicates a
bug (should be simple enough in this case)
- how reproducible it is (100% of the time? 50%?)
- which kernel versions you have tested, and results with each
- full "dmesg" output from booting and reproducing the bug, as an
attachment
- any other weird observations or workarounds

I believe the best way to move forward would be to take this report
upstream to a public mailing list. So the purpose of these questions
is to collect data on what's known so far as a starting point.

Thanks,
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120211193827.GE6944@burratino">http://lists.debian.org/20120211193827.GE6944@burratino

Juha Jykk 02-21-2012 11:14 PM

Bug#628444: iwlagn - "MAC is in deep sleep", cannot restore wifi operation
 
Hi Jonathan!

Sorry I took a while to respond and apologies in advance for them being quite
useless...

> - steps to reproduce, assuming I had the same hardware

Use the computer. My maximum time without hitting the bug has been less than
48 hours before I added the following module options: 11n_disable=1
power_save=0 wd_disable=1 which some googling suggested might solve it. They
do not, but they make it less frequent. With them, I can even get a WEEK
without seeing this.

But notice I have different hw than the original reporter: mine is X200s with
"Intel Corporation Ultimate N WiFi Link 5300" (pci-id: 8086:4236).

> - expected result, actual result, and how the difference indicates a
> bug (should be simple enough in this case)

=) Expected result: wifi keeps working unless I switch it off using rf_kill,
physical switch, unload the module, or turn off the computer.

Actual result: suddenly, out of the blue, in the middle of typing an email in
kmail, see attachment.

Bug: the wifi card has certainly not changed into "Unknown hardware type"
suddenly.

> - how reproducible it is (100% of the time? 50%?)

100% when waiting long enough between reboots.

> - which kernel versions you have tested, and results with each

There were no problems in the 2.6-series. The bug occurs at least in the
Debian kernel versions 3.2.0-1-amd64, 3.0.0-2-amd64, and 3.0.0-1-amd64.

> - full "dmesg" output from booting and reproducing the bug, as an
> attachment

Do not have it now, if really necessary, will get it next time it occurs
(which may be a while: I am back to 2.6.39 because I need to get work done).

> - any other weird observations or workarounds

No workaround. Above module parameters alleviate the issue. This is a
regression, so I suggest someone with time (if anyone has it) bisects 2.6 and
3.2... horrible task, I do not envy anyone doing that.

> upstream to a public mailing list. So the purpose of these questions
> is to collect data on what's known so far as a starting point.

Please CC me if you do.

Cheers,
Juha

--
-----------------------------------------------
| Juha Jykk, juhaj@iki.fi |
| http://www.maths.leeds.ac.uk/~juhaj |
-----------------------------------------------
[252832.820219] iwlwifi 0000:03:00.0: Error sending POWER_TABLE_CMD: time out after 2000ms.
[252832.820229] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 135
[252832.820237] iwlwifi 0000:03:00.0: set power fail, ret = -110
[252835.320320] iwlwifi 0000:03:00.0: Error sending REPLY_QOS_PARAM: time out after 2000ms.
[252835.320331] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 137
[252835.320338] iwlwifi 0000:03:00.0: Failed to update QoS
[252837.320249] iwlwifi 0000:03:00.0: Error sending REPLY_RXON: time out after 2000ms.
[252837.320259] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 140
[252837.320267] iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-110)
[252839.320130] iwlwifi 0000:03:00.0: Error sending REPLY_RXON: time out after 2000ms.
[252839.320140] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 143
[252839.320148] iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-110)
[252841.320273] iwlwifi 0000:03:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.
[252841.320283] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 146
[252841.320296] ieee80211 phy0: failed to remove key (0, 00:24:17:33:f4:f5) from hardware (-110)
[252843.320053] iwlwifi 0000:03:00.0: Error sending REPLY_REMOVE_STA: time out after 2000ms.
[252843.320057] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 149
[252843.320061] iwlwifi 0000:03:00.0: Error removing station 00:24:17:33:f4:f5
[252845.324086] iwlwifi 0000:03:00.0: Error sending REPLY_RXON: time out after 2000ms.
[252845.324096] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 152
[252845.324104] iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-110)
[252847.328250] iwlwifi 0000:03:00.0: fail to flush all tx fifo queues
[252849.328175] iwlwifi 0000:03:00.0: Error sending POWER_TABLE_CMD: time out after 2000ms.
[252849.328185] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 154
[252849.328192] iwlwifi 0000:03:00.0: set power fail, ret = -110
[252851.328203] iwlwifi 0000:03:00.0: Error sending REPLY_RXON: time out after 2000ms.
[252851.328214] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 155
[252851.328222] iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-110)
[252853.328234] iwlwifi 0000:03:00.0: Error sending REPLY_ADD_STA: time out after 2000ms.
[252853.328245] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 156
[252853.328258] ieee80211 phy0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-110)
[252853.328380] cfg80211: Calling CRDA to update world regulatory domain
[252855.428124] iwlwifi 0000:03:00.0: Error sending REPLY_RXON: time out after 2000ms.
[252855.428135] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 157
[252855.428142] iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-110)
[252857.428183] iwlwifi 0000:03:00.0: Error sending POWER_TABLE_CMD: time out after 2000ms.
[252857.428193] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 158
[252857.428201] iwlwifi 0000:03:00.0: set power fail, ret = -110
[252859.428260] iwlwifi 0000:03:00.0: Error sending REPLY_RXON: time out after 2000ms.
[252859.428271] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 159
[252859.428279] iwlwifi 0000:03:00.0: Error clearing ASSOC_MSK on BSS (-110)
[252861.428175] iwlwifi 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 2000ms.
[252861.428186] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 160
[252864.428133] iwlwifi 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 2000ms.
[252864.428144] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 161
[252867.428027] iwlwifi 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 2000ms.
[252867.428032] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 162
[252870.428036] iwlwifi 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 2000ms.
[252870.428041] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 163
[252873.428063] iwlwifi 0000:03:00.0: Error sending REPLY_SCAN_CMD: time out after 2000ms.
[252873.428074] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 134 write_ptr 164
[252874.429359] iwlwifi 0000:03:00.0: No space in command queue
[252874.429364] iwlwifi 0000:03:00.0: Restarting adapter queue is full
[252874.429373] iwlwifi 0000:03:00.0: Error sending REPLY_SCAN_CMD: enqueue_hcmd failed: -28
[252874.433273] iwlwifi 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
...
[252874.792578] ieee80211 phy0: Hardware restart was requested
[252874.792628] iwlwifi 0000:03:00.0: L1 Disabled; Enabling L0S
[252874.796429] iwlwifi 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
[252874.796429] iwlwifi 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
[252874.836786] iwlwifi 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
[252874.846994] iwlwifi 0000:03:00.0: Radio type=0x0-0x2-0x0
[252874.850986] iwlwifi 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
...
[252875.485413] iwlwifi 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
[252880.496046] iwlwifi 0000:03:00.0: Could not load the INST uCode section
[252880.496052] iwlwifi 0000:03:00.0: Failed to start RT ucode: -110
[252880.503267] iwlwifi 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
...
[252880.845613] iwlwifi 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
[252880.859799] iwlwifi 0000:03:00.0: Unable to initialize device.
[252880.859825] iwlwifi 0000:03:00.0: Request scan called when driver not ready.
...

until I hit the kill switch, at which point syslog shows:

[253188.968042] ------------[ cut here ]------------
[253188.968056] WARNING: at /build/buildd-linux-2.6_3.2.1-2-amd64-kK3kdc/linux-2.6-3.2.1/debian/build/source_amd64_none/drivers/net/wireless/iwlwifi/iwl-core.c:1330 iwlagn_mac_remove_interface+0x48/0xdd [iwlwifi]()
[253188.968060] Hardware name: 74695KG
[253188.968062] Modules linked in: iwlwifi mac80211 cfg80211 hidp hid tun acpi_cpufreq mperf cpufreq_stats cpufreq_userspace cpufreq_powersave cpufreq_conservative rfcomm bnep parport_pc ppdev lp parport autofs4 binfmt_misc uinput fuse nfsd nfs lockd fscache auth_rpcgss nfs_acl sunrpc coretemp loop kvm_intel kvm btusb bluetooth crc16 snd_hda_codec_conexant snd_hda_intel snd_hda_codec arc4 snd_hwdep snd_pcm_oss snd_mixer_oss thinkpad_acpi snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq i915 snd_timer snd_seq_device evdev snd iTCO_wdt drm_kms_helper drm rfkill i2c_algo_bit i2c_i801 ac soundcore battery nvram tpm_tis tpm tpm_bios iTCO_vendor_support i2c_core snd_page_alloc power_supply video button psmouse serio_raw processor wmi xfs sha256_generic cryptd aes_x86_64 aes_generic cbc dm_crypt dm_mod sd_mod crc_t10dif uhci_hcd ahci libahci ehci_hcd thermal thermal_sys usbcore ata_generic libata scsi_mod e1000e usb_common [last unloaded: cfg80211]
[253188.968150] Pid: 28201, comm: kworker/0:1 Not tainted 3.2.0-1-amd64 #1
[253188.968153] Call Trace:
[253188.968160] [<ffffffff810467ed>] ? warn_slowpath_common+0x78/0x8c
[253188.968168] [<ffffffffa0390997>] ? iwlagn_mac_remove_interface+0x48/0xdd [iwlwifi]
[253188.968176] [<ffffffffa0227301>] ? rfkill_restore_states+0x7/0x47 [rfkill]
[253188.968188] [<ffffffffa061f05b>] ? ieee80211_do_stop+0x30d/0x45e [mac80211]
[253188.968192] [<ffffffff8104bcdf>] ? _local_bh_enable_ip.isra.11+0x1e/0x88
[253188.968198] [<ffffffffa022738b>] ? spin_unlock_irq+0xb/0xb [rfkill]
[253188.968208] [<ffffffffa061f1be>] ? ieee80211_stop+0x12/0x16 [mac80211]
[253188.968217] [<ffffffffa061f1ac>] ? ieee80211_do_stop+0x45e/0x45e [mac80211]
[253188.968223] [<ffffffff8127fccc>] ? __dev_close_many+0x84/0xb0
[253188.968226] [<ffffffff8127fdc7>] ? dev_close_many+0x88/0xee
[253188.968230] [<ffffffff810363ab>] ? should_resched+0x5/0x23
[253188.968234] [<ffffffff81282031>] ? dev_close+0x37/0x46
[253188.968241] [<ffffffffa0289c66>] ? cfg80211_rfkill_set_block+0x3d/0x62 [cfg80211]
[253188.968247] [<ffffffffa0226be1>] ? rfkill_set_block+0x7d/0xf0 [rfkill]
[253188.968252] [<ffffffffa0226d8f>] ? __rfkill_switch_all+0x33/0x55 [rfkill]
[253188.968258] [<ffffffffa0227266>] ? rfkill_switch_all+0x33/0x48 [rfkill]
[253188.968264] [<ffffffffa0227489>] ? rfkill_op_handler+0xfe/0x12d [rfkill]
[253188.968268] [<ffffffff8105adc1>] ? process_one_work+0x163/0x284
[253188.968272] [<ffffffff8105bd89>] ? worker_thread+0xc2/0x145
[253188.968276] [<ffffffff8105bcc7>] ? manage_workers.isra.23+0x15b/0x15b
[253188.968280] [<ffffffff8105eec5>] ? kthread+0x76/0x7e
[253188.968285] [<ffffffff813473b4>] ? kernel_thread_helper+0x4/0x10
[253188.968289] [<ffffffff8105ee4f>] ? kthread_worker_fn+0x139/0x139
[253188.968293] [<ffffffff813473b0>] ? gs_change+0x13/0x13
[253188.968295] ---[ end trace 52cc41750673642a ]---
[253188.968300] iwlwifi 0000:03:00.0: ctx->vif = (null), vif = ffff88001db5cdf0
[253188.968303] iwlwifi 0000:03:00.0: ID = 0: ctx = ffff880091cbb4b0 ctx->vif = (null)

a bit later, some more of

[253216.532067] iwlwifi 0000:03:00.0: L1 Disabled; Enabling L0S
[253216.536016] iwlwifi 0000:03:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
[253216.586757] iwlwifi 0000:03:00.0: Radio type=0x0-0x2-0x0

and other old friends follow until I decide to try reloading the module:

[253268.602408] cfg80211: Calling CRDA to update world regulatory domain
[253268.617695] Intel(R) Wireless WiFi Link AGN driver for Linux, in-tree:
[253268.617698] Copyright(c) 2003-2011 Intel Corporation
[253268.617758] iwlwifi 0000:03:00.0: enabling device (0000 -> 0002)
[253268.617768] iwlwifi 0000:03:00.0: PCI INT A -> GSI 17 (level, low) -> IRQ 17
[253268.617781] iwlwifi 0000:03:00.0: setting latency timer to 64
[253268.617811] iwlwifi 0000:03:00.0: pci_resource_len = 0x00002000
[253268.617814] iwlwifi 0000:03:00.0: pci_resource_base = ffffc90005094000
[253268.617817] iwlwifi 0000:03:00.0: HW Revision ID = 0x0
[253268.617899] iwlwifi 0000:03:00.0: irq 43 for MSI/MSI-X
[253268.617960] iwlwifi 0000:03:00.0: Detected Intel(R) Ultimate N WiFi Link 5300 AGN, REV=0xFFFFFFFF
[253268.618003] iwlwifi 0000:03:00.0: Unknown hardware type
[253268.618005] iwlwifi 0000:03:00.0: Unable to init EEPROM
[253268.618040] iwlwifi 0000:03:00.0: PCI INT A disabled
[253268.618046] iwlwifi: probe of 0000:03:00.0 failed with error -2

Jonathan Nieder 03-12-2012 03:07 PM

Bug#628444: iwlagn - "MAC is in deep sleep", cannot restore wifi operation
 
Hi,

Venkataraman, Meenakshi wrote:

> Hi Shannon and others,

Thanks for a helpful note. Forwarding to Shannon, Juha, and Bjrn:

> First up, my sincere apologies for not responding earlier. I've been
> swamped with other work, and have had a chance to look at this only
> now.
>
> I just got caught up with the email thread, and it appears that
> you're seeing a problem with the following configuration most
> frequently:
>
> 1) Enable power saving in the driver (power_save)
> 2) Enabling 11n
> 3) Leaving aspm at its default
> 4) wd_disable=0 (the default)
>
> Our devices are known to have issues with being in L1 (a PCIe sleep
> state), and so we use L0S by default - this is a lower latency and
> higher power state.
>
> We've also not been able to reproduce the "MAC in deep sleep"
> problem at our end, so not sure at the moment what is causing it.
>
> However, there was one issue with queue-stuck detection that we
> found and fixed very recently. The patch is available in the
> wireless-next tree, and will likely improve the situation if a stuck
> queue was the initial cause of your problem.
>
> You can get the source here:
> http://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next.git
>
> And the patch I'm talking about is this:
> git.kernel.org/?p=linux/kernel/git/linville/wireless-next.git;a=commit;h=342bbf3fee2fa9a18147e74b2e3c42 29a4564912
>
> My suggestion is to load the module with power_save=0, wd_disable=1.
> Enabling 11n should not be a problem, but if it is, then please let
> us know. You should not need to use the wd_disable=1 in the upcoming
> versions of the kernel, but for now, I'd suggest using it. Since
> your problem seems to be reducing significantly by using
> pcie_aspm=off, I would appreciate it much if you could tell us what
> the behaviour is with all other parameters being the same
> (power_save=0, wd_disable=1), and just toggling the state of this
> variable.
>
> We'll try to reproduce the suspend/hibernate/resume issue in-house
> and let you know if we were able to reproduce the problem at our
> end. If not, we'd like you to try out a newer WiFi card; as the 5100
> is a fairly old device, and will likely not get any firmware updates
> (if it is some weird firmware/driver combo that produced the PCIe
> error).
>
> Thanks!
> Meenakshi Venkataraman



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120312160738.GA18721@burratino">http://lists.debian.org/20120312160738.GA18721@burratino

Jonathan Nieder 03-12-2012 03:30 PM

Bug#628444: iwlagn - "MAC is in deep sleep", cannot restore wifi operation
 
found 628444 linux-2.6/3.2.9-1
tags 628444 + upstream patch moreinfo
quit

Hi Dafydd,

Dafydd Harries wrote:

> I've been seeing similar problems with my "Intel Corporation Centrino
> Ultimate-N 6300".
>
> Like others, the problems seemed to start around 2.6.39.

Odd. What kernel did you use before then? (/var/log/dpkg.log might
tell.)

> Like othes, the card flakes out a day or two after booting, and a reboot
> always fixes the problem. Occasionally it stays working for longer.
>
> Like others, I've added RAM. But as far as I can recall the upgrade
> happened well before any poblems started appearing.

Interesting and useful.

> Any ASPM settings are at their default.
>
> I'll try wd_disable=1 as a workaround for now.
>
> Meenakshi, will the patch you mentioned be applied in 3.3?

Cc-ing her. The patch currently seems to be part of the wireless-next
tree but not davem's net tree.

> Below is a syslog excerpt from around the time of failue. It seems to
> support Meenakshi's suggestion that it's related to the queue getting
> stuck.

Well, that can be tested. Could you try the patch against current
"master"? It works like this:

0. Prerequisites:
apt-get install git build-essential

1. Get the kernel history, if you don't already have it:
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

2. Configure and build:
cd linux
git checkout origin/master
cp /boot/config-$(uname -r) .config; # current configuration
make localmodconfig; # optional: minimize configuration
make deb-pkg; # optionally with -j<num> for parallel build
dpkg -i ../<name of package>; # as root
reboot

... test test test ...

3. Hopefully it reproduces the problem. So try the attached patch:
git am -3sc <the patch>
make deb-pkg; # maybe with -j4
dpkg -i ../<name of package>; # as root
reboot

If it works, we can pass this to Dave with information about what
happened and your test result, to get the patch fast-tracked.

Thanks,
Jonathan

> Below is a syslog excerpt from around the time of failue. It seems to
> support Meenakshi's suggestion that it's related to the queue getting
> stuck.
[...]
> iwlwifi 0000:02:00.0: Queue 4 stuck for 2000 ms.
> iwlwifi 0000:02:00.0: Current read_ptr 112 write_ptr 115
> iwlwifi 0000:02:00.0: On demand firmware reload
> iwlwifi 0000:02:00.0: Command REPLY_QOS_PARAM failed: FW Error
> iwlwifi 0000:02:00.0: Failed to update QoS
> iwlwifi 0000:02:00.0: fw recovery, no hcmd send
> iwlwifi 0000:02:00.0: Error sending REPLY_RXON: enqueue_hcmd failed: -5
> iwlwifi 0000:02:00.0: Error clearing ASSOC_MSK on BSS (-5)
> iwlwifi 0000:02:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
> iwlwifi 0000:02:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
[...]
> ieee80211 phy0: Hardware restart was requested
> wpa_supplicant[1472]: CTRL-EVENT-DISCONNECTED bssid=00:50:7f:cb:4b:58 reason=4
> ieee80211 phy0: failed to remove key (1, ff:ff:ff:ff:ff:ff) from hardware (-2)
[....]
> iwlwifi 0000:02:00.0: Could not load the INST uCode section
> iwlwifi 0000:02:00.0: Failed to start RT ucode: -110
[...]
> iwlwifi 0000:02:00.0: MAC is in deep sleep!. CSR_GP_CNTRL = 0xFFFFFFFF
[...]
> I get some kind of OOPS but I'm guessing this is just because the driver can't
> communicate with the card when the module is being unloaded:
[...]
> WARNING: at /build/buildd-linux-2.6_3.2.9-1-amd64-KTPapN/linux-2.6-3.2.9/debian/build/source_amd64_none/drivers/net/wireless/iwlwifi/iwl-core.c:1330 iwlagn_mac_remove_interface+0x48/0xdd [iwlwifi]()
> Hardware name: 3249CTO
> Modules linked in: uvcvideo videodev v4l2_compat_ioctl32 media snd_usb_audio snd_usbmidi_lib pci_stub vboxpci(O) vboxnetadp(O) vboxnetflt(O) vboxdrv(O) acpi_cpufreq mperf cpufreq_stats cpufreq_userspace cpu
> Mar 12 13:15:04 localhost kernel: sync_memcpy async_tx raid1 raid0 multipath linear md_mod sd_mod crc_t10dif usbhid hid ahci libahci ehci_hcd libata scsi_mod usbcore thermal thermal_sys usb_common e1000e [last unloaded: scsi_wait_scan]
> Mar 12 13:15:04 localhost kernel: [48290.674508] Pid: 1405, comm: NetworkManager Tainted: G O 3.2.0-2-amd64 #1
> Mar 12 13:15:04 localhost kernel: [48290.674511] Call Trace:
> Mar 12 13:15:04 localhost kernel: [48290.674520] [<ffffffff81046879>] ? warn_slowpath_common+0x78/0x8c
> Mar 12 13:15:04 localhost kernel: [48290.674531] [<ffffffffa03ea9af>] ? iwlagn_mac_remove_interface+0x48/0xdd [iwlwifi]
[...]
> Mar 12 13:15:04 localhost kernel: [48290.674647] [<ffffffff812a35a5>] ? netlink_rcv_skb+0x36/0x7a
[...]
> iwlwifi 0000:02:00.0: ctx->vif = (null), vif = ffff8801b1c72df0
> iwlwifi 0000:02:00.0: ID = 0: ctx = ffff8801b1a834b0 ctx->vif = (null)
From: Johannes Berg <johannes.berg@intel.com>
Date: Sun, 4 Mar 2012 08:50:46 -0800
Subject: iwlwifi: always monitor for stuck queues

commit 342bbf3fee2fa9a18147e74b2e3c4229a4564912 upstream.

If we only monitor while associated, the following
can happen:
- we're associated, and the queue stuck check
runs, setting the queue "touch" time to X
- we disassociate, stopping the monitoring,
which leaves the time set to X
- almost 2s later, we associate, and enqueue
a frame
- before the frame is transmitted, we monitor
for stuck queues, and find the time set to
X, although it is now later than X + 2000ms,
so we decide that the queue is stuck and
erroneously restart the device

It happens more with P2P because there we can
go between associated/unassociated frequently.

Cc: stable@vger.kernel.org
Reported-by: Ben Cahill <ben.m.cahill@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
---
drivers/net/wireless/iwlwifi/iwl-core.c | 18 ++++--------------
1 file changed, 4 insertions(+), 14 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-core.c b/drivers/net/wireless/iwlwifi/iwl-core.c
index 7bcfa781e0b9..3abe9ede6990 100644
--- a/drivers/net/wireless/iwlwifi/iwl-core.c
+++ b/drivers/net/wireless/iwlwifi/iwl-core.c
@@ -1465,20 +1465,10 @@ void iwl_bg_watchdog(unsigned long data)
if (timeout == 0)
return;

- /* monitor and check for stuck cmd queue */
- if (iwl_check_stuck_queue(priv, priv->shrd->cmd_queue))
- return;
-
- /* monitor and check for other stuck queues */
- if (iwl_is_any_associated(priv)) {
- for (cnt = 0; cnt < hw_params(priv).max_txq_num; cnt++) {
- /* skip as we already checked the command queue */
- if (cnt == priv->shrd->cmd_queue)
- continue;
- if (iwl_check_stuck_queue(priv, cnt))
- return;
- }
- }
+ /* monitor and check for stuck queues */
+ for (cnt = 0; cnt < hw_params(priv).max_txq_num; cnt++)
+ if (iwl_check_stuck_queue(priv, cnt))
+ return;

mod_timer(&priv->watchdog, jiffies +
msecs_to_jiffies(IWL_WD_TICK(timeout)));
--
1.7.9.2

"Venkataraman, Meenakshi" 03-12-2012 04:11 PM

Bug#628444: iwlagn - "MAC is in deep sleep", cannot restore wifi operation
 
Hi,

>Dafydd Harries wrote:
>
>> I've been seeing similar problems with my "Intel Corporation Centrino
>> Ultimate-N 6300".
>>
>> Like others, the problems seemed to start around 2.6.39.
>
>Odd. What kernel did you use before then? (/var/log/dpkg.log might
>tell.)
>
>> Like othes, the card flakes out a day or two after booting, and a reboot
>> always fixes the problem. Occasionally it stays working for longer.

[MV] what platform are you using? And does your problem appear after a system hibernate?

>>
>> Like others, I've added RAM. But as far as I can recall the upgrade
>> happened well before any poblems started appearing.
>
>Interesting and useful.
>
>> Any ASPM settings are at their default.

[MV] Can you try pcie_aspm=off during boot?

>> Meenakshi, will the patch you mentioned be applied in 3.3?

[MV] Yes...it should be applied to 3.3 as well (it is also slated to be backported to stable kernels); but it is a fairly recent fix, so it will take some time before it gets accepted to the other Linux trees.

>> Below is a syslog excerpt from around the time of failue. It seems to
>> support Meenakshi's suggestion that it's related to the queue getting
>> stuck.
>[...]
>> iwlwifi 0000:02:00.0: Queue 4 stuck for 2000 ms.

[MV] Any idea what happened before this? Did you see any error sending host commands? Did you resume from a hibernate? Can you send me the log?

Thanks for your patience,
Meenakshi



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4595B4D22AB93C4FABBA84AAD5AA37FD0DC7CB@ORSMSX103.a mr.corp.intel.com">http://lists.debian.org/4595B4D22AB93C4FABBA84AAD5AA37FD0DC7CB@ORSMSX103.a mr.corp.intel.com

Dafydd Harries 03-12-2012 04:32 PM

Bug#628444: iwlagn - "MAC is in deep sleep", cannot restore wifi operation
 
Ar 12/03/2012 am 17:11, ysgrifennodd Venkataraman, Meenakshi:
> Hi,
>
> >Dafydd Harries wrote:
> >
> >> I've been seeing similar problems with my "Intel Corporation Centrino
> >> Ultimate-N 6300".
> >>
> >> Like others, the problems seemed to start around 2.6.39.
> >
> >Odd. What kernel did you use before then? (/var/log/dpkg.log might
> >tell.)

Sadly, my dpkg.log only goes back to 3.0, which was installed last July
(!).

I could try installing e.g. 2.6.32 from snapshot.debian.org if other
things don't help. But that would be a lot of patches to bisect,
especially when reproduction iterations are so long...

> >> Like othes, the card flakes out a day or two after booting, and a reboot
> >> always fixes the problem. Occasionally it stays working for longer.
>
> [MV] what platform are you using? And does your problem appear after a system hibernate?

Linux nia 3.2.0-2-amd64 #1 SMP Sun Mar 4 22:48:17 UTC 2012 x86_64 GNU/Linux

The system is Debian unstable.

I don't use hibernation. I do suspend regularly, but I haven't noticed any
correlation with suspend/resume.

> >>
> >> Like others, I've added RAM. But as far as I can recall the upgrade
> >> happened well before any poblems started appearing.
> >
> >Interesting and useful.
> >
> >> Any ASPM settings are at their default.
>
> [MV] Can you try pcie_aspm=off during boot?

I'm currently waiting to see if wd_disable=1 help at all. This might
take a few days, though, since I don't have any way to reproduce it.

> >> Meenakshi, will the patch you mentioned be applied in 3.3?
>
> [MV] Yes...it should be applied to 3.3 as well (it is also slated to be backported to stable kernels); but it is a fairly recent fix, so it will take some time before it gets accepted to the other Linux trees.
>
> >> Below is a syslog excerpt from around the time of failue. It seems to
> >> support Meenakshi's suggestion that it's related to the queue getting
> >> stuck.
> >[...]
> >> iwlwifi 0000:02:00.0: Queue 4 stuck for 2000 ms.
>
> [MV] Any idea what happened before this? Did you see any error sending host commands? Did you resume from a hibernate? Can you send me the log?

I haven't noticed any pattern. The first thing that happens is that
the NetworkManager applet seems to be trying to reconnect to the
wireless.

The log seems pretty quiet beforehand.

Mar 12 10:09:43 localhost dhclient: DHCPREQUEST on wlan0 to 192.168.1.1 port 67
Mar 12 10:09:43 localhost dhclient: DHCPACK from 192.168.1.1
Mar 12 10:09:43 localhost dhclient: bound to 192.168.1.15 -- renewal in 15209 seconds.
Mar 12 10:17:01 localhost /USR/SBIN/CRON[12402]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 12 10:19:46 localhost smartd[2460]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 161 to 157
Mar 12 10:25:51 localhost dbus[1395]: [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
Mar 12 10:25:51 localhost dbus[1395]: [system] Successfully activated service 'org.freedesktop.PackageKit'
Mar 12 11:17:01 localhost /USR/SBIN/CRON[12472]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 12 11:19:46 localhost smartd[2460]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 157 to 161
Mar 12 11:19:59 localhost dbus[1395]: [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
Mar 12 11:19:59 localhost dbus[1395]: [system] Successfully activated service 'org.freedesktop.PackageKit'
Mar 12 11:25:51 localhost dbus[1395]: [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
Mar 12 11:25:51 localhost dbus[1395]: [system] Successfully activated service 'org.freedesktop.PackageKit'
Mar 12 11:49:46 localhost smartd[2460]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 161 to 157
Mar 12 12:17:01 localhost /USR/SBIN/CRON[15518]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 12 12:19:20 localhost dbus[1395]: [system] Reloaded configuration
Mar 12 12:19:21 localhost dbus[1395]: [system] Reloaded configuration
Mar 12 12:19:21 localhost dbus[1395]: [system] Reloaded configuration
Mar 12 12:19:21 localhost dbus[1395]: [system] Reloaded configuration
Mar 12 12:19:22 localhost dbus[1395]: [system] Reloaded configuration
Mar 12 12:19:46 localhost smartd[2460]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 157 to 144
Mar 12 12:20:15 localhost anacron[19324]: Anacron 2.3 started on 2012-03-12
Mar 12 12:20:15 localhost anacron[19324]: Normal exit (0 jobs run)
Mar 12 12:20:22 localhost dbus[1395]: [system] Reloaded configuration
Mar 12 12:20:23 localhost dbus[1395]: [system] Reloaded configuration
Mar 12 12:20:23 localhost dbus[1395]: [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
Mar 12 12:20:23 localhost dbus[1395]: [system] Successfully activated service 'org.freedesktop.PackageKit'
Mar 12 12:34:33 localhost dbus[1395]: [system] Activating service name='org.freedesktop.PackageKit' (using servicehelper)
Mar 12 12:34:33 localhost dbus[1395]: [system] Successfully activated service 'org.freedesktop.PackageKit'
Mar 12 12:49:04 localhost crontab[21678]: (daf) LIST (daf)
Mar 12 12:49:46 localhost smartd[2460]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 144 to 141
Mar 12 13:13:21 localhost kernel: [48188.303709] iwlwifi 0000:02:00.0: Queue 4 stuck for 2000 ms.

Regards,

Dafydd



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120312173221.GY17493@rhydd.org">http://lists.debian.org/20120312173221.GY17493@rhydd.org

Jonathan Nieder 03-12-2012 04:44 PM

Bug#628444: iwlagn - "MAC is in deep sleep", cannot restore wifi operation
 
Dafydd Harries wrote:
> Ar 12/03/2012 am 17:11, ysgrifennodd Venkataraman, Meenakshi:
>>> Dafydd Harries wrote:

>>>> Like others, the problems seemed to start around 2.6.39.
[...]
> Sadly, my dpkg.log only goes back to 3.0, which was installed last July
> (!).

Thanks for checking, and sorry for the lack of clarity.
/var/log/dpkg.log.1 et al might go back further.

>> what platform are you using? And does your problem appear after a
>> system hibernate?
>
> Linux nia 3.2.0-2-amd64 #1 SMP Sun Mar 4 22:48:17 UTC 2012 x86_64 GNU/Linux
>
> The system is Debian unstable.

Maybe "lspci -vvnn" output (as an attachment) and "dmesg" output from
booting if you have it could help to pin down the setup a little more.
(If I understand correctly, Meenakshi is dreaming of a reliable
reproduction recipe. ;-))

Ciao,
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120312174429.GA21040@burratino">http://lists.debian.org/20120312174429.GA21040@burratino

"Venkataraman, Meenakshi" 03-12-2012 04:45 PM

Bug#628444: iwlagn - "MAC is in deep sleep", cannot restore wifi operation
 
Hi Dafydd,

>> >> I've been seeing similar problems with my "Intel Corporation Centrino
>> >> Ultimate-N 6300".
>> >>
>> >> Like others, the problems seemed to start around 2.6.39.

[MV] Hmm...this is interesting. Can you load the iwlwifi module with bt_coex_active=0 and see if it changes anything? One of the patches that went in between 2.6.38 and 2.6.39 changed the behaviour of coexistence with Bluetooth devices on some platforms, and caused users some grief; although their symptoms were different. The module parameter I mention above solved this problem for them.

>> >Odd. What kernel did you use before then? (/var/log/dpkg.log might
>> >tell.)
>
>Sadly, my dpkg.log only goes back to 3.0, which was installed last July
>(!).
>
>I could try installing e.g. 2.6.32 from snapshot.debian.org if other
>things don't help. But that would be a lot of patches to bisect,
>especially when reproduction iterations are so long...

[MV] No...let's leave bisecting as the last option for this problem.

>> [MV] what platform are you using? And does your problem appear after a
>system hibernate?
>
>Linux nia 3.2.0-2-amd64 #1 SMP Sun Mar 4 22:48:17 UTC 2012 x86_64
>GNU/Linux

[MV] Oh...I was asking about hardware...who is the manufacturer of your system?

>I don't use hibernation. I do suspend regularly, but I haven't noticed any
>correlation with suspend/resume.

[MV] Interesting as well.

>> [MV] Any idea what happened before this? Did you see any error sending
>host commands? Did you resume from a hibernate? Can you send me the log?
>
>I haven't noticed any pattern. The first thing that happens is that
>the NetworkManager applet seems to be trying to reconnect to the
>wireless.

[MV] Hmm...the queue stuck patch could potentially help here, but only provided that a queue was stuck prior to the reconnect. Your log doesn't run that far back, so I'm not able to say. :-)

Thanks,
Meenakshi




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4595B4D22AB93C4FABBA84AAD5AA37FD0DCE1A@ORSMSX103.a mr.corp.intel.com">http://lists.debian.org/4595B4D22AB93C4FABBA84AAD5AA37FD0DCE1A@ORSMSX103.a mr.corp.intel.com


All times are GMT. The time now is 12:20 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.