FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 06-03-2010, 08:17 AM
Holger Levsen
 
Default Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

reassign 584314 linux-2.6
thanks

On Donnerstag, 3. Juni 2010, Andreas Berger wrote:
> Package: base
> Severity: important
> Tags: squeeze
>
> Steps to Reproduce:
> 1: Suspend Laptop to RAM
> 2: Resume from Suspend
> 3: Wait and see, preferably monitoring top:
> At some random time, ranging from immediately (black screen after resume)
> to several hours later, the system will become unresponsive. Switching to
> tty1 or killing xorg with Alt+Print+K does not work, Alt+Print+REISUB does
> work. Each freeze is anticipated by a random process (this time it was
> mandb, was installing something) hogging 100% of CPU, then the System
> becomes gradually unresponsive within a minute or so (panel, metacity,
> finally mouse cursor freezes too). Additionally, i don't know if this is
> related, i noticed one process using 9999% of CPU according to top, just
> thought i'd mention it. This bug constitutes a regression, suspend does
> work flawlessly on this Laptop in Lenny. Also, i encountered this bug in
> Ubuntu 9.10 (ironically, this was the one that pushed me over the edge to
> switch to debian), the corresponding bug report is here:
> https://bugs.launchpad.net/ubuntu/+bug/480850 Hardware is an Acer Aspire
> 5610 Laptop, please advise me on what more specific information to gather
> and what else to do, I'm happy to try out anything you suggest.
>
> I assigned this bug to base because reportbug forced me to choose
> something, but i can only guess about the package, please reassign it.
>
> -- System Information:
> Debian Release: squeeze/sid
> APT prefers testing
> APT policy: (500, 'testing')
> Architecture: i386 (i686)
>
> Kernel: Linux 2.6.32-trunk-686 (SMP w/2 CPU cores)
> Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/dash
 
Old 07-28-2011, 02:19 AM
Jonathan Nieder
 
Default Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

found 584314 linux-2.6/2.6.32-32
fixed 584314 linux-2.6/2.6.38-3
tags 584314 = upstream
quit

Hi Andreas,

Andreas Berger wrote:

> i can no longer reproduce this bug with kernel 2.6.38
>
> to be sure that it's not due to some other change in testing, i did:
> -clean install of debian 6 (kernel 2.6.32-5), suspend, resume, kerneloops
> -add kernel 2.6.38-2 (from testing), suspend, resume, everything goes fine

Thanks! Quick questions:

- when you say "kernel 2.6.32-5", I assume you mean the package
linux-image-2.6.32-5-686 or linux-image-2.6.32-5-amd64. What version
did you use? (The number after the dash should be around 30; you can
get it with "dpkg -l 'linux-image-*'".

- same question for 2.6.38.

- could you send a photo of the screen during the oops, so we can read
the backtrace?

- could you send the full output of "dmesg" after booting with a
working version?

Regards,
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110728021939.GA1401@elie">http://lists.debian.org/20110728021939.GA1401@elie
 
Old 08-30-2011, 03:10 AM
Jonathan Nieder
 
Default Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

notfound 584314 linux-2.6/2.6.32-32
notfixed 584314 linux-2.6/2.6.38-3
found 584314 linux-2.6/2.6.32-30
fixed 584314 linux-2.6/2.6.38-5
quit

Andreas Berger wrote:

> in linux-image-2.6.32-5-686, version 2.6.32-30, the bug was still there,
> in linux-image-2.6.38-2-686, version 2.6.38-5, the bug was no longer there,
>
> in between the two, i don't know, but if it helps, i can narrow it down as
> soon as i get home to a spare hard drive.

Sure, it would help to narrow the search for the fix (but see below to
save some time).

> On Thursday, July 28, 2011 04:19:39 Jonathan Nieder wrote:

>> - could you send a photo of the screen during the oops, so we can read
>> the backtrace?
>
> i typed it off the screen and included it in my previous mail here:
> http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=22;bug=584314
>
> is that not what you mean?

Unfortunately what you typed doesn't include the call trace (or maybe
there was none). It does include the code, which when passed through
scripts/decodecode looks like this:

| kernel:[ 496.263433] Code: 04 01 00 00 00 66 83 7c 24 28 00 79 37 89 f5 31 db eb 2b ba 03 00 00 00 89 e8 e8 ee 73 fa ff b9 00 04 00 00 89 04 24 89 c7 31 c0 <f3> ab 8b 04 24 ba 03 00 00 00 43 83 c5 20 e8 20 72 fa ff 3b 5c
[...]
| 11: eb 2b jmp 0x3e
| 13: ba 03 00 00 00 mov $0x3,%edx
| 18: 89 e8 mov %ebp,%eax
| 1a: e8 ee 73 fa ff callq 0xfffffffffffa740d
| 1f: b9 00 04 00 00 mov $0x400,%ecx
| 24: 89 04 24 mov %eax,(%rsp)
| 27: 89 c7 mov %eax,%edi
| 29: 31 c0 xor %eax,%eax
| 2b:* f3 ab rep stos %eax,%es <-- trapping instruction%rdi)
| 2d: 8b 04 24 mov (%rsp),%eax

Building mm/page_alloc.s and comparing, we see that this is in
"clear_highpage"; the function call starting on line 13 is to
kmap_atomic and the trapping rep stos is memset(page, 0, PAGE_SIZE).

Unwinding a little: clear_highpage is called by prep_zero_page,
which is called by prep_new_page, which is called by buffered_rmqueue,
which is called by get_page_from_freelist for each potentially
free page.

I suspect memory corruption. Maybe v2.6.37-rc5~3^2 (PM / Hibernate:
Fix memory corruption related to swap, 2010-12-03) fixes it. Could
you test 2.6.37-rc5 and 2.6.37-rc4?

Thanks,
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110830031046.GC17796@elie.gateway.2wire.net">htt p://lists.debian.org/20110830031046.GC17796@elie.gateway.2wire.net
 
Old 09-01-2011, 07:09 PM
Jonathan Nieder
 
Default Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

Hi Andreas,

Andreas Berger wrote:
> On Tuesday, August 30, 2011 05:10:47 you wrote:

>> I suspect memory corruption. Maybe v2.6.37-rc5~3^2 (PM / Hibernate:
>> Fix memory corruption related to swap, 2010-12-03) fixes it. Could
>> you test 2.6.37-rc5 and 2.6.37-rc4?
>
> um, maybe a stupid question, but where do i get these kernels? are they in
> some debian repository or do i have to build them?

http://snapshot.debian.org/, source package linux-2.6.

Thanks,
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110901190938.GB5705@elie">http://lists.debian.org/20110901190938.GB5705@elie
 
Old 09-07-2011, 04:15 PM
Andreas Berger
 
Default Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

On Thursday, September 01, 2011 21:09:38 Jonathan Nieder wrote:
> Hi Andreas,
>
> Andreas Berger wrote:
> > On Tuesday, August 30, 2011 05:10:47 you wrote:
> >> I suspect memory corruption. Maybe v2.6.37-rc5~3^2 (PM / Hibernate:
> >> Fix memory corruption related to swap, 2010-12-03) fixes it. Could
> >> you test 2.6.37-rc5 and 2.6.37-rc4?
> >
> > um, maybe a stupid question, but where do i get these kernels? are they
> > in some debian repository or do i have to build them?
>
> http://snapshot.debian.org/, source package linux-2.6.
>
> Thanks,
> Jonathan

ok, i narrowed it down, but it is:

found: linux-image-2.6.36-trunk-686, version 2.6.36-1~experimental.1
not found: linux-image-2.6.37-rc4-686, version 2.6.37~rc4-1~experimental.1

and this time i think i got a complete call trace, is attached


greetings,
andreas
[ 186.878224] BUG: unable to handle kernel paging request at f76ff01c
[ 186.878300] IP: [<f8617b2b>] df_probe+0x3a/0x287 [ext3]
[ 186.878366] *pde = 00007067 *pte = f0001212
[ 186.878415] Oops: 0000 [#1] SMP
[ 186.878454] last sysfs file: /sys/devices/pci0000:00/0000:00:1c.3/0000:05:00.0/ieee80211/phy0/rfkill0/state
[ 186.878540] Modules linked in: acpi_cpufreq mperf cpufreq_conservative cpufreq_stats cpufreq_userspace cpufreq_powersave parport_pc ppdev lp parport sco bridge stp bnep rfcomm 12cap crc16 bluetooth uinput fuse loop firewire_sbp2 firewire_core crc_itu_t snd_hda_codec_realtek arc4 ecb snd_hda_intel iwl3945 snd_hda_codec iwlcore i915 snd_hwdep snd_pcm mac80211 yenta_socket snd_seq drm_kms_helper i2c_i801 snd_timer snd_deq_device cfg80211 pcmcia_rsrc drm i2c_algo_bit rng_core tpm_tis joydev rfkill tpm i2c_core tpm_bios snd shpchp container ac pci_hotplug soundcore wmi battery video pcspkr snd_page_alloc button serio_raw psmouse evdev processor output ext3 jbd mbcache sg sr_mod cdrom sd_mod crc_t10dif b44 ata_generic ssb ata_piix uhci_hcd pcmcia libata scsi_mod sdhci_pci ehci_hcd usbcore sdhci mmc_core thermal pcmcia_core led_class mii thermal_sys nls_base [last unloaded: scsi_wait_scan]
[ 186.879586]
[ 186.879604] Pid: 1643, comm: NetworkManager Not tainted 2.6.36-trunk-686 #1 Grapevine/Aspire 5610
[ 186.879604] EIP: 0060:[<f8617b2b>] EFLAGS: 00010282 CPU: 0
[ 186.879743] EIP is at dx_probe+0x3a/0x287 [ext3]
[ 186.879785] EAX: f6dee8f8 EBX: f6dbedf4 ECX: 00000000 EDX: f6dee8f8
[ 186.879840] ESI: f6dee8f8 EDI: f76ff000 EBP: f6ba9e00 ESP: f6ba9d50
[ 186.879895] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 186.879943] Process NetworkManager (pid: 1643, ti=f6ba8000 task=f68ad4a0 task.ti=f6ba8000)
[ 186.880014] Stack:
[ 186.880034] f6ba9e14 00000000 f6ba9df0 f6f902c8 c12b2ef0 f6ba9da0 0000000f f6ba9f54 f6f902a8
[ 186.880130] <0> f6dbedf4 f685f200 f8618d39 f6ba9dd8 f6ba9e00 0000000f f6ba9f54 f6e1ca80
[ 186.880233] <0> f6f90304 f6f902c8 00001000 00000000 0000003c 00000000 f6ba9e1c 00000001
[ 186.880376] Call Trace:
[ 186.880376] [<f8618d39>] ? ext3_find_entry+0x85/0x49a [ext3]
[ 186.880432] [<c10c55e8>] ? d_alloc+0x1b/0x142
[ 186.880482] [<f86198ae>] ? ext3_lookup+0x24/0xa8 [ext3]
[ 186.880535] [<c10be0f3>] ? d_alloc_and_lookup+0x3c/0x52
[ 186.880584] [<c10be1d1>] ? do_lookup+0x92/0xcb
[ 186.880627] [<c10bfa23>] ? link_path_walk+0x242/0x372
[ 186.880674] [<c10bfc15>] ? path_walk+0x4f/0xae
[ 186.880717] [<c10bfd46>] ? do_path_lookup+0x1f/0x69
[ 186.880762] [<c10c05bc>] ? user_path_at+0x37/0x5f
[ 186.880808] [<c10ba55d>] ? vfs_fstatat+0x2a/0x50
[ 186.880851] [<c10ba5c4>] ? vfs_lstat+0x13/0x15
[ 186.880892] [<c10ba5d5>] ? sys_lstat64+0xf/0x23
[ 186.880937] [<c1002f1f>] ? sysenter_do_call+0x12/0x28
[ 186.880983] Code: 24 30 8b 44 24 2c 89 4c 24 08 31 c9 c7 00 00 00 00 00 31 c0 55 6a 00 e8 50 f6 ff ff 59 5f 85 c0 89 c6 0f 84 20 02 00 00 8b 78 18 <8a> 47 1c 3c 02 76 0b 0f b6 c0 50 68 f2 33 62 f8 eb 64 8b 54 24
[ 186.881401] EIP: [<f8617b2b>] dx_probe+0x3a/0x287 [ext3] SS:ESP 0068:f6ba9d50
[ 186.881483] CR2: 00000000f76ff01c
[ 186.881645] ---[ end trace 4938385b8da477eb ]---
 
Old 09-12-2011, 12:56 AM
Jonathan Nieder
 
Default Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

Andreas Berger wrote:

> ok, i narrowed it down, but it is:
>
> found: linux-image-2.6.36-trunk-686, version 2.6.36-1~experimental.1
> not found: linux-image-2.6.37-rc4-686, version 2.6.37~rc4-1~experimental.1
>
> and this time i think i got a complete call trace, is attached

Nice. Alas, after looking at the Debian changelog and "git shortlog
v2.6.36..v2.6.37-rc4" output, no particular change jumps out as likely
to have fixed this corruption (and the places the kernel panicked
don't give any obvious clue).

Some ideas for narrowing it down:

- could you try suspending in single-user mode (i.e., kernel
parameters "single debug"), to rule out a problem in the i915
driver?

- likewise, does unloading other modules before suspend help?

- if nothing else gives a hint: can you bisect to find the fix? It
works like this:

1. Reproduce the bug with the unpatched kernel.

# apt-get install git-core build-essential
$ git clone git://github.com/torvalds/linux.git; # kernel.org is down
$ cd linux
$ git checkout v2.6.36
$ make localmodconfig; # minimal configuration
$ make deb-pkg; # with -j<n> for parallel build if wanted
# dpkg -i ../<linux-image package name>
# reboot
... test test test ...

Hopefully it reproduces the bug. Otherwise, declare victory and we
can figure out how Debian-specific changes screwed it up.

2. Reproduce the fix.

$ cd ~/src/linux
$ git checkout v2.6.37-rc4
$ yes "" | make silentoldconfig; # reuse configuration
$ make deb-pkg
# dpkg -i ../<linux-image package name>
# reboot
... test test test ...

Hopefully it does _not_ reproduce the bug. If not, try again after
copying Debian's config-2.6.37-rc4-686 as ~/src/linux/.config and
rebuild --- if that fixes it, declare victory and we can figure out
which configuration change fixed it, and if that doesn't fix it, we
can look for a relevant Debian-specific patch.

3. Great --- so v2.6.36 reproduces the bug and v2.6.37-rc4 reproduces
the fix. Tell git:

$ cd ~/src/linux
$ git bisect start v2.6.37-rc4 v2.6.36

Git checks out a revision halfway between to test.

$ yes "" | make silentoldconfig; # reuse configuration
$ make deb-pkg
# dpkg -i ../<linux-image package name>
# reboot
... test test test ...
$ cd ~/src/linux
$ git bisect good; # if it crashes
$ git bisect bad; # if it is stable
$ git bisect skip; # if some other bug makes it hard to test

Yes, "good" means "successfully demonstrates the bug". The naming is
a little confusing because git bisect is usually used to find changes
introducing bugs rather than changes fixing them.

4. Repeat until bored:

$ make silentoldconfig
$ make deb-pkg
# dpkg -i ../<linux-image package>
# reboot
... test test test ...
$ cd ~/src/linux
$ git bisect good / bad / skip

Eventually it will tell the "first bad commit" (i.e., the fix), which
was what was wanted. If you get bored before then, that's still
useful --- "git bisect log" will tell the results so far. (Even a
few rounds can narrow things down a lot.) If the gitk package is
installed, you can run "git bisect visualize" at any time to watch the
range of changes potentially containing the fix narrowing.

"man git-bisect" and /usr/share/doc/git-doc/git-bisect-lk2009.html
from the git-doc package have details.

Thanks much for your help so far!
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20110912005651.GA2548@elie.sbx02827.chicail.waypor t.net">http://lists.debian.org/20110912005651.GA2548@elie.sbx02827.chicail.waypor t.net
 
Old 11-23-2011, 12:15 PM
Jonathan Nieder
 
Default Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

Hi again,

Jonathan Nieder wrote:
> Andreas Berger wrote:

>> ok, i narrowed it down, but it is:
>>
>> found: linux-image-2.6.36-trunk-686, version 2.6.36-1~experimental.1
>> not found: linux-image-2.6.37-rc4-686, version 2.6.37~rc4-1~experimental.1
>>
>> and this time i think i got a complete call trace, is attached
>
> Nice.
[...]
> - could you try suspending in single-user mode (i.e., kernel
> parameters "single debug"), to rule out a problem in the i915
> driver?

Did you get a chance to try this?

Even simpler can be to suspend from an initramfs rescue shell,
prepared as described at [1]:

echo mem >/sys/power/state

By the way, does trouble only happen after suspend (suspend to RAM),
or does hibernation (suspend to disk) trigger it, too?

[1] http://wiki.debian.org/InitramfsDebug



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20111123131539.GA6595@elie.hsd1.il.comcast.net">ht tp://lists.debian.org/20111123131539.GA6595@elie.hsd1.il.comcast.net
 
Old 03-16-2012, 10:52 PM
Jonathan Nieder
 
Default Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

affects 584314 + xserver-xorg-video-intel
quit

Hi,

Thanks again for your work on this bug so far.

To recap, this bug is about symptoms of memory corruption after
suspending to disk on an Acer Aspire 5610, which uses (I think)
the 945GM express chipset.

Lenny and wheezy worked fine; it is only the squeeze kernel that
has this problem.

Searching through kernels from snapshot.debian.org, you found
that it was introduced between 2.6.36 and 2.6.37-rc4. (Nicely
done.)

Andreas Berger wrote:

> ok, i narrowed it down, but it is:
>
> found: linux-image-2.6.36-trunk-686, version 2.6.36-1~experimental.1
> not found: linux-image-2.6.37-rc4-686, version 2.6.37~rc4-1~experimental.1

Unfortunately there are a lot of interesting patches in that range,
so we will probably need a little more data to track this down. So
I suggested:

- trying suspend-to-disk (with

echo disk >/sys/power/state

) and seeing if that reproduces the same trouble

- suspending from single-user mode (kernel params "single debug")
or from an initramfs shell (kernel param "break=top") to see if
the same problem occurs even if the i915 driver is not loaded
yet when the suspend/hibernate happens

Other ideas would be welcome, too. I'd be happy to get this fixed in
squeeze.

Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120316235216.GA32646@burratino">http://lists.debian.org/20120316235216.GA32646@burratino
 
Old 03-16-2012, 10:59 PM
Jonathan Nieder
 
Default Bug#584314: base: System freezes at random time after Resume from Suspend (Regression)

Jonathan Nieder wrote:

> To recap, this bug is about symptoms of memory corruption after
> suspending to disk on an Acer Aspire 5610, which uses (I think)
> the 945GM express chipset.

This should have read "after suspending to RAM". Sorry for the
nonsense.



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20120316235953.GA32730@burratino">http://lists.debian.org/20120316235953.GA32730@burratino
 

Thread Tools




All times are GMT. The time now is 06:21 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org