FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 11-15-2011, 07:10 AM
Jonathan Nieder
 
Default Bug#627019: several kernel hangs before geting to login

Hi,

Will Set wrote:

> I see several segfaults and hangs during boot per each successfult login.

By private email, you said:

- you have tested some 3.1.0-1-686-pae kernel (I assume
3.1.0-1~experimental.1 from experimental).
- unless you add "processor.nocst=1", it reliably hangs at boot time.
- adding "processor.nocst=1" makes it boot without hanging.
- in addition to this machine, you have another machine that has an
i865 chipset. It produces the same symptoms.
- in addition, you have a machine with an i915 chipset, which works
fine, with no need for special boot parameters.

In the bug log, I see:

- this is an Acer Aspire One AO521, board JV01-NL, BIOS v1.08
- the chipset is indeed an 82865G
- oopses are all over the place. Feels like corruption somewhere.
- with debug=3, we see that the DMI says this is board D865GRH, BIOS
BF86510A.86A.0077.P25.0508040031 --- wait, are these even the same
machine?
- the other i865 is D865PERLK.

Ok. The processor.nocst=1 workaround indicates that the ACPI tables
might be incorrect or being incorrectly parsed. For the D865GBF, such
a problem is being tracked as bug#630031 and upstream bug 38262.
Compare v2.6.22-rc1~1112^2^2 (ACPICA: clear fields reserved before
FADT r3, 2007-04-28). To move forward on that, the right thing to do
would be to get in touch with Len Brown, for example by answering his
questions from the Fedora bugtracker at
<https://bugzilla.redhat.com/show_bug.cgi?id=727865>.

For the D865PERLK, a quick web search does not show anyone but you
having this problem.

You've said you have three boards you're checking with and only two
exhibit the problem. I'm not sure where the JV01-NL fits into the
picture.

Anyway, for the future, it would be way less confusing to have one bug
per machine, unless they are identically configured or we can be
reasonably certain for some other reason that the same fix will apply
to all of them. Please provide a summary of which machines that you
use are affected and not affected, and I can clone this bug and let
you know the bug number assigned to each.

Thanks for your help and patience.

Regards,
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20111115081058.GA11164@elie.hsd1.il.comcast.net">h ttp://lists.debian.org/20111115081058.GA11164@elie.hsd1.il.comcast.net
 
Old 11-15-2011, 10:16 AM
Will Set
 
Default Bug#627019: several kernel hangs before geting to login

Tuesday, November 15, 2011 3:10AM Jonathan Nieder wrote:
>
>Hi,
>
>Will Set
wrote:
>
>- you have tested some 3.1.0-1-686-pae kernel (I assume
>* 3.1.0-1~experimental.1 from experimental).

Yes, 3.1.0-1~experimental.1 from experimental

>- unless you add "processor.nocst=1", it reliably hangs at boot time.
>- adding "processor.nocst=1" makes it boot without hanging.
>- in addition to this machine, you have another machine that has an
>* i865 chipset.* It produces the same symptoms.
>- in addition, you have a machine with an i915 chipset, which works
>* fine, with no need for special boot parameters.

Yes.

>
>In the bug log, I see:
>
>- this is an Acer Aspire One AO521, board JV01-NL, BIOS v1.08
>- the chipset is indeed an 82865G
>- oopses are all over the place.* Feels like corruption somewhere.
>- with debug=3, we see that the DMI
says this is board D865GRH, BIOS
>* BF86510A.86A.0077.P25.0508040031 --- wait, are these even the same
>* machine?
>- the other i865 is D865PERLK.

What I have gathered so far from reading docs and reports
*it looks like a C state problem.
I think there isn't a CST in this processor...
If CST adjusts processor voltage and stepping for energy saving when idle?
I;m thinking legacy FADT is all this chip can use..

It's not a big deal for me to use the workaround Len Brown suggested
https://bugzilla.redhat.com/show_bug.cgi?id=727865#c16
for 2.6.38-rc* and newer kernels. --->
Debian stable / 2.6.32-5-686 kernel still works fine.
*
And I'm still OK if it's an upstream ( will not fix issue).
But I would like a fix as well, if one is possible. ****

>
>Ok.* The processor.nocst=1 workaround indicates that the ACPI
tables
>might be incorrect or being incorrectly parsed.* For the D865GBF, such
>a problem is being tracked as bug#630031 and upstream bug 38262.
>Compare v2.6.22-rc1~1112^2^2 (ACPICA: clear fields reserved before
>FADT r3, 2007-04-28).* To move forward on that, the right thing to do
>would be to get in touch with Len Brown, for example by answering his
>questions from the Fedora bugtracker at
><https://bugzilla.redhat.com/show_bug.cgi?id=727865>.

All my answers to* Len Browns questions are identical to
Adam 's* https://bugzilla.redhat.com/show_bug.cgi?id=727865#c17
answers to Len Browns questions.

$ grep . /sys/devices/system/cpu/cpu0/cpuidle/*/** -->* doesn't exist.
$ /sys/firmware/acpi/tables/dynamic/* -->* doesn't exist in the filesystem.

>
>For the D865PERLK, a quick web search does not show anyone but you
>having this problem.
>
>You've said you have three boards you're checking with and only two
>exhibit the problem.* I'm not sure where the JV01-NL fits into the
>picture.

*I'm not sure how the JV01-NL got into the picture either.

>
>Anyway, for the future, it would be way less confusing to have one bug
>per machine,

Yes, I agree
100%

>unless they are identically configured or we can
be
>reasonably certain for some other reason that the
same fix will apply
>to all of them.*

Yes,* at this preliminary stage, I think the issue is exactly the same,
or at least close enough, on my two Intel 865 chipset machines.

Even though the two mobos are not identical,*
the processors, memory and disks* are identical in both machines.

>Please provide a summary of which machines that you
>use are affected and not affected, and I can clone this bug and let
>you know the bug number assigned to each.

I will file a separate bug report from the other machine.


>
>Thanks for your help and patience.
>
>Regards,
>Jonathan

Best Regards,
Will
 
Old 12-23-2011, 10:54 PM
Jonathan Nieder
 
Default Bug#627019: several kernel hangs before geting to login

Hi Will,

Will Set wrote:

> I was able to take three pictures of the boot messages by scrolling up the
> boot buffer from the login prompt, while booting 3.2.0-rc4-686-pae
> to illustrate what I did my best to explain yesterday.
>
> I'll also attach the dmesg.udev-2 and acpidump-udev-2

Thanks.

If I understand you correctly, udev 175-2 segfaults at boot. udev
175-3 does _not_ segfault, but the boot fails in some way unless you
add processor.nocst=1 to the kernel command line. Which is already
weird, since the only advertised changes in 175-3 were a fix to the
systemd service file and a fix to udev rules for Xen support. Based
on the kernel log you sent, you are not using systemd, and I assume
you're not using Xen.

This is on the machine with a D865GBF motherboard.

Anyway, you were able to take advantage of this situation to get an acpidump.

Are these results reproducible?

Hope that helps,
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20111223235449.GB25158@elie.Belkin">http://lists.debian.org/20111223235449.GB25158@elie.Belkin
 
Old 12-24-2011, 09:55 PM
Will Set
 
Default Bug#627019: several kernel hangs before geting to login

* Friday, December 23, 2011 6:54 PM Jonathan Nieder wrote
>Hi Will,
>
>Will Set wrote:
>
>> I was able to take three pictures of the boot messages by scrolling up the
>> boot buffer from the login prompt, while booting 3.2.0-rc4-686-pae
>> to illustrate what I did my best to explain yesterday.
>>
>> I'll also attach the dmesg.udev-2 and
acpidump-udev-2
>
>Thanks.
>
>If I understand you correctly, udev 175-2 segfaults at boot.

No, Not always a segfault.

Sometimes udev just hangs, leaving the machine without keyboard access.
And it's way to early in the boot process to get normal network connectivity,

Other times the kernel will panic.
And when the kernel panics I'm not able to save any data from the boot buffer
other than the screen full of data showing when the boot buffer finishes
sending the trace data to the buffer.

Boot also fails in at least one other way.
Where I can see a "udev settle" message and* messages showing the /sys directory structure.
But when this type of issue happens I am able to login and run the system console.
But, if I start the xserver under these condition I have no keyboard or mouse.

These failures have not changed much since I initially reported this.
But I have seen
the failures so may times now that I'm a bit less confused by them.


> udev 175-3 does _not_ segfault,

No, udev 175-3 also segaults iirc
but I have not "re - upgraded" udev to 175-3 to test exactly what it shows, yet.

>but the boot fails in some way unless you
>add processor.nocst=1 to the kernel command line.*

Yes,
Adding processor.nocst=1 has always worked for me on all effected kernels I've tested so far.

But, the boot fails consistently when using udev 175-3* unstable with 3.2.0-rc4-686-pae
and without processor.nocst=1 added to the boot command.

>Which is already
>weird, since the only advertised changes in 175-3 were a fix to the
>systemd service file and a fix to udev rules for Xen support.* Based
>on the kernel log you sent, you are not using systemd, and I assume
>you're not using Xen.

Please understand that a failed boot,
appears - at least from what I can see here,
always to have something to do with udev.

>
>This is on the machine with a D865GBF motherboard.

No,
This report is and always will be* Intel D865GRH mobo.
My other mobo is an Intel D865PERLK

There is another Debian user that has an Intel D865GBF mobo*
with a* "very" similar debian bug report filed.

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631597
[ 9.132009] Pid: 311, comm: modprobe Tainted: G D
2.6.39-2-686-pae #1 /D865GBF

And this user has also filed a bug report upstream after Ben requested he do so.



http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631597#15


>
>Anyway, you were able to take advantage of this situation to get an acpidump.
>
>Are these results reproducible?

Yes, But, the fail is not consistently one failure.
I had three failed boot attempts today while testing with a clean kernel commandline.
ie: processor.nocst=1 was not added to the commandline. on any of my 4 boot attempts today.
The fourth time the machine booted to a useable state.

>

I hope you can find some clues in this email that will make this issue less weird to understand.
And as always I'll do my best to get timely responses back to you, even though I have been busy
elsewhere recently.
I've not had my usual amount of time to devote to testing and learning about the kernel.

Best Regards,
Will

>Hope that helps,
>Jonathan
>
 
Old 12-25-2011, 08:24 AM
Jonathan Nieder
 
Default Bug#627019: several kernel hangs before geting to login

Will Set wrote:
> Jonathan Nieder wrote

>> but the boot fails in some way unless you
>> add processor.nocst=1 to the kernel command line.*
>
> Yes,
> Adding processor.nocst=1 has always worked for me on all effected kernels I've tested so far.
[...]
>> This is on the machine with a D865GBF motherboard.
>
> No,
> This report is and always will be* Intel D865GRH mobo.

Sorry for the typo, and thanks for the corrections.

Excellent --- I suspect that udev is actually a red herring and that
_any_ code executed during the early boot process is likely to
misbehave or segfault on this machine unless processor.nocst=1 is
passed.

In other words, this looks like incorrect execution or memory
corruption during boot. Which is consistent with a broken _CST table.

Unfortunately the acpidump you sent does not include a _CST table.
The log you sent does not include any complaints about lack of a _CST
table, though. Puzzling.

I recommend keeping processor.nocst=1 on the kernel command line for
now. We should report this upstream to Len Brown and the
linux-acpi@vger.kernel.org list, but I would like to delay that until
after the holidays to avoid overwhelming them.

> There is another Debian user that has an Intel D865GBF mobo
> with a "very" similar debian bug report filed.
>
> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631597

Does disabling hyperthreading in the BIOS avoid trouble for you, too?



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20111225092458.GE10805@elie.Belkin">http://lists.debian.org/20111225092458.GE10805@elie.Belkin
 
Old 12-26-2011, 09:24 PM
Will Set
 
Default Bug#627019: several kernel hangs before geting to login

Sunday, December 25, 2011 4:24 AM Jonathan Nieder wrote:
>Will Set wrote:
>> Jonathan Nieder wrote
>
>>> but the boot fails in some way unless you
>>> add processor.nocst=1 to the kernel command line.*
>>
>> Yes,
>> Adding processor.nocst=1 has always worked for me on all effected kernels I've tested so far.
>[...]
>>> This is
on the machine with a D865GBF motherboard.
>>
>> No,
>> This report is and always will be* Intel D865GRH mobo.
>
>Sorry for the typo, and thanks for the corrections.
>
>Excellent --- I suspect that udev is actually a red herring and that
>_any_ code executed during the early boot process is likely to
>misbehave or segfault on this machine unless processor.nocst=1 is
>passed.
>
>In other words, this looks like incorrect execution or memory
>corruption during boot.* Which is consistent with a broken _CST table.
>
>Unfortunately the acpidump you sent does not include a _CST table.
>The log you sent does not include any complaints about lack of a _CST
>table, though.* Puzzling.
>
>I recommend keeping processor.nocst=1 on the kernel command line for
>now.* We should report this upstream to Len Brown and
the
>linux-acpi@vger.kernel.org list, but I would like to delay that until
>after the holidays to avoid overwhelming them.
>
>> There is another Debian user that has an Intel D865GBF mobo*
>> with a* "very" similar debian bug report filed.
>>
>> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=631597
>
>Does disabling hyperthreading in the BIOS avoid trouble for you, too?

Yes, when I disable hyperthreading in the BIOS the machine boots normally.

Please take my findings today with a grain of salt, because of my testing methods used.
I had to test using* 3.1.0-1-686-pae
( which I believe is an old experiemtnal kernel and not a sid or wheezy kernel)
I rebooted 3.1.0-1-686-pae 10 times with hyperthreading enabled,
and got 10 different dmesg problems* very
near if not exactly while
udev was populating /dev
So, I than disabled hyperthreading in the BIOS and rebooted 4 or 5 times.
With hyperthreading disabled I still see a 30 second timeout :

*[*** 7.014167] snd_intel8x0 0000:00:1f.5: PCI INT B -> GSI 17 (level, low) -> IRQ 17
[*** 7.016968] snd_intel8x0 0000:00:1f.5: setting latency timer to 64
[*** 7.476075] intel8x0_measure_ac97_clock: measured 54401 usecs (2621 samples)
[*** 7.478973] intel8x0: clocking to 48000

here while udev finishes populating /dev

But, the boot looks normal to me and I'm using it in this xsession with hyperthreading disabled.

Best Regards,
Will

attachments:
dmesg.hyper
acpidump.hyper

ps: 3.2.0-rc4-686-pae has been remarkably, consistently booting both yesterday and today with any udev
{udev and libudev0 175-1}, {udev and libudev0175-2} and
{udev and libudev0 175-3}

I'm also holding off upgrading the 20 packages waiting in the sid upgrade queue,
until I understand a bit better what I'm looking at in the system today.
 
Old 12-26-2011, 09:33 PM
Will Set
 
Default Bug#627019: several kernel hangs before geting to login

*
Monday, December 26, 2011 5:24 PM Will Set wrote:
>Sunday, December 25, 2011 4:24 AM Jonathan Nieder wrote:
>>Will Set wrote:
>>> Jonathan Nieder wrote
>>
>>>> but the boot fails in some way unless you
>>>> add processor.nocst=1 to the kernel command line.*
[...]

>I had to test using* 3.1.0-1-686-pae
>( which I believe is an old experiemtnal kernel and not a sid or wheezy kernel)

ooops : sorry for confusion: 3.1.0-1-686-pae is a sid kernel.

>I rebooted 3.1.0-1-686-pae 10 times with hyperthreading enabled,
>and got 10 different dmesg problems* very
near if not exactly while
>udev was populating /dev
>
Best Regards,
Will
 

Thread Tools




All times are GMT. The time now is 03:31 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org