Bug#647095: CPU hyperthreading turned on after soft power-cycle
On Sun, 2011-10-30 at 07:05 -0400, Jiri Polach wrote:
> Package: linux-2.6
> Version: 2.6.39-3~bpo60+1
> Severity: normal
>
>
> When the computer is turned off using "shutdown -h" or "halt" command,
> the hypertherading BIOS setting is changed - even if hypertherading is
> disabled in BIOS, the kernel detects twice as many "processors" on
> next boot as if hyperthreading was enabled. Please see details below.
>
> I have observed the problem on several Supermicro platforms with
> various Intel Xeon processors. The particular case I report was
> observed on Supermicro X8DTT-F mainboard with two Intel Xeon E5645
> processors (6core). The problem can be reproduced the following way:
By my understanding of how hyperthreading is controlled, this has to be
a BIOS bug, as you seem to have suspected. But if the BIOS behaviour is
kernel version-dependent, then presumably there is something the kernel
can do to work around it.
> 1. Turn on the computer, go to BIOS setup and turn "Simultaneous
> multithreading" to "Disabled". Boot Debian.
>
> 2. Check with "cat /proc/cpuinfo" that the system reports 12 CPUs (2 x
> six-core processor).
>
> 3. (optionally) Reboot the system (shutdown -r) and check that there
> are still 12 CPUs detected and reported.
>
> 4. Halt the system using "shutdown -h" or "halt", turn it on again,
> and boot Debian.
I assume from this that shutdown -h is configured to turn the system
off.
> 5. Check the number of CPUs reported - it will show you that there are
> 24 CPUs as if hyperthreading was enabled.
>
> 6. Reboot and go to BIOS setup - it still shows that "Simultaneous
> multithreading" is set to "Disabled". Do not change anythig, just
> select "Save and Exit". Boot Debian and check the number of CPUs - it
> now shows 12 CPUs again.
>
> I have tested several kernel versions and it seems that this behavior
> appeared for the first time somewhere between 2.6.35.7 and 2.6.38.6
> versions (ok = does not show the decribed behavior, not ok = does
> show):
>
> * linux-image-2.6.32-5-amd64 official Debian - ok
> * linux-image-2.6.39-bpo.2-amd64 official Debian from backports - not
> ok
>
> * linux 2.6.35.7 - custom compiled from source - ok
> * linux 2.6.38.6 - custom compiled from source - not ok
> * linux 2.6.39.4 - custom compiled from source - not ok
> * linux 3.0.4 - custom compiled from source - not ok
That might be too large a range for developers to consider. Can you
test some versions between 2.6.35.7 and 2.6.38.6 (bisection)?
Ben.
> I have exchnged many e-mails with Supermicro distributor who
> apparently is in direct contact with Supermicro technicians. They more
> or less deny any responsibility for this problem and repeatedly point
> to the fact that some (older) kernels do not exhibit this behavior so
> it must be a kernel problem. Their representative writes:
>
> "I discussed this with supermicro and they informed me that the Kernel
> itself is causing the issue, that it may be sending the hyperthreading
> command code to the BIOS."
>
> Although I do not completely agree with their arguments, my knowledge
> is not deep enough to recognize where exactly the core of the problem
> is so I report this as a bug in a hope that someone will know what
> happens when a kernel turns a computer off and what has changed in
> kernel somewhere between the versions I mention above. I have asked
> Supermicro distributor for more information on what they think happens
> there and what exactly they mean by "hyperhreading command code" and I
> am waiting for their response.
>
> -- Package-specific info:
> ** Version:
> Linux version 2.6.39-bpo.2-amd64 (Debian 2.6.39-3~bpo60+1) (norbert@tretkowski.de) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Tue Jul 26 10:35:23 UTC 2011
[...]
> ** Model information
> sys_vendor: Supermicro
> product_name: X8DTT
> product_version: 1234567890
> chassis_vendor: Supermicro
> chassis_version: 1234567890
> bios_vendor: American Megatrends Inc.
> bios_version: 080016
> board_vendor: Supermicro
> board_name: X8DTT
> board_version: 2.0
[...]
--
Ben Hutchings
compatible: Gracefully accepts erroneous data from any source
10-31-2011, 12:06 PM
Clarinet
Bug#647095: CPU hyperthreading turned on after soft power-cycle
On 10/30/2011 4:25 PM, Ben Hutchings wrote:
On Sun, 2011-10-30 at 07:05 -0400, Jiri Polach wrote:
Package: linux-2.6
Version: 2.6.39-3~bpo60+1
Severity: normal
When the computer is turned off using "shutdown -h" or "halt" command,
the hypertherading BIOS setting is changed - even if hypertherading is
disabled in BIOS, the kernel detects twice as many "processors" on
next boot as if hyperthreading was enabled. Please see details below.
I have observed the problem on several Supermicro platforms with
various Intel Xeon processors. The particular case I report was
observed on Supermicro X8DTT-F mainboard with two Intel Xeon E5645
processors (6core). The problem can be reproduced the following way:
By my understanding of how hyperthreading is controlled, this has to be
a BIOS bug, as you seem to have suspected. But if the BIOS behaviour is
kernel version-dependent, then presumably there is something the kernel
can do to work around it.
Yes, there are reasons that support my suspicion that BIOS is not doing
its work properly. But I cannot prove it until it is clear what has been
changed in the kernel.
1. Turn on the computer, go to BIOS setup and turn "Simultaneous
multithreading" to "Disabled". Boot Debian.
2. Check with "cat /proc/cpuinfo" that the system reports 12 CPUs (2 x
six-core processor).
3. (optionally) Reboot the system (shutdown -r) and check that there
are still 12 CPUs detected and reported.
4. Halt the system using "shutdown -h" or "halt", turn it on again,
and boot Debian.
I assume from this that shutdown -h is configured to turn the system
off.
I do not know. I have been using mostly "halt" to shutdown the system
and turn the server off and I tried "shutdown -h" only several times to
see if there is any difference. Both commands have turned the computer
off, but I did not do any special "shutdown -h" configuration.
5. Check the number of CPUs reported - it will show you that there are
24 CPUs as if hyperthreading was enabled.
6. Reboot and go to BIOS setup - it still shows that "Simultaneous
multithreading" is set to "Disabled". Do not change anythig, just
select "Save and Exit". Boot Debian and check the number of CPUs - it
now shows 12 CPUs again.
I have tested several kernel versions and it seems that this behavior
appeared for the first time somewhere between 2.6.35.7 and 2.6.38.6
versions (ok = does not show the decribed behavior, not ok = does
show):
* linux-image-2.6.32-5-amd64 official Debian - ok
* linux-image-2.6.39-bpo.2-amd64 official Debian from backports - not
ok
* linux 2.6.35.7 - custom compiled from source - ok
* linux 2.6.38.6 - custom compiled from source - not ok
* linux 2.6.39.4 - custom compiled from source - not ok
* linux 3.0.4 - custom compiled from source - not ok
That might be too large a range for developers to consider. Can you
test some versions between 2.6.35.7 and 2.6.38.6 (bisection)?
OK, after another day of testing it seems that the problem appears in
2.6.38.1, because
* linux 2.6.37.6 - custom compiled from source - ok
* linux 2.6.38.1 - custom compiled from source - not ok
Best regards,
Jiri Polach
Ben.
I have exchnged many e-mails with Supermicro distributor who
apparently is in direct contact with Supermicro technicians. They more
or less deny any responsibility for this problem and repeatedly point
to the fact that some (older) kernels do not exhibit this behavior so
it must be a kernel problem. Their representative writes:
"I discussed this with supermicro and they informed me that the Kernel
itself is causing the issue, that it may be sending the hyperthreading
command code to the BIOS."
Although I do not completely agree with their arguments, my knowledge
is not deep enough to recognize where exactly the core of the problem
is so I report this as a bug in a hope that someone will know what
happens when a kernel turns a computer off and what has changed in
kernel somewhere between the versions I mention above. I have asked
Supermicro distributor for more information on what they think happens
there and what exactly they mean by "hyperhreading command code" and I
am waiting for their response.
-- Package-specific info:
** Version:
Linux version 2.6.39-bpo.2-amd64 (Debian 2.6.39-3~bpo60+1) (norbert@tretkowski.de) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Tue Jul 26 10:35:23 UTC 2011
[...]
** Model information
sys_vendor: Supermicro
product_name: X8DTT
product_version: 1234567890
chassis_vendor: Supermicro
chassis_version: 1234567890
bios_vendor: American Megatrends Inc.
bios_version: 080016
board_vendor: Supermicro
board_name: X8DTT
board_version: 2.0
[...]
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4EAE9D3A.7000108@atlas.cz">http://lists.debian.org/4EAE9D3A.7000108@atlas.cz
11-08-2011, 11:33 AM
Jiri Polach
Bug#647095: CPU hyperthreading turned on after soft power-cycle
On 10/31/2011 2:06 PM, Clarinet wrote:
On 10/30/2011 4:25 PM, Ben Hutchings wrote:
On Sun, 2011-10-30 at 07:05 -0400, Jiri Polach wrote:
Package: linux-2.6
Version: 2.6.39-3~bpo60+1
Severity: normal
>>
>> ...
That might be too large a range for developers to consider. Can you
test some versions between 2.6.35.7 and 2.6.38.6 (bisection)?
OK, after another day of testing it seems that the problem appears in
2.6.38.1, because
* linux 2.6.37.6 - custom compiled from source - ok
* linux 2.6.38.1 - custom compiled from source - not ok
On Ben's advice I am trying to locate the commit that causes the problem
to appear more precisely using 'git bisect'. However, too many of
generated revisions are unbootable so I have to use 'bisect skip'
frequently. I started with 4059 revisions to test (roughly 12 steps) and
after 15 steps I still have 2902 revisions to test (rougly 12 steps).
Is there any way to speed this process up? I tried to do bisection
manually but I do not understand git enough to be able to do this
efficiently.
My current bisect log is below.
Jiri Polach
---
git bisect start
# good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
# bad: [521cb40b0c44418a4fd36dc633f575813d59a43d] Linux 2.6.38
git bisect bad 521cb40b0c44418a4fd36dc633f575813d59a43d
# bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1
git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470
# good: [c8ddb2713c624f432fa5fe3c7ecffcdda46ea0d4] Linux 2.6.37-rc1
git bisect good c8ddb2713c624f432fa5fe3c7ecffcdda46ea0d4
# good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
# skip: [ecacc6c70cf77a52a22af66c879873202522d6ce] Merge branch
'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
git bisect skip ecacc6c70cf77a52a22af66c879873202522d6ce
# skip: [22113efd00491310da802f3b1a9a66cfcf415fac] mmc: Test bus-width
for old MMC devices
git bisect skip 22113efd00491310da802f3b1a9a66cfcf415fac
# good: [233cbe5b94096f95ba7bca2162d63275b0b90b5b] OMAP2+: hwmod: Update
the sysc_cache in case module context is lost
git bisect good 233cbe5b94096f95ba7bca2162d63275b0b90b5b
# skip: [443e6221e465efa8efb752a8405a759ef1161af9] Merge branch
'for_linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86
git bisect skip 443e6221e465efa8efb752a8405a759ef1161af9
# good: [9e3be1edbe5ca57df51140b523168237b3a01f4d] Merge branch
'for-2.6.37' into HEAD
git bisect good 9e3be1edbe5ca57df51140b523168237b3a01f4d
# good: [6c869e772c72d509d0db243a56c205ef48a29baf] Merge branch
'perf/urgent' into perf/core
git bisect skip f451171c5ac829e55581c81caf2cb01e1c0a5c5f
# good: [aa5cbf8a70f57c5360ce1bfef692b357c866ae7f] [SCSI] qla2xxx: Use
sg_next to fetch next sg element while walking sg list.
git bisect good c0b33bdc5b8d9c1120dece660480d4dd86b817ee
# good: [d7ae30f073a179a9cebd663e7502843ddf4ba672] mac80211: document
workqueue
git bisect good d7ae30f073a179a9cebd663e7502843ddf4ba672
# skip: [949f6711b83d2809d1ccb9d830155a65fdacdff9] Merge branch
'staging-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6
git bisect skip 949f6711b83d2809d1ccb9d830155a65fdacdff9
# good: [40e44399301b6dbd997408a184140b79b77f632d] omap2+: Add struct
omap_board_data and use it for platform level serial init
git bisect good 40e44399301b6dbd997408a184140b79b77f632d
# skip: [8f9b54a35a70b604ebd2b2f2e7e04eabd0ff8a54] Decompressors: check
for write errors in decompress_unlzo.c
git bisect skip c0afc916029c02a8650e533392893b3da6326d1e
# good: [9ac4e613a88d7f6a7a9651d863e9c8f63b582718] mtd: OneNAND:
OMAP2/3: prevent regulator sleeping while OneNAND is in use
git bisect good 9ac4e613a88d7f6a7a9651d863e9c8f63b582718
# skip: [2b6203bb7d85e6a2ca2088b8684f30be70246ddf] qeth: enable
interface setup if LAN is offline
git bisect good 0b97fee0ef9b0a0445520f90980410f905c6f9da
# skip: [a7f5a5fcd9f13afd3471a0de8c1fdaa8f989497c] ixgbe: fix for link
failure on SFP+ DA cables
git bisect good ba5d1012292403c8037adf4a54c4ec50dfe846c4
# good: [b646d90053f887c1bc243191e693a9b02d09f2c2] r8169: magic.
git bisect good b646d90053f887c1bc243191e693a9b02d09f2c2
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4EB92181.5030500@seznam.cz">http://lists.debian.org/4EB92181.5030500@seznam.cz
11-10-2011, 12:52 AM
Jonathan Nieder
Bug#647095: CPU hyperthreading turned on after soft power-cycle
Hi Jiri,
Jiri Polach wrote:
> On Ben's advice I am trying to locate the commit that causes the problem to
> appear more precisely using 'git bisect'. However, too many of generated
> revisions are unbootable so I have to use 'bisect skip' frequently.
Ok, so I've looked over the log at <http://bugs.debian.org/647095>, and
this seems totally weird. Have I described the symptoms correctly below?
(Warning: I am making some guesses, especially at step 5. In case of
doubt, see the bug log just mentioned.)
1. Disable SMT in the BIOS.
2. Boot a bad kernel. /proc/cpuinfo (correctly) shows one entry
per core.
3. "shutdown -h now". Enter BIOS. SMT is still disabled.
Don't save.
4. Boot any kernel. /proc/cpuinfo shows two entries per core.
5. "shutdown -h now". Boot any kernel. /proc/cpuinfo still shows
two entries per core.
6. "shutdown -h now". Enter BIOS. SMT is still disabled. Save.
Now /proc/cpuinfo will (correctly) shows one entry per core.
Reproducible for Jiri with v3.0.4.
Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
many of the topic branches merged in the 2.6.38 merge window work ok.
Some other topic branches do not boot at all.
Jiri: if you have gitk installed, then "git bisect visualize" can help
get a sense of what's in the middle of the regression range.
"gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
to find mainline commits to test before finding a topic branch to delve
into.
x86 people: do the symptoms seem familiar? Any hints for tracking it
down?
Thanks and hope that helps,
Jonathan
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20111110015212.GB2399@elie.gateway.2wire.net">http ://lists.debian.org/20111110015212.GB2399@elie.gateway.2wire.net
11-11-2011, 12:50 PM
Clarinet
Bug#647095: CPU hyperthreading turned on after soft power-cycle
Hi all,
Hi Jiri,
Jiri Polach wrote:
On Ben's advice I am trying to locate the commit that causes the problem to
appear more precisely using 'git bisect'. However, too many of generated
revisions are unbootable so I have to use 'bisect skip' frequently.
Ok, so I've looked over the log at<http://bugs.debian.org/647095>, and
this seems totally weird. Have I described the symptoms correctly below?
(Warning: I am making some guesses, especially at step 5. In case of
doubt, see the bug log just mentioned.)
1. Disable SMT in the BIOS.
2. Boot a bad kernel. /proc/cpuinfo (correctly) shows one entry
per core.
3. "shutdown -h now". Enter BIOS. SMT is still disabled.
Don't save.
4. Boot any kernel. /proc/cpuinfo shows two entries per core.
5. "shutdown -h now". Boot any kernel. /proc/cpuinfo still shows
two entries per core.
6. "shutdown -h now". Enter BIOS. SMT is still disabled. Save.
Now /proc/cpuinfo will (correctly) shows one entry per core.
Reproducible for Jiri with v3.0.4.
Yes, this is exactly how it works. Something happens when kernel shuts
down. Not when kernel reboots.
Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
many of the topic branches merged in the 2.6.38 merge window work ok.
Some other topic branches do not boot at all.
Jiri: if you have gitk installed, then "git bisect visualize" can help
get a sense of what's in the middle of the regression range.
"gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
to find mainline commits to test before finding a topic branch to delve
into.
I have been able to narrow the interval manually a little bit from the
"top" (the bad side) and I will go on from the bottom now. However,
there seems to be a large area where kernels are unbootable for me -
they mostly stop when init is called and I do not know why.
x86 people: do the symptoms seem familiar? Any hints for tracking it
down?
Please! I have spent more than a month trying to resolve it. I cannot
revert back to 2.6.37 kernels and I cannot live with SMT changing on
every shutdown - I have too many servers to allow such unusual behavior ...
Thank you,
Jiri Polach
Thanks and hope that helps,
Jonathan
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4EBD2825.6050806@atlas.cz">http://lists.debian.org/4EBD2825.6050806@atlas.cz
11-16-2011, 09:49 PM
Clarinet
Bug#647095: CPU hyperthreading turned on after soft power-cycle
Hi all,
Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
many of the topic branches merged in the 2.6.38 merge window work ok.
Some other topic branches do not boot at all.
Jiri: if you have gitk installed, then "git bisect visualize" can help
get a sense of what's in the middle of the regression range.
"gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
to find mainline commits to test before finding a topic branch to delve
into.
I have been able to narrow the interval manually a little bit from the
"top" (the bad side) and I will go on from the bottom now. However,
there seems to be a large area where kernels are unbootable for me -
they mostly stop when init is called and I do not know why.
Finally! After another 50+ compilations a have it! It took some time as
first I had to find a reason why some revisions did not boot (almost 2/3
were unbootable and the first bad commit was among them). Having this
solved I have been able to bisect without "skipping". The result is
surprising (at least for me) - believe it or not, the first bad commit
is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
John Stultz (I am sending him a copy of this message).
I would never expect this would be a problem, but my understanding of
this commit is very limited, so I am certainly missing the point.
However, I have tried to compile 2.6.38 (which was "bad") with "Real
Time Clock" configuration option turned off and it behaves "normally"
then (= is "good").
Can you please comment this result? What does it mean? Any idea what is
"wrong" there?
Best regards,
Jiri Polach
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4EC43DF7.4010902@atlas.cz">http://lists.debian.org/4EC43DF7.4010902@atlas.cz
11-17-2011, 07:32 PM
John Stultz
Bug#647095: CPU hyperthreading turned on after soft power-cycle
On Wed, 2011-11-16 at 23:49 +0100, Clarinet wrote:
> Hi all,
>
> >> Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
> >> many of the topic branches merged in the 2.6.38 merge window work ok.
> >> Some other topic branches do not boot at all.
> >>
> >> Jiri: if you have gitk installed, then "git bisect visualize" can help
> >> get a sense of what's in the middle of the regression range.
> >> "gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
> >> to find mainline commits to test before finding a topic branch to delve
> >> into.
> >
> > I have been able to narrow the interval manually a little bit from the
> > "top" (the bad side) and I will go on from the bottom now. However,
> > there seems to be a large area where kernels are unbootable for me -
> > they mostly stop when init is called and I do not know why.
>
> Finally! After another 50+ compilations a have it! It took some time as
> first I had to find a reason why some revisions did not boot (almost 2/3
> were unbootable and the first bad commit was among them). Having this
> solved I have been able to bisect without "skipping". The result is
> surprising (at least for me) - believe it or not, the first bad commit
> is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
> John Stultz (I am sending him a copy of this message).
>
> I would never expect this would be a problem, but my understanding of
> this commit is very limited, so I am certainly missing the point.
> However, I have tried to compile 2.6.38 (which was "bad") with "Real
> Time Clock" configuration option turned off and it behaves "normally"
> then (= is "good").
Huh. That's *very* odd. Is your system doing anything in-particular
with the RTC? I don't have a clue right off, so probably the next step
is doing a bit of instrumentation to try to figure out where exactly we
trigger the behavior. Could you checkout commit 6610e089 and apply the
patch below to see if we can't narrow it down?
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1321561946.25715.16.camel@work-vm">http://lists.debian.org/1321561946.25715.16.camel@work-vm
11-17-2011, 10:42 PM
Jiri Polach
Bug#647095: CPU hyperthreading turned on after soft power-cycle
On 11/17/2011 9:32 PM, John Stultz wrote:
On Wed, 2011-11-16 at 23:49 +0100, Clarinet wrote:
Hi all,
Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
many of the topic branches merged in the 2.6.38 merge window work ok.
Some other topic branches do not boot at all.
Jiri: if you have gitk installed, then "git bisect visualize" can help
get a sense of what's in the middle of the regression range.
"gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
to find mainline commits to test before finding a topic branch to delve
into.
I have been able to narrow the interval manually a little bit from the
"top" (the bad side) and I will go on from the bottom now. However,
there seems to be a large area where kernels are unbootable for me -
they mostly stop when init is called and I do not know why.
Finally! After another 50+ compilations a have it! It took some time as
first I had to find a reason why some revisions did not boot (almost 2/3
were unbootable and the first bad commit was among them). Having this
solved I have been able to bisect without "skipping". The result is
surprising (at least for me) - believe it or not, the first bad commit
is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
John Stultz (I am sending him a copy of this message).
I would never expect this would be a problem, but my understanding of
this commit is very limited, so I am certainly missing the point.
However, I have tried to compile 2.6.38 (which was "bad") with "Real
Time Clock" configuration option turned off and it behaves "normally"
then (= is "good").
Huh. That's *very* odd. Is your system doing anything in-particular
with the RTC? I don't have a clue right off, so probably the next step
Yes, it is very odd. The system does not do anything special with RTC.
It is a diskless computational workstation.
is doing a bit of instrumentation to try to figure out where exactly we
trigger the behavior. Could you checkout commit 6610e089 and apply the
patch below to see if we can't narrow it down?
With the patch applied the system does not show the strange behavior (=
is "good").
Could you also send your .config to me?
Sure. It is attached. I have found that if I turn CONFIG_RTC_DRV_CMOS
off, the system behaves normally (= is "good") too.
Bug#647095: CPU hyperthreading turned on after soft power-cycle
On Fri, 2011-11-18 at 00:42 +0100, Jiri Polach wrote:
> On 11/17/2011 9:32 PM, John Stultz wrote:
> > On Wed, 2011-11-16 at 23:49 +0100, Clarinet wrote:
> >> Hi all,
> >>
> >>>> Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
> >>>> many of the topic branches merged in the 2.6.38 merge window work ok.
> >>>> Some other topic branches do not boot at all.
> >>>>
> >>>> Jiri: if you have gitk installed, then "git bisect visualize" can help
> >>>> get a sense of what's in the middle of the regression range.
> >>>> "gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
> >>>> to find mainline commits to test before finding a topic branch to delve
> >>>> into.
> >>>
> >>> I have been able to narrow the interval manually a little bit from the
> >>> "top" (the bad side) and I will go on from the bottom now. However,
> >>> there seems to be a large area where kernels are unbootable for me -
> >>> they mostly stop when init is called and I do not know why.
> >>
> >> Finally! After another 50+ compilations a have it! It took some time as
> >> first I had to find a reason why some revisions did not boot (almost 2/3
> >> were unbootable and the first bad commit was among them). Having this
> >> solved I have been able to bisect without "skipping". The result is
> >> surprising (at least for me) - believe it or not, the first bad commit
> >> is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
> >> John Stultz (I am sending him a copy of this message).
> >>
> >> I would never expect this would be a problem, but my understanding of
> >> this commit is very limited, so I am certainly missing the point.
> >> However, I have tried to compile 2.6.38 (which was "bad") with "Real
> >> Time Clock" configuration option turned off and it behaves "normally"
> >> then (= is "good").
> >
> > Huh. That's *very* odd. Is your system doing anything in-particular
> > with the RTC? I don't have a clue right off, so probably the next step
>
> Yes, it is very odd. The system does not do anything special with RTC.
> It is a diskless computational workstation.
>
> > is doing a bit of instrumentation to try to figure out where exactly we
> > trigger the behavior. Could you checkout commit 6610e089 and apply the
> > patch below to see if we can't narrow it down?
>
> With the patch applied the system does not show the strange behavior (=
> is "good").
>
> > Could you also send your .config to me?
>
> Sure. It is attached. I have found that if I turn CONFIG_RTC_DRV_CMOS
> off, the system behaves normally (= is "good") too.
Yea. My rough guess is that the BIOS is somehow sensitive to how the
CMOS RTC is touched.
Does disabling CONFIG_HPET_EMULATE_RTC change the behavior?
thanks
-john
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1321574019.25715.52.camel@work-vm">http://lists.debian.org/1321574019.25715.52.camel@work-vm
11-21-2011, 12:27 PM
Jiri Polach
Bug#647095: CPU hyperthreading turned on after soft power-cycle
Finally! After another 50+ compilations a have it! It took some time as
first I had to find a reason why some revisions did not boot (almost 2/3
were unbootable and the first bad commit was among them). Having this
solved I have been able to bisect without "skipping". The result is
surprising (at least for me) - believe it or not, the first bad commit
is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
John Stultz (I am sending him a copy of this message).
I would never expect this would be a problem, but my understanding of
this commit is very limited, so I am certainly missing the point.
However, I have tried to compile 2.6.38 (which was "bad") with "Real
Time Clock" configuration option turned off and it behaves "normally"
then (= is "good").
Huh. That's *very* odd. Is your system doing anything in-particular
with the RTC? I don't have a clue right off, so probably the next step
Yes, it is very odd. The system does not do anything special with RTC.
It is a diskless computational workstation.
is doing a bit of instrumentation to try to figure out where exactly we
trigger the behavior. Could you checkout commit 6610e089 and apply the
patch below to see if we can't narrow it down?
With the patch applied the system does not show the strange behavior (=
is "good").
Could you also send your .config to me?
Sure. It is attached. I have found that if I turn CONFIG_RTC_DRV_CMOS
off, the system behaves normally (= is "good") too.
Yea. My rough guess is that the BIOS is somehow sensitive to how the
CMOS RTC is touched.
Does disabling CONFIG_HPET_EMULATE_RTC change the behavior?
But how do I do it? :-)
I have not found a way to disable it in "menuconfig". If I comment it
out manually in .config, it is automatically set back to "y" as soon as
compilation starts ...
Thanks,
Jiri
--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4ECA51B9.3010707@seznam.cz">http://lists.debian.org/4ECA51B9.3010707@seznam.cz