FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian Kernel

 
 
LinkBack Thread Tools
 
Old 10-30-2011, 02:25 PM
Ben Hutchings
 
Default Bug#647095: CPU hyperthreading turned on after soft power-cycle

On Sun, 2011-10-30 at 07:05 -0400, Jiri Polach wrote:
> Package: linux-2.6
> Version: 2.6.39-3~bpo60+1
> Severity: normal
>
>
> When the computer is turned off using "shutdown -h" or "halt" command,
> the hypertherading BIOS setting is changed - even if hypertherading is
> disabled in BIOS, the kernel detects twice as many "processors" on
> next boot as if hyperthreading was enabled. Please see details below.
>
> I have observed the problem on several Supermicro platforms with
> various Intel Xeon processors. The particular case I report was
> observed on Supermicro X8DTT-F mainboard with two Intel Xeon E5645
> processors (6core). The problem can be reproduced the following way:

By my understanding of how hyperthreading is controlled, this has to be
a BIOS bug, as you seem to have suspected. But if the BIOS behaviour is
kernel version-dependent, then presumably there is something the kernel
can do to work around it.

> 1. Turn on the computer, go to BIOS setup and turn "Simultaneous
> multithreading" to "Disabled". Boot Debian.
>
> 2. Check with "cat /proc/cpuinfo" that the system reports 12 CPUs (2 x
> six-core processor).
>
> 3. (optionally) Reboot the system (shutdown -r) and check that there
> are still 12 CPUs detected and reported.
>
> 4. Halt the system using "shutdown -h" or "halt", turn it on again,
> and boot Debian.

I assume from this that shutdown -h is configured to turn the system
off.

> 5. Check the number of CPUs reported - it will show you that there are
> 24 CPUs as if hyperthreading was enabled.
>
> 6. Reboot and go to BIOS setup - it still shows that "Simultaneous
> multithreading" is set to "Disabled". Do not change anythig, just
> select "Save and Exit". Boot Debian and check the number of CPUs - it
> now shows 12 CPUs again.
>
> I have tested several kernel versions and it seems that this behavior
> appeared for the first time somewhere between 2.6.35.7 and 2.6.38.6
> versions (ok = does not show the decribed behavior, not ok = does
> show):
>
> * linux-image-2.6.32-5-amd64 official Debian - ok
> * linux-image-2.6.39-bpo.2-amd64 official Debian from backports - not
> ok
>
> * linux 2.6.35.7 - custom compiled from source - ok
> * linux 2.6.38.6 - custom compiled from source - not ok
> * linux 2.6.39.4 - custom compiled from source - not ok
> * linux 3.0.4 - custom compiled from source - not ok

That might be too large a range for developers to consider. Can you
test some versions between 2.6.35.7 and 2.6.38.6 (bisection)?

Ben.

> I have exchnged many e-mails with Supermicro distributor who
> apparently is in direct contact with Supermicro technicians. They more
> or less deny any responsibility for this problem and repeatedly point
> to the fact that some (older) kernels do not exhibit this behavior so
> it must be a kernel problem. Their representative writes:
>
> "I discussed this with supermicro and they informed me that the Kernel
> itself is causing the issue, that it may be sending the hyperthreading
> command code to the BIOS."
>
> Although I do not completely agree with their arguments, my knowledge
> is not deep enough to recognize where exactly the core of the problem
> is so I report this as a bug in a hope that someone will know what
> happens when a kernel turns a computer off and what has changed in
> kernel somewhere between the versions I mention above. I have asked
> Supermicro distributor for more information on what they think happens
> there and what exactly they mean by "hyperhreading command code" and I
> am waiting for their response.
>
> -- Package-specific info:
> ** Version:
> Linux version 2.6.39-bpo.2-amd64 (Debian 2.6.39-3~bpo60+1) (norbert@tretkowski.de) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Tue Jul 26 10:35:23 UTC 2011
[...]
> ** Model information
> sys_vendor: Supermicro
> product_name: X8DTT
> product_version: 1234567890
> chassis_vendor: Supermicro
> chassis_version: 1234567890
> bios_vendor: American Megatrends Inc.
> bios_version: 080016
> board_vendor: Supermicro
> board_name: X8DTT
> board_version: 2.0
[...]

--
Ben Hutchings
compatible: Gracefully accepts erroneous data from any source
 
Old 10-31-2011, 12:06 PM
Clarinet
 
Default Bug#647095: CPU hyperthreading turned on after soft power-cycle

On 10/30/2011 4:25 PM, Ben Hutchings wrote:

On Sun, 2011-10-30 at 07:05 -0400, Jiri Polach wrote:

Package: linux-2.6
Version: 2.6.39-3~bpo60+1
Severity: normal


When the computer is turned off using "shutdown -h" or "halt" command,
the hypertherading BIOS setting is changed - even if hypertherading is
disabled in BIOS, the kernel detects twice as many "processors" on
next boot as if hyperthreading was enabled. Please see details below.

I have observed the problem on several Supermicro platforms with
various Intel Xeon processors. The particular case I report was
observed on Supermicro X8DTT-F mainboard with two Intel Xeon E5645
processors (6core). The problem can be reproduced the following way:


By my understanding of how hyperthreading is controlled, this has to be
a BIOS bug, as you seem to have suspected. But if the BIOS behaviour is
kernel version-dependent, then presumably there is something the kernel
can do to work around it.


Yes, there are reasons that support my suspicion that BIOS is not doing
its work properly. But I cannot prove it until it is clear what has been
changed in the kernel.



1. Turn on the computer, go to BIOS setup and turn "Simultaneous
multithreading" to "Disabled". Boot Debian.

2. Check with "cat /proc/cpuinfo" that the system reports 12 CPUs (2 x
six-core processor).

3. (optionally) Reboot the system (shutdown -r) and check that there
are still 12 CPUs detected and reported.

4. Halt the system using "shutdown -h" or "halt", turn it on again,
and boot Debian.


I assume from this that shutdown -h is configured to turn the system
off.


I do not know. I have been using mostly "halt" to shutdown the system
and turn the server off and I tried "shutdown -h" only several times to
see if there is any difference. Both commands have turned the computer
off, but I did not do any special "shutdown -h" configuration.



5. Check the number of CPUs reported - it will show you that there are
24 CPUs as if hyperthreading was enabled.

6. Reboot and go to BIOS setup - it still shows that "Simultaneous
multithreading" is set to "Disabled". Do not change anythig, just
select "Save and Exit". Boot Debian and check the number of CPUs - it
now shows 12 CPUs again.

I have tested several kernel versions and it seems that this behavior
appeared for the first time somewhere between 2.6.35.7 and 2.6.38.6
versions (ok = does not show the decribed behavior, not ok = does
show):

* linux-image-2.6.32-5-amd64 official Debian - ok
* linux-image-2.6.39-bpo.2-amd64 official Debian from backports - not
ok

* linux 2.6.35.7 - custom compiled from source - ok
* linux 2.6.38.6 - custom compiled from source - not ok
* linux 2.6.39.4 - custom compiled from source - not ok
* linux 3.0.4 - custom compiled from source - not ok


That might be too large a range for developers to consider. Can you
test some versions between 2.6.35.7 and 2.6.38.6 (bisection)?


OK, after another day of testing it seems that the problem appears in
2.6.38.1, because


* linux 2.6.37.6 - custom compiled from source - ok
* linux 2.6.38.1 - custom compiled from source - not ok

Best regards,

Jiri Polach


Ben.


I have exchnged many e-mails with Supermicro distributor who
apparently is in direct contact with Supermicro technicians. They more
or less deny any responsibility for this problem and repeatedly point
to the fact that some (older) kernels do not exhibit this behavior so
it must be a kernel problem. Their representative writes:

"I discussed this with supermicro and they informed me that the Kernel
itself is causing the issue, that it may be sending the hyperthreading
command code to the BIOS."

Although I do not completely agree with their arguments, my knowledge
is not deep enough to recognize where exactly the core of the problem
is so I report this as a bug in a hope that someone will know what
happens when a kernel turns a computer off and what has changed in
kernel somewhere between the versions I mention above. I have asked
Supermicro distributor for more information on what they think happens
there and what exactly they mean by "hyperhreading command code" and I
am waiting for their response.

-- Package-specific info:
** Version:
Linux version 2.6.39-bpo.2-amd64 (Debian 2.6.39-3~bpo60+1) (norbert@tretkowski.de) (gcc version 4.4.5 (Debian 4.4.5-8) ) #1 SMP Tue Jul 26 10:35:23 UTC 2011

[...]

** Model information
sys_vendor: Supermicro
product_name: X8DTT
product_version: 1234567890
chassis_vendor: Supermicro
chassis_version: 1234567890
bios_vendor: American Megatrends Inc.
bios_version: 080016
board_vendor: Supermicro
board_name: X8DTT
board_version: 2.0

[...]





--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4EAE9D3A.7000108@atlas.cz">http://lists.debian.org/4EAE9D3A.7000108@atlas.cz
 
Old 11-08-2011, 11:33 AM
Jiri Polach
 
Default Bug#647095: CPU hyperthreading turned on after soft power-cycle

On 10/31/2011 2:06 PM, Clarinet wrote:

On 10/30/2011 4:25 PM, Ben Hutchings wrote:

On Sun, 2011-10-30 at 07:05 -0400, Jiri Polach wrote:

Package: linux-2.6
Version: 2.6.39-3~bpo60+1
Severity: normal

>>
>> ...


That might be too large a range for developers to consider. Can you
test some versions between 2.6.35.7 and 2.6.38.6 (bisection)?


OK, after another day of testing it seems that the problem appears in
2.6.38.1, because

* linux 2.6.37.6 - custom compiled from source - ok
* linux 2.6.38.1 - custom compiled from source - not ok


On Ben's advice I am trying to locate the commit that causes the problem
to appear more precisely using 'git bisect'. However, too many of
generated revisions are unbootable so I have to use 'bisect skip'
frequently. I started with 4059 revisions to test (roughly 12 steps) and
after 15 steps I still have 2902 revisions to test (rougly 12 steps).


Is there any way to speed this process up? I tried to do bisection
manually but I do not understand git enough to be able to do this
efficiently.


My current bisect log is below.

Jiri Polach

---

git bisect start
# good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
# bad: [521cb40b0c44418a4fd36dc633f575813d59a43d] Linux 2.6.38
git bisect bad 521cb40b0c44418a4fd36dc633f575813d59a43d
# bad: [c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470] Linux 2.6.38-rc1
git bisect bad c56eb8fb6dccb83d9fe62fd4dc00c834de9bc470
# good: [c8ddb2713c624f432fa5fe3c7ecffcdda46ea0d4] Linux 2.6.37-rc1
git bisect good c8ddb2713c624f432fa5fe3c7ecffcdda46ea0d4
# good: [3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5] Linux 2.6.37
git bisect good 3c0eee3fe6a3a1c745379547c7e7c904aa64f6d5
# skip: [ecacc6c70cf77a52a22af66c879873202522d6ce] Merge branch
'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6

git bisect skip ecacc6c70cf77a52a22af66c879873202522d6ce
# skip: [22113efd00491310da802f3b1a9a66cfcf415fac] mmc: Test bus-width
for old MMC devices

git bisect skip 22113efd00491310da802f3b1a9a66cfcf415fac
# good: [233cbe5b94096f95ba7bca2162d63275b0b90b5b] OMAP2+: hwmod: Update
the sysc_cache in case module context is lost

git bisect good 233cbe5b94096f95ba7bca2162d63275b0b90b5b
# skip: [443e6221e465efa8efb752a8405a759ef1161af9] Merge branch
'for_linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86

git bisect skip 443e6221e465efa8efb752a8405a759ef1161af9
# good: [9e3be1edbe5ca57df51140b523168237b3a01f4d] Merge branch
'for-2.6.37' into HEAD

git bisect good 9e3be1edbe5ca57df51140b523168237b3a01f4d
# good: [6c869e772c72d509d0db243a56c205ef48a29baf] Merge branch
'perf/urgent' into perf/core

git bisect good 6c869e772c72d509d0db243a56c205ef48a29baf
# skip: [f451171c5ac829e55581c81caf2cb01e1c0a5c5f] i2c-algo-bit:
Refactor adapter registration

git bisect skip f451171c5ac829e55581c81caf2cb01e1c0a5c5f
# good: [aa5cbf8a70f57c5360ce1bfef692b357c866ae7f] [SCSI] qla2xxx: Use
sg_next to fetch next sg element while walking sg list.

git bisect good aa5cbf8a70f57c5360ce1bfef692b357c866ae7f
# skip: [9b3ffe523af895f6b969b971079da4c06c2743af] ARM: ns9xxx: irq_data
conversion.

git bisect skip 9b3ffe523af895f6b969b971079da4c06c2743af
# good: [c0b33bdc5b8d9c1120dece660480d4dd86b817ee] [media]
gspca-stv06xx: support bandwidth changing

git bisect good c0b33bdc5b8d9c1120dece660480d4dd86b817ee
# good: [d7ae30f073a179a9cebd663e7502843ddf4ba672] mac80211: document
workqueue

git bisect good d7ae30f073a179a9cebd663e7502843ddf4ba672
# skip: [949f6711b83d2809d1ccb9d830155a65fdacdff9] Merge branch
'staging-next' of
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6

git bisect skip 949f6711b83d2809d1ccb9d830155a65fdacdff9
# good: [40e44399301b6dbd997408a184140b79b77f632d] omap2+: Add struct
omap_board_data and use it for platform level serial init

git bisect good 40e44399301b6dbd997408a184140b79b77f632d
# skip: [8f9b54a35a70b604ebd2b2f2e7e04eabd0ff8a54] Decompressors: check
for write errors in decompress_unlzo.c

git bisect skip 8f9b54a35a70b604ebd2b2f2e7e04eabd0ff8a54
# good: [190683a9d5457e6d962c232ffbecac3ab158dddd] net: net_families
__rcu annotations

git bisect good 190683a9d5457e6d962c232ffbecac3ab158dddd
# skip: [c0afc916029c02a8650e533392893b3da6326d1e] ARM: ep93xx: irq_data
conversion.

git bisect skip c0afc916029c02a8650e533392893b3da6326d1e
# good: [9ac4e613a88d7f6a7a9651d863e9c8f63b582718] mtd: OneNAND:
OMAP2/3: prevent regulator sleeping while OneNAND is in use

git bisect good 9ac4e613a88d7f6a7a9651d863e9c8f63b582718
# skip: [2b6203bb7d85e6a2ca2088b8684f30be70246ddf] qeth: enable
interface setup if LAN is offline

git bisect skip 2b6203bb7d85e6a2ca2088b8684f30be70246ddf
# good: [8ec98fe0b4ffdedce4c1caa9fb3d550f52ad1c6b] jz4740-battery:
Protect against concurrent battery readings

git bisect good 8ec98fe0b4ffdedce4c1caa9fb3d550f52ad1c6b
# good: [dc69e2e9fcd7c613eb744ea3b9c4ee9ca554e822] ceph: associate
requests with opening sessions

git bisect good dc69e2e9fcd7c613eb744ea3b9c4ee9ca554e822
# good: [3e2a037c1de79af999a54581cbf1e8a5c933fd95] ARM: PL08x: fix
sparse warnings

git bisect good 3e2a037c1de79af999a54581cbf1e8a5c933fd95
# good: [0b97fee0ef9b0a0445520f90980410f905c6f9da] powerpc/mm: Avoid
avoidable void* pointer

git bisect good 0b97fee0ef9b0a0445520f90980410f905c6f9da
# skip: [a7f5a5fcd9f13afd3471a0de8c1fdaa8f989497c] ixgbe: fix for link
failure on SFP+ DA cables

git bisect skip a7f5a5fcd9f13afd3471a0de8c1fdaa8f989497c
# good: [ba5d1012292403c8037adf4a54c4ec50dfe846c4] xen/gntdev: stop
using "token" argument

git bisect good ba5d1012292403c8037adf4a54c4ec50dfe846c4
# good: [b646d90053f887c1bc243191e693a9b02d09f2c2] r8169: magic.
git bisect good b646d90053f887c1bc243191e693a9b02d09f2c2




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4EB92181.5030500@seznam.cz">http://lists.debian.org/4EB92181.5030500@seznam.cz
 
Old 11-10-2011, 12:52 AM
Jonathan Nieder
 
Default Bug#647095: CPU hyperthreading turned on after soft power-cycle

Hi Jiri,

Jiri Polach wrote:

> On Ben's advice I am trying to locate the commit that causes the problem to
> appear more precisely using 'git bisect'. However, too many of generated
> revisions are unbootable so I have to use 'bisect skip' frequently.

Ok, so I've looked over the log at <http://bugs.debian.org/647095>, and
this seems totally weird. Have I described the symptoms correctly below?
(Warning: I am making some guesses, especially at step 5. In case of
doubt, see the bug log just mentioned.)

1. Disable SMT in the BIOS.

2. Boot a bad kernel. /proc/cpuinfo (correctly) shows one entry
per core.

3. "shutdown -h now". Enter BIOS. SMT is still disabled.
Don't save.

4. Boot any kernel. /proc/cpuinfo shows two entries per core.

5. "shutdown -h now". Boot any kernel. /proc/cpuinfo still shows
two entries per core.

6. "shutdown -h now". Enter BIOS. SMT is still disabled. Save.
Now /proc/cpuinfo will (correctly) shows one entry per core.

Reproducible for Jiri with v3.0.4.

Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
many of the topic branches merged in the 2.6.38 merge window work ok.
Some other topic branches do not boot at all.

Jiri: if you have gitk installed, then "git bisect visualize" can help
get a sense of what's in the middle of the regression range.
"gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
to find mainline commits to test before finding a topic branch to delve
into.

x86 people: do the symptoms seem familiar? Any hints for tracking it
down?

Thanks and hope that helps,
Jonathan



--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 20111110015212.GB2399@elie.gateway.2wire.net">http ://lists.debian.org/20111110015212.GB2399@elie.gateway.2wire.net
 
Old 11-11-2011, 12:50 PM
Clarinet
 
Default Bug#647095: CPU hyperthreading turned on after soft power-cycle

Hi all,


Hi Jiri,

Jiri Polach wrote:


On Ben's advice I am trying to locate the commit that causes the problem to
appear more precisely using 'git bisect'. However, too many of generated
revisions are unbootable so I have to use 'bisect skip' frequently.


Ok, so I've looked over the log at<http://bugs.debian.org/647095>, and
this seems totally weird. Have I described the symptoms correctly below?
(Warning: I am making some guesses, especially at step 5. In case of
doubt, see the bug log just mentioned.)

1. Disable SMT in the BIOS.

2. Boot a bad kernel. /proc/cpuinfo (correctly) shows one entry
per core.

3. "shutdown -h now". Enter BIOS. SMT is still disabled.
Don't save.

4. Boot any kernel. /proc/cpuinfo shows two entries per core.

5. "shutdown -h now". Boot any kernel. /proc/cpuinfo still shows
two entries per core.

6. "shutdown -h now". Enter BIOS. SMT is still disabled. Save.
Now /proc/cpuinfo will (correctly) shows one entry per core.

Reproducible for Jiri with v3.0.4.


Yes, this is exactly how it works. Something happens when kernel shuts
down. Not when kernel reboots.



Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
many of the topic branches merged in the 2.6.38 merge window work ok.
Some other topic branches do not boot at all.

Jiri: if you have gitk installed, then "git bisect visualize" can help
get a sense of what's in the middle of the regression range.
"gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
to find mainline commits to test before finding a topic branch to delve
into.


I have been able to narrow the interval manually a little bit from the
"top" (the bad side) and I will go on from the bottom now. However,
there seems to be a large area where kernels are unbootable for me -
they mostly stop when init is called and I do not know why.



x86 people: do the symptoms seem familiar? Any hints for tracking it
down?


Please! I have spent more than a month trying to resolve it. I cannot
revert back to 2.6.37 kernels and I cannot live with SMT changing on
every shutdown - I have too many servers to allow such unusual behavior ...


Thank you,

Jiri Polach


Thanks and hope that helps,
Jonathan





--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4EBD2825.6050806@atlas.cz">http://lists.debian.org/4EBD2825.6050806@atlas.cz
 
Old 11-16-2011, 09:49 PM
Clarinet
 
Default Bug#647095: CPU hyperthreading turned on after soft power-cycle

Hi all,


Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
many of the topic branches merged in the 2.6.38 merge window work ok.
Some other topic branches do not boot at all.

Jiri: if you have gitk installed, then "git bisect visualize" can help
get a sense of what's in the middle of the regression range.
"gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
to find mainline commits to test before finding a topic branch to delve
into.


I have been able to narrow the interval manually a little bit from the
"top" (the bad side) and I will go on from the bottom now. However,
there seems to be a large area where kernels are unbootable for me -
they mostly stop when init is called and I do not know why.


Finally! After another 50+ compilations a have it! It took some time as
first I had to find a reason why some revisions did not boot (almost 2/3
were unbootable and the first bad commit was among them). Having this
solved I have been able to bisect without "skipping". The result is
surprising (at least for me) - believe it or not, the first bad commit
is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
John Stultz (I am sending him a copy of this message).


I would never expect this would be a problem, but my understanding of
this commit is very limited, so I am certainly missing the point.
However, I have tried to compile 2.6.38 (which was "bad") with "Real
Time Clock" configuration option turned off and it behaves "normally"
then (= is "good").


Can you please comment this result? What does it mean? Any idea what is
"wrong" there?


Best regards,

Jiri Polach




--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4EC43DF7.4010902@atlas.cz">http://lists.debian.org/4EC43DF7.4010902@atlas.cz
 
Old 11-17-2011, 07:32 PM
John Stultz
 
Default Bug#647095: CPU hyperthreading turned on after soft power-cycle

On Wed, 2011-11-16 at 23:49 +0100, Clarinet wrote:
> Hi all,
>
> >> Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
> >> many of the topic branches merged in the 2.6.38 merge window work ok.
> >> Some other topic branches do not boot at all.
> >>
> >> Jiri: if you have gitk installed, then "git bisect visualize" can help
> >> get a sense of what's in the middle of the regression range.
> >> "gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
> >> to find mainline commits to test before finding a topic branch to delve
> >> into.
> >
> > I have been able to narrow the interval manually a little bit from the
> > "top" (the bad side) and I will go on from the bottom now. However,
> > there seems to be a large area where kernels are unbootable for me -
> > they mostly stop when init is called and I do not know why.
>
> Finally! After another 50+ compilations a have it! It took some time as
> first I had to find a reason why some revisions did not boot (almost 2/3
> were unbootable and the first bad commit was among them). Having this
> solved I have been able to bisect without "skipping". The result is
> surprising (at least for me) - believe it or not, the first bad commit
> is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
> John Stultz (I am sending him a copy of this message).
>
> I would never expect this would be a problem, but my understanding of
> this commit is very limited, so I am certainly missing the point.
> However, I have tried to compile 2.6.38 (which was "bad") with "Real
> Time Clock" configuration option turned off and it behaves "normally"
> then (= is "good").

Huh. That's *very* odd. Is your system doing anything in-particular
with the RTC? I don't have a clue right off, so probably the next step
is doing a bit of instrumentation to try to figure out where exactly we
trigger the behavior. Could you checkout commit 6610e089 and apply the
patch below to see if we can't narrow it down?

Could you also send your .config to me?

thanks
-john

diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 5856167..d049344 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -497,13 +497,13 @@ static int cmos_procfs(struct device *dev, struct seq_file *seq)
static const struct rtc_class_ops cmos_rtc_ops = {
.read_time = cmos_read_time,
.set_time = cmos_set_time,
- .read_alarm = cmos_read_alarm,
- .set_alarm = cmos_set_alarm,
- .proc = cmos_procfs,
- .irq_set_freq = cmos_irq_set_freq,
- .irq_set_state = cmos_irq_set_state,
- .alarm_irq_enable = cmos_alarm_irq_enable,
- .update_irq_enable = cmos_update_irq_enable,
+// .read_alarm = cmos_read_alarm,
+// .set_alarm = cmos_set_alarm,
+// .proc = cmos_procfs,
+// .irq_set_freq = cmos_irq_set_freq,
+// .irq_set_state = cmos_irq_set_state,
+// .alarm_irq_enable = cmos_alarm_irq_enable,
+// .update_irq_enable = cmos_update_irq_enable,
};

/*----------------------------------------------------------------*/






--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1321561946.25715.16.camel@work-vm">http://lists.debian.org/1321561946.25715.16.camel@work-vm
 
Old 11-17-2011, 10:42 PM
Jiri Polach
 
Default Bug#647095: CPU hyperthreading turned on after soft power-cycle

On 11/17/2011 9:32 PM, John Stultz wrote:

On Wed, 2011-11-16 at 23:49 +0100, Clarinet wrote:

Hi all,


Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
many of the topic branches merged in the 2.6.38 merge window work ok.
Some other topic branches do not boot at all.

Jiri: if you have gitk installed, then "git bisect visualize" can help
get a sense of what's in the middle of the regression range.
"gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
to find mainline commits to test before finding a topic branch to delve
into.


I have been able to narrow the interval manually a little bit from the
"top" (the bad side) and I will go on from the bottom now. However,
there seems to be a large area where kernels are unbootable for me -
they mostly stop when init is called and I do not know why.


Finally! After another 50+ compilations a have it! It took some time as
first I had to find a reason why some revisions did not boot (almost 2/3
were unbootable and the first bad commit was among them). Having this
solved I have been able to bisect without "skipping". The result is
surprising (at least for me) - believe it or not, the first bad commit
is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
John Stultz (I am sending him a copy of this message).

I would never expect this would be a problem, but my understanding of
this commit is very limited, so I am certainly missing the point.
However, I have tried to compile 2.6.38 (which was "bad") with "Real
Time Clock" configuration option turned off and it behaves "normally"
then (= is "good").


Huh. That's *very* odd. Is your system doing anything in-particular
with the RTC? I don't have a clue right off, so probably the next step


Yes, it is very odd. The system does not do anything special with RTC.
It is a diskless computational workstation.



is doing a bit of instrumentation to try to figure out where exactly we
trigger the behavior. Could you checkout commit 6610e089 and apply the
patch below to see if we can't narrow it down?


With the patch applied the system does not show the strange behavior (=
is "good").



Could you also send your .config to me?


Sure. It is attached. I have found that if I turn CONFIG_RTC_DRV_CMOS
off, the system behaves normally (= is "good") too.


Thank you.

Jiri Polach


thanks
-john

diff --git a/drivers/rtc/rtc-cmos.c b/drivers/rtc/rtc-cmos.c
index 5856167..d049344 100644
--- a/drivers/rtc/rtc-cmos.c
+++ b/drivers/rtc/rtc-cmos.c
@@ -497,13 +497,13 @@ static int cmos_procfs(struct device *dev, struct seq_file *seq)
static const struct rtc_class_ops cmos_rtc_ops = {
.read_time = cmos_read_time,
.set_time = cmos_set_time,
- .read_alarm = cmos_read_alarm,
- .set_alarm = cmos_set_alarm,
- .proc = cmos_procfs,
- .irq_set_freq = cmos_irq_set_freq,
- .irq_set_state = cmos_irq_set_state,
- .alarm_irq_enable = cmos_alarm_irq_enable,
- .update_irq_enable = cmos_update_irq_enable,
+// .read_alarm = cmos_read_alarm,
+// .set_alarm = cmos_set_alarm,
+// .proc = cmos_procfs,
+// .irq_set_freq = cmos_irq_set_freq,
+// .irq_set_state = cmos_irq_set_state,
+// .alarm_irq_enable = cmos_alarm_irq_enable,
+// .update_irq_enable = cmos_update_irq_enable,
};

/*----------------------------------------------------------------*/
 
Old 11-17-2011, 10:53 PM
John Stultz
 
Default Bug#647095: CPU hyperthreading turned on after soft power-cycle

On Fri, 2011-11-18 at 00:42 +0100, Jiri Polach wrote:
> On 11/17/2011 9:32 PM, John Stultz wrote:
> > On Wed, 2011-11-16 at 23:49 +0100, Clarinet wrote:
> >> Hi all,
> >>
> >>>> Result of bisecting: v2.6.38-rc1 exhibits the problem. v2.6.37 and
> >>>> many of the topic branches merged in the 2.6.38 merge window work ok.
> >>>> Some other topic branches do not boot at all.
> >>>>
> >>>> Jiri: if you have gitk installed, then "git bisect visualize" can help
> >>>> get a sense of what's in the middle of the regression range.
> >>>> "gitk --bisect --first-parent v2.6.37..v2.6.38-rc1" might be a good way
> >>>> to find mainline commits to test before finding a topic branch to delve
> >>>> into.
> >>>
> >>> I have been able to narrow the interval manually a little bit from the
> >>> "top" (the bad side) and I will go on from the bottom now. However,
> >>> there seems to be a large area where kernels are unbootable for me -
> >>> they mostly stop when init is called and I do not know why.
> >>
> >> Finally! After another 50+ compilations a have it! It took some time as
> >> first I had to find a reason why some revisions did not boot (almost 2/3
> >> were unbootable and the first bad commit was among them). Having this
> >> solved I have been able to bisect without "skipping". The result is
> >> surprising (at least for me) - believe it or not, the first bad commit
> >> is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
> >> John Stultz (I am sending him a copy of this message).
> >>
> >> I would never expect this would be a problem, but my understanding of
> >> this commit is very limited, so I am certainly missing the point.
> >> However, I have tried to compile 2.6.38 (which was "bad") with "Real
> >> Time Clock" configuration option turned off and it behaves "normally"
> >> then (= is "good").
> >
> > Huh. That's *very* odd. Is your system doing anything in-particular
> > with the RTC? I don't have a clue right off, so probably the next step
>
> Yes, it is very odd. The system does not do anything special with RTC.
> It is a diskless computational workstation.
>
> > is doing a bit of instrumentation to try to figure out where exactly we
> > trigger the behavior. Could you checkout commit 6610e089 and apply the
> > patch below to see if we can't narrow it down?
>
> With the patch applied the system does not show the strange behavior (=
> is "good").
>
> > Could you also send your .config to me?
>
> Sure. It is attached. I have found that if I turn CONFIG_RTC_DRV_CMOS
> off, the system behaves normally (= is "good") too.

Yea. My rough guess is that the BIOS is somehow sensitive to how the
CMOS RTC is touched.

Does disabling CONFIG_HPET_EMULATE_RTC change the behavior?

thanks
-john







--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 1321574019.25715.52.camel@work-vm">http://lists.debian.org/1321574019.25715.52.camel@work-vm
 
Old 11-21-2011, 12:27 PM
Jiri Polach
 
Default Bug#647095: CPU hyperthreading turned on after soft power-cycle

Finally! After another 50+ compilations a have it! It took some time as
first I had to find a reason why some revisions did not boot (almost 2/3
were unbootable and the first bad commit was among them). Having this
solved I have been able to bisect without "skipping". The result is
surprising (at least for me) - believe it or not, the first bad commit
is 6610e089 "RTC: Rework RTC code to use timerqueue for events" from
John Stultz (I am sending him a copy of this message).

I would never expect this would be a problem, but my understanding of
this commit is very limited, so I am certainly missing the point.
However, I have tried to compile 2.6.38 (which was "bad") with "Real
Time Clock" configuration option turned off and it behaves "normally"
then (= is "good").


Huh. That's *very* odd. Is your system doing anything in-particular
with the RTC? I don't have a clue right off, so probably the next step


Yes, it is very odd. The system does not do anything special with RTC.
It is a diskless computational workstation.


is doing a bit of instrumentation to try to figure out where exactly we
trigger the behavior. Could you checkout commit 6610e089 and apply the
patch below to see if we can't narrow it down?


With the patch applied the system does not show the strange behavior (=
is "good").


Could you also send your .config to me?


Sure. It is attached. I have found that if I turn CONFIG_RTC_DRV_CMOS
off, the system behaves normally (= is "good") too.


Yea. My rough guess is that the BIOS is somehow sensitive to how the
CMOS RTC is touched.

Does disabling CONFIG_HPET_EMULATE_RTC change the behavior?


But how do I do it? :-)

I have not found a way to disable it in "menuconfig". If I comment it
out manually in .config, it is automatically set back to "y" as soon as
compilation starts ...


Thanks,

Jiri





--
To UNSUBSCRIBE, email to debian-kernel-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 4ECA51B9.3010707@seznam.cz">http://lists.debian.org/4ECA51B9.3010707@seznam.cz
 

Thread Tools




All times are GMT. The time now is 02:02 PM.

VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org