On 11/10/2011 12:56 PM, David J. Haines wrote:
On Thu, Nov 10, 2011 at 1:44 PM, Richard Schütz<firstname.lastname@example.org> wrote:
Am 10.11.2011 18:47, schrieb David C. Rankin:
Upgraded 5 i686 boxes and 2 x86_64 boxes to linux 3.1-4 yesterday night.
This morning, one i686 server is dead, other i686 box responded to xterm
(return input) and then locked (ssh connection was left up after login
to confirm reboot). Two other i686 boxes (under no load) still running.
The boxes are remote. I'll pull the logs when I get to the site and
send. Anybody else seeing this with linux 3.1-4?
I had lockups on my notebook  and netbook  during normal usage. Both
have a Intel processor. The AMD based desktop machine had no problems so
far. All systems are running linux 3.1-4 x86_64.
I'm getting lockups on an i5 box with Intel graphics running x86_64
while I'm using it. This has been happening while I've been using the
computer and has been happening since 3.0.7-1. 3.0.6-2, however,
seemed perfectly fine.
David J. Haines
Hmm.. Absolutely no help from the logs on the box that locked:
Nov 10 03:20:04 phoenix -- MARK --
Nov 10 03:25:34 phoenix dhcpd: DHCPREQUEST for 192.168.7.124 from
00:11:43:22:50:08 via eth0
Nov 10 03:25:34 phoenix dhcpd: DHCPACK on 192.168.7.124 to 00:11:43:22:50:08 via
Nov 10 12:44:33 phoenix kernel: [ 0.000000] Initializing cgroup subsys cpuset
Nov 10 12:44:33 phoenix kernel: [ 0.000000] Initializing cgroup subsys cpu
Obviously something occurred after 03:25:34, but no indication of what. The
second box I lost and thought was locked, wasn't locked, I just had the uncanny
coincidence of trying it during one of its spontaneous reboots due to hwclock
drift (I'll create a cron job to update this). The boxes are on the same LAN
subnet. The only SWAG I have is that once the box with the drifting clock got
far enough out of time any net communications with the box that locked may have
caused it to panic over the time sync issue.
(but that is wrong because once running, the sysclock is the only clock that
matters - right? But that can't be all wrong, otherwise there is no explanation
for the spontaneous reboot due to clock drift. A digital paradox so to speak
Richard, David - check your hardware clock "# hwclock -r" and compare that to
the time returned by "# date". If they are hours apart, then make sure your
sysclock is correct and set the hardware clock to your sysclock with "# hwclock
-w". Worth checking regardless. I know this used to be done on boot or shutdown
and I don't know why it isn't anymore. I'll do some more digging.
David C. Rankin, J.D.,P.E.