FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > ArchLinux > ArchLinux General Discussion

 
 
LinkBack Thread Tools
 
Old 11-10-2011, 06:40 PM
"David J. Haines"
 
Default linux 3.1-4 - two i686 lockups after ~ 5 hours of operations. two x86_64 seem OK

On Thu, Nov 10, 2011 at 2:28 PM, C Anthony Risinger <anthony@xtfx.me> wrote:
> On Thu, Nov 10, 2011 at 1:16 PM, David C. Rankin
> <drankinatty@suddenlinkmail.com> wrote:
>>
>> *Richard, David - check your hardware clock "# hwclock -r" and compare that
>> to the time returned by "# date". If they are hours apart, then make sure
>> your sysclock is correct and set the hardware clock to your sysclock with "#
>> hwclock -w". Worth checking regardless. *I know this used to be done on boot
>> or shutdown and I don't know why it isn't anymore. I'll do some more
>> digging.
>
> your machine reboots because of a drifting clock? *i don't understand.
>
> aren't you running ntpd (not openntpd)? <---- *HINT* *HINT*, if not ;-)
>
> --
>
> C Anthony
>

My clock is fine, as far as I can tell.

David J. Haines
dhaines@gmail.com
 
Old 11-10-2011, 06:44 PM
Leonid Isaev
 
Default linux 3.1-4 - two i686 lockups after ~ 5 hours of operations. two x86_64 seem OK

On (11/10/11 13:16), David C. Rankin wrote:
-~> Hmm.. Absolutely no help from the logs on the box that locked:
-~>
-~> Nov 10 03:20:04 phoenix -- MARK --
-~> Nov 10 03:25:34 phoenix dhcpd: DHCPREQUEST for 192.168.7.124 from
-~> 00:11:43:22:50:08 via eth0
-~> Nov 10 03:25:34 phoenix dhcpd: DHCPACK on 192.168.7.124 to
-~> 00:11:43:22:50:08 via eth0
-~> Nov 10 12:44:33 phoenix kernel: [ 0.000000] Initializing cgroup subsys cpuset
-~> Nov 10 12:44:33 phoenix kernel: [ 0.000000] Initializing cgroup subsys cpu
-~>
-~> Obviously something occurred after 03:25:34, but no indication
-~> of what. The second box I lost and thought was locked, wasn't
-~> locked, I just had the uncanny coincidence of trying it during one
-~> of its spontaneous reboots due to hwclock drift (I'll create a
-~> cron job to update this). The boxes are on the same LAN subnet.
-~> The only SWAG I have is that once the box with the drifting clock
-~> got far enough out of time any net communications with the box
-~> that locked may have caused it to panic over the time sync issue.
-~>
-~> (but that is wrong because once running, the sysclock is the only
-~> clock that matters - right? But that can't be all wrong, otherwise
-~> there is no explanation for the spontaneous reboot due to clock
-~> drift. A digital paradox so to speak
-~>
-~> Richard, David - check your hardware clock "# hwclock -r" and
-~> compare that to the time returned by "# date". If they are hours
-~> apart, then make sure your sysclock is correct and set the
-~> hardware clock to your sysclock with "# hwclock -w". Worth
-~> checking regardless. I know this used to be done on boot or
-~> shutdown and I don't know why it isn't anymore. I'll do some more
-~> digging.
-~>
-~> --
-~> David C. Rankin, J.D.,P.E.

Regarding logs, I would look into dmesg.log and probably append loglevel=7 (or
debug) to the kernel command line.

If your machine crashes because of rtc, your motherboard is junk -- get rid of
it.

--
Leonid Isaev
GnuPG key ID: 164B5A6D
Key fingerprint: C0DF 20D0 C075 C3F1 E1BE 775A A7AE F6CB 164B 5A6D
 
Old 11-10-2011, 06:45 PM
Richard Schütz
 
Default linux 3.1-4 - two i686 lockups after ~ 5 hours of operations. two x86_64 seem OK

Am 10.11.2011 20:16, schrieb David C. Rankin:

Richard, David - check your hardware clock "# hwclock -r" and compare
that to the time returned by "# date". If they are hours apart, then
make sure your sysclock is correct and set the hardware clock to your
sysclock with "# hwclock -w". Worth checking regardless. I know this
used to be done on boot or shutdown and I don't know why it isn't
anymore. I'll do some more digging.


I'm running ntpd on all machines, so sysclock and hwclock are almost
perfect in sync. If the issue is clock-related it would be rather the
fault of ntpd and adjtimex slewing the time than the difference between
sysclock and hwclock.


--
Regards,
Richard Schütz
 
Old 11-10-2011, 06:55 PM
Mauro Santos
 
Default linux 3.1-4 - two i686 lockups after ~ 5 hours of operations. two x86_64 seem OK

On 10-11-2011 19:16, David C. Rankin wrote:


Richard, David - check your hardware clock "# hwclock -r" and compare
that to the time returned by "# date". If they are hours apart, then
make sure your sysclock is correct and set the hardware clock to your
sysclock with "# hwclock -w". Worth checking regardless. I know this
used to be done on boot or shutdown and I don't know why it isn't
anymore. I'll do some more digging.


You should take into account that 'hwclock -r' and 'date' might return
different times and things will still be ok, it all depends on if you
have the clock set to UTC or localtime and your timezone. The man page
says there is some autodetection logic but as with all things it can fail.


--
Mauro Santos
 
Old 11-10-2011, 07:19 PM
"David C. Rankin"
 
Default linux 3.1-4 - two i686 lockups after ~ 5 hours of operations. two x86_64 seem OK

On 11/10/2011 01:28 PM, C Anthony Risinger wrote:

On Thu, Nov 10, 2011 at 1:16 PM, David C. Rankin
<drankinatty@suddenlinkmail.com> wrote:


Richard, David - check your hardware clock "# hwclock -r" and compare that
to the time returned by "# date". If they are hours apart, then make sure
your sysclock is correct and set the hardware clock to your sysclock with "#
hwclock -w". Worth checking regardless. I know this used to be done on boot
or shutdown and I don't know why it isn't anymore. I'll do some more
digging.


your machine reboots because of a drifting clock? i don't understand.

aren't you running ntpd (not openntpd)?<---- *HINT* *HINT*, if not ;-)



Yes, I'm running ntpd and yest I'm saying that my box reboots due to clock
drift. Check out this bizarre log entry. Yes, this is the actual order of the log:


Nov 10 05:12:41 providence kernel: [ 1.649918] rtc_cmos 00:05: setting system
clock to 2011-11-10 11:12:27 UTC (1320923547)


<snip>
Nov 10 05:12:55 providence ntpd[829]: ntpd 4.2.6p4@1.2324-o Sun Nov 6 05:50:06
UTC 2011 (1)

Nov 10 05:12:56 providence ntpd[864]: proto: precision = 0.832 usec
Nov 10 05:12:56 providence kernel: [ 30.360065] NET: Registered protocol family 10
Nov 10 05:12:56 providence ntpd[864]: ntp_io: estimated max descriptors: 1024,
initial socket boundary: 16
Nov 10 05:12:56 providence ntpd[864]: Listen and drop on 0 v4wildcard 0.0.0.0
UDP 123

Nov 10 05:12:56 providence ntpd[864]: Listen and drop on 1 v6wildcard :: UDP 123
Nov 10 05:12:56 providence ntpd[864]: Listen normally on 2 lo 127.0.0.1 UDP 123
Nov 10 05:12:56 providence ntpd[864]: Listen normally on 3 eth0 192.168.7.124
UDP 123

Nov 10 05:12:56 providence ntpd[864]: Listen normally on 4 lo ::1 UDP 123
Nov 10 05:12:56 providence ntpd[864]: peers refreshed
Nov 10 05:12:56 providence ntpd[864]: Listening on routing socket on fd #21 for
interface updates
Nov 10 05:12:57 providence apcupsd[867]: apcupsd 3.14.10 (13 September 2011)
unknown startup succeeded

Nov 10 05:12:57 providence apcupsd[867]: NIS server startup succeeded
Nov 10 05:12:58 providence ntpd[864]: Listen normally on 5 eth0
fe80::211:43ff:fe22:5008 UDP 123

Nov 10 05:12:58 providence ntpd[864]: peers refreshed
Nov 10 05:12:58 providence ntpd[864]: new interface(s) found: waking up resolver

<snip>
Nov 10 05:14:02 providence dbus[717]: [system] Successfully activated service
'org.freedesktop.PolicyKit1'
Nov 10 05:14:02 providence dbus[717]: [system] Successfully activated service
'org.freedesktop.ConsoleKit'

Nov 9 15:29:01 providence crond[859]: time disparity of -827 minutes detected
Nov 9 15:32:24 providence crond[19989]: mailing cron output for user root job
sys-daily


Huh?? The system jumped backwards? Whatever is causing this to occur is causing
the spontaneous reboot. Taking a linux system forward in time is OK, but taking
it backwards in time really really causes things to go haywire. The hwclock
doesn't seem to drift that much, so I don't know what the issue is. I set the
thing about 3 hours ago and there is no drift:


[14:16 providence:/home/david/tmp] # hwclock -r; date
Thu 10 Nov 2011 02:17:44 PM CST -0.125494 seconds
Thu Nov 10 14:17:44 CST 2011

Something is up though, but I can't explain it.

--
David C. Rankin, J.D.,P.E.
 
Old 11-10-2011, 07:44 PM
"David C. Rankin"
 
Default linux 3.1-4 - two i686 lockups after ~ 5 hours of operations. two x86_64 seem OK

On 11/10/2011 01:55 PM, Mauro Santos wrote:

On 10-11-2011 19:16, David C. Rankin wrote:


Richard, David - check your hardware clock "# hwclock -r" and compare
that to the time returned by "# date". If they are hours apart, then
make sure your sysclock is correct and set the hardware clock to your
sysclock with "# hwclock -w". Worth checking regardless. I know this
used to be done on boot or shutdown and I don't know why it isn't
anymore. I'll do some more digging.


You should take into account that 'hwclock -r' and 'date' might return different
times and things will still be ok, it all depends on if you have the clock set
to UTC or localtime and your timezone. The man page says there is some
autodetection logic but as with all things it can fail.



True, hwclock always returns time in 'localtime' as does 'date'. Both also
provide the '-u' option to return UTC. This box has the hwclock set to localtime
because it dual-boots with M$. Come to think about it, it is one of my only
boxes that is dual-boot. I wonder if the rtc set to localtime may be uncovering
a regression that is causing this strange behavior, because honestly I can't
explain jumping backwards in time over 13.75 hours with ntp running??


--
David C. Rankin, J.D.,P.E.
 
Old 11-10-2011, 07:47 PM
Leonid Isaev
 
Default linux 3.1-4 - two i686 lockups after ~ 5 hours of operations. two x86_64 seem OK

On (11/10/11 14:19), David C. Rankin wrote:
-~> On 11/10/2011 01:28 PM, C Anthony Risinger wrote:
-~> >On Thu, Nov 10, 2011 at 1:16 PM, David C. Rankin
-~> ><drankinatty@suddenlinkmail.com> wrote:
-~> >>
-~> >> Richard, David - check your hardware clock "# hwclock -r" and compare that
-~> >>to the time returned by "# date". If they are hours apart, then make sure
-~> >>your sysclock is correct and set the hardware clock to your sysclock with "#
-~> >>hwclock -w". Worth checking regardless. I know this used to be done on boot
-~> >>or shutdown and I don't know why it isn't anymore. I'll do some more
-~> >>digging.
-~> >
-~> >your machine reboots because of a drifting clock? i don't understand.
-~> >
-~> >aren't you running ntpd (not openntpd)?<---- *HINT* *HINT*, if not ;-)
-~> >
-~>
-~> Yes, I'm running ntpd and yest I'm saying that my box reboots due
-~> to clock drift. Check out this bizarre log entry. Yes, this is the
-~> actual order of the log:
-~>
-~> Nov 10 05:12:41 providence kernel: [ 1.649918] rtc_cmos 00:05:
-~> setting system clock to 2011-11-10 11:12:27 UTC (1320923547)
-~>
-~> <snip>
-~> Nov 10 05:12:55 providence ntpd[829]: ntpd 4.2.6p4@1.2324-o Sun
-~> Nov 6 05:50:06 UTC 2011 (1)
-~> Nov 10 05:12:56 providence ntpd[864]: proto: precision = 0.832 usec
-~> Nov 10 05:12:56 providence kernel: [ 30.360065] NET: Registered protocol family 10
-~> Nov 10 05:12:56 providence ntpd[864]: ntp_io: estimated max
-~> descriptors: 1024, initial socket boundary: 16
-~> Nov 10 05:12:56 providence ntpd[864]: Listen and drop on 0
-~> v4wildcard 0.0.0.0 UDP 123
-~> Nov 10 05:12:56 providence ntpd[864]: Listen and drop on 1 v6wildcard :: UDP 123
-~> Nov 10 05:12:56 providence ntpd[864]: Listen normally on 2 lo 127.0.0.1 UDP 123
-~> Nov 10 05:12:56 providence ntpd[864]: Listen normally on 3 eth0
-~> 192.168.7.124 UDP 123
-~> Nov 10 05:12:56 providence ntpd[864]: Listen normally on 4 lo ::1 UDP 123
-~> Nov 10 05:12:56 providence ntpd[864]: peers refreshed
-~> Nov 10 05:12:56 providence ntpd[864]: Listening on routing socket
-~> on fd #21 for interface updates
-~> Nov 10 05:12:57 providence apcupsd[867]: apcupsd 3.14.10 (13
-~> September 2011) unknown startup succeeded
-~> Nov 10 05:12:57 providence apcupsd[867]: NIS server startup succeeded
-~> Nov 10 05:12:58 providence ntpd[864]: Listen normally on 5 eth0
-~> fe80::211:43ff:fe22:5008 UDP 123
-~> Nov 10 05:12:58 providence ntpd[864]: peers refreshed
-~> Nov 10 05:12:58 providence ntpd[864]: new interface(s) found: waking up resolver
-~>
-~> <snip>
-~> Nov 10 05:14:02 providence dbus[717]: [system] Successfully
-~> activated service 'org.freedesktop.PolicyKit1'
-~> Nov 10 05:14:02 providence dbus[717]: [system] Successfully
-~> activated service 'org.freedesktop.ConsoleKit'
-~> Nov 9 15:29:01 providence crond[859]: time disparity of -827 minutes detected
-~> Nov 9 15:32:24 providence crond[19989]: mailing cron output for
-~> user root job sys-daily
-~>
-~> Huh?? The system jumped backwards? Whatever is causing this to
-~> occur is causing the spontaneous reboot. Taking a linux system
-~> forward in time is OK, but taking it backwards in time really
-~> really causes things to go haywire. The hwclock doesn't seem to
-~> drift that much, so I don't know what the issue is. I set the
-~> thing about 3 hours ago and there is no drift:
-~>
-~> [14:16 providence:/home/david/tmp] # hwclock -r; date
-~> Thu 10 Nov 2011 02:17:44 PM CST -0.125494 seconds
-~> Thu Nov 10 14:17:44 CST 2011
-~>
-~> Something is up though, but I can't explain it.
-~>
-~> --
-~> David C. Rankin, J.D.,P.E.

OK. On top of my head I would suggest:
1. Play with clocksource (see kernel-parameters.txt).
2. Add "-ddd" to /etc/conf.d/ntpd.conf's NTPD_ARGS variable.
3. See this http://twiki.ntp.org/bin/view/Support/KnownHardwareIssues (might
need to disable ntpd).
4. Try community/chrony.

--
Leonid Isaev
GnuPG key ID: 164B5A6D
Key fingerprint: C0DF 20D0 C075 C3F1 E1BE 775A A7AE F6CB 164B 5A6D
 
Old 11-10-2011, 07:58 PM
Mauro Santos
 
Default linux 3.1-4 - two i686 lockups after ~ 5 hours of operations. two x86_64 seem OK

On 10-11-2011 20:44, David C. Rankin wrote:

On 11/10/2011 01:55 PM, Mauro Santos wrote:

On 10-11-2011 19:16, David C. Rankin wrote:


Richard, David - check your hardware clock "# hwclock -r" and compare
that to the time returned by "# date". If they are hours apart, then
make sure your sysclock is correct and set the hardware clock to your
sysclock with "# hwclock -w". Worth checking regardless. I know this
used to be done on boot or shutdown and I don't know why it isn't
anymore. I'll do some more digging.


You should take into account that 'hwclock -r' and 'date' might return
different
times and things will still be ok, it all depends on if you have the
clock set
to UTC or localtime and your timezone. The man page says there is some
autodetection logic but as with all things it can fail.



True, hwclock always returns time in 'localtime' as does 'date'. Both
also provide the '-u' option to return UTC. This box has the hwclock set
to localtime because it dual-boots with M$. Come to think about it, it
is one of my only boxes that is dual-boot. I wonder if the rtc set to
localtime may be uncovering a regression that is causing this strange
behavior, because honestly I can't explain jumping backwards in time
over 13.75 hours with ntp running??



I thought hwclock would return the time set in the CMOS clock, which
should be set to UTC (if you set HARDWARECLOCK="UTC" in /etc/rc.conf)
and date would return localtime due to taking the timezone setting into
account. That is why I said they could be different but maybe I'm
looking at it in the wrong way. If everything is set to localtime then
both hwclock and date should return the same time.


--
Mauro Santos
 
Old 11-10-2011, 07:59 PM
C Anthony Risinger
 
Default linux 3.1-4 - two i686 lockups after ~ 5 hours of operations. two x86_64 seem OK

On Nov 10, 2011 2:44 PM, "David C. Rankin" <drankinatty@suddenlinkmail.com>
wrote:
>
> On 11/10/2011 01:55 PM, Mauro Santos wrote:
>>
>> On 10-11-2011 19:16, David C. Rankin wrote:
>>>
>>>
>>> Richard, David - check your hardware clock "# hwclock -r" and compare
>>> that to the time returned by "# date". If they are hours apart, then
>>> make sure your sysclock is correct and set the hardware clock to your
>>> sysclock with "# hwclock -w". Worth checking regardless. I know this
>>> used to be done on boot or shutdown and I don't know why it isn't
>>> anymore. I'll do some more digging.
>>
>>
>> You should take into account that 'hwclock -r' and 'date' might return
different
>> times and things will still be ok, it all depends on if you have the
clock set
>> to UTC or localtime and your timezone. The man page says there is some
>> autodetection logic but as with all things it can fail.
>>
>
> True, hwclock always returns time in 'localtime' as does 'date'. Both
also provide the '-u' option to return UTC. This box has the hwclock set to
localtime because it dual-boots with M$. Come to think about it, it is one
of my only boxes that is dual-boot. I wonder if the rtc set to localtime
may be uncovering a regression that is causing this strange behavior,
because honestly I can't explain jumping backwards in time over 13.75 hours
with ntp running??

Yeah I'm really not sure about the jump, ntp should be logging any changes
anyway (though I believe it will not change the time if greater than some
threshold)

... however, some time ago it was announced that localtime is no longer
supported for a variety of well-known and good reasons. Windows just needs
a small tweak (via registry IIRC) and it will behave. I would recommended
switching to UTC hardware clock, and making said change.

C Anthony [mobile]
 
Old 11-10-2011, 08:30 PM
"David C. Rankin"
 
Default linux 3.1-4 - two i686 lockups after ~ 5 hours of operations. two x86_64 seem OK

On 11/10/2011 02:47 PM, Leonid Isaev wrote:

OK. On top of my head I would suggest:
1. Play with clocksource (see kernel-parameters.txt).
2. Add "-ddd" to /etc/conf.d/ntpd.conf's NTPD_ARGS variable.
3. See thishttp://twiki.ntp.org/bin/view/Support/KnownHardwareIssues (might
need to disable ntpd).
4. Try community/chrony.


Thank you a lot Leonid. That will get me started.

--
David C. Rankin, J.D.,P.E.
 

Thread Tools




All times are GMT. The time now is 04:30 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2007 - 2008, www.linux-archive.org