FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 05-30-2008, 06:27 AM
Volkan YAZICI
 
Default Unknown Server Failure, Logs and openntpd

Hi,

This morning one of our R&D servers stop responding (no ssh, http) and
because of urgency of some tests I needed to hardware-reset it. After
machine woke up, I first checked /var/log/messages:

May 30 06:25:05 arge syslogd 1.4.1#18: restart.
May 30 06:49:46 arge -- MARK --
May 30 07:09:46 arge -- MARK --
May 30 07:29:47 arge -- MARK --
May 30 07:49:47 arge -- MARK --
May 30 08:09:47 arge -- MARK --
May 30 08:29:47 arge -- MARK --
May 30 08:44:36 arge kernel: e100: eth1: e100_watchdog: link down
May 30 08:44:38 arge kernel: e100: eth1: e100_watchdog: link up, 100Mbps, full-duplex
May 30 08:44:40 arge kernel: e100: eth1: e100_watchdog: link down
May 30 08:44:42 arge kernel: e100: eth1: e100_watchdog: link up, 100Mbps, full-duplex
May 30 08:45:14 arge shutdown[7450]: shutting down for system halt
May 30 08:38:11 arge syslogd 1.4.1#18: restart.
May 30 08:38:11 arge kernel: klogd 1.4.1#18, log source = /proc/kmsg started.
May 30 08:38:11 arge kernel: Linux version 2.6.18-6-686 (Debian 2.6.18.dfsg.1-18etch5) (dannf@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Sat May 24 10:24:42 UTC 2008

As can be understood from "kernel: e100: eth1: ..." lines, I first
suspected a connection failure and try to fiddle with the network cable
socket. But logs tell that it wasn't the problem. Moreover, it seems
that system was working properly just before 08:44:36 if we'd look at
/var/log/syslog

May 30 08:40:01 arge /USR/SBIN/CRON[6611]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
May 30 08:40:01 arge /USR/SBIN/CRON[6614]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
May 30 08:41:01 arge /USR/SBIN/CRON[6630]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
May 30 08:41:01 arge /USR/SBIN/CRON[6632]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
May 30 08:42:01 arge /USR/SBIN/CRON[6654]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
May 30 08:42:01 arge /USR/SBIN/CRON[6655]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
May 30 08:43:01 arge /USR/SBIN/CRON[7039]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
May 30 08:43:01 arge /USR/SBIN/CRON[7040]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
May 30 08:44:01 arge /USR/SBIN/CRON[7417]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
May 30 08:44:01 arge /USR/SBIN/CRON[7420]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)

I checked logs of every file under /var/log at time between 08:00:00 and
08:38:00, but found nothing useful. OTOH, if we'd look at below lines of
the /var/log/messages output:

May 30 08:45:14 arge shutdown[7450]: shutting down for system halt
May 30 08:38:11 arge syslogd 1.4.1#18: restart.

It seems that openntpd somehow failed to synchronize hardware clock with
the time it gathered from NTP servers, and after reboot it switched back
to a past time. Is this something expected? If not, how can I fix this?

To summarize, what else should I check to figure out the reason of the
emerged problem? (I'll try to login from terminal next time such a
failure repeats.)


Regards.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 05-31-2008, 02:52 AM
"Douglas A. Tutty"
 
Default Unknown Server Failure, Logs and openntpd

On Fri, May 30, 2008 at 09:27:51AM +0300, Volkan YAZICI wrote:
> This morning one of our R&D servers stop responding (no ssh, http) and
> because of urgency of some tests I needed to hardware-reset it. After
> machine woke up, I first checked /var/log/messages:
>
[snip most]
> May 30 08:09:47 arge -- MARK --
> May 30 08:29:47 arge -- MARK --
> May 30 08:44:36 arge kernel: e100: eth1: e100_watchdog: link down
> May 30 08:44:38 arge kernel: e100: eth1: e100_watchdog: link up, 100Mbps, full-duplex
> May 30 08:44:42 arge kernel: e100: eth1: e100_watchdog: link up, 100Mbps, full-duplex

> May 30 08:45:14 arge shutdown[7450]: shutting down for system halt
> May 30 08:38:11 arge syslogd 1.4.1#18: restart.
>
> As can be understood from "kernel: e100: eth1: ..." lines, I first
> suspected a connection failure and try to fiddle with the network cable
> socket. But logs tell that it wasn't the problem. Moreover, it seems
> that system was working properly just before 08:44:36 if we'd look at
> /var/log/syslog
>

[snip]
> I checked logs of every file under /var/log at time between 08:00:00 and
> 08:38:00, but found nothing useful. OTOH, if we'd look at below lines of
> the /var/log/messages output:
>
> May 30 08:45:14 arge shutdown[7450]: shutting down for system halt
> May 30 08:38:11 arge syslogd 1.4.1#18: restart.
>
> It seems that openntpd somehow failed to synchronize hardware clock with
> the time it gathered from NTP servers, and after reboot it switched back
> to a past time. Is this something expected? If not, how can I fix this?
>
> To summarize, what else should I check to figure out the reason of the
> emerged problem? (I'll try to login from terminal next time such a
> failure repeats.)

I don't know what caused the freeze; The hard reset would keep the
shutdown scripts from setting the system time to the hardware clock. On
restart, did the ntpd eventually get a network connection and fix the
time?

It may not have been a freeze at all, just a networking problem that
wasn't found by fitzing with the cable.

Logging in from a VT or serial terminal would have been helpful. If you
are concerned that this may happen again, you may even want to connect
up a serial console to another box (or a real serial VT) and watch that
as well.

Doug.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 
Old 06-02-2008, 10:11 AM
Volkan YAZICI
 
Default Unknown Server Failure, Logs and openntpd

On Fri, 30 May 2008, "Douglas A. Tutty" <dtutty@porchlight.ca> writes:
> I don't know what caused the freeze; The hard reset would keep the
> shutdown scripts from setting the system time to the hardware clock. On
> restart, did the ntpd eventually get a network connection and fix the
> time?

Yes. As soon as network interface woke up, openntpd clients started to
connect to ntp peers and fixed the time.

> It may not have been a freeze at all, just a networking problem that
> wasn't found by fitzing with the cable.
>
> Logging in from a VT or serial terminal would have been helpful. If you
> are concerned that this may happen again, you may even want to connect
> up a serial console to another box (or a real serial VT) and watch that
> as well.

That's exactly what I did. It seems to be the only feasible solution at
this time.


Regards.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
 

Thread Tools




All times are GMT. The time now is 05:22 AM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org