Linux Archive

Linux Archive (http://www.linux-archive.org/)
-   Ubuntu Server Development (http://www.linux-archive.org/ubuntu-server-development/)
-   -   OS Reboot on NetworkFailure (http://www.linux-archive.org/ubuntu-server-development/155159-os-reboot-networkfailure.html)

Henri Cook 09-06-2008 10:40 AM

OS Reboot on NetworkFailure
 
Hi guys,

I'm using DRBD to mount a RAID1-over-net type drive, mounted at /shared.

If one of the machines is using the shared drive, say A (i.e. writing to
it) and B gets rebooted, A registers a 'networkfailure' and reboots
itself, hence the entire cluster is suddenly down.

There are no log entries describing the reboot that I can find at all.

What i'm wondering is - could DRBD be passing this networkfailure event
up to the kernel somehow and triggering a reboot - does a machine ever
auto-reboot on network failure? At the moment i'm looking at it as a
DRBD problem but didn't want to narrow my scope too early.

Thanks,

Henri

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam

Ante Karamatic 09-06-2008 02:08 PM

OS Reboot on NetworkFailure
 
On Sat, 06 Sep 2008 11:40:37 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote:

> What i'm wondering is - could DRBD be passing this networkfailure
> event up to the kernel somehow and triggering a reboot - does a
> machine ever auto-reboot on network failure? At the moment i'm
> looking at it as a DRBD problem but didn't want to narrow my scope
> too early.

DRBD doesn't do that. DRBD can detect that other machine is down, but
it doesn't do reboots. Whole purpose of DRBD is two keep filesystem
going on the other node :)

Unless you configured it to do reboots on network failure. Check
you /etc/drbd.conf

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam

Henri Cook 09-06-2008 08:16 PM

OS Reboot on NetworkFailure
 
Dear Ante,



That's what I thought; hence the reason I ask if some how passing it up
to the OS could trigger the reboot. In that case, is there any way you
suggest I could remotely debug the cause of this reboot that I can only
recreate in the circumstances described??



Thanks,



Henri



Ante Karamatic wrote:

On Sat, 06 Sep 2008 11:40:37 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote:



What i'm wondering is - could DRBD be passing this networkfailure
event up to the kernel somehow and triggering a reboot - does a
machine ever auto-reboot on network failure? At the moment i'm
looking at it as a DRBD problem but didn't want to narrow my scope
too early.



DRBD doesn't do that. DRBD can detect that other machine is down, but
it doesn't do reboots. Whole purpose of DRBD is two keep filesystem
going on the other node :)

Unless you configured it to do reboots on network failure. Check
you /etc/drbd.conf





--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam

Ante Karamatic 09-06-2008 08:35 PM

OS Reboot on NetworkFailure
 
On Sat, 06 Sep 2008 21:16:54 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote:

> That's what I thought; hence the reason I ask if some how passing it
> up to the OS could trigger the reboot. In that case, is there any way
> you suggest I could remotely debug the cause of this reboot that I
> can only recreate in the circumstances described??

Did you check /etc/drbd.conf? Is it a normal reboot or hard reset? If
it's a normal reboot, looking at syslog could help. Hard resets are
usually triggered by hardware problems.

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam

Henri Cook 09-06-2008 08:42 PM

OS Reboot on NetworkFailure
 
It appears at the moment to be a hard reset, there's nothing in syslog
that indicates a reboot has been triggered by any component. I don't
know for sure it's a hard reset though, is there a way for me to
determine whether the system is being gracefully rebooted that you can
think of?



Kern.log simply goes:



Sep* 6 21:19:20 torvil kernel: [ 8556.893048] ocfs2_dlm: Node 1 leaves
domain 2377CC24AA29499C9D058EF3610B5B97

Sep* 6 21:19:20 torvil kernel: [ 8556.893054] ocfs2_dlm: Nodes in
domain ("2377CC24AA29499C9D058EF3610B5B97"): 0

Sep* 6 21:19:20 torvil kernel: [ 8556.910512] o2net: no longer
connected to node Dean (num 1) at 10.0.0.3:7777

<bootup>

Sep* 6 21:20:43 torvil kernel: Inspecting
/boot/System.map-2.6.24-19-server

Sep* 6 21:20:43 torvil kernel: Loaded 28743 symbols from
/boot/System.map-2.6.24-19-server.

Sep* 6 21:20:43 torvil kernel: Symbols match kernel version 2.6.24.



Syslog goes:



Sep* 6 21:19:36 torvil pengine: [6116]: debug: native_assign_node: All
nodes for resource FTP:0 are unavailable, unclean or shutting down

Sep* 6 21:19:36 torvil pengine: [6116]: WARN: native_color: Resource
FTP:0 cannot run anywhere

Sep* 6 21:19:36 torvil pengine: [6116]: debug: clone_color: Allocated 1
ProFTPd instances of a possible 2

Sep* 6 21:19:36 torvil pengine: [6116]: notice: NoRoleChange: Leave
resource FTP:1^I(torvil)

Sep* 6 21:20:43 torvil syslogd 1.5.0#1ubuntu1: restart.



All of which looks fairly standard.



It registers in last as a 'crash':



root**** pts/0******* 85-191-213-65.be Sat Sep* 6 21:24** still logged
in**

reboot** system boot* 2.6.24-19-server Sat Sep* 6 21:20 - 21:41*
(00:20)***

pg ftpd17285*** 85-191-213-65.be Sat Sep* 6 21:13 - crash* (00:07)***

pg ftpd16973*** 85-191-213-65.be Sat Sep* 6 21:12 - crash* (00:08)



- Does this mean it's a kernel issue?



Thanks,



Henri



Ante Karamatic wrote:

On Sat, 06 Sep 2008 21:16:54 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote:



That's what I thought; hence the reason I ask if some how passing it
up to the OS could trigger the reboot. In that case, is there any way
you suggest I could remotely debug the cause of this reboot that I
can only recreate in the circumstances described??



Did you check /etc/drbd.conf? Is it a normal reboot or hard reset? If
it's a normal reboot, looking at syslog could help. Hard resets are
usually triggered by hardware problems.





--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam

Ante Karamatic 09-07-2008 07:09 AM

OS Reboot on NetworkFailure
 
On Sat, 06 Sep 2008 21:42:25 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote:

> - Does this mean it's a kernel issue?

Doesn't look like. I guess you have some cluster management software.
It's probably misconfigured or configured to reboot on node failure.
Pengine, whatever that is, warns that all nodes are shutting down.

Maybe some other node is killing your node (stonith). Whatever it is,
it's not related to DRBD and I really doubt it's related to kernel.

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam


All times are GMT. The time now is 07:20 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.