FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Ubuntu > Ubuntu Server Development

 
 
LinkBack Thread Tools
 
Old 09-06-2008, 10:40 AM
Henri Cook
 
Default OS Reboot on NetworkFailure

Hi guys,

I'm using DRBD to mount a RAID1-over-net type drive, mounted at /shared.

If one of the machines is using the shared drive, say A (i.e. writing to
it) and B gets rebooted, A registers a 'networkfailure' and reboots
itself, hence the entire cluster is suddenly down.

There are no log entries describing the reboot that I can find at all.

What i'm wondering is - could DRBD be passing this networkfailure event
up to the kernel somehow and triggering a reboot - does a machine ever
auto-reboot on network failure? At the moment i'm looking at it as a
DRBD problem but didn't want to narrow my scope too early.

Thanks,

Henri

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 09-06-2008, 02:08 PM
Ante Karamatic
 
Default OS Reboot on NetworkFailure

On Sat, 06 Sep 2008 11:40:37 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote:

> What i'm wondering is - could DRBD be passing this networkfailure
> event up to the kernel somehow and triggering a reboot - does a
> machine ever auto-reboot on network failure? At the moment i'm
> looking at it as a DRBD problem but didn't want to narrow my scope
> too early.

DRBD doesn't do that. DRBD can detect that other machine is down, but
it doesn't do reboots. Whole purpose of DRBD is two keep filesystem
going on the other node

Unless you configured it to do reboots on network failure. Check
you /etc/drbd.conf

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 09-06-2008, 08:16 PM
Henri Cook
 
Default OS Reboot on NetworkFailure

Dear Ante,



That's what I thought; hence the reason I ask if some how passing it up
to the OS could trigger the reboot. In that case, is there any way you
suggest I could remotely debug the cause of this reboot that I can only
recreate in the circumstances described??



Thanks,



Henri



Ante Karamatic wrote:

On Sat, 06 Sep 2008 11:40:37 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote:



What i'm wondering is - could DRBD be passing this networkfailure
event up to the kernel somehow and triggering a reboot - does a
machine ever auto-reboot on network failure? At the moment i'm
looking at it as a DRBD problem but didn't want to narrow my scope
too early.



DRBD doesn't do that. DRBD can detect that other machine is down, but
it doesn't do reboots. Whole purpose of DRBD is two keep filesystem
going on the other node

Unless you configured it to do reboots on network failure. Check
you /etc/drbd.conf





--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 09-06-2008, 08:35 PM
Ante Karamatic
 
Default OS Reboot on NetworkFailure

On Sat, 06 Sep 2008 21:16:54 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote:

> That's what I thought; hence the reason I ask if some how passing it
> up to the OS could trigger the reboot. In that case, is there any way
> you suggest I could remotely debug the cause of this reboot that I
> can only recreate in the circumstances described??

Did you check /etc/drbd.conf? Is it a normal reboot or hard reset? If
it's a normal reboot, looking at syslog could help. Hard resets are
usually triggered by hardware problems.

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 09-06-2008, 08:42 PM
Henri Cook
 
Default OS Reboot on NetworkFailure

It appears at the moment to be a hard reset, there's nothing in syslog
that indicates a reboot has been triggered by any component. I don't
know for sure it's a hard reset though, is there a way for me to
determine whether the system is being gracefully rebooted that you can
think of?



Kern.log simply goes:



Sep* 6 21:19:20 torvil kernel: [ 8556.893048] ocfs2_dlm: Node 1 leaves
domain 2377CC24AA29499C9D058EF3610B5B97

Sep* 6 21:19:20 torvil kernel: [ 8556.893054] ocfs2_dlm: Nodes in
domain ("2377CC24AA29499C9D058EF3610B5B97"): 0

Sep* 6 21:19:20 torvil kernel: [ 8556.910512] o2net: no longer
connected to node Dean (num 1) at 10.0.0.3:7777

<bootup>

Sep* 6 21:20:43 torvil kernel: Inspecting
/boot/System.map-2.6.24-19-server

Sep* 6 21:20:43 torvil kernel: Loaded 28743 symbols from
/boot/System.map-2.6.24-19-server.

Sep* 6 21:20:43 torvil kernel: Symbols match kernel version 2.6.24.



Syslog goes:



Sep* 6 21:19:36 torvil pengine: [6116]: debug: native_assign_node: All
nodes for resource FTP:0 are unavailable, unclean or shutting down

Sep* 6 21:19:36 torvil pengine: [6116]: WARN: native_color: Resource
FTP:0 cannot run anywhere

Sep* 6 21:19:36 torvil pengine: [6116]: debug: clone_color: Allocated 1
ProFTPd instances of a possible 2

Sep* 6 21:19:36 torvil pengine: [6116]: notice: NoRoleChange: Leave
resource FTP:1^I(torvil)

Sep* 6 21:20:43 torvil syslogd 1.5.0#1ubuntu1: restart.



All of which looks fairly standard.



It registers in last as a 'crash':



root**** pts/0******* 85-191-213-65.be Sat Sep* 6 21:24** still logged
in**

reboot** system boot* 2.6.24-19-server Sat Sep* 6 21:20 - 21:41*
(00:20)***

pg ftpd17285*** 85-191-213-65.be Sat Sep* 6 21:13 - crash* (00:07)***

pg ftpd16973*** 85-191-213-65.be Sat Sep* 6 21:12 - crash* (00:08)



- Does this mean it's a kernel issue?



Thanks,



Henri



Ante Karamatic wrote:

On Sat, 06 Sep 2008 21:16:54 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote:



That's what I thought; hence the reason I ask if some how passing it
up to the OS could trigger the reboot. In that case, is there any way
you suggest I could remotely debug the cause of this reboot that I
can only recreate in the circumstances described??



Did you check /etc/drbd.conf? Is it a normal reboot or hard reset? If
it's a normal reboot, looking at syslog could help. Hard resets are
usually triggered by hardware problems.





--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 
Old 09-07-2008, 07:09 AM
Ante Karamatic
 
Default OS Reboot on NetworkFailure

On Sat, 06 Sep 2008 21:42:25 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote:

> - Does this mean it's a kernel issue?

Doesn't look like. I guess you have some cluster management software.
It's probably misconfigured or configured to reboot on node failure.
Pengine, whatever that is, warns that all nodes are shutting down.

Maybe some other node is killing your node (stonith). Whatever it is,
it's not related to DRBD and I really doubt it's related to kernel.

--
ubuntu-server mailing list
ubuntu-server@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server
More info: https://wiki.ubuntu.com/ServerTeam
 

Thread Tools




All times are GMT. The time now is 12:16 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org