OS Reboot on NetworkFailure
Hi guys,
I'm using DRBD to mount a RAID1-over-net type drive, mounted at /shared. If one of the machines is using the shared drive, say A (i.e. writing to it) and B gets rebooted, A registers a 'networkfailure' and reboots itself, hence the entire cluster is suddenly down. There are no log entries describing the reboot that I can find at all. What i'm wondering is - could DRBD be passing this networkfailure event up to the kernel somehow and triggering a reboot - does a machine ever auto-reboot on network failure? At the moment i'm looking at it as a DRBD problem but didn't want to narrow my scope too early. Thanks, Henri -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
OS Reboot on NetworkFailure
On Sat, 06 Sep 2008 11:40:37 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote: > What i'm wondering is - could DRBD be passing this networkfailure > event up to the kernel somehow and triggering a reboot - does a > machine ever auto-reboot on network failure? At the moment i'm > looking at it as a DRBD problem but didn't want to narrow my scope > too early. DRBD doesn't do that. DRBD can detect that other machine is down, but it doesn't do reboots. Whole purpose of DRBD is two keep filesystem going on the other node :) Unless you configured it to do reboots on network failure. Check you /etc/drbd.conf -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
OS Reboot on NetworkFailure
Dear Ante,
That's what I thought; hence the reason I ask if some how passing it up to the OS could trigger the reboot. In that case, is there any way you suggest I could remotely debug the cause of this reboot that I can only recreate in the circumstances described?? Thanks, Henri Ante Karamatic wrote: On Sat, 06 Sep 2008 11:40:37 +0100 Henri Cook <ubuntu-server@theplayboymansion.net> wrote: What i'm wondering is - could DRBD be passing this networkfailure event up to the kernel somehow and triggering a reboot - does a machine ever auto-reboot on network failure? At the moment i'm looking at it as a DRBD problem but didn't want to narrow my scope too early. DRBD doesn't do that. DRBD can detect that other machine is down, but it doesn't do reboots. Whole purpose of DRBD is two keep filesystem going on the other node :) Unless you configured it to do reboots on network failure. Check you /etc/drbd.conf -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
OS Reboot on NetworkFailure
On Sat, 06 Sep 2008 21:16:54 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote: > That's what I thought; hence the reason I ask if some how passing it > up to the OS could trigger the reboot. In that case, is there any way > you suggest I could remotely debug the cause of this reboot that I > can only recreate in the circumstances described?? Did you check /etc/drbd.conf? Is it a normal reboot or hard reset? If it's a normal reboot, looking at syslog could help. Hard resets are usually triggered by hardware problems. -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
OS Reboot on NetworkFailure
It appears at the moment to be a hard reset, there's nothing in syslog
that indicates a reboot has been triggered by any component. I don't know for sure it's a hard reset though, is there a way for me to determine whether the system is being gracefully rebooted that you can think of? Kern.log simply goes: Sep* 6 21:19:20 torvil kernel: [ 8556.893048] ocfs2_dlm: Node 1 leaves domain 2377CC24AA29499C9D058EF3610B5B97 Sep* 6 21:19:20 torvil kernel: [ 8556.893054] ocfs2_dlm: Nodes in domain ("2377CC24AA29499C9D058EF3610B5B97"): 0 Sep* 6 21:19:20 torvil kernel: [ 8556.910512] o2net: no longer connected to node Dean (num 1) at 10.0.0.3:7777 <bootup> Sep* 6 21:20:43 torvil kernel: Inspecting /boot/System.map-2.6.24-19-server Sep* 6 21:20:43 torvil kernel: Loaded 28743 symbols from /boot/System.map-2.6.24-19-server. Sep* 6 21:20:43 torvil kernel: Symbols match kernel version 2.6.24. Syslog goes: Sep* 6 21:19:36 torvil pengine: [6116]: debug: native_assign_node: All nodes for resource FTP:0 are unavailable, unclean or shutting down Sep* 6 21:19:36 torvil pengine: [6116]: WARN: native_color: Resource FTP:0 cannot run anywhere Sep* 6 21:19:36 torvil pengine: [6116]: debug: clone_color: Allocated 1 ProFTPd instances of a possible 2 Sep* 6 21:19:36 torvil pengine: [6116]: notice: NoRoleChange: Leave resource FTP:1^I(torvil) Sep* 6 21:20:43 torvil syslogd 1.5.0#1ubuntu1: restart. All of which looks fairly standard. It registers in last as a 'crash': root**** pts/0******* 85-191-213-65.be Sat Sep* 6 21:24** still logged in** reboot** system boot* 2.6.24-19-server Sat Sep* 6 21:20 - 21:41* (00:20)*** pg ftpd17285*** 85-191-213-65.be Sat Sep* 6 21:13 - crash* (00:07)*** pg ftpd16973*** 85-191-213-65.be Sat Sep* 6 21:12 - crash* (00:08) - Does this mean it's a kernel issue? Thanks, Henri Ante Karamatic wrote: On Sat, 06 Sep 2008 21:16:54 +0100 Henri Cook <ubuntu-server@theplayboymansion.net> wrote: That's what I thought; hence the reason I ask if some how passing it up to the OS could trigger the reboot. In that case, is there any way you suggest I could remotely debug the cause of this reboot that I can only recreate in the circumstances described?? Did you check /etc/drbd.conf? Is it a normal reboot or hard reset? If it's a normal reboot, looking at syslog could help. Hard resets are usually triggered by hardware problems. -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
OS Reboot on NetworkFailure
On Sat, 06 Sep 2008 21:42:25 +0100
Henri Cook <ubuntu-server@theplayboymansion.net> wrote: > - Does this mean it's a kernel issue? Doesn't look like. I guess you have some cluster management software. It's probably misconfigured or configured to reboot on node failure. Pengine, whatever that is, warns that all nodes are shutting down. Maybe some other node is killing your node (stonith). Whatever it is, it's not related to DRBD and I really doubt it's related to kernel. -- ubuntu-server mailing list ubuntu-server@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-server More info: https://wiki.ubuntu.com/ServerTeam |
| All times are GMT. The time now is 09:31 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.