RHEL 5.5 Oracle RAC cluster resbooted due to processor hung!!
Hi,
I have raised this question with redhat support as well. Just want to collect your thoughts on the below issue. ---- *Platform: RHEL 5.5 * *Arch: 64 bit, Running Oracle RAC 11gr2 (2 Node cluster)* *Problem Description: Node 2 of the cluster got rebooted. The reboot process was initiated by Oracle due to unknown reasons. /var/log/messages show that the processor was hung for 10 seconds (Please see the logs below). What could be the cause of this??* Jun 10 19:22:04 prddbs02 snmpd[5158]: Received SNMP packet(s) from UDP: [127.0.0.1]:17955 Jun 10 19:22:34 prddbs02 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jun 10 19:22:34 prddbs02 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it Jun 10 19:22:34 prddbs02 kernel: bonding: bond0: making interface eth2 the new active one. Jun 10 19:22:34 prddbs02 kernel: device eth2 entered promiscuous mode Jun 10 19:22:46 prddbs02 kernel: BUG: soft lockup - CPU#2 stuck for 10s! [multipathd:5060] Jun 10 19:22:46 prddbs02 kernel: CPU 2: Jun 10 19:22:46 prddbs02 kernel: Modules linked in: oracleacfs(PFU) oracleadvm(PFU) oracleoks(PU) autofs4 hidp smbus(U) ipmi_devintf ipmi_si ipmi_msghandler rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table bonding dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev sr_mod cdrom i2c_i801 igb pcspkr i2c_core 8021q e1000e dca sg dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod lpfc(U) scsi_transport_fc ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Jun 10 19:22:46 prddbs02 kernel: Pid: 5060, comm: multipathd Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:22:46 prddbs02 kernel: RIP: 0010:[<ffffffff8007767a>] [<ffffffff8007767a>] __smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP: 0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel: Pid: 5060, comm: multipathd Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:22:46 prddbs02 kernel: RIP: 0010:[<ffffffff8007767a>] [<ffffffff8007767a>] __smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP: 0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel: RAX: 0000000000000006 RBX: 0000000000000007 RCX: 0000000000000000 Jun 10 19:22:46 prddbs02 kernel: RDX: 00000000000000ff RSI: 00000000000000ff RDI: 00000000000000c0 Jun 10 19:22:46 prddbs02 kernel: RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000038 Jun 10 19:22:46 prddbs02 kernel: R10: ffff8108e79a5b98 R11: 0000000000000000 R12: ffffffff80143e16 Jun 10 19:22:46 prddbs02 kernel: R13: 0000000000000003 R14: ffff810366ec2c58 R15: ffff81093da13340 Jun 10 19:22:46 prddbs02 kernel: FS: 000000004189d940(0063) GS:ffff81012071cec0(0000) knlGS:0000000000000000 Jun 10 19:22:46 prddbs02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jun 10 19:22:46 prddbs02 kernel: CR2: 00002aaaac004000 CR3: 0000000928447000 CR4: 00000000000006e0 Jun 10 19:22:46 prddbs02 kernel: Jun 10 19:22:46 prddbs02 kernel: Call Trace: Jun 10 19:22:46 prddbs02 kernel: [<ffffffff8007754d>] do_flush_tlb_all+0x0/0x6a Jun 10 19:22:46 prddbs02 kernel: [<ffffffff8007754d>] do_flush_tlb_all+0x0/0x6a Jun 10 19:22:46 prddbs02 kernel: [<ffffffff80077778>] smp_call_function_many+0x38/0x4c Jun 10 19:22:46 prddbs02 kernel: [<ffffffff8007754d>] do_flush_tlb_all+0x0/0x6a Jun 10 19:22:46 prddbs02 kernel: [<ffffffff80077869>] smp_call_function+0x4e/0x5e Jun 10 19:22:46 prddbs02 kernel: [<ffffffff8007754d>] do_flush_tlb_all+0x0/0x6a Jun 10 19:22:46 prddbs02 kernel: [<ffffffff881fcb28>] :dm_mod:dev_status+0x0/0x38 Jun 10 19:22:46 prddbs02 kernel: [<ffffffff800958c1>] on_each_cpu+0x10/0x22 Jun 10 19:22:46 prddbs02 kernel: [<ffffffff800d2017>] __remove_vm_area+0x2b/0x42 Jun 10 19:22:46 prddbs02 kernel: [<ffffffff800d2046>] remove_vm_area+0x18/0x25 Jun 10 19:22:46 prddbs02 kernel: [<ffffffff800d209a>] __vunmap+0x47/0xed Jun 10 19:22:46 prddbs02 kernel: [<ffffffff881fdeff>] :dm_mod:ctl_ioctl+0x237/0x25b Jun 10 19:22:46 prddbs02 kernel: [<ffffffff800424bd>] do_ioctl+0x55/0x6b Jun 10 19:22:46 prddbs02 kernel: [<ffffffff800304d6>] vfs_ioctl+0x457/0x4b9 Jun 10 19:22:46 prddbs02 kernel: [<ffffffff8000d3e9>] dput+0x2c/0x114 Jun 10 19:22:46 prddbs02 kernel: [<ffffffff8004cbb7>] sys_ioctl+0x59/0x78 Jun 10 19:22:46 prddbs02 kernel: [<ffffffff8005e116>] system_call+0x7e/0x83 Jun 10 19:22:46 prddbs02 kernel: Jun 10 19:23:04 prddbs02 kernel: BUG: soft lockup - CPU#4 stuck for 10s! [eecd:8758] Jun 10 19:23:04 prddbs02 kernel: CPU 4: Jun 10 19:23:04 prddbs02 kernel: Modules linked in: oracleacfs(PFU) oracleadvm(PFU) oracleoks(PU) autofs4 hidp smbus(U) ipmi_devintf ipmi_si ipmi_msghandler rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table bonding dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev sr_mod cdrom i2c_i801 igb pcspkr i2c_core 8021q e1000e dca sg dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod lpfc(U) scsi_transport_fc ata_piix li: Jun 10 19:23:04 prddbs02 kernel: Pid: 8758, comm: eecd Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:23:04 prddbs02 kernel: RIP: 0010:[<ffffffff80065bfc>] [<ffffffff80065bfc>] .text.lock.spinlock+0x2/0x30 Jun 10 19:23:04 prddbs02 kernel: RSP: 0018:ffff8108997d1bc0 EFLAGS: 00000286 Jun 10 19:23:04 prddbs02 kernel: RAX: 0000000000000000 RBX: 00000000d2a03d30 RCX: 0000000000000001 Jun 10 19:23:04 prddbs02 kernel: RDX: ffff8108997d1d98 RSI: ffffffff885dd304 RDI: ffffffff8030e6c8 Jun 10 19:23:04 prddbs02 kernel: RBP: ffff8102f1aa8c10 R08: 0000000000000001 R09: ffff8108997d1bf8 Jun 10 19:23:04 prddbs02 kernel: R10: ffff81089d5285c0 R11: 0000000000000000 R12: 0000000000000000 Jun 10 19:23:04 prddbs02 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000000fb Jun 10 19:23:04 prddbs02 kernel: FS: 0000000000000000(0000) GS:ffff81012077dd40(0063) knlGS:00000000d2a04b90 Jun 10 19:23:04 prddbs02 kernel: CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b Jun 10 19:23:04 prddbs02 kernel: CR2: 00000000d2a02ddc CR3: 00000008f0781000 CR4: 00000000000006e0 Jun 10 19:23:04 prddbs02 kernel: Jun 10 19:23:04 prddbs02 kernel: Call Trace: Jun 10 19:23:04 prddbs02 kernel: [<ffffffff80077764>] smp_call_function_many+0x24/0x4c Jun 10 19:23:04 prddbs02 kernel: [<ffffffff885dd304>] :smbus:smbus_GetCpuError_callback+0x0/0x14 Jun 10 19:23:04 prddbs02 kernel: [<ffffffff80077869>] smp_call_function+0x4e/0x5e Jun 10 19:23:04 prddbs02 kernel: [<ffffffff885e4fcd>] :smbus:smbus_ioctl+0x2880/0x2f74 Jun 10 19:23:05 prddbs02 kernel: [<ffffffff80063ff8>] thread_return+0x62/0xfe Jun 10 19:23:05 prddbs02 kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff Jun 10 19:23:05 prddbs02 kernel: [<ffffffff8002b379>] flush_tlb_page+0xac/0xda Jun 10 19:23:05 prddbs02 kernel: [<ffffffff80011149>] do_wp_page+0x3fd/0x902 Jun 10 19:23:05 prddbs02 kernel: [<ffffffff80009677>] __handle_mm_fault+0xee5/0xfaa Jun 10 19:23:05 prddbs02 kernel: [<ffffffff80022127>] __up_read+0x19/0x7f Jun 10 19:23:05 prddbs02 kernel: [<ffffffff80067b88>] do_page_fault+0x4fe/0x874 Jun 10 19:23:05 prddbs02 kernel: [<ffffffff8006f1f5>] do_gettimeofday+0x40/0x90 Jun 10 19:23:05 prddbs02 kernel: [<ffffffff885e56d7>] :smbus:smbus_ioctl_compat+0x16/0x1d Jun 10 19:23:05 prddbs02 kernel: [<ffffffff800fb8d4>] compat_sys_ioctl+0xc5/0x2b2 Jun 10 19:23:05 prddbs02 kernel: [<ffffffff8006249d>] sysenter_do_call+0x1e/0x76 Jun 10 19:23:05 prddbs02 kernel: Jun 10 19:23:14 prddbs02 kernel: BUG: soft lockup - CPU#4 stuck for 10s! [eecd:8758] Jun 10 19:23:14 prddbs02 kernel: CPU 4: Jun 10 19:23:14 prddbs02 kernel: Modules linked in: oracleacfs(PFU) oracleadvm(PFU) oracleoks(PU) autofs4 hidp smbus(U) ipmi_devintf ipmi_si ipmi_msghandler rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table bonding dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev sr_mod cdrom i2c_i801 igb pcspkr i2c_core 8021q e1000e dca sg dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod lpfc(U) scsi_transport_fc ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Jun 10 19:23:14 prddbs02 kernel: Pid: 8758, comm: eecd Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:23:14 prddbs02 kernel: RIP: 0010:[<ffffffff80065bfc>] [<ffffffff80065bfc>] Thanks for any help in advance :) Regards, Raj -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
RHEL 5.5 Oracle RAC cluster resbooted due to processor hung!!
On 06/18/2012 08:44 AM, raj sourabh wrote:
Jun 10 19:22:04 prddbs02 snmpd[5158]: Received SNMP packet(s) from UDP: [127.0.0.1]:17955 Jun 10 19:22:34 prddbs02 kernel: NETDEV WATCHDOG: eth0: transmit timed out Jun 10 19:22:34 prddbs02 kernel: bonding: bond0: link status definitely down for interface eth0, disabling it Jun 10 19:22:34 prddbs02 kernel: bonding: bond0: making interface eth2 the new active one. Jun 10 19:22:34 prddbs02 kernel: device eth2 entered promiscuous mode Jun Before the soft lockup, what exactly caused the the NETDEV WATCHDOG loose eth0? For the __smp_call_function_many lockup, there were many fixes between 5.5 and 5.6 in relation to multipath and other third party drivers that caused similar lookups. (why are you on 5.5 and not at least 5.6, which kernel are you running on)? Best regards, -- -- George Magklaras PhD RHCE no: 805008309135525 Senior Systems Engineer/IT Manager Biotechnology Center of Oslo and the Norwegian Center for Molecular Medicine EMBnet TMPC Chair http://folk.uio.no/georgios 10 19:22:46 prddbs02 kernel: BUG: soft lockup - CPU#2 stuck for 10s! [multipathd:5060] Jun 10 19:22:46 prddbs02 kernel: CPU 2: Jun 10 19:22:46 prddbs02 kernel: Modules linked in: oracleacfs(PFU) oracleadvm(PFU) oracleoks(PU) autofs4 hidp smbus(U) ipmi_devintf ipmi_si ipmi_msghandler rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq freq_table bonding dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev sr_mod cdrom i2c_i801 igb pcspkr i2c_core 8021q e1000e dca sg dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod lpfc(U) scsi_transport_fc ata_piix libata shpchp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Jun 10 19:22:46 prddbs02 kernel: Pid: 5060, comm: multipathd Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:22:46 prddbs02 kernel: RIP: 0010:[<ffffffff8007767a>] [<ffffffff8007767a>] __smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP: 0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel: Pid: 5060, comm: multipathd Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:22:46 prddbs02 kernel: RIP: 0010:[<ffffffff8007767a>] [<ffffffff8007767a>] __smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP: 0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel: RAX: 0000000000000006 RBX: 0000000000000007 RCX: 0000000000000000 Jun 10 19:22:46 prddbs02 kernel: RDX: 00000000000000ff RSI: 00000000000000ff RDI: 00000000000000c0 Jun 10 19:22:46 prddbs02 kernel: RBP: 0000000000000000 R08: 0000000000000008 R09: 0000000000000038 Jun 10 19:22:46 prddbs02 kernel: R10: ffff8108e79a5b98 R11: 0000000000000000 R12: ffffffff80143e16 Jun 10 19:22:46 prddbs02 kernel: R13: 0000000000000003 R14: ffff810366ec2c58 R15: ffff81093da13340 Jun 10 19:22:46 prddbs02 kernel: FS: 000000004189d940(0063) GS:ffff81012071cec0(0000) knlGS:0000000000000000 Jun ... Thanks for any help in advance :) Regards, Raj -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
RHEL 5.5 Oracle RAC cluster resbooted due to processor hung!!
Actually the cluster checks are done via private network, so eth0
network loss should not have crashed the server. Do you see any logs in /var/crash? Is kdump/netdump setup? Can you post logs for ocssd (should be under grid directory) for the 10-15 minutes before the crash? Also post the /var/log/messages for 10-15 minutes prior to the crash. On Thu, Jun 21, 2012 at 1:04 AM, Georgios Magklaras <georgios@biotek.uio.no> wrote: > On 06/18/2012 08:44 AM, raj sourabh wrote: >> >> Jun 10 19:22:04 prddbs02 snmpd[5158]: Received SNMP packet(s) from UDP: >> [127.0.0.1]:17955 Jun 10 19:22:34 prddbs02 kernel: NETDEV WATCHDOG: eth0: >> transmit timed out Jun 10 19:22:34 prddbs02 kernel: bonding: bond0: link >> status definitely down for interface eth0, disabling it Jun 10 19:22:34 >> prddbs02 kernel: bonding: bond0: making interface eth2 the new active one. >> Jun 10 19:22:34 prddbs02 kernel: device eth2 entered promiscuous mode Jun > > Before the soft lockup, what exactly caused the the NETDEV WATCHDOG loose > eth0? > For the __smp_call_function_many lockup, there were many fixes between 5.5 > and 5.6 in relation to multipath and other third party drivers > that caused similar lookups. (why are you on 5.5 and not at least 5.6, which > kernel are you running on)? > > Best regards, > > -- > -- > George Magklaras PhD > RHCE no: 805008309135525 > > Senior Systems Engineer/IT Manager > Biotechnology Center of Oslo and > the Norwegian Center for Molecular Medicine > EMBnet TMPC Chair > > http://folk.uio.no/georgios > > > > >> 10 19:22:46 prddbs02 kernel: BUG: soft lockup - CPU#2 stuck for 10s! >> [multipathd:5060] Jun 10 19:22:46 prddbs02 kernel: CPU 2: Jun 10 19:22:46 >> prddbs02 kernel: Modules linked in: oracleacfs(PFU) oracleadvm(PFU) >> oracleoks(PU) autofs4 hidp smbus(U) ipmi_devintf ipmi_si ipmi_msghandler >> rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq >> freq_table bonding dm_round_robin dm_multipath scsi_dh video backlight sbs >> power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi >> acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev >> sr_mod cdrom i2c_i801 igb pcspkr i2c_core 8021q e1000e dca sg dm_raid45 >> dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror >> dm_log >> dm_mod lpfc(U) scsi_transport_fc ata_piix libata shpchp mptsas mptscsih >> mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd >> ehci_hcd Jun 10 19:22:46 prddbs02 kernel: Pid: 5060, comm: multipathd >> Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:22:46 prddbs02 kernel: RIP: >> 0010:[<ffffffff8007767a>] [<ffffffff8007767a>] >> __smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP: >> 0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel: >> Pid: 5060, comm: multipathd Tainted: PF M 2.6.18-194.el5 #1 Jun 10 >> 19:22:46 >> prddbs02 kernel: RIP: 0010:[<ffffffff8007767a>] [<ffffffff8007767a>] >> __smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP: >> 0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel: >> RAX: 0000000000000006 RBX: 0000000000000007 RCX: 0000000000000000 Jun 10 >> 19:22:46 prddbs02 kernel: RDX: 00000000000000ff RSI: 00000000000000ff RDI: >> 00000000000000c0 Jun 10 19:22:46 prddbs02 kernel: RBP: 0000000000000000 >> R08: 0000000000000008 R09: 0000000000000038 Jun 10 19:22:46 prddbs02 >> kernel: R10: ffff8108e79a5b98 R11: 0000000000000000 R12: ffffffff80143e16 >> Jun 10 19:22:46 prddbs02 kernel: R13: 0000000000000003 R14: >> ffff810366ec2c58 R15: ffff81093da13340 Jun 10 19:22:46 prddbs02 kernel: >> FS: >> 000000004189d940(0063) GS:ffff81012071cec0(0000) knlGS:0000000000000000 >> Jun > > ... > >> Thanks for any help in advance :) >> >> Regards, >> Raj > > > > -- > redhat-list mailing list > unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe > https://www.redhat.com/mailman/listinfo/redhat-list -- redhat-list mailing list unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list |
| All times are GMT. The time now is 07:37 PM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.