FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 11-12-2008, 12:47 PM
lingu
 
Default Cluster Broken Pipe error and Heartbeat configuration

Hi,

I am running two node active/passive cluster on RHEL3U8-64 bit
operating system for my oracle database,both the nodes are connected
to HP MSA-500 storage(scsi not Fibre channel) . Below are my hardware
and clumanager version details. It was running fine and stable for
last two years but all of a sudden for the past one month i am getting
below errors on syslog and cluster restarting locally.

Server Hardware: HP ProLiant DL580 G4
OS: RHEL3U8-64BIT INTEL EMT
Kernel : 2.4.21-47.EL
Storage : HP MSA-500 storage (scsci channel)

Cluster Version:
clumanager-1.2.26.1-1
redhat-config-cluster-1.0.7-1

NODE1 ip: 20.2.135.161 (network bonding configured)
NODE2 ip: 20.2.135.162 (network bonding configured)
VIP : 20.2.135.35

Syslog errors

cluquorumd[1921]: <warning> Disk-TB: Detected I/O Hang!
clulockd[1996]: <warning> Potential recursive lock #0 grant to member
#1, PID1962
clulockd[1996]: <warning> Denied 20.1.135.162: Broken pipe
clulockd[1996]: <err> select error: Broken pipe
clulockd[1996]: <warning> Denied 20.1.135.162: Broken pipe
clulockd[1996]: <err> select error: Broken pipe
cluquorumd[1921]: <warning> Disk-TB: Detected I/O Hang!
clulockd[1996]: <warning> Denied 20.1.135.161: Broken pipe
clulockd[1996]: <err> select error: Broken pipe
clusvcmgrd[2011]: <err> Unable to obtain cluster lock: Connection timed out
cluquorumd[2100]: <err> VF: Abort: Invalid header in reply from member #0
cluquorumd[1934]: <err> __msg_send: Incomplete write to 13. Error:
Connection reset by peer

Can any one guide me what is this above error indicates and how to
troubleshoot.After a long google search i found the below link from
redhat that is matching my scenario.Can i follow the same because it
is my very critical production server.

https://bugzilla.redhat.com/show_bug.cgi?id=185484


Also anyone help me to configure a dedicated LAN (for example eth3)
as heartbeat(private point to point cross over cable network for
cluster communications),I don't wish heartbeat over public LAN ,
because of heavy Network saturation.

Fot the above heartbeat configuration i didnot found any suitable
document for rhel. Can any one provide me the suitable link or guide
me what are all the changes i have to made in my existing cluster.xml
file for this private heartbeat configuration to work.

Waiting for some one reply its urgent for me

Regards,
Lingu
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-12-2008, 03:41 PM
"nate"
 
Default Cluster Broken Pipe error and Heartbeat configuration

lingu wrote:

> Can any one guide me what is this above error indicates and how to
> troubleshoot.After a long google search i found the below link from
> redhat that is matching my scenario.Can i follow the same because it
> is my very critical production server.

I suggest you contact Red Hat support for this issue if it's
such a critical server and sounds like a pretty fragile situation.
That's what they are there for. And your running a really old version
of RH.

If it were me I would upgrade the system to be fiber channel instead
of SCSI, and update to all the latest patches for your version of
RH. The bug mentions how using SCSI attached storage as your shared
storage medium is not at all proven reliable. At least some MSAs
out there you can get a fiber channel head unit and a few HBAs, and
perhaps a switch and hook things up without too much downtime and
have a better system as a result.

nate


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 11-13-2008, 12:09 AM
Ian Forde
 
Default Cluster Broken Pipe error and Heartbeat configuration

On Wed, 2008-11-12 at 08:41 -0800, nate wrote:
> lingu wrote:
>
> > Can any one guide me what is this above error indicates and how to
> > troubleshoot.After a long google search i found the below link from
> > redhat that is matching my scenario.Can i follow the same because it
> > is my very critical production server.
>
> I suggest you contact Red Hat support for this issue if it's
> such a critical server and sounds like a pretty fragile situation.
> That's what they are there for. And your running a really old version
> of RH.

I'm inclined to agree the RH is probably the fastest way to get this
resolved. That isn't such an old version of RHEL, btw... current RHEL3
version is 3.9, but RH recommends sticking with particular versions when
using RHCS (Red Hat Cluster Server) as Cluster can often come with
replacement versions of stock rpms (including the kernel)... From
checking http://www.redhat.com/docs/manuals/csgfs/ we can see that RH
didn't update RHCS for 3.9, so RHEL 3.8 is the current version supported
for RHCS...

> If it were me I would upgrade the system to be fiber channel instead
> of SCSI, and update to all the latest patches for your version of
> RH. The bug mentions how using SCSI attached storage as your shared
> storage medium is not at all proven reliable. At least some MSAs
> out there you can get a fiber channel head unit and a few HBAs, and
> perhaps a switch and hook things up without too much downtime and
> have a better system as a result.

I wouldn't do that... not right away anyway... SCSI has proven itself
reliable over the years for clustering just fine in my experience.
However, it's how you've got it configured that may cause headaches...

You should definitely configure a private LAN for the heartbeat. It's
as simple as editing /etc/sysconfig/network-scripts/ifcfg-eth3 on each
box and setting up the IP addresses.

But I wouldn't use a crossover cable for this - create a 2-port vlan on
your switch for it (or use a cheap switch or whatever else will work).
If you use a crossover, if either NIC or the cable fails, link state
will be down on both nodes and both nodes will attempt (and possibly
succeed) to fence each other. I'm not making this up - I've seen it
happen on cluster deployments. (In a past life I used to deploy RHCS for
RH.)

Best advice? Call Red Hat and speak to them... they'll give you the
recommended config... also, check out the RHCS docs...
http://www.redhat.com/docs/manuals/csgfs/browse/rh-cs-en-3/index.html

I'd also recommend taking the RHCS class. It's um... enlightening...

-I

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 09:15 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org