FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 10-30-2008, 05:00 PM
"Stackpole, Chris"
 
Default Question regarding bonding of multiple eth's

I seem to be having a problem with bonding under Debian
Lenny, but I am not sure exactly what the problem is.


*


I have two servers and each server has two gigabit network
cards. We have two gigabit switches that we use so that we have failover should
one die. I matched both eth0’s to switch0 and both eth1’s to switch
one. I then bonded the eth’s together on both servers. I posted how I did
it below just in case I screwed something up. Once I did the bonding,
everything looks to be OK. I can ping out and I can ping the hosts from other
systems. I pulled the network plug from one of the cards and watched that the
failover worked as it should. Then I plugged it back in and removed the other.
Everything worked as I thought it should; I am not an expert at bonding but I
have used the same method a few times now without problem.


*


Well I went on about my business and soon complaints began
to come in that one server was much slower then the other. :-/


*


I began investigating and sure enough, one system is slower.
Transferring a 1GB file across the network, I easily maintain ~38-40M/s on the
first host and I usually top out around 15-18MB/s on the other. Ifconfig shows
that both cards are set to the proper speed (txqueuelen:1000) but it isn’t
behaving like should be. Worse is when I do a watch or htop or something else
that updates I can notice the lag. For example, I have ssh’d into the
system and have htop running right now; it is supposed to update every 2
seconds. It works like it should for a short time but then every once in a
while the screen freezes for about 10 seconds, then everything updates all at
once and continues its 2 second update interval.


*


I thought it was the network cards, so I disabled the
bonding and tested each of them. I get gigabit speeds individually. Rebonded
the cards and I am back to the slow speeds. I turned off the system to see if
there was physical damage or something (found nothing) and when I brought it
back up I saw this in the logs:


*


Oct 30 11:53:04 Hostname kernel: [** 10.167568]
bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to
be 100Mb/sec and Full.


Oct 30 11:53:04 Hostname kernel: [** 10.167568]
bonding: bond0: enslaving eth0 as an active interface with an up link.


Oct 30 11:53:04 Hostname kernel: [** 10.264691]
bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to
be 100Mb/sec and Full.


Oct 30 11:53:04 Hostname kernel: [** 10.264691]
bonding: bond0: enslaving eth1 as an active interface with an up link.


Oct 30 11:53:04 Hostname kernel: [** 10.578052]
NET: Registered protocol family 10


Oct 30 11:53:04 Hostname kernel: [** 10.579606]
lo: Disabled Privacy Extensions


Oct 30 11:53:05 Hostname kernel: [** 12.884391]
tg3: eth0: Link is up at 1000 Mbps, full duplex.


Oct 30 11:53:05 Hostname kernel: [** 12.884391]
tg3: eth0: Flow control is off for TX and off for RX.


Oct 30 11:53:06 Hostname kernel: [** 13.012292]
tg3: eth1: Link is up at 1000 Mbps, full duplex.


Oct 30 11:53:06 Hostname kernel: [** 13.012292]
tg3: eth1: Flow control is off for TX and off for RX.


*


I see the tg3 messages in the first server, but I don’t
see the bonding warnings. My guess is that the bonding is somehow screwed up
and stuck on 100Mb/sec and doesn’t update when the cards go to 1000Mb/sec.
I tried to find an answer via google but did not find anything that seemed
useful to me. I see others have had this problem, but I found no solution that
helped me.


*


*I don’t know why one works and the other doesn’t.
They should be pretty similar in setup and configuration as I didn’t do
anything drastically different when I built them.


*


Any help would be appreciated.


*


Thanks!


Chris Stackpole


*


*


How I did the bonding:


# apt-get install ifenslave


# vi /etc/network/interfaces


auto lo


iface lo inet loopback


auto bond0


iface bond0 inet static


*********** address
10.3.45.3


*********** netmask
255.255.255.0


*********** network
10.3.45.0


*********** broadcast
10.3.45.255


*********** gateway
10.3.45.251


*********** dns-nameservers
10.1.1.5 10.1.1.6


*********** dns-search
mydomain.com


*********** up
/sbin/ifenslave bond0 eth0 eth1


*********** down
/sbin/ifenslave -d bond0 eth0 eth1


*


Then I restarted (yeah I know I could have just reset the
network but I restarted).


When it was back up ifconfig shows bond0, eth0, eth1, and lo
all correctly.
 
Old 11-03-2008, 07:28 PM
"Stackpole, Chris"
 
Default Question regarding bonding of multiple eth's

I finally figured it out and thought I
would share in case someone else stumbles upon this problem.


*


After doing a lot of research I found
that I had to add the following line to my /etc/modules:


bonding mode=1 miimon=100 downdelay=200
updelay=200


*


It seems to be working perfectly now.


*


Chris Stackpole


*


*












From: Stackpole, Chris
[mailto:CStackpole@barbnet.com]

Sent: Thursday, October 30, 2008
1:01 PM

To: debian-user@lists.debian.org

Subject: Question regarding
bonding of multiple eth's




*


I seem to be having a problem with bonding under Debian
Lenny, but I am not sure exactly what the problem is.


*


I have two servers and each server has two gigabit network
cards. We have two gigabit switches that we use so that we have failover should
one die. I matched both eth0’s to switch0 and both eth1’s to switch
one. I then bonded the eth’s together on both servers. I posted how I did
it below just in case I screwed something up. Once I did the bonding,
everything looks to be OK. I can ping out and I can ping the hosts from other
systems. I pulled the network plug from one of the cards and watched that the
failover worked as it should. Then I plugged it back in and removed the other.
Everything worked as I thought it should; I am not an expert at bonding but I
have used the same method a few times now without problem.


*


Well I went on about my business and soon complaints began
to come in that one server was much slower then the other. :-/


*


I began investigating and sure enough, one system is slower.
Transferring a 1GB file across the network, I easily maintain ~38-40M/s on the
first host and I usually top out around 15-18MB/s on the other. Ifconfig shows
that both cards are set to the proper speed (txqueuelen:1000) but it
isn’t behaving like should be. Worse is when I do a watch or htop or
something else that updates I can notice the lag. For example, I have
ssh’d into the system and have htop running right now; it is supposed to
update every 2 seconds. It works like it should for a short time but then every
once in a while the screen freezes for about 10 seconds, then everything
updates all at once and continues its 2 second update interval.


*


I thought it was the network cards, so I disabled the
bonding and tested each of them. I get gigabit speeds individually. Rebonded
the cards and I am back to the slow speeds. I turned off the system to see if
there was physical damage or something (found nothing) and when I brought it
back up I saw this in the logs:


*


Oct 30 11:53:04 Hostname kernel: [** 10.167568]
bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to
be 100Mb/sec and Full.


Oct 30 11:53:04 Hostname kernel: [** 10.167568]
bonding: bond0: enslaving eth0 as an active interface with an up link.


Oct 30 11:53:04 Hostname kernel: [** 10.264691]
bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to
be 100Mb/sec and Full.


Oct 30 11:53:04 Hostname kernel: [** 10.264691]
bonding: bond0: enslaving eth1 as an active interface with an up link.


Oct 30 11:53:04 Hostname kernel: [** 10.578052]
NET: Registered protocol family 10


Oct 30 11:53:04 Hostname kernel: [** 10.579606]
lo: Disabled Privacy Extensions


Oct 30 11:53:05 Hostname kernel: [** 12.884391]
tg3: eth0: Link is up at 1000 Mbps, full duplex.


Oct 30 11:53:05 Hostname kernel: [** 12.884391]
tg3: eth0: Flow control is off for TX and off for RX.


Oct 30 11:53:06 Hostname kernel: [** 13.012292]
tg3: eth1: Link is up at 1000 Mbps, full duplex.


Oct 30 11:53:06 Hostname kernel: [** 13.012292]
tg3: eth1: Flow control is off for TX and off for RX.


*


I see the tg3 messages in the first server, but I
don’t see the bonding warnings. My guess is that the bonding is somehow
screwed up and stuck on 100Mb/sec and doesn’t update when the cards go to
1000Mb/sec. I tried to find an answer via google but did not find anything that
seemed useful to me. I see others have had this problem, but I found no
solution that helped me.


*


*I don’t know why one works and the other
doesn’t. They should be pretty similar in setup and configuration as I
didn’t do anything drastically different when I built them.


*


Any help would be appreciated.


*


Thanks!


Chris Stackpole


*


*


How I did the bonding:


# apt-get install ifenslave


# vi /etc/network/interfaces


auto lo


iface lo inet loopback


auto bond0


iface bond0 inet static


***********
address 10.3.45.3


***********
netmask 255.255.255.0


***********
network 10.3.45.0


***********
broadcast 10.3.45.255


***********
gateway 10.3.45.251


***********
dns-nameservers 10.1.1.5 10.1.1.6


***********
dns-search mydomain.com


***********
up /sbin/ifenslave bond0 eth0 eth1


***********
down /sbin/ifenslave -d bond0 eth0 eth1


*


Then I restarted (yeah I know I could have just reset the
network but I restarted).


When it was back up ifconfig shows bond0, eth0, eth1, and lo
all correctly.
 

Thread Tools




All times are GMT. The time now is 02:59 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org