Question regarding bonding of multiple eth's
I seem to be having a problem with bonding under Debian
Lenny, but I am not sure exactly what the problem is. Â* I have two servers and each server has two gigabit network cards. We have two gigabit switches that we use so that we have failover should one die. I matched both eth0’s to switch0 and both eth1’s to switch one. I then bonded the eth’s together on both servers. I posted how I did it below just in case I screwed something up. Once I did the bonding, everything looks to be OK. I can ping out and I can ping the hosts from other systems. I pulled the network plug from one of the cards and watched that the failover worked as it should. Then I plugged it back in and removed the other. Everything worked as I thought it should; I am not an expert at bonding but I have used the same method a few times now without problem. Â* Well I went on about my business and soon complaints began to come in that one server was much slower then the other. :-/ Â* I began investigating and sure enough, one system is slower. Transferring a 1GB file across the network, I easily maintain ~38-40M/s on the first host and I usually top out around 15-18MB/s on the other. Ifconfig shows that both cards are set to the proper speed (txqueuelen:1000) but it isn’t behaving like should be. Worse is when I do a watch or htop or something else that updates I can notice the lag. For example, I have ssh’d into the system and have htop running right now; it is supposed to update every 2 seconds. It works like it should for a short time but then every once in a while the screen freezes for about 10 seconds, then everything updates all at once and continues its 2 second update interval. Â* I thought it was the network cards, so I disabled the bonding and tested each of them. I get gigabit speeds individually. Rebonded the cards and I am back to the slow speeds. I turned off the system to see if there was physical damage or something (found nothing) and when I brought it back up I saw this in the logs: Â* Oct 30 11:53:04 Hostname kernel: [Â*Â* 10.167568] bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full. Oct 30 11:53:04 Hostname kernel: [Â*Â* 10.167568] bonding: bond0: enslaving eth0 as an active interface with an up link. Oct 30 11:53:04 Hostname kernel: [Â*Â* 10.264691] bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to be 100Mb/sec and Full. Oct 30 11:53:04 Hostname kernel: [Â*Â* 10.264691] bonding: bond0: enslaving eth1 as an active interface with an up link. Oct 30 11:53:04 Hostname kernel: [Â*Â* 10.578052] NET: Registered protocol family 10 Oct 30 11:53:04 Hostname kernel: [Â*Â* 10.579606] lo: Disabled Privacy Extensions Oct 30 11:53:05 Hostname kernel: [Â*Â* 12.884391] tg3: eth0: Link is up at 1000 Mbps, full duplex. Oct 30 11:53:05 Hostname kernel: [Â*Â* 12.884391] tg3: eth0: Flow control is off for TX and off for RX. Oct 30 11:53:06 Hostname kernel: [Â*Â* 13.012292] tg3: eth1: Link is up at 1000 Mbps, full duplex. Oct 30 11:53:06 Hostname kernel: [Â*Â* 13.012292] tg3: eth1: Flow control is off for TX and off for RX. Â* I see the tg3 messages in the first server, but I don’t see the bonding warnings. My guess is that the bonding is somehow screwed up and stuck on 100Mb/sec and doesn’t update when the cards go to 1000Mb/sec. I tried to find an answer via google but did not find anything that seemed useful to me. I see others have had this problem, but I found no solution that helped me. Â* Â*I don’t know why one works and the other doesn’t. They should be pretty similar in setup and configuration as I didn’t do anything drastically different when I built them. Â* Any help would be appreciated. Â* Thanks! Chris Stackpole Â* Â* How I did the bonding: # apt-get install ifenslave # vi /etc/network/interfaces auto lo iface lo inet loopback auto bond0 iface bond0 inet static Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* address 10.3.45.3 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* netmask 255.255.255.0 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* network 10.3.45.0 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* broadcast 10.3.45.255 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* gateway 10.3.45.251 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* dns-nameservers 10.1.1.5 10.1.1.6 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* dns-search mydomain.com Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* up /sbin/ifenslave bond0 eth0 eth1 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* down /sbin/ifenslave -d bond0 eth0 eth1 Â* Then I restarted (yeah I know I could have just reset the network but I restarted). When it was back up ifconfig shows bond0, eth0, eth1, and lo all correctly. |
Question regarding bonding of multiple eth's
I finally figured it out and thought I
would share in case someone else stumbles upon this problem. Â* After doing a lot of research I found that I had to add the following line to my /etc/modules: bonding mode=1 miimon=100 downdelay=200 updelay=200 Â* It seems to be working perfectly now. Â* Chris Stackpole Â* Â* From: Stackpole, Chris [mailto:CStackpole@barbnet.com] Sent: Thursday, October 30, 2008 1:01 PM To: debian-user@lists.debian.org Subject: Question regarding bonding of multiple eth's Â* I seem to be having a problem with bonding under Debian Lenny, but I am not sure exactly what the problem is. Â* I have two servers and each server has two gigabit network cards. We have two gigabit switches that we use so that we have failover should one die. I matched both eth0’s to switch0 and both eth1’s to switch one. I then bonded the eth’s together on both servers. I posted how I did it below just in case I screwed something up. Once I did the bonding, everything looks to be OK. I can ping out and I can ping the hosts from other systems. I pulled the network plug from one of the cards and watched that the failover worked as it should. Then I plugged it back in and removed the other. Everything worked as I thought it should; I am not an expert at bonding but I have used the same method a few times now without problem. Â* Well I went on about my business and soon complaints began to come in that one server was much slower then the other. :-/ Â* I began investigating and sure enough, one system is slower. Transferring a 1GB file across the network, I easily maintain ~38-40M/s on the first host and I usually top out around 15-18MB/s on the other. Ifconfig shows that both cards are set to the proper speed (txqueuelen:1000) but it isn’t behaving like should be. Worse is when I do a watch or htop or something else that updates I can notice the lag. For example, I have ssh’d into the system and have htop running right now; it is supposed to update every 2 seconds. It works like it should for a short time but then every once in a while the screen freezes for about 10 seconds, then everything updates all at once and continues its 2 second update interval. Â* I thought it was the network cards, so I disabled the bonding and tested each of them. I get gigabit speeds individually. Rebonded the cards and I am back to the slow speeds. I turned off the system to see if there was physical damage or something (found nothing) and when I brought it back up I saw this in the logs: Â* Oct 30 11:53:04 Hostname kernel: [Â*Â* 10.167568] bonding: bond0: Warning: failed to get speed and duplex from eth0, assumed to be 100Mb/sec and Full. Oct 30 11:53:04 Hostname kernel: [Â*Â* 10.167568] bonding: bond0: enslaving eth0 as an active interface with an up link. Oct 30 11:53:04 Hostname kernel: [Â*Â* 10.264691] bonding: bond0: Warning: failed to get speed and duplex from eth1, assumed to be 100Mb/sec and Full. Oct 30 11:53:04 Hostname kernel: [Â*Â* 10.264691] bonding: bond0: enslaving eth1 as an active interface with an up link. Oct 30 11:53:04 Hostname kernel: [Â*Â* 10.578052] NET: Registered protocol family 10 Oct 30 11:53:04 Hostname kernel: [Â*Â* 10.579606] lo: Disabled Privacy Extensions Oct 30 11:53:05 Hostname kernel: [Â*Â* 12.884391] tg3: eth0: Link is up at 1000 Mbps, full duplex. Oct 30 11:53:05 Hostname kernel: [Â*Â* 12.884391] tg3: eth0: Flow control is off for TX and off for RX. Oct 30 11:53:06 Hostname kernel: [Â*Â* 13.012292] tg3: eth1: Link is up at 1000 Mbps, full duplex. Oct 30 11:53:06 Hostname kernel: [Â*Â* 13.012292] tg3: eth1: Flow control is off for TX and off for RX. Â* I see the tg3 messages in the first server, but I don’t see the bonding warnings. My guess is that the bonding is somehow screwed up and stuck on 100Mb/sec and doesn’t update when the cards go to 1000Mb/sec. I tried to find an answer via google but did not find anything that seemed useful to me. I see others have had this problem, but I found no solution that helped me. Â* Â*I don’t know why one works and the other doesn’t. They should be pretty similar in setup and configuration as I didn’t do anything drastically different when I built them. Â* Any help would be appreciated. Â* Thanks! Chris Stackpole Â* Â* How I did the bonding: # apt-get install ifenslave # vi /etc/network/interfaces auto lo iface lo inet loopback auto bond0 iface bond0 inet static Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* address 10.3.45.3 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* netmask 255.255.255.0 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* network 10.3.45.0 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* broadcast 10.3.45.255 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* gateway 10.3.45.251 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* dns-nameservers 10.1.1.5 10.1.1.6 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* dns-search mydomain.com Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* up /sbin/ifenslave bond0 eth0 eth1 Â*Â*Â*Â*Â*Â*Â*Â*Â*Â*Â* down /sbin/ifenslave -d bond0 eth0 eth1 Â* Then I restarted (yeah I know I could have just reset the network but I restarted). When it was back up ifconfig shows bond0, eth0, eth1, and lo all correctly. |
| All times are GMT. The time now is 04:43 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.