FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > CentOS > CentOS

 
 
LinkBack Thread Tools
 
Old 12-14-2009, 09:50 PM
"nate"
 
Default bnx2 losing connectivity

Hoping someone else has seen this before.

I have a few dozen Dell R610 systems with CentOS 5.2 that are
using kernels from 5.3 and 5.4 (2.6.18-128.1.10.el5 & 2.6.18-164.6.1.el5),
that at random lose layer 2 network connectivity either partially
or totally. Running tcpdump on the interface reveals only ARP
broadcasts, no responses. Switch reports no packets being
received on the interface.

Systems can run for days/weeks or even months without an issue then
drop off the network. At first I thought it was the Dell switches
which we had lots of problems with but it has happened on two other
brands of switches as well(Cisco and Extreme), so I no longer believe
it's the switch but rather the systems.

The workaround is to restart the network on the system. I have even
configured the bonding driver to do ARP requests and fail over to
the backup link in the event that fails but wasn't successful there
either as both links can go down, and/or the system can go into
"degraded" state where it can reach some systems but not others.

I have ESXi systems running on the same hardware and to-date have not
seen any of them drop off the same way.

System can be under high traffic load at the time or completely
idle, it doesn't seem to make a difference. No log entries indicating
what might be going on.

I have a case open with Dell but am not expecting a whole lot from
them, maybe I'll get lucky though. They asked me to upgrade the NIC
firmware which I did on a batch of systems to no avail(the release
notes for the firmware said nothing about any fixes that sounded
like my issue).

Driver versions:
ESXi (vSphere):
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.6.9 (December 8, 2007)

Most linux systems(5.3 kernel):
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.7.9-1 (July 18, 2008)

Some linux systems(5.4 kernel):
Broadcom NetXtreme II Gigabit Ethernet Driver bnx2 v1.9.3 (March 17, 2009)

Happens across at least a dozen systems spread over 4 data centers.

Never seen this sort of behavior before in the hundreds and hundreds
of systems I've run. These systems are all new, the R610 hardware
was released around May 2009, and we've been having issues since
day 1, but only recently have been able to rule the switches out as
the cause.

The latest driver on Broadcom's site is 1.9.20b which seems odd since
CentOS 5.4 seems to come with 1.9.3(the date on the Broadcom site is
more recent than the date on the linux kernel driver in 5.4) Most of
the fixes in the recent driver versions seem to focus around iSCSI,
which I'm not using.

lspci says:
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709
Gigabit Ethernet (rev 20)
Subsystem: Dell Unknown device 0236
Flags: bus master, fast devsel, latency 0, IRQ 114
Memory at dc000000 (64-bit, non-prefetchable) [size=32M]
Capabilities: [48] Power Management version 3
Capabilities: [50] Vital Product Data
Capabilities: [58] Message Signalled Interrupts: 64bit+ Queue=0/4
Enable-
Capabilities: [a0] MSI-X: Enable- Mask- TabSize=9
Capabilities: [ac] Express Endpoint IRQ 0
Capabilities: [100] Device Serial Number c9-dc-93-fe-ff-9b-21-00
Capabilities: [110] Advanced Error Reporting
Capabilities: [150] Power Budgeting
Capabilities: [160] Virtual Channel

I suppose I could go build the latest driver from their site and see
how it goes..

thanks

nate

_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-15-2009, 11:02 AM
James Pearson
 
Default bnx2 losing connectivity

nate wrote:
> Hoping someone else has seen this before.
>
> I have a few dozen Dell R610 systems with CentOS 5.2 that are
> using kernels from 5.3 and 5.4 (2.6.18-128.1.10.el5 & 2.6.18-164.6.1.el5),
> that at random lose layer 2 network connectivity either partially
> or totally. Running tcpdump on the interface reveals only ARP
> broadcasts, no responses. Switch reports no packets being
> received on the interface.

A colleague of mine has seen the exact same issue with Dell R610 systems
running CentOS 5.x

It looks like this might be the same issue as:

<https://bugzilla.redhat.com/show_bug.cgi?id=520888>

Which seems to suggest disabling MSI - i.e. load the bnx2 module with
"disable_msi=1"

We haven't tried this yet (as we went back to using CentOS 4 on these
boxes - which works OK)

James Pearson
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-15-2009, 04:24 PM
"nate"
 
Default bnx2 losing connectivity

James Pearson wrote:

> It looks like this might be the same issue as:
>
> <https://bugzilla.redhat.com/show_bug.cgi?id=520888>
>
> Which seems to suggest disabling MSI - i.e. load the bnx2 module with
> "disable_msi=1"

Wow! that looks interesting, will try it! thanks!

nate


_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-15-2009, 04:45 PM
Akemi Yagi
 
Default bnx2 losing connectivity

On Tue, Dec 15, 2009 at 9:24 AM, nate <centos@linuxpowered.net> wrote:
> James Pearson wrote:
>
>> It looks like this might be the same issue as:
>>
>> <https://bugzilla.redhat.com/show_bug.cgi?id=520888>
>>
>> Which seems to suggest disabling MSI - i.e. load the bnx2 module with
>> "disable_msi=1"
>
> Wow! that looks interesting, will try it! thanks!

This is also being tracked in the CentOS bug tracker:

http://bugs.centos.org/view.php?id=3832

Akemi
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 
Old 12-16-2009, 09:47 AM
James Pearson
 
Default bnx2 losing connectivity

nate wrote:
> James Pearson wrote:
>
>>It looks like this might be the same issue as:
>>
>><https://bugzilla.redhat.com/show_bug.cgi?id=520888>
>>
>>Which seems to suggest disabling MSI - i.e. load the bnx2 module with
>>"disable_msi=1"
>
> Wow! that looks interesting, will try it! thanks!

Also, the pre-5.5 kernel at
<http://people.redhat.com/dzickus/el5/181.el5> has "bnx2: update to
version 2.0.2" - no idea if this helps in this case.

James Pearson
_______________________________________________
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos
 

Thread Tools




All times are GMT. The time now is 10:14 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org