FAQ Search Today's Posts Mark Forums Read
» Video Reviews

» Linux Archive

Linux-archive is a website aiming to archive linux email lists and to make them easily accessible for linux users/developers.


» Sponsor

» Partners

» Sponsor

Go Back   Linux Archive > Debian > Debian User

 
 
LinkBack Thread Tools
 
Old 05-28-2012, 05:31 PM
"Anatoly Rybalchenko"
 
Default Strange problem with network , netperf CRR test fails.

Hello,
I have 6 identical physical machines in one cluster with Debian 6.0
onboard . Initially they were used to run Cassandra nodes, but these
nodes started to go down randomly after several hours of work, with hung
up connections in CLOSE_WAIT state. Typically, CLOSE_WAIT state is
indicator of incorrect app behavior, but I've reproduced similar
symptoms with netperf CRR test even with host as localhost:
'netperf -H localhost -t TCP_CRR -l -5' results in

'TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to localhost (127.0.0.1) port 0 AF_INET : demo
send_tcp_conn_rr: data recv error: Connection reset by peer'

And connections connection hang up in CLOSE_WAIT state with strange 1
byte in Recv-Q:

'tcp 1 0 127.0.0.1:12865 127.0.0.1:39664 CLOSE_WAIT'

Though, if I set test duration in seconds (e.g. -l 5) it works
correctly, and TCP_RR works correctly all the time.
Also, I've made tcpdump of conversation between two nodes in similar
TCP_CRR test and it also looks strange. Nodes correctly open connection
'client' send its data and then 'server' side just resets connection.

'netstat -s' for 40 minutes of uptime(reboot, test, and writing this
message) shows suspicious '6 TCP data loss events' and '11
connections reset due to early user close':

Ip:
2645347 total packets received
76 with invalid addresses
0 forwarded
0 incoming packets discarded
2645271 incoming packets delivered
2636980 requests sent out
Icmp:
22 ICMP messages received
0 input ICMP message failed.
ICMP input histogram:
destination unreachable: 22
22 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 22
IcmpMsg:
InType3: 22
OutType3: 22
Tcp:
263419 active connections openings
263458 passive connection openings
0 failed connection attempts
62 connection resets received
1 connections established
2636459 segments received
2636437 segments send out
8 segments retransmited
0 bad segments received.
21 resets sent
Udp:
531 packets received
2 packets to unknown port received.
0 packet receive errors
553 packets sent
UdpLite:
TcpExt:
9 invalid SYN cookies received
264883 TCP sockets finished time wait in fast timer
3 time wait sockets recycled by time stamp
20 delayed acks sent
Quick ack mode was activated 1 times
264978 packets directly queued to recvmsg prequeue.
473 bytes directly in process context from backlog
265473 bytes directly received in process context from prequeue
69 packet headers predicted
1573 packets header predicted and directly queued to user
1055284 acknowledgments not containing data payload received
193 predicted acknowledgments
6 TCP data loss events
1 timeouts in loss state
5 retransmits in slow start
2 other TCP timeouts
2 DSACKs sent for old packets
11 connections reset due to early user close
TCPSackMerged: 7
TCPSackShiftFallback: 13

I've already upgraded 'ixgbe' driver upto the latest 3.9-NAPI, but
problem still persists. And I even cannot find out it's source.

Best regards,
Anatoly Rybalchenko


--
To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
Archive: 681E33E82BBF69408861EB4178B371BB0542A7@smc-ex1.Enkata.com">http://lists.debian.org/681E33E82BBF69408861EB4178B371BB0542A7@smc-ex1.Enkata.com
 
Old 05-29-2012, 08:18 AM
"Anatoly Rybalchenko"
 
Default Strange problem with network , netperf CRR test fails.

Hello,
I have 6 identical physical machines in one cluster with Debian 6.0 onboard . Initially they were used to run Cassandra nodes, but these nodes started to go down randomly after several hours of work, with hung up connections in CLOSE_WAIT state. Typically, CLOSE_WAIT state is indicator of incorrect app behavior, but I’ve reproduced similar symptoms with netperf CRR test even with host as localhost:
‘netperf -H localhost -t TCP_CRR -l -5’ results in
*
‘TCP Connect/Request/Response TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET : demo
send_tcp_conn_rr: data recv error: Connection reset by peer’
*
And connections connection hang up in CLOSE_WAIT state with strange 1 byte in Recv-Q:
*
'tcp 1 0 127.0.0.1:12865 127.0.0.1:39664 CLOSE_WAIT'
*
Though, if I set test duration in seconds (e.g. -l 5) it works correctly, and TCP_RR works correctly all the time.
Also, I've made tcpdump of conversation between two nodes in similar TCP_CRR test and it also looks strange. Nodes correctly open connection 'client' send its data and then* 'server' side just resets connection.
*
'netstat -s' for 40 minutes of uptime(reboot, test, and writing this message) shows suspicious*** '6 TCP data loss events' and '11 connections reset due to early user close':
*
Ip:
** 2645347 total packets received
** 76 with invalid addresses
** 0 forwarded
** 0 incoming packets discarded
** 2645271 incoming packets delivered
** 2636980 requests sent out
Icmp:
** 22 ICMP messages received
** 0 input ICMP message failed.
** ICMP input histogram:
****** destination unreachable: 22
** 22 ICMP messages sent
** 0 ICMP messages failed
** ICMP output histogram:
****** destination unreachable: 22
IcmpMsg:
****** InType3: 22
****** OutType3: 22
Tcp:
** 263419 active connections openings
** 263458 passive connection openings
** 0 failed connection attempts
** 62 connection resets received
** 1 connections established
** 2636459 segments received
** 2636437 segments send out
** 8 segments retransmited
** 0 bad segments received.
** 21 resets sent
Udp:
** 531 packets received
** 2 packets to unknown port received.
** 0 packet receive errors
** 553 packets sent
UdpLite:
TcpExt:
** 9 invalid SYN cookies received
** 264883 TCP sockets finished time wait in fast timer
** 3 time wait sockets recycled by time stamp
** 20 delayed acks sent
** Quick ack mode was activated 1 times
** 264978 packets directly queued to recvmsg prequeue.
** 473 bytes directly in process context from backlog
** 265473 bytes directly received in process context from prequeue
** 69 packet headers predicted
** 1573 packets header predicted and directly queued to user
** 1055284 acknowledgments not containing data payload received
** 193 predicted acknowledgments
** 6 TCP data loss events
** 1 timeouts in loss state
** 5 retransmits in slow start
** 2 other TCP timeouts
** 2 DSACKs sent for old packets
** 11 connections reset due to early user close
** TCPSackMerged: 7
** TCPSackShiftFallback: 13
*
I've already upgraded 'ixgbe' driver upto the latest 3.9-NAPI, but problem still persists. And I even cannot find out it's source.
*
Best regards,
Anatoly Rybalchenko
 

Thread Tools




All times are GMT. The time now is 12:23 PM.

VBulletin, Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright 2007 - 2008, www.linux-archive.org