Weird routing / arp / ppp problem - low upload after debian upgrade
Hi,
After upgrade from old patched etch, my clients cannot browse internet anymore* (upload is ok but download not bigger than* few kbps ) - problem occurs randomly - other services that use small packets like voip work perfectly. Here's my detailed problem. I have pppoe concentrator serving several hundreds of computers . Every user can have public IP (directly on the pppoe tunnel without snat/dnat) or snated private IP. Those clients with public IP are proxy_arp'ed so world can see them. Incoming traffic goes on imq0 and outgoing on eth0 - traffic shaping looks fine This is typical example of firewall rule generated for public IP : iptables : *iptables -t filter -A FORWARD -i ppp+* -s 217.17.10.250 -j ACCEPT *iptables -t filter -A FORWARD -d 217.17.10.250 -j ACCEPT shaping : *iptables -t mangle -A UPLOAD -p all -o eth0 -s 217.17.10.250 -j CLASSIFY --set-class 2:246 *tc filter add dev imq0 parent 1: protocol ip u32 match ip dst 217.17.38.250 flowid 1:246 *tc class add dev imq0 parent 1:2 classid 1:246 htb rate 128kbit ceil 4096kbit burst 4096kbit prio 5 quantum 8 *tc qdisc add dev imq0 parent 1:246 handle 246:0 sfq perturb 10 *tc class add dev eth0 parent 2:2 classid 2:246 htb rate 128kbit ceil 4096kbit burst 4096kbit prio 5 quantum 8 *tc qdisc add dev eth0 parent 2:246* handle 246:0 sfq perturb 10 for private IP we have almost the same but there's SNAT in the iptables part. Every client has the same formula for generating iptables firewall. My problem is following - totally random clients are having problems with download. If I use mikrotik bandwidth tester from internet to their computer it gives transfer like Xmbits upload (from their side) and 10-15kbps in direction to the client. Problem ONLY occurs when they are behind their client router. If they connect via pppoe directly to my server - problem disappeares. Bandwidth tester uses big packets so they are fragmented. If I use packets like ping - they have nice transfer and everything is reachable from them.* The problem on the side of client looks like they can browse internet but google.com loads for like 20 minutes but voip works fine. Moreover if i take client router and place it in the other place of my "lan" (my lan is 100% bridged with mikrotik) , it usually works. What i've triple checked : - generation of iptables/tc rules - pppoe MTU (1480 or 1492 - both working ) - mss - path mtu discovery packets are not blocked, everything looks fine What i suspect : - some arp problem maybe ? Problem began right after i've changed my old Etch server on 2.6.15 witch patched iptables and kernel with patch-o-matic into clean 2.6.32 squeeze with everything from apt. My sysctl.conf along with pppoe-server-options is attached at the end of this message. I've done also tcpdump sniff on the clients interface many times and nothing drags my attention. Typical arp entry for snated IP is : ? (10.100.0.25) at <incomplete> on eth1 ? (10.100.0.25) at <from_interface> PERM PUB on eth1 Typical arp entry for public IP looks like : ? (217.17.10.250) at <from_interface> PERM PUB on eth0 Any clues will be VERY appreciated. debian-firewall , please Cc to me as I'm not subscribed please. Regards WZ ------------sysctl.conf--------------- kernel.panic = 3 net.core.rmem_max = 131071 net.core.wmem_max = 131071 net.ipv4.conf.all.arp_announce = 0 net.ipv4.conf.all.arp_ignore = 0 net.ipv4.conf.all.proxy_arp = 1 net.ipv4.conf.all.rp_filter = 0 net.ipv4.conf.default.arp_announce = 0 net.ipv4.conf.default.arp_filter = 0 net.ipv4.conf.default.arp_ignore = 0 net.ipv4.conf.default.rp_filter = 0 net.ipv4.conf.eth0.arp_announce = 0 net.ipv4.conf.eth0.arp_ignore = 0 net.ipv4.conf.eth0.proxy_arp = 1 net.ipv4.conf.eth1.arp_announce = 0 net.ipv4.conf.eth1.arp_ignore = 0 net.ipv4.conf.eth1.proxy_arp = 0 net.ipv4.conf.eth2.proxy_arp = 0 net.ipv4.ip_forward = 1 net.ipv4.ip_local_port_range = 1024 4999 net.ipv4.neigh.default.base_reachable_time = 1036800 net.ipv4.neigh.default.gc_thresh1 = 1024 net.ipv4.neigh.default.gc_thresh2 = 8192 net.ipv4.neigh.default.gc_thresh3 = 32768 net.ipv4.neigh.default.ucast_solicit = 4 net.ipv4.neigh.eth0.base_reachable_time = 1036800 net.ipv4.neigh.eth0.ucast_solicit = 4 net.ipv4.neigh.eth1.ucast_solicit = 4 net.ipv4.neigh.eth2.ucast_solicit = 4 net.ipv4.neigh.imq0.ucast_solicit = 4 net.ipv4.neigh.imq1.ucast_solicit = 4 net.ipv4.neigh.lo.ucast_solicit = 4 net.ipv4.netfilter.ip_conntrack_max = 132760 net.ipv4.netfilter.ip_conntrack_tcp_timeout_close_ wait = 10 net.ipv4.netfilter.ip_conntrack_tcp_timeout_close = 5 net.ipv4.netfilter.ip_conntrack_tcp_timeout_establ ished = 43200 net.ipv4.netfilter.ip_conntrack_tcp_timeout_fin_wa it = 30 net.ipv4.netfilter.ip_conntrack_tcp_timeout_last_a ck = 30 net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_re cv = 60 net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_se nt = 120 net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_w ait = 20 net.ipv4.tcp_dsack = 0 net.ipv4.tcp_ecn = 0 net.ipv4.tcp_fack = 1 net.ipv4.tcp_mem = 393216****** 524288* 786432 net.ipv4.tcp_rmem = 4096******* 87380** 174760 net.ipv4.tcp_sack = 0 net.ipv4.tcp_syncookies = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_timestamps = 0 net.ipv4.tcp_window_scaling = 1 net.ipv4.tcp_wmem = 4096******* 16384** 131072 -----------------------------------tcpdump of client -------------(myptr is the client's public IP)------------ 23:11:45.110682 IP ew-in-f104.1e100.net.www > 190.myptr.com.33773: Flags [.], seq 966809112:966810542, ack 3627825205, win 122, length 1430 23:11:49.509890 IP 190.myptr.com.32876 > sip.voice.gtsenergis.pl.sip: SIP, length: 369 23:11:49.512774 IP sip.voice.gtsenergis.pl.sip > 190.myptr.com.32876: SIP, length: 366 23:11:53.811644 IP 190.myptr.com.49186 > 94.245.115.184.3544: UDP, length 61 23:11:53.853959 IP 94.245.115.184.3544 > 190.myptr.com.49186: UDP, length 109 23:11:55.110798 IP ew-in-f104.1e100.net.www > 190.myptr.com.33773: Flags [.], seq 0:1430, ack 1, win 122, length 1430 23:11:56.383790 IP 151.59.26.182.46119 > 190.myptr.com.33260: Flags [F.], seq 3286477939, ack 3609663927, win 65364, length 0 23:11:56.622196 IP 158.129.20.136.35137 > 190.myptr.com.32923: Flags [F.], seq 840311840, ack 4022529708, win 65373, length 0 23:12:00.277498 IP 190.myptr.com.isakmp > ip-89.171.11.42.static.crowley.pl.isakmp: isakmp: phase 1 I ident 23:12:04.529779 IP 190.myptr.com.32876 > sip.voice.gtsenergis.pl.sip: SIP, length: 369 23:12:04.532416 IP sip.voice.gtsenergis.pl.sip > 190.myptr.com.32876: SIP, length: 366 23:12:05.111138 IP ew-in-f104.1e100.net.www > 190.myptr.com.33773: Flags [.], seq 0:1430, ack 1, win 122, length 1430 23:12:05.659030 IP 130pc240.sshunet.nl.https > 190.myptr.com.32786: Flags [R.], seq 3389053229, ack 421162466, win 0, length 0 23:12:09.608646 IP 190.myptr.com.isakmp > ip-89.171.11.42.static.crowley.pl.isakmp: isakmp: phase 1 I ident 23:12:10.174398 IP 190.myptr.com.33580 > 10.10.123.30.www: Flags [S], seq 3710035758, win 8192, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0 23:12:13.187047 IP 190.myptr.com.33580 > 10.10.123.30.www: Flags [S], seq 3710035758, win 8192, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0 23:12:15.111714 IP ew-in-f104.1e100.net.www > 190.myptr.com.33773: Flags [.], seq 0:1430, ack 1, win 122, length 1430 23:12:19.548671 IP 190.myptr.com.32876 > sip.voice.gtsenergis.pl.sip: SIP, length: 369 23:12:19.551750 IP sip.voice.gtsenergis.pl.sip > 190.myptr.com.32876: SIP, length: 366 23:12:23.548358 IP 190.myptr.com.33568 > 10.10.123.30.www: Flags [S], seq 2757323080, win 8192, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0 23:12:25.112024 IP ew-in-f104.1e100.net.www > 190.myptr.com.33773: Flags [.], seq 0:1430, ack 1, win 122, length 1430 23:12:26.548632 IP 190.myptr.com.33568 > 10.10.123.30.www: Flags [S], seq 2757323080, win 8192, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0 23:12:32.191530 IP 139.91.70.35.19609 > 190.myptr.com.33196: Flags [F.], seq 604234015, ack 3123515410, win 17520, length 0 -----------------pppoe server options----------------------- plugin radius.so plugin radattr.so auth require-chap lcp-echo-interval 10 lcp-echo-failure 5 ms-dns 217.17.10.208 ms-dns 217.17.10.10 proxyarp noipx mtu 1460 mru 1460 -- Wojciech Ziniewicz http://www.rfc-editor.org/rfc/rfc2324.txt |
Weird routing / arp / ppp problem - low upload after debian upgrade
On 08 Dec 16:26, Wojciech Ziniewicz wrote:
> Hi, > After upgrade from old patched etch, my clients cannot browse internet > anymore (upload is ok but download not bigger than few kbps ) - problem > occurs randomly - other services that use small packets like voip work > perfectly. Between that system and the outside world, is there another router/firewall? My initial guess would be that you've hit the tcp window scale problem, you can (quickly) check this by doing: sysctl net.ipv4.tcp_window_scaling=0 On the box that they're going through - if that works then you've got a box between that and the internet that doesn't watch the window scaling flag as it goes past, and therefore mangles packets later on because it doesn't know that they can get through. Hope that helps, -- Brett Parker http://www.sommitrealweird.co.uk/ PGP Fingerprint 1A9E C066 EDEE 6746 36CB BD7F 479E C24F 95C7 1D61 -- To UNSUBSCRIBE, email to debian-isp-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org Archive: 20101208164714.GD4830@sommitrealweird.co.uk">http://lists.debian.org/20101208164714.GD4830@sommitrealweird.co.uk |
Weird routing / arp / ppp problem - low upload after debian upgrade
2010/12/8 Brett Parker <iDunno@sommitrealweird.co.uk>
On 08 Dec 16:26, Wojciech Ziniewicz wrote: > Hi, > After upgrade from old patched etch, my clients cannot browse internet > anymore *(upload is ok but download not bigger than *few kbps ) - problem > occurs randomly - other services that use small packets like voip work > perfectly. Between that system and the outside world, is there another router/firewall? *there's only my router with BGP session acting as a gateway * My initial guess would be that you've hit the tcp window scale problem, you can (quickly) check this by doing: * *sysctl net.ipv4.tcp_window_scaling=0 I did some tests with both settings : 1. telneting on a host behind my client's router : listening on ppp296, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 18:58:39.988961 IP 1.mydomain.com.3718 > 10.100.0.194.telnet: Flags [S], seq 3615516481, win 5808, options [mss 1452,nop,wscale 2], length 0 18:58:39.991914 IP 10.100.0.194.telnet > 1.mydomain.com.3718: Flags [S.], seq 1759632665, ack 3615516482, win 5840, options [mss 1452,nop,wscale 0], length 0 18:58:39.991975 IP 1.mydomain.com.3718 > 10.100.0.194.telnet: Flags [.], ack 1, win 1452, length 0 18:58:39.992118 IP 1.mydomain.com.3718 > 10.100.0.194.telnet: Flags [P.], seq 1:25, ack 1, win 1452, length 24 18:58:40.020668 IP 10.100.0.194.telnet > 1.mydomain.com.3718: Flags [P.], seq 1:13, ack 25, win 5840, length 12 18:58:40.020742 IP 1.mydomain.com.3718 > 10.100.0.194.telnet: Flags [.], ack 13, win 1452, length 0 18:58:40.020847 IP 1.mydomain.com.3718 > 10.100.0.194.telnet: Flags [P.], seq 25:28, ack 13, win 1452, length 3 18:58:40.023064 IP 10.100.0.194.telnet > 1.mydomain.com.3718: Flags [P.], seq 13:28, ack 25, win 5840, length 15 18:58:40.054093 IP 1.mydomain.com.3718 > 10.100.0.194.telnet: Flags [.], ack 28, win 1452, length 0 18:58:40.056014 IP 10.100.0.194.telnet > 1.mydomain.com.3718: Flags [P.], seq 28:46, ack 28, win 5840, length 18 --- from now on my router tries to get response from the box behind firewall 18:58:40.056068 IP 1.mydomain.com.3718 > 10.100.0.194.telnet: Flags [P.], seq 28:37, ack 46, win 1452, length 9 18:58:40.284102 IP 1.mydomain.com.3718 > 10.100.0.194.telnet: Flags [P.], seq 28:37, ack 46, win 1452, length 9 18:58:40.744084 IP 1.mydomain.com.3718 > 10.100.0.194.telnet: Flags [P.], seq 28:37, ack 46, win 1452, length 9 18:58:41.664196 IP 1.mydomain.com.3718 > 10.100.0.194.telnet: Flags [P.], seq 28:37, ack 46, win 1452, length 9 18:58:43.504119 IP 1.mydomain.com.3718 > 10.100.0.194.telnet: Flags [P.], seq 28:37, ack 46, win 1452, length 9 18:58:47.184092 IP 1.mydomain.com.3718 > 10.100.0.194.telnet: Flags [P.], seq 28:37, ack 46, win 1452, length 9 output of telnet is : root@beta2:/home/wojtek# telnet 10.100.0.194 Trying 10.100.0.194... Connected to 10.100.0.194. Escape character is '^]'. *it should be prompt for login and password. 2. after doing the tcp window scaling change i repeated the telnet procedure and here's another sniff from my pppoe-server tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ppp296, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes 19:03:12.045452 IP 1.mydomain.com.1048 > 10.100.0.194.telnet: Flags [S], seq 3534691937, win 5808, options [mss 1452], length 0 19:03:12.047495 IP 10.100.0.194.4119 > 1.mydomain.com.1048: Flags [R.], seq 0, ack 3534691938, win 0, length 0 19:03:18.092283 IP 1.mydomain.com.1048 > 10.100.0.194.telnet: Flags [S], seq 3534691937, win 5808, options [mss 1452], length 0 19:03:18.094212 IP 10.100.0.194.4119 > 1.mydomain.com.1048: Flags [R.], seq 0, ack 1, win 0, length 0 syn with reset all the time - totally no connectivity. so with tcp scaling on my server we have packets going thru client's nat but big packets cannot go thru . on the other hand when I turn tcp window scaling to "on" i can't even connect (reset + syn all the time), but icmp goes thruough both in 1 and 2 case frankly i have no clue why O_o -- Wojciech Ziniewicz http://www.rfc-editor.org/rfc/rfc2324.txt |
| All times are GMT. The time now is 09:07 AM. |
VBulletin, Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.