Problem :
Until a few months ago, I had my Linux desktop serving double duty as a router for my home network, and all was well.
Then I set up a small Linux machine to act as a stand-alone router, and since then, I experience lots of dropped connections.
Errors like this are frequent (this one during an rsync session):
Write failed: Broken pipe
rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
rsync: connection unexpectedly closed (1576 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(601) [sender=3.0.7]
My IM sessions disconnect and reconnect frequently. SSH sessions drop, and sometimes web pages fail to load.
It’s always an active connection that gets dropped (as opposed to problems establishing new connections). And it’s most frequent when my Internet connection is busy–in terms of number of active connections, not in terms of bandwidth. For instance, running bittorrent makes the problem much worse, but downloading or uploading a single large file that consumes 100% of my bandwidth does not seem to trigger the problem. I can always reconnect immediately (although the new connection often gets dropped soon, too).
I have an 8mbit (ha! yeah right!) cable modem connection from Telecable (one of the big cable companies in Mexico). I would have assumed it was a problem with their service, except that I don’t have the problem when not using my router.
So it seems pretty apparent to me that I’m reaching some sort of “max connections” limit in my Linux router
I have experienced similar problems in the past, on very busy systems, and increasing the netconn_max (or the equivalent in older kernels) has always solved the problem. But this time that doesn’t seem to be the issue. This is immediately after having experienced a series of disconnections:
/proc/sys/net/ipv4/netfilter/ip_conntrack_max: 48324
/proc/sys/net/ipv4/netfilter/ip_conntrack_count: 75
For what it’s worth, the output of `iptables -L -t nat’:
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
DNAT tcp -- anywhere anywhere multiport dports 6881:6999 to:10.0.3.5
DNAT udp -- anywhere anywhere multiport dports 6881:6999 to:10.0.3.5
DNAT tcp -- anywhere anywhere tcp dpt:4380 to:10.0.3.12
DNAT udp -- anywhere anywhere udp dpt:4380 to:10.0.3.12
DNAT tcp -- anywhere anywhere tcp dpt:49181 to:10.0.3.12
DNAT udp -- anywhere anywhere udp dpt:49181 to:10.0.3.12
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- anywhere anywhere
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
What else do I need to check?
Edit:
Load average and memory usage, as requested:
total used free shared buffers cached
Mem: 755 747 8 0 154 504
-/+ buffers/cache: 88 667
Swap: 1903 0 1903
18:32:19 up 4 days, 19:53, 2 users, load average: 0.00, 0.00, 0.00
I also forgot to include the uname -a
output originally:
Linux reep 2.6.32-5-686 #1 SMP Wed Jan 12 04:01:41 UTC 2011 i686 GNU/Linux
Solution :
Interestingly, one of the things your ‘router’ does is masquerade / NAT (if I’m reading that right).
If your cable modem normally does that task, its ability to handle outbound connections may be affected by the fact that everything on the other side of your ‘router’ (in quotes because it does more than just route) has the same IP address while the router is turned on. In other words, if it has a per-host bucket for connections, or if its internal handling of connections simply can’t cope with a large number of connections from a single host, then the cable modem router could be overloaded.
One way to test this would be to configure your router to simply push packets to the cable modem router, instead of doing masquerade / NAT. This means configuring a default route and telling the kernel to forward packets (if I recall correctly). If I’m right, the problem should go away again.