Semi-Random Connection Loss

Posted on

Problem :

I’m having a perplexing network issues.

For context, I work at a radio station cluster–multiple stations in one location–and we use the Internet a great deal in delivering our audio content. We stream 3 radio feeds to our online feeds, we push two different feeds out to two different tower sites where the audio is broadcast over the air, receive two audio feeds (sometimes 3), and send one feed back to its source. All of this streaming is 24/7, so we use our Internet a bit more than the average guy. We never stop broadcasting–unless we lose connection.

We have been suffering from connection loss for some time, which is very problematic for a professional radio station. We’ve called the Internet service provider for answers and have come back empty handed from each attempt to have them look into the problem.

At first, I thought the issue was just packet loss. But then I noticed that the connection losses were only semi-random and that there was some sort of pattern. Each station is hooked up to a silent sensor, which sends out alerts if and when a station goes off air. These alerts can mean different things; but for us, the alerts have only signified an interruption in our Internet connection. To troubleshoot this issue, I am using information gathered from the two stations that receive audio from another location. The alerts are sent out when we stop receiving audio from the source.

First, the connection issues are not completely random because–for the most part–the connection interruption only occurs 2 minutes before the beginning of a new hour–12:58, 4:58, 1:58. I would say that the connection issues occur approximately 2 minutes before a new hour at least 90% of the time. But I would have to check to be certain. To me losing connection 2 minutes before an hour is strange enough, but there’s more.

The connection interruptions do not happen every hour or even during the same hour each day. The hours that the connections are interrupted vary each day. And even more strangely, one station may experience a network interruption 2 minutes before the end of an hour, while the other station does not experience an interruption. In fact, though each station loses connection 2 minutes before a new hour, I don’t think I’ve ever known a case where both station went down at the same time. Therefore, the connection issues not only occur during random hours throughout the day, but also occur during different hours for each station. The only common denominator is that the connection loss is occurring approximately 2 minutes before the end of “an” hour.

I’m not at the station right now, so I can’t provide the exact equipment that we are using, but the setup is fairly simple.

We have a modem that is connected to a Netgear Prosafe 24 port switcher. The switcher then feeds the individual rooms in the building. Generally, each room then has a small 4-8 port switcher (various brands). The audio processing devices that receive the audio are then connected to these smaller switchers.

I’m at a complete loss here. I’m even having trouble convincing Comcast that its not our fault. Right now, I’m thinking about disconnecting the 24 port switcher for the weekend and using only the four ports on the back of the modem to feed vital/essential equipment (I think I would have to keep at least one of the smaller switchers connected, though). I imagine then, Comcast would have to take the blame if the problem persisted because there wouldn’t be any intervening technology.

Any help would be a HUGE blessing! Why are the issues semi-random? Where do I start looking for the source of the problem? I’m a little suspicious of the modem; the issues started happening near the time a modem was swapped out–I think. But, ultimately, I’m lost… lost.. lost.

Solution :

Start with isolating the problem. I will logically break a network down into segments starting from the outside and working in for documentation/logic flow:

  • Internet (8.8.8.8 is google DNS server – never down)
  • One hop into your ISP network from your ISP connection device
  • Your Modem
  • Your Router/NAT device
  • Your internal network (192.168.x.x, 172.20.x.x, 10.x.x.x)

Understanding that breakdown, we start figuring out what we have…in reverse: From Inside to the out. So…

Using ipconfig Command

From internal device (PC) determine what your network looks like according to that device/PC
Start | Run | cmd Enter
ipconfig Enter

This gives you your IP/Subnet/Gateway (let’s hope you’re not on wireless, if you are disable for first layer troubleshooting).

Should looks something like:

Windows IP Configuration

Ethernet adapter Ethernet:

   Connection-specific DNS Suffix  . :
   Link-local IPv6 Address . . . . . : removed
   IPv4 Address. . . . . . . . . . . : 192.168.0.100
   Subnet Mask . . . . . . . . . . . : 255.255.255.0
   Default Gateway . . . . . . . . . : 192.168.0.1

Make sure you’re doing the Ethernet/Local Area Connection device, not anything else.
The device you’re at is the IPv4 Address: 192.168.0.100
Your NAT device/router is the Default Gateway: 192.168.0.1

Using ping Command

Now we start testing connectivity between network device and the NAT/Router device. In the command prompt we’ll be using the ping command type:

ping 192.168.0.100 -t 

or

ping -t 192.168.0.100

Basically what you’re doing is saying hello are you there to a device, and that device should respond back (until we get into the middle of the internet where things can go funky)

Good Responses:

Reply from 192.168.0.100: bytes=32 time<1ms TTL=64

Bad Responses:

Destination Host Unreachable

or

Request timed out

or anything else

The -t in this command means keep sending a packet of information every 1 second till you tell it to stop (Ctrl + c or close windows with the X). Without the -t it’ll just do 4 packets and stop.

Now that we know how to test a link we’re going to use that ping command on every link/connection in the network and see where we start having problems.

Using tracert Command

Last thing we need to do is make sure nothing else funky in the link between you and the internet (what’s called double NAT or two NAT devices), and determine what device is one step outside your ISP Modem.

in command prompt type:

tracert google.com<kbd>Enter</kbd>

you’ll get something like:

tracert google.com

Tracing route to google.com [74.125.21.138]
over a maximum of 30 hops:

1    <1 ms    <1 ms    <1 ms  router [192.168.0.1]
2     2 ms     1 ms     1 ms  device [10.1.10.1]
3     1 ms     1 ms     1 ms  blah.somename.whatever [123.123.123.123]
4     1 ms     1 ms     1 ms  124.124.124.124
5     *        *        *     Request timed out.

….and there will be more, use Ctrl + C to stop

What you care about is the IP address of the device between the [] for each line. Note: If the line after your Default Gateway IP from the ipconfig test above matches one of the 192.168.x.x, 172.20.x.x, 10.x.x.x patterns (private non-routable subnets) you have Double NAT, which can cause other weird problems, I won’t go into that here.

Last piece of info needed, the public IP of your network. Goto www.ipchicken.com. That number is your public IP.

Now with all this info, what do we test?

  1. Yourself (I’ll usually skip this unless the next one is giving problems): 192.168.0.100

  2. Your connection to your NAT Router: 192.168.0.1

  3. ipchicken number: 123.123.123.125

  4. The first hop outside the ISP Modem (your public gateway): 123.123.123.123

  5. Google’s DNS servers: 8.8.8.8

So, using the ping test described above have up to 5 command prompt windows open testing each hop with ping. Let me put those hops in again with what can be a problem between each device

ping 192.168.0.100

– if this is not 100% you have NIC problem, or broken IP stack and it needs rebuilding

ping 192.168.0.1

– if this is not 100% you have internal wiring problems between your PC and switch/router. Start following and replacing network cables/switches/router.
– if you had double NAT here, that will start being a problem with subsequent hops

ping 123.123.123.125

– Your ISP modem is having problems, have them test
– In network segmentation parlance we’re crossing the DMARC or demarkation between your local corporate network (your IT person’s problem) and the ISP network.

ping 123.123.123.123

– Your internet connection is having problems, ISP needs to login and check your internet connection. Your modem isn’t having good connectivity to the next set of ISP equipment, they need to troubleshoot.
– Cable ISP you need to check power (usually +-10) and SNR (Signal to Noise Ratio) and they should tell you what they call an acceptable range. If it’s not in range, ISP tech will need to be deployed.
– DSL you need to have them check the noise profile and it needs to be within their specs. Filter installation on all devices plugged into the phone line will be a possible issue here.

ping 8.8.8.8

This is out in the web somewhere, ISP’s will deny plausibility on it being them or not, looking further in the tracert chain can help you start seeing where problems are beginning to occur. The names will help you ID when network boundaries change if you were lucky enough to see it.

Welcome to the IT profession 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *