This week Steve and I ran into a very curious problem with our network. Before I jump into what went wrong, I should give you the layout of our current network. We have Frontier FiOS (used to be owned by Verizon) and we’re paying for symmetric 75Mbps upload and download. We’ve had fantastic service for years with only minor interruptions in service.
If you’ve been following along for years, you may remember back in 2013 when Bart taught how to essentially bypass ISP-supplied FiOS routers and instead use a router of our own choosing. I’ve put a link in the show notes to the full tutorial, but the main idea is to configure the FiOS router as a NAT router with a DMZ address on the LAN side. Then your real router is configured with its WAN side address to that DMZ address from the FiOS router. From there, your real router gets to dish out DHCP addresses and do all management of the network. Essentially the FiOS router becomes simply a pass-thru device.
Back when he taught us, I had an Airport Extreme. In February of 2017 I added a Netgear R8500 trim-band router to the mix. Bart had explained the dangers of having janky IoT devices on our networks, so I created two internal networks, one with the Airport Extreme for the IoT devices of suspicious origin and any Windows machines that wandered into our house, and the Netgear for our Macs and any HomeKit-enabled IoT devices. And of course this separated network design was all documented in a blog post as well.
Because I’m a genius, a year and a half later it occurred to me that the Netgear router has a guest network, which I could use instead of this overly complex setup with two routers. Over the last few months, Steve and I have been hunting down every device that was on the Airport Extreme and moving them to the Netgear’s guest network.
You would think this would be an easy task. Both routers have nice tables that clearly show you what devices are on which network. Unfortunately, many devices don’t report exactly who they are. They have helpful information like “Texas Instruments” as their name. TI makes a lot of chips, which one are you?
One of my solutions was to just unplug the Airport Extreme every once in a while and see what failed. That turned into a big mess when Steve’s weather station got knocked offline but that’s a whole story in and of itself.
You might also remember that I received a pair of Netgear Orbi mesh routers, and I had set them up to extend the network, but that seemed to confuse some of our devices. I’m not entirely sure I did that whole bridge network thing correctly, so I shut them off for now.
I’m not sure we found all of the devices on the Airport Extreme yet but for the most part everything is running on the Netgear now. Since most of the stuff is IoT, I only made a 2.4GHz guest network for those devices. I’m happy to say that now when you look at the available WiFi networks when you’re at our house, only 3 of them are mine, instead of 7. That’s progress, right?
Ok, that’s all background, now to the interesting problem. I might have mentioned a few hundred times that we’re doing a massive amount of construction in our house. We have gutted three bathrooms, the entry way, the laundry room, and two fireplaces, and then decided since it was anarchy anyway, why not have the interior of the house repainted. It’s been 30 years, so it seemed like time.
We’ve had to unplug and move from room to room as the painters chase us around. I told them I feel like a MASH unit now. I’m not sure any of this contributed to the problem but suddenly we had really really slow Internet at our house. I’m talking sub 1Mbps on some tests. On a good test we might see 20-30Mbps but often it was in single digits when we went to speedtest.net. Again, we’re paying for 75/75 and normally get far in excess of those speeds.
I called Frontier and the tech support guy Daniel asked me the usual questions. After quite a while on the phone I was finally able to ask him if he could check to see if there was a problem in my area. He said, “Oh. I guess I could check for that.” Sigh. Anyway, he said that yes there was a problem in my area but that it was “only” affecting a few hundred homes.
That was good enough for me. As soon as I know it’s not just my house, I don’t worry so much because I know they’ll get to it faster than if it’s just me. It was still sad, we couldn’t watch streaming TV so we were forced to read books. Good thing our Kindles had them downloaded.
The next day Steve noticed something interesting. He ran a test and got 100+Mbps down and up, but he was on wired Ethernet. He unplugged and the numbers went back into the pooper. He plugged back in and they went right back up to faster than what we’re paying for. I replicated his tests using my Mac on wired Ethernet and sure enough, the speeds were great. That meant the problem was internal to our network. I didn’t call Daniel back to confess.
Now it was time to try to figure this out. My first thought was to restart the router, but for some reason the web interface to the Netgear, which clearly has a reboot button, wasn’t responding when I pushed it. I decided to just unplug the router, wait 10 seconds and plug it back in. Didn’t fix the problem.
After the hard shutdown, I was able to use the reboot button on the web interface, but again it didn’t fix anything. I checked for firmware updates, but there were none. By the way, I have a monthly reminder to do that. Netgear is pretty good about sending emails when critical updates are released, but it’s good to keep on top of it anyway.
I started running some ping times to www.apple.com and they were awful, and that’s when the packets got through at all. Even when I’m doing an audio call to Bart way over in Ireland, we get ping times under 400ms, but pinging Apple’s local servers, I was getting thousands of ms ping times and often many completely dropped connections.
For grins and giggles, I tried pinging my router while on WiFi, and one ping took 10,000ms. åThat’s inside my house! Well that just ain’t right!
It was time to ask for help. I posted in our Slack, in the Mac Geek Gab Facebook group and in the Mac Geek Gab Forums where lots of nerdlets hang out.
I posted my question as, “Does WiFi ever just go bad on a router?” I got many helpful answers. Quite a few people answered interestingly with the answer of “yes” and even adding that Netgear seems to be susceptible to this more than others.
A lot of people suggested it was an interference problem because of congestion on the channel I was using. My neighborhood seems pretty low end in the WiFi competition category though. I got the best information from @datafornothinandbitsforfree in the Mac Geek Gab forums. His real name is Bob, but I’m going to call him datafornothin for short!
In any case, datafornothin also suggested the interference problem but gave me very specific diagnostic help. He suggested I hold down the option key and tap on the WiFi signal in the menubar on my Mac. When you do this, the network you’re using will have expanded information. Amongst other things, you’ll see RSSI and Noise. I’d seen this before but I didn’t have an explanation of how to interpret the numbers.
I looked up RSSI and according to Wikipedia, it stands for Received Signal Strength Indication. This is a measurement of the power present in a received radio signal.
datafornothin explained that RSSI is the signal strength and of course noise is noise. But he went on to explain:
RSSI in the -30’s is fantastic, and anything approaching the -90’s is lousy. Noise in the -90’s is great, and as it drops closer to the -30’s means it is really bad for you if this is your channel.
Oddly enough, my RSSI signal was -34dBm and my noise was -92dBm. I was well within the parameters that datafornothin had outlined.
Later in Facebook, John F Braun suggested I spend some quality time with iStumbler. This used to be a free app that I used from time to time, but now it’s $14. I bet if I’d never gotten it for free I would have gladly shelled out $14 but that seemed steep after free for so long.
Then datafornothin suggested I use the built in tool Wireless Diagnostics. I don’t think I even realized Macs came with this tool. Shows you how aware I am, I found a video from 2012 talking about it. This app is buried in the top level Library/CoreServices/Applications folder. Rather than digging for it that way, you can also find it by again holding down the option key and clicking on the WiFi symbol in the menu bar. Right near the top you’ll find “Open Wireless Diagnostics”. For reference on more about this tool, check out Apple Support article HT202663 at support.apple.com/…
Wireless Diagnostics is packed with different tools. When you first launch it, it wants to run Wireless Diagnostics. I know that sounds redundant but it’s only one tool amongst many. If you hit continue it will analyze your network. I didn’t learn anything interesting from it though.
I took a look at the Scan window and this was much more interesting. In the main window you can see all of the networks it found. It lists the name, SSID, security level, protocol, RSSI & noise, channel being used, which band in GHz, and width in MHz, and oddly the country.
I was able to look at this data and see that there are three devices broadcasting on the 5GHz band, one of which is mine, one is a printer of all things, and one is named Roy. I could see that Roy’s signal was a paltry -87dBm while mine was a solid -33dBm. Again this supports the idea that my signal strength is just fine.
datafornothin suggested that I open Performance test from within Wireless Diagnostics. This is accessed by going to Window and choosing Performance. This is super cool. It’s a continuously updating set of graphs all in one window. The top graph is called Rate and if you hover with your mouse over the graph it explains that it’s showing you the transfer rate over time in Mbps.
There’s more graphs coming to tell you about but let’s pause and talk about what this first graph for Rate told me. I watched this data being collected and graphed real time and it was all over the map. My rate was banging from 40 to 120Mbps over time. I wasn’t sure what it should look like though.
I pinged my friend Pat Dengler and asked her if she could run the test. She was at a business and she got a perfectly flat line on her graph showing 300Mbps. No variation in her graph while mine is moving by 200%. I plugged into wired Ethernet and suddenly my graph was a nice, non-varying line at over 525Mbps. That seems reasonable on a gigabit network.
Explaining how to make a ratio of signal to noise
I mentioned that the Performance window has 3 graphs. The top is the rate as we’ve discussed. The bottom graph has both the RSSI and Noise plotted on the same graph. These are graphed in units of dBm. The middle graph is the Quality, and if you hover over it, it tells you it’s showing you the ratio of the signal level to the noise level over time. But how do you take the ratio of two values measured in dBm? This required a lesson from the electrical engineer I keep on staff for just this kind of situation.
First of all dBm is short for decibel-milliwatts. Let’s break that down. Decibels, or dB is a unit that was designed to help you compare numbers that are vastly different from each other. Let’s say you have two numbers you’re plotting and one is millions of dollars while the other is hundreds of dollars. You couldn’t plot them together and be able to see the second set of data at all. Decibels use a logarithmic scales which will allow you to see both values on the same graph.
Now logarithmic values are a funny thing. Normally you divide two numbers to get their ratio. But to calculate the ratio of two logarithmic values, you actually subtract their values and their units (in our example, milliwatts) cancel. So 10dBm divided by 2dBm, is actually 8dB.
In our problem with signal strength, we’re trying to measure the power levels of the signal and noise. That means we want to compare it to a fixed reference, in this case one milliwatt or 1/1000 of a Watt. decibel-milliwatts, or dBm. And if we want the signal-to-noise ratio, we subtract the two values and discard the units. In our example, our signal was –30dBm, and noise was –90dBm, so we would have a signal-to-noise ratio of 60dB. Apple plots this on a graph called Quality.
Back to our experiments
Now after that little lesson, let’s get back to trying to figure out what’s making my rate be so wonky on WiFi. datafornothin suggested that it was time to bring in one of my spare routers. He was trying to eliminate the possibility that there was some new interference happening in our house. If the Airport Extreme also gave us poor performance on WiFi, then that would eliminate the possibility that something had gone wrong with the Netgear.
I should mention that by this time in our story, our speeds on WiFi were just fine. But that didn’t keep us from running the experiments to see if we could figure out whether something odd was going on with the Netgear router.
I dusted off the Airport Extreme, reset it to factory settings, and set it up in bridge mode attached to the Netgear. datafornothin had suggested I run these tests on two different Macs at the same time. I brought my MacBook Pro into Steve’s Den where his iMac lives so we could look at the graphs at the same time. At first my MacBook Pro got a beautiful flatline graph while Steve’s was bouncing around with those curious square waves. I moved my MacBook Pro from the bed to the desk right next to the iMac and then I got the bouncing up and down speeds too.
The theory was that if the wonky behavior went away when we used the Airport Extreme, then we could say definitively that the Netgear’s WiFi was failing and I could go shopping for a new router. But now that the Airport Extreme has exhibited the same behavior I don’t have an excuse to buy a new shiny. Instead, we have to take a look at interference.
What about interference?
I also ran some tests changing the channel of the Netgear but I managed to pretty much lose the signal entirely when I did that. I then started shutting stuff off devices in the cabinet with the Netgear. One at a time I shut off the Mac mini, then the Drobo 5N2, then the Drobo 5N, the monitor and finally the printer. Nothing changed the Performance graph.
Maybe it’s something in my studio though, so I unplugged power to my dock which shut everything off, but still no joy. I even shut down Steve’s iMac in his den and pulled power on his dock, but nothing fixed the problem.
Bottom Line
I bet you were thinking this story had an end. That I would delightfully tie it up in a bow and be able to tell you the answer. Well, sorry to disappoint, this is a story that has no ending, at least not yet. I hope that you’ve learned a bit about fault isolation, and about the cool Performance testing you can do with the built-in Wireless Diagnostics app on your Mac.
Maybe there’s nothing wrong. Maybe it was some sort of neutrino attack and these square waves are normal for my house. Maybe my contractors used paint with tiny little wire mesh embedded in it. Maybe Steve and my teaching assistant from our physics lab in college is hiding in my house. I say that because once we were doing an experiment with traveling wave tubes (aka TWTs) and we had an oscilloscope set up measuring the wave patterns coming through them we had a very weird problem. We’d just barely get a good sinusoid on the oscilloscope and suddenly the display would go completely wonky. We couldn’t figure it out. Finally we noticed that our teaching assistant (who was frequently stoned) was sticking the metal tip of a yardstick into the other end of the TWT. He thought it was hilarious.
So, yeah, I may never know the answer to this one but I’d love to hear your thoughts.
I wonder if your problems are not with WiFi per se, but rather at the IP layer. With all the changing of routers and guest networks, and plugging and unplugging, is it possible you have a rogue DHCP server somewhere? Maybe you are sometimes getting duplicate IP addresses for your WiFi network devices. In such a case, you may expect problems that come and go without explanation.
Windows used to warn immediately if there was a duplicate IP. I don’t know if the Mac does that. I don’t remember ever seeing it.
It’s also worth checking the default gateways and netmasks on both WiFi and Ethernet.
What happens if you do a traceroute while experiencing slowness? Anything weird?
Is it possible you have two WiFi routers trying to serve the same SSID/password simultaneously?
-Jamie @LinkLayer