Network troubles, part 2

This is a followup to part 1 of the story, where I found that a hanging ftp transfer was caused by one of my network components not being able to transmit certain bit patterns.

After getting on-site, I had the chance to move around cables to test each component in isolation.

I was quite surprised to learn that the problem seems to be with my old Linksys WRT54GS router. This router has one WAN Ethernet port, four LAN Ethernet port that work as a 4-port switch, and a wireless WLAN “port”/antenna; and the box performs routing/NAT among the three parts.

When I use my test server to send my magic bytes between two hosts both connected to the 4-port LAN switch on the Linksys, there is no problem, nor between two hosts both connected to the WLAN. When I send the data in through the WLAN, and out through the LAN, there is also no problem.

But when I send the data in through the LAN and out through the WLAN, the transfer hangs due to repeated checksum errors on the data packet. Same if I send in through the LAN and out through the WAN. And same if I send in through the WAN and out through either LAN or WLAN.

So, in summary, the problem happens whenever the data is routed inside the Linksys from any Ethernet port to any other port. But only when it is routed, not when it is merely switched among the 4 LAN ports. And only when going into an Ethernet port.

I of course tried re-cycling the power on the Linksys, switching cables, and switching among the LAN ports. No difference. So I am forced to put the blame on the little blue Linksys box, which is a pity as it has served me well for more than 3 years and I had gotten rather fond of it.

So problem solved! Replacing the Linksys, I should be up and running without problems again. Still, I am left wondering about two questions:

  1. This is not the kind of problem I would expect to turn up after 3 years of flawless service; on the other hand I also find it hard to believe that I would have not discovered this problem for 3 years if it was there from the start.
  2. I was very surprised to find this problem connected with the routing. I would have expected some hardware problem, like a bad cable or marginal Ethernet port signal processing. But the Ethernet connection works in switch mode and not in router mode. This sounds like a software problem with the driver or IP stack inside the Linksys. A bug that depend on a certain bit string pattern I would not have thought a software problem (Ie. a memcpy() but that depends on the bit pattern copied sounds quite unlikely).

Of well. It is probably some weird hardware problem in the interface between main board and Ethernet device, and switching happens entirely inside the device so not affected by the problem. Or something like that.

Leave a comment

Your email address will not be published. Required fields are marked *