Network troubles


As this story shows, the cause of a network problem is not always where you
first suspect…

So I just set up an ftp server on my home network for easy file transfer with some family members. Everything was working fine, except … occasionally, file transfers would just hang, for no apparent reasons. Logs did not say anything.

So I of course first thought that I made some error in the setup of the ftp server. The server is behind two NAT routers, and ftp is of course tricky with NAT due to the use of multiple associated connections. I did try during the setup to properly configure masquerading on the server and correct port forwarding in both NAT boxes (ports 20 and 21 and some range of ports for passive ftp connections), but clearly there are potential for errors here.

So I started by double-checking port configuration. It was fine. So I go read up on the details of the ftp protocol and the issues with NAT and firewalls here. But I still have no ideas. And while my initial guess was a configuration problem on my part, I start to wonder… I see a file transfer start, and then after some part of the file has transfered correctly, the transfer hangs. It is hard to imagine how a port misconfiguration could cause this. Failing to initiate the transfer yes, but hang in the middle no.

So finally I managed to find a way to reproduce the problem myself, and I obtained two tcpdumps (each 40Mb…) at each end of the connection. And saw something quite interesting. When the error occurs, the client receives a data packet with TCP checksum error. The packet is re-transmitted, but each retransmission arrives with TCP checksum errors. So this of course explains the hang. But why the checksum error?

It turns out the problem is with the particular data! So extracting the offending packet from the tcpdump, I now have a 1448 byte file which cannot pass through my network uncorrupted :-(:

00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 7d00 0000 cfbf 0c00 e1e3 8e00 403e  ..}...........@>
00000070: 7500 ad8b c500 ccaa 0000 8175 5600 d061  u..........uV..a
00000080: 3000 6ff0 2000 0565 ea00 707d 1b00 de3c  0.o. ..e..p}...<
00000090: d800 123a d800 4676 4000 8175 5600 d061  ...:..Fv@..uV..a
000000a0: 3000 6ff0 2000 0565 ea00 707d 1b00 de3c  0.o. ..e..p}...<
000000b0: d800 123a d800 4676 4000 a3db 0000 fe60  ...:..Fv@......`
000000c0: 0000 a3db 0000 fe60 0000 4000 0000 7eaf  .......`..@...~.
000000d0: 0000 4000 0000 7eaf 0000 7f1b 7f1b 7f1b  ..@...~.........
000000e0: 7f1b 0000 0000 0000 0000 2d70 2d70 0000  ..........-p-p..
000000f0: 0000 0000 0000 5b30 9370 52be 8a8a 0205  ......[0.pR.....
00000100: 3c30 16cd eebe 6f05 9e30 d9cd f0be f9b6  <0....o..0......
00000110: 7969 a769 f788 2fea 360c 0000 0070 6405  yi.i../.6....pd.
00000120: 8130 3505 6d30 7856 d1d7 ae8a 9120 a069  .05.m0xV..... .i
00000130: c688 03ea 4c81 0000 0000 9205 3930 3b05  ....L.......90;.
00000140: e230 3a04 4d81 be88 8e69 8a20 02b6 5856  .0:.M....i. ..XV
00000150: d80c 0000 0000 0000 0000 4d05 5b30 b256  ..........M.[0.V
00000160: c1d7 dd30 4170 0000 0000 2a04 2a04 0000  ...0Ap....*.*...
00000170: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000180: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000190: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001a0: 0000 ffff ffff 0000 6c00 0004 56cd 70cd  ........l...V.p.
000001b0: 05e7 5e40 7d05 055c 205e e700 e7ea 07be  ..^@}..\ ^......
000001c0: 56d8 0000 7d8a ea00 4000 0000 0000 0008  V...}...@.......
000001d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000001f0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
...

The byte 00 at offset 0000019f is transmitted as ff. Ouch! So I now have some particular bit pattern being reliably corrupted by my network connection.

So what was first assumed an ftp server configuration now turns out to be a nasty network issue :-(. There seems to be four possible sources of the problem:

  1. The server network card or driver.
  2. My own Linksys router.
  3. The Zyxel router provided by the ISP (Fullrate).
  4. The ISP network and switch infrastructure.

(The problem cannot be outside of these, as the problem occurs from multiple external IP providers, and also occurs when connecting from the ftp server to itself, but looped around the ISP network.)

So now the next step is to test each component in isolation to pinpoint the offender (unfortunately I cannot isolate the Zyxel router from the ISP without obtaining an ADSL simulator, which my guess is I could not afford easily…). But the others can be tested in isolation once I get the chance to get on-site and move around cables.

I put together a small Perl snippet to test this without the complexity of a full-blown ftp server getting in the way and confusing the ISP tech support. Running this on the server, I can just run on the client this:

    nc HOST 5376 > /dev/null

When run on a working network, this just downloads the magic 1448 bytes. But on the bad network, the command hangs waiting for a checksum-error-free retransmission that never comes.

To be continued!

#! /usr/bin/perl

use strict;
use warnings;

use Socket;

# Get the data with the problematic bitstream.
my $data;
open(IN, '<', 'trouble_data.raw')
    or die "Failed to read trouble data: $!";
{ local $/= undef;
  $data= <IN>;
}

my $listen_port= 5376;
my $proto = getprotobyname('tcp');

socket(SERVER, PF_INET, SOCK_STREAM, $proto)
    or die "socket() failed: $!\n";
setsockopt(SERVER, SOL_SOCKET, SO_REUSEADDR, pack("l", 1))
    or die "setsockopt() failed: $!";
bind(SERVER, sockaddr_in($listen_port, INADDR_ANY))
    or die "bind() failed: $!";
listen(SERVER,SOMAXCONN)
    or die "listen() failed: $!";

print "Server started successfully.\n";

my $paddr;
while ($paddr= accept(CLIENT,SERVER))
{
  my ($port, $iaddr)= sockaddr_in($paddr);
  my $name= gethostbyaddr($iaddr, AF_INET);

  print "Got connection from '$name' [", inet_ntoa($iaddr), "], sending data...\n";
  print CLIENT $data;
  print "Data sent, closing connection.\n";
  close CLIENT;
}

2 comments

  1. Interesting…

    If it would be af any help I could try to reproduce it on my fullrate connection. Anyways I’m looking forward to the continuation.

Leave a Reply to knielsen Cancel reply

Your email address will not be published. Required fields are marked *