Inverse multiplexing to speed up file transfer

Posted on

Problem :

I have send a large amount of data from one machine to another. If I send with rsync (or any other method), it will go at a steady 320kb/sec. If I initiate two or three transfers at once, each will go at 320, and if I do four at once, they will max out the link.

I need to be able to send data as fast as possible, so I need a tool that can do inverse multiplexing with file transfers. I need a general solution, so running split on the source machine and catting them together at the other end is not practical. I need this to work in an automated fashion.

Is there a tool that does this, or do I need to make my own? The sender is CentOS, receiver is FreeBSD.

Solution :

Proof it all adds up – I present the ‘holy grail’ of remote mirror commands. Thanks to davr for the lftp suggestion.

lftp -c "mirror --use-pget-n=10 --verbose sftp://username:password@server.com/directory" 

The above will recursively mirror a remote directory, breaking each file into 10 threads as it transfers!

There are a couple tools that might work.

  • LFTP – supports FTP, HTTP, and SFTP. Supports using multiple connections to download a single file. Assuming you want to transfer a file from remoteServer to localServer, install LFTP on localServer, and run:

    lftp -e 'pget -n 4 sftp://userName@remoteServer.com/some/dir/file.ext'

    The ‘-n 4’ is how many connections to use in parallel.

  • Then there are the many ‘download accelerator’ tools, but they generally only support HTTP or FTP, which you might not want to have to set up on the remote server. Some examples are Axel, aria2, and ProZilla

If you have few and large files use lftp -e 'mirror --parallel=2 --use-pget-n=10 <remote_dir> <local_dir>' <ftp_server>: you willll download 2 files with each file split in 10 segments with a total of 20 ftp connections to <ftp_server>;

If you have a large amount of small files, then use lftp -e 'mirror --parallel=100 <remote_dir> <local_dir>' <ftp_server>: you’ll download 100 files in parallel without segmentation, then. A total of 100 connections will be open. This may exaust the available clients on the server, or can get you banned on some servers.

You can use --continue to resume the job 🙂 and the -R option to upload instead of download (then switching argument order to <local_dir> <remote_dir>).

You may be able to tweak your TCP settings to avoid this problem, depending on what’s causing the 320KB/s per connection limit. My guess is that it is not explicit per-connection rate limiting by the ISP. There are two likely culprits for the throttling:

  1. Some link between the two machines is saturated and dropping packets.
  2. The TCP windows are saturated because the bandwidth delay product is too large.

In the first case each TCP connection would, effectively, compete equally in standard TCP congestion control. You could also improve this by changing congesting control algorithms or by reducing the amount of backoff.

In the second case you aren’t limited by packet loss. Adding extra connections is a crude way of expanding the total window size. If you can manually increase the window sizes the problem will go away. (This might require TCP window scaling if the connection latency is sufficiently high.)

You can tell approximately how large the window needs to be by multiplying the round trip “ping” time by the total speed of the connection. 1280KB/s needs 1280 (1311 for 1024 = 1K) bytes per millisecond of round trip. A 64K buffer will be maxed out at about 50 ms latency, which is fairly typical. A 16K buffer would then saturate around 320KB/s.

Leave a Reply

Your email address will not be published.