How FTP Data Connections Work Part 1 (OR: Don’t Open Port 20 in your Firewall!)
This will be the first of a couple of articles on FTP, as I’ve been asked to post this information in an easy-to-read format in a public place where it can be referred to. I think my expertise in developing and supporting WFTPD and WFTPD Pro allow me to be reliable on this topic. Oh, that and the fact that I’ve contributed to a number of RFCs on the subject.
Enough TCP to be dangerous
First, a quick refresher on TCP – every TCP connection can be thought of as being associated with a “socket” at each device along the way – from one computer, through routers, to the other computer. The socket is identified by five individual items – the local IP address, the local port, the remote IP address, the remote port, and the protocol (in this case, the protocol is TCP).
Firewalls are essentially a special kind of router, with rules not only for how to forward data, but also rules on connection requests to drop or allow. Once a connection request is allowed, the entire flow of traffic associated with that connection request is allowed, also – any traffic flow not associated with a previously allowed connection request is discarded.
When you set up a firewall to allow access to a server, you have to consider the first segment – the “SYN”, or connection request from the TCP client to the TCP server. The rule can refer to any data that would identify the socket to be created, such as “allow any connection request where the source IP address is 10.1.1.something, and the destination port is 54321”.
Typically, an external-facing firewall will allow all outbound connections, and have rules only for inbound connections. As a result, firewall administrators are used to saying things like “to enable access to the web server, simply open port 80”, whereas what they truly mean is to add a rule that applies to incoming TCP connection requests whose source address and source port could be anything, but whose destination port is 80, and whose destination address is that of the web server.” This is usually written in some short hand, such as “allow tcp 0.0.0.0:0 10.1.2.3:80”, where “0.0.0.0” stands for “any address” and “:0” stands for “any port”.
Firewall rules for FTP
For an FTP server, firewall rules are known to be a little trickier than for most other servers.
Sure, you can set up the rule “allow tcp 0.0.0.0:0 10.1.2.3:21”, because the default port for the control connection of FTP is 21. That only allows the control connection, though.
What other connections are there?
In the default transfer mode of “Stream”, every file transfer gets its own data connection. Of course, it’d be lovely if this data connection was made on port 21 as well, but that’s not the way the protocol was built. Instead, Stream mode data connections are opened either as “Active” or “Passive” connections.
Active and Passive Data Connections
The terms "Active" and "Passive" refer to how the FTP server connects. The choice of connection method is initiated by the client, although the server can choose to refuse whatever the client asked for, at which point the client should fail over to using the other method.
In the Active method, the FTP server connects to the client (the server is the “active” participant, the client just lies back and thinks of England), on a random port chosen by the client. Obviously, that will work if the client's firewall is configured to allow the connection to that port, and doesn't depend on the firewall at the server to do anything but allow connections outbound. The Active method is chosen by the client sending a “PORT” command, containing the IP address and port to which the server should connect.
In the Passive method, the FTP client connects to the server (the server is now the “passive” participant), on a random port chosen by the server. This requires the server's firewall to allow the incoming connection, and depends on the client's firewall only to allow outbound connections. The Passive method is chosen by the client sending a “PASV” command, to which the server responds with a message containing the IP address and port at the server that the client should connect to.
The ALG comes to the rescue!
So in theory, your firewall now needs to know what ports are going to be requested by the PORT and PASV commands. For some situations, this is true, and you need to consider this – we’ll talk about that in part 2. For now, let’s assume everything is “normal”, and talk about how the firewall helps the FTP user or administrator.
If you use port 21 for your FTP server, and the firewall is able to read the control connection, just about every firewall in existence will recognise the PORT and PASV commands, and open up the appropriate holes. This is because those firewalls have an Application Level Gateway, or ALG, which monitors port 21 traffic for FTP commands, and opens up the appropriate holes in the firewall. We’ve discussed the FTP ALG in the Windows Vista firewall before.
So why port 20?
Where does port 20 come in? A rather simplistic view is that administrators read the “Services” file, and see the line that tells them that port 20 is “ftp-data”. They assume that this means that opening port 20 as a destination port on the firewall will allow FTP data connections to flow. By the “elephant repellant” theory, this is proved “true” when their firewalls allow FTP data connections after they open ports 21 and 20. Nobody bothers to check that it also works if they only open port 21, because of the ALG.
OK, so if port 20 isn’t needed, why is it associated with “ftp-data”? For that, you’ll have to remember what I said early on in the article – that every socket has five values associated with it – two addresses, two ports, and a protocol. When the data connection is made from the server to the client (remember, that’s an Active data connection, in response to a PORT command), the source port at the server is port 20. It’s totally that simple, and since nobody makes firewall rules that look at source port values, it’s relatively unimportant. That “ftp-data” in the Services file is simply so that the output from “netstat” has a meaningful service name instead of “:20” as a source port.
Coming up in part 2…
Next time, we’ll expand on this topic, to go into the inability of the ALG to process encrypted FTP control traffic, and the resultant issues and solutions that face encrypted FTP.