DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

[Why is this under "Programmer Hubris"? Because it's about developers who find "an easy fix" and apply it, without trying to figure out why it made things appear to work better.]

I like to read Larry Osterman and Raymond Chen's blogs, because they've seen most things, and learned most of the lessons right.  Today, Larry posted on network optimisation - reminding me once again:

Trying to fix network speed problems by enabling TCP_NODELAY is almost always wrong.  Setting TCP_NODELAY disables the Nagle algorithm

When you do this, it's an indication that either your program is broken, or your protocol is broken, or quite possibly both.

"But what about FTP?" people say.

Yes, the protocol is broken.  It requires a send / send / recv exchange, and you have to enable TCP_NODELAY (disabling the Nagle algorithm in the process) to make it work properly, so that you can have more than five files transfer per second.

So what does Nagle do, exactly? Well, John Nagle spends his time pushing dead bodies down staircases.  The Nagle algorithm, named after John despite his humble protestations, does a couple of simple things every time you try to send data:

If there is unacknowledged data in the queue, then don't send until:
  • all data is acknowledged, or
  • you have a segment's worth to send.
  • Uh... that's it.

The idea is to take a program that, rather stupidly, sends one character at a time, resulting in a 1:40 data:framing ratio, and turn it into one that sends several characters at a time, using the network bandwidth more efficiently and not becoming a network hog.

Way back when, this was a perfect idea, and could have remained perfect, if it weren't for the delayed ACK algorithm, whose author is not remembered so fondly as to name the algorithm after him.  Delayed Ack states that you should not send an unaccompanied ack until either:

  • you receive two segments
  • 200ms expires since the first piece of data was received.
[An ACK is included in every TCP segment, so it's not an overhead of any kind except when you're sending nothing but an ACK.]

In most environments, this is still good, because most protocols are "client sends command, server sends response" over and over again, so each side is doing one send, then one recv. Model this behaviour in your head, and you'll see that the Nagle algorithm won't stop any outgoing traffic, and nor will delayed ACK hold up any acknowledgements.

If one side hiccups and calls send() twice (on short data) before receiving, however, things come to (in networking terms) a screeching halt. The second send queues up because the Nagle algorithm doesn't get an acknowledgement from the first send until the delayed ACK algorithm has exhausted its timer.

The answer is always "don't send / send / recv".  Always group logically-associated data together in one call to send, unless you're sending large amounts of data, in which case you can happily send / send / send until you've exhausted your data.

Setting TCP_NODELAY will look like it makes your program perform at top speed, but it's now become the network hog that you programmed it to be, and that Nagle was helpfully preventing you from being.  Fixing the program will make your program perform faster than it would with TCP_NODELAY alone, and you will find that the TCP_NODELAY setting will have no further effect, on or off.  Your program is now working smoothly, a good network citizen, and the Nagle net-cop allows it to go about its business unimpeded.

So, my final piece of advice - if TCP_NODELAY looks like it makes your program perform better, fix your damn program! There's too much crappy networking software out there already, and you don't want to add to it.

Published Mon, May 8 2006 17:01 by Alun Jones
Filed under: ,

Comments

# re: DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

Finally. This makes a lot of sense. Your description of what happens really clear things up. And with this new understanding of what Nagle does (and why it should not be disabled), I will now be able to fix my damn program.

Tuesday, May 09, 2006 1:05 AM by Steven Don

# re: DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

Every so often, I'll explain the Nagle algorithm, and someone will post back and say "that doesn't explain this or that behaviour". Almost always, I'll simply post back a demonstration of how their behaviour arises immediately out of Nagle and the delayed Ack algorithm. Most people think that Nagle's algorithm "must" be more complicated than that.
I think I may even have made it more complicated than it needs to be, and will revisit it later.

Tuesday, May 09, 2006 1:48 PM by Alun Jones

# re: DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

Alun,

"In most environments, this is still good, because most protocols are "client sends command, server sends response" over and over again, so each side is doing one send, then one recv."

Consider a situation in which you have an app that requires very low response time (very low latency of response). For instance a multiplayer game over TCP (quite a few out there that don't use UDP, due to various constraints).

Such a game might well want to go Send-Send-Send-Receive.

So wouldn't you say there *are* exceptions to be made here? I do believe your argument makes sense in the main, but

"There's too much crappy networking software out there already, and you don't want to add to it."

sounds a little bit harsh in the rare case mentioned above :)

If you see things differently please enlighten me :)

Tuesday, May 09, 2006 5:13 PM by nick

# re: DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

The simple response: TCP is _not_ low latency.
Consider this - what happens if you hit an outage? TCP will keep trying and keep trying and keep trying. That's not going to fit with your low latency requirement.
An app that needs low latency needs to not be using TCP, and needs to be designed around the idea that occasionally the network will not be available for seconds at a time. TCP's behaviour, of trying until the acknowledgements come back, is not appropriate to such an app.
The answer is not to take TCP and subtract the inconvenient stuff, it is to take UDP and add a very limited reliability layer, that can appropriately account for the low latency requirement.
And you really need a good network protocol design for such a system - there are other issues you'll run into with TCP (my favourite - if packet 1 is unreceived, and packet 2 is queued, you will often hit a requirement to send a packet 3 with contrary information to packets 1 and/or 2 - TCP doesn't allow you to "unsend" these packets, so your protocol design based on UDP should allow you to remove queued packets that have not been sent or acknowledged.
TCP satisfies a very strict set of requirements - a stream where every byte from start to finish _must_ get through, or the stream is unfinished, for instance. If you're outside of that set of requirements, for instance, if you might want to remove unsent-but-queued data, TCP is not right for you.
You can add to IP, you can add to UDP, you can add to TCP. You should not design around subtracting from TCP.

Tuesday, May 09, 2006 5:35 PM by Alun Jones

# re: DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

Right, agreed. It makes very little sense to subtract from TCP in any generic situation.

The problem that I'm facing here is that I'm working with existing protocols (proprietary) which work over TCP, and demand minimum latency. My latency to the application servers in question is very low (10 msec or so), and close to zero packet loss, so typical TCP issues that cause latency (wait-for-ack) aren't really an issue here.

These app servers have low throughput ... low bandwidth requirements....but since I didnt design the protocol, nor do I really implement it (I handle the load balancing / relaying), its an interesting challenge to minimize the latency without actually changing the protocol semantics in any fashion.

Fortunately for me bandwidth isn't an issue, so I *can* chew up some bandwidth if I can provide better latency as a result of that.

I believe I should take your advice on this one though and not *subtract* from TCP. I do have some ideas on what I can *add* to TCP, but any advice from your end is also welcome.

An interesting thing to think about too is the situations you'll find yourself in when doing TCP over UDP or UDP over TCP.

Thanks!

Wednesday, May 10, 2006 4:43 AM by nick

# re: DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

yeah, who in their right mind would disable Nagle?

http://support.microsoft.com/default.aspx?scid=kb;EN-US;270926

remember how NT SP6 broke lots of things? As I recall, it was SP6 that disabled nagling, and SP6a reinstated it - as far as I know it's still a feature of all Windows OS. With NT, swapping different versions of NetBT.sys would instantly improve throughput for applications such as MSAccess. Dig out your old disks and give it a whirl!

Thursday, May 11, 2006 5:04 PM by petal

# I wish Larry hadn't written that...

Oh, Larry, Larry, Larry...
Articles 1 and 2 were great - really necessary reading to a lot of would-be...

Monday, June 26, 2006 11:13 PM by Tales from the Crypto

# re: DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

It generally does not make sense to substract from TCP, but say you decided to use UDP and add some reliability layer, how is this very different from disabling Nagle:

1. The number of IP packets on the network would be roughly the same. Agreed the UDP packets will be slightly smaller (28bytes IP + UDP header instead of the 40 bytes TCP + IP header).

2. What about the reliability and sequencing that you get from TCP out of the box? Relieves you from writing reliability layer for UDP IMHO.

So I would prefer to disable Nagle if that makes sense in your specific case and not indulge in any unnecessary research based projects like implementing your own reliability layer over UDP and busting your project's deliverable date. And in these days of gigabit switches, increasing a few packets on the network is no big deal.

I believe in approaching problems in a pragmatic fashion.

Monday, November 20, 2006 1:57 PM by BK

# re: DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

If you need reliability, what damage does Nagle cause you?

It delays your packets in extreme circumstances, but then so does reliability in general. If you add reliability, you are implicitly declaring that you don't mind seeing your packets delayed.

If you need your packets to get there as fast as possible, and can't afford any delay, you have to lose packets.

Pragmatism, if that is your watch-word, tells you that reliability and the Nagle algorithm are not mutually exclusive by any means.

So, reliability => TCP => Nagle, and speed => UDP => packet loss.

Tuesday, November 21, 2006 7:37 PM by Alun Jones

# re: DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

Actually, TCP_NODELAY causes sub-optimal performance even for more popular protocols such as HTTP. A HTTP request will typically be smaller than a complete segment, so if you don't disable it, the response will be delayed. The nice thing is that you can enable and disable it any time for each single connection. Also the telnet case is a rather extreme one. Most protocols don't send tiny pieces of data, so setting TCP_NODELAY will rarely have extremely bad consequences. Also UDP simply isn't possible in each and every case. Consider that most applications will want to over tunneling through a proxy at least as fallback. UDP has a lot disadvantages anyway, thus scrapping TCP altogether just because one property is sub-optimal is probably a not good idea as you might easily end up with something that is far worse especially if you want to use that in the heterogen real-world i.e., everybody can come up with protocols that perform great for him... but suck everywhere else.

I think the problem you describe is not so much disabling Nagle but as you also wrote the multiple sends/writes. That alone will typically perform quite badly due the whole system call and processing overhead. So if you simply avoid writing out single bytes by coalescing data into a buffer of a few kilobyte or use of gather/scatter IO, you don't really need Nagle, it'll perform even better due to less system calls and you have more control over what is send ASAP and what may be queued up. Of course that needs some planning ahead and isn't as trivial and switching something on or off. That much I agree.

Tuesday, September 04, 2007 6:30 PM by Chris

# re: DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

HTTP need not be impacted by the Nagle interaction with delayed ACKs.

Granted, the request is often shorter than a complete segment - though it can be longer. As a result, the request will at least end with one incomplete segment - but then, there will be no other incomplete segments buffered, so the request will complete without delay.

The response will carry the ACK with it, so there's no delay on the ACK, and it too should be sent as a number of full segments followed by a small one, which will not be delayed by Nagle.

HTTP is a fairly good example of a request/response cycle that should _not_ be affected adversely by Nagle or Delayed ACK.

Now, if you want to write your HTTP server so that it sends the response header in a separate send() call from the response data, you might get poor performance on short responses, because that then becomes "read, write-small, write-small, read", which is the classic bad case for Nagle / Delayed ACK.

Write your HTTP server better :)

Tuesday, September 04, 2007 10:23 PM by Alun Jones

# re: DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

ok here is the thing guys

for most everyday situations you guys are right

but!!!!!

for any mmpog players out there know with nagle and tcp ack/delay diasabled games run faster.. that depend on literly half sec loading time of skills you can not wait for more packet,s to accumlate befor sending information and that bandwidth hogging consant sendign and recviening of packets befor checking them is needed to have that split second reaction....

for anything other then split second reaction times on games then no its not something ppl should just use to think it will speed up there net..   for bowsing and some otherthings and can make it slower and allow bad packets to be sent

Mmpog  major fix for reaction time with server

other then that leave it alone

re: Still not a good argument

Sorry to tell you this, but the only thing you've demonstrated is that many MMORPGs are written poorly as far as network performance is concerned.

Once again, remember that you have a choice - data is either time-sensitive, in which case it can be discarded, lost, ignored and updated later, or it is sequence-sensitive, in which case the data must get through even if it takes longer.

The data you are talking about is clearly time-sensitive, not sequence-sensitive. As such, it should be communicated using UDP, not TCP. And UDP has no Nagle or delayed-ACK interactions.

You have simply made an argument that MMORPG writers should employ experienced and skilled network developers, rather than the hacks they currently employ.

Saturday, October 24, 2009 09:27 AM by Alun Jones

Saturday, October 24, 2009 10:54 AM by Brad

# re: DELAY or NODELAY - Riffing on Larry, who's riffing on Raymond...

How about when sending chunks of data larger than the maximum segment size.  If the number of segments the data needs to be broken into is even, Nagle causes decreased throughput.  This is a common case on embedded devices, where the MSS may be 256 bytes or less.

re: Sending chunks of data

Yes, that's why you don't do that. You are apparently engaging in one of two things:

  • Sending a bulk data stream and splitting it badly.
  • Expecting an acknowledgement from a protocol that isn't sending an acknowledgement.

In the first case, if you're sending a stream of data larger than a segment, only the last segment should be smaller-sized, and there should be an application acknowledgement from the receiver thereafter.

In the latter case, your protocol is badly designed and needs rethinking so that if you need to wait for an acknowledgement, the acknowledgement you wait for should come from the application and not the network stack - because you, the application, don't actually care if the network stack received the data.

So, no, this doesn't invalidate my argument.

Saturday, November 14, 2009 06:27 PM by Alun Jones

Saturday, November 14, 2009 1:06 AM by Ben

Leave a Comment

(required) 
(required) 
(optional)
(required) 
If you can't read this number refresh your screen
Enter the numbers above: