Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net: default TCP_NODELAY to false #57530

Closed
sxlijin opened this issue Dec 30, 2022 · 0 comments
Closed

net: default TCP_NODELAY to false #57530

sxlijin opened this issue Dec 30, 2022 · 0 comments

Comments

@sxlijin
Copy link

sxlijin commented Dec 30, 2022

There was a Hacker News discussion yesterday about why TCP_NODELAY is defaulted to true.

Filing this issue not because I intend this to be a proposal, but because I want to document rsc's response in a forum officially associated with golang:

That code was in turn a loose port of the dial function from Plan 9 from User Space, where I added TCP_NODELAY to new connections by default in 2004 [1], with the unhelpful commit message "various tweaks". If I had known this code would eventually be of interest to so many people maybe I would have written a better commit message!
I do remember why, though. At the time, I was working on a variety of RPC-based systems that ran over TCP, and I couldn't understand why they were so incredibly slow. The answer turned out to be TCP_NODELAY not being set. As John Nagle points out [2], the issue is really a bad interaction between delayed acks and Nagle's algorithm, but the only option on the FreeBSD system I was using was TCP_NODELAY, so that was the answer. In another system I built around that time I ran an RPC protocol over ssh, and I had to patch ssh to set TCP_NODELAY, because at the time ssh only set it for sessions with ptys [3]. TCP_NODELAY being off is a terrible default for trying to do anything with more than one round trip.

When I wrote the Go implementation of net.Dial, which I expected to be used for RPC-based systems, it seemed like a no-brainer to set TCP_NODELAY by default. I have a vague memory of discussing it with Dave Presotto (our local networking expert, my officemate at the time, and the listed reviewer of that commit) which is why we ended up with SetNoDelay as an override from the very beginning. If it had been up to me, I probably would have left SetNoDelay out entirely.

As others have pointed out at length elsewhere in these comments, it's a completely reasonable default.

I will just add that it makes no sense at all that git-lfs (lf = large file!) should be sending large files 50 bytes at a time. That's a huge number of system calls that could be avoided by doing larger writes. And then the larger writes would work better for the TCP stack anyway.

And to answer the question in the article:

Much (all?) of Kubernetes is written Go, and how has this default affected that?

I'm quite confident that this default has greatly improved the default server latency in all the various kinds of servers Kubernetes has. It was the right choice for Go, and it still is.

[1] https://github.com/9fans/plan9port/commit/d51419bf4397cf13d0...

[2] https://news.ycombinator.com/item?id=34180239

[3] http://publications.csail.mit.edu/lcs/pubs/pdf/MIT-LCS-TM-65...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants