crypto/tls: significant drop in throughput from Go 1.6 #15713

rsc · 2016-05-17T04:47:26Z

Attached is a stab at a TLS benchmark (crypto/tls has none!).
It runs b.N TLS connections in sequence, each transferring 16 MB in each direction in 64 kB writes.
tlsbench_test.go.txt

Through Go 1.5 it was hovering around 30 MB/s on my workstation.
Go 1.6 made it jump by almost 10X, to around 295 MB/s.
At tip, however, it is down a bit, around 235 MB/s.

It would be good to understand this. This might be something to do soon after the beta is released.

/cc @agl

mikioh · 2016-05-17T06:11:46Z

Probably it's because of the dynamic record sizing; https://go-review.googlesource.com/19591. What happens when you run the bench with &tls.Config{DynamicRecordSizingDisabled: true}?

josharian · 2016-05-17T13:59:37Z

Looks like dynamic record sizing, indeed. Going from 1.6.2 to tip, I can reproduce a slowdown:

name  old time/op   new time/op   delta
IO-8    115ms ± 7%    130ms ± 3%  +12.34%  (p=0.000 n=15+15)

name  old speed     new speed     delta
IO-8  146MB/s ± 7%  129MB/s ± 3%  -11.05%  (p=0.000 n=15+15)

Going from 1.6 to tip with DynamicRecordSizingDisabled: true, I see a slight speed-up:

name  old time/op   new time/op   delta
IO-8    115ms ± 7%    109ms ± 2%  -5.26%  (p=0.000 n=15+15)

name  old speed     new speed     delta
IO-8  146MB/s ± 7%  154MB/s ± 2%  +5.47%  (p=0.000 n=15+15)

So maybe the only things to do here are to call out dynamic record sizing in the 1.7 release notes and add this benchmark to the repo. Back to you, @rsc.

cespare · 2016-05-17T18:03:45Z

/cc @tombergan

tombergan · 2016-05-17T20:20:16Z

DynamicRecordSizing writes the first 1MB in ~1KB chunks instead of ~16KB chunks. Some loss of throughput is expected as this trades throughput for latency early in the connection.

The 1MB constant may benefit from tuning. I borrowed that constant from the Google frontend servers. Unfortunately, it doesn't seem like much thought went into that constant (or at least, the thought process was not documented anywhere) and I could not find benchmark data. I'm not even sure what the right benchmarks would be -- there are competing interests and the right value will likely depend on the underlying network speed and specifics of the application layer.

tombergan · 2016-05-17T23:16:11Z

Just to verify that the loss in throughput is limited to the start of the connection, I ran Russ's benchmark with various transfer sizes. There is noise, but the overhead generally decreases as the transfer size increases.

BenchmarkIO_64MB_NoDynamicSizing-40           30     375858719 ns/op     178.55 MB/s
BenchmarkIO_64MB_DynamicSizing-40             30     396826655 ns/op     169.11 MB/s
BenchmarkIO_32MB_NoDynamicSizing-40          100     192553850 ns/op     174.26 MB/s
BenchmarkIO_32MB_DynamicSizing-40            100     201446453 ns/op     166.57 MB/s
BenchmarkIO_24MB_NoDynamicSizing-40          100     140218146 ns/op     179.48 MB/s
BenchmarkIO_24MB_DynamicSizing-40            100     155411500 ns/op     161.93 MB/s
BenchmarkIO_16MB_NoDynamicSizing-40          100     100766193 ns/op     166.50 MB/s
BenchmarkIO_16MB_DynamicSizing-40            100     110000313 ns/op     152.52 MB/s
BenchmarkIO_1MB_NoDynamicSizing-40          2000      10185166 ns/op     102.95 MB/s
BenchmarkIO_1MB_DynamicSizing-40            1000      23113165 ns/op      45.37 MB/s

rsc · 2016-05-18T00:14:16Z

@tombergan, given that we don't even know what the right values are and the ones we do use cause a significant performance drop, maybe dynamic record sizing should default to off for now or be reverted?

Maybe it should be "DynamicRecordSize int" and "DynamicRecordLimit int64", where 0 and 0 means off and the current behavior would be DynamicRecordSize=1024 and DynamicRecordLimit=1048576.

tombergan · 2016-05-18T00:39:58Z

It causes a significant performance drop when measured by throughput, but a performance increase when measured by latency, particularly for slow connections. For a simple web page served via HTTPS over a simulated 2G connection, I measured a ~500ms improvement in page load time with dynamic record sizing. There's more background in the first comment of issue #14376 and in the discussion with agl in the linked CL.

I can try to cook up a more reusable latency-centric benchmark if you like. I tend to believe that most clients will use TLS via HTTPS, and for those clients, latency is more important than throughput early in the connection. I am somewhat worried about over-tuning for a throughput benchmark when many uses are not actually throughput-limited.

That said, I don't feel too strongly about whether this should default to "on" or "off". If you feel strongly one way or the other, let's go with that. We tried to avoid knobs other than a single on/off bool, but I'm happy to change the bool to an int if you think that will help (I actually had an int in a prior version of the CL). I do feel strongly about reverting this -- given that it demonstrably improves latency, I think the feature should stay in some form.

tombergan · 2016-05-19T00:27:22Z

How about:

Disable by default in tls.Config
Change the 1MB constant to 128KB
Enable by default in net/http.Server (via cloneTLSConfig, here)

Rationale:

The most important use case is serving HTML. Pageload latency improves when the browser's preload scanner parses the HTML as quickly as possible. Smaller TLS records improve time-to-first-decrypted-byte.
128KB should be plenty to cover the HTML with response headers.
Disabling by default elsewhere means there are no unexpected throughput surprises.

@rsc, @agl, WDYT?

I added a simple latency benchmark to Russ's tlsbench_test.go. Results below from my workstation:

BenchmarkLatency_2500kbps_NoDynamicSizing-40             200      10392396 ns/op
BenchmarkLatency_1500kbps_NoDynamicSizing-40             100      14912234 ns/op
BenchmarkLatency_500kbps_NoDynamicSizing-40              100     281757497 ns/op
BenchmarkLatency_200kbps_NoDynamicSizing-40               30     677013274 ns/op
BenchmarkLatency_2500kbps_DynamicSizing-40               200       6145195 ns/op
BenchmarkLatency_1500kbps_DynamicSizing-40               200       6641656 ns/op
BenchmarkLatency_500kbps_DynamicSizing-40                100      78163048 ns/op
BenchmarkLatency_200kbps_DynamicSizing-40                 30     184665580 ns/op

tlsbench_test.go.txt

funny-falcon · 2016-05-24T09:39:22Z

may be it is sensible to make size of "small records" growing?
with linear growth, sum of 1KB+2KB+3KB+...+15KB=120KB - quite close to 128KB bound.

gopherbot · 2016-05-27T15:00:23Z

CL https://golang.org/cl/23487 mentions this issue.

tombergan · 2016-05-27T18:10:48Z

To follow-up on the above CL, I tried loading a very simple web page (16KB HTML file that imports one 16KB script in ) served over HTTP/2 on a simulated 2G connection (1200ms RTT, 200kbps throughput). Page load time results over a median of 5 runs:

Max record sizing: 6.2s
Dynamic record sizing before cl/23487: 4.6s
Dynamic record sizing after cl/23487: 5.2s

Web benchmarking is hard. The benchmark I chose is a "best case" for dynamic record sizing in some ways, with a very slow client and a very simple page. The reason dynamic record sizing is slightly slower after cl/23487 is that the HTML actually starts on the third data packet -- the first packet has h2 settings and the second packet has h2 response headers.

If I were to propose any changes, I might propose changing the sequence to 1,1,1,2,3,4,5,etc. But, I think this is a good enough compromise between throughput/latency. We can revisit this later if it becomes a problem.

/cc @rsc

The current code, introduced after Go 1.6 to improve latency on low-bandwidth connections, sends 1 kB packets until 1 MB has been sent, and then sends 16 kB packets (the maximum record size). Unfortunately this decreases throughput for 1-16 MB responses by 20% or so. Following discussion on golang#15713, change cutoff to 128 kB sent and also grow the size allowed for successive packets: 1 kB, 2 kB, 3 kB, ..., 15 kB, 16 kB. This fixes the throughput problems: the overhead is now closer to 2%. I hope this still helps with latency but I don't have a great way to test it. At the least, it's not worse than Go 1.6. Comparing MaxPacket vs DynamicPacket benchmarks: name maxpkt time/op dyn. time/op delta Throughput/1MB-8 5.07ms ± 7% 5.21ms ± 7% +2.73% (p=0.023 n=16+16) Throughput/2MB-8 15.7ms ±201% 8.4ms ± 5% ~ (p=0.604 n=20+16) Throughput/4MB-8 14.3ms ± 1% 14.5ms ± 1% +1.53% (p=0.000 n=16+16) Throughput/8MB-8 26.6ms ± 1% 26.8ms ± 1% +0.47% (p=0.003 n=19+18) Throughput/16MB-8 51.0ms ± 1% 51.3ms ± 1% +0.47% (p=0.000 n=20+20) Throughput/32MB-8 100ms ± 1% 100ms ± 1% +0.24% (p=0.033 n=20+20) Throughput/64MB-8 197ms ± 0% 198ms ± 0% +0.56% (p=0.000 n=18+7) The small MB runs are bimodal in both cases, probably GC pauses. But there's clearly no general slowdown anymore. Fixes golang#15713. Change-Id: I5fc44680ba71812d24baac142bceee0e23f2e382 Reviewed-on: https://go-review.googlesource.com/23487 Reviewed-by: Ian Lance Taylor <iant@golang.org>

rsc added this to the Go1.7 milestone May 17, 2016

rsc modified the milestones: Go1.7Beta, Go1.7 May 23, 2016

gopherbot closed this as completed in fa3543e May 27, 2016

golang locked and limited conversation to collaborators May 27, 2017

gopherbot added the FrozenDueToAge label May 27, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crypto/tls: significant drop in throughput from Go 1.6 #15713

crypto/tls: significant drop in throughput from Go 1.6 #15713

rsc commented May 17, 2016

mikioh commented May 17, 2016

josharian commented May 17, 2016

cespare commented May 17, 2016

tombergan commented May 17, 2016 •

edited

tombergan commented May 17, 2016

rsc commented May 18, 2016

tombergan commented May 18, 2016

tombergan commented May 19, 2016

funny-falcon commented May 24, 2016

gopherbot commented May 27, 2016

tombergan commented May 27, 2016

crypto/tls: significant drop in throughput from Go 1.6 #15713

crypto/tls: significant drop in throughput from Go 1.6 #15713

Comments

rsc commented May 17, 2016

mikioh commented May 17, 2016

josharian commented May 17, 2016

cespare commented May 17, 2016

tombergan commented May 17, 2016 • edited

tombergan commented May 17, 2016

rsc commented May 18, 2016

tombergan commented May 18, 2016

tombergan commented May 19, 2016

funny-falcon commented May 24, 2016

gopherbot commented May 27, 2016

tombergan commented May 27, 2016

tombergan commented May 17, 2016 •

edited