crypto/tls: 500% increase in allocations from `(*tls.Conn).Read` in go 1.17 #50657

dt · 2022-01-17T16:31:53Z

What version of Go are you using (`go version`)?

$ go version
1.17.3

Does this issue reproduce with the latest release?

I believe so, but have not yet confirmed.

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GOARCH="amd64"
GOOS="linux"

What did you do?

I'm running a service that has many http2 clients.

What did you expect to see?

Fewer allocations when reading data from established connections.

What did you see instead?

Around 20-25% of our service's total allocations at times are coming from context.WithCancel calls made by (*tls.Conn).handshakeContext.

It looks like every (*tls.Context).Read call made by http2's ReadFrame is calling c.Handshake which calls through to c.handshakeContext, which begins with a call to context.WithCancel, which allocates.

The text was updated successfully, but these errors were encountered:

johanbrandhorst · 2022-01-17T16:44:43Z

As mentioned in the Gophers Slack discussion, it seems to me we could simply move the early exit checks above the context manipulation functions to remove these allocations in the case where the handshake has already completed.

dt · 2022-01-17T17:08:00Z

it seems to me we could simply move the early exit checks above the context manipulation functions

One thing I was trying to figure out when glancing over the code: can we just check c.handshakeComplete(), only reading the atomic status field, right away and return if that's true, or do we want to move the whole block that acquires handshakeMu also checks the err up as well? Right now we're only checking status after acquiring the mutex but I'm wondering if we could check it first to avoid that?

johanbrandhorst · 2022-01-17T17:16:16Z

I would expect that we'd need to do both, so that if there was an error during handshaking that is returned first.

dt · 2022-01-17T17:19:41Z

if there was an error during handshaking

Huh, but in that case, isn't handshakeStatus still 0? I'm just reading this for the first time, so I might be missing something, but it looked like the intent of the atomic status was to avoid needing to acquire the mutex to check if handshake was completed or not?

johanbrandhorst · 2022-01-17T19:28:21Z

I'm not too intimately familiar with that specific logic, but judging by the existing logic flow we have to check the error before returning?

gopherbot · 2022-01-17T22:14:56Z

Change https://golang.org/cl/379034 mentions this issue: crypto/tls: avoid extra allocations in noop handshake calls

johanbrandhorst · 2022-01-18T02:10:43Z

CC @golang/security (sorry if this is not the right way to get the attention from someone more experienced with the code).

dt · 2022-01-21T16:41:49Z

I did a quick run comparing release-branch.go1.16 and release-branch.go1.17 using crypto/tls.BenchmarkThroughput. The raw timing numbers across those two are not directly comparable, as an unrelated change in 9d0819b switched the cipher config to avoid hardware aes, but just looking at the allocations / op:

name                                     old allocs/op  new allocs/op  delta
Throughput/MaxPacket/1MB/TLSv12-32            773 ± 0%      1030 ± 0%   +33.25%  (p=0.016 n=5+4)
Throughput/MaxPacket/1MB/TLSv13-32          1.44k ± 0%     1.67k ± 0%   +16.47%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB/TLSv12-32            843 ± 0%      1546 ± 0%   +83.35%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB/TLSv13-32          1.51k ± 0%     2.19k ± 0%   +45.50%  (p=0.008 n=5+5)
Throughput/MaxPacket/4MB/TLSv12-32            973 ± 1%      2576 ± 0%  +164.79%  (p=0.008 n=5+5)
Throughput/MaxPacket/4MB/TLSv13-32          1.64k ± 0%     3.22k ± 0%   +96.12%  (p=0.008 n=5+5)
Throughput/MaxPacket/8MB/TLSv12-32          1.24k ± 1%     4.65k ± 0%  +274.66%  (p=0.008 n=5+5)
Throughput/MaxPacket/8MB/TLSv13-32          1.91k ± 1%     5.30k ± 0%  +177.47%  (p=0.008 n=5+5)
Throughput/MaxPacket/16MB/TLSv12-32         1.77k ± 1%     8.78k ± 0%  +394.98%  (p=0.008 n=5+5)
Throughput/MaxPacket/16MB/TLSv13-32         2.43k ± 0%     9.41k ± 0%  +287.08%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv12-32         2.83k ± 1%    17.01k ± 0%  +501.69%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv13-32         3.50k ± 1%    17.65k ± 0%  +404.74%  (p=0.008 n=5+5)
Throughput/MaxPacket/64MB/TLSv12-32         4.93k ± 1%    33.60k ± 0%  +581.96%  (p=0.008 n=5+5)
Throughput/MaxPacket/64MB/TLSv13-32         5.63k ± 1%    34.14k ± 0%  +506.90%  (p=0.008 n=5+5)
Throughput/DynamicPacket/1MB/TLSv12-32        786 ± 0%      1093 ± 0%   +39.00%  (p=0.008 n=5+5)
Throughput/DynamicPacket/1MB/TLSv13-32      1.45k ± 0%     1.73k ± 0%   +19.32%  (p=0.016 n=4+5)
Throughput/DynamicPacket/2MB/TLSv12-32        856 ± 0%      1607 ± 0%   +87.69%  (p=0.008 n=5+5)
Throughput/DynamicPacket/2MB/TLSv13-32      1.52k ± 0%     2.25k ± 0%   +47.69%  (p=0.008 n=5+5)
Throughput/DynamicPacket/4MB/TLSv12-32        987 ± 1%      2640 ± 0%  +167.46%  (p=0.008 n=5+5)
Throughput/DynamicPacket/4MB/TLSv13-32      1.66k ± 0%     3.28k ± 0%   +97.82%  (p=0.008 n=5+5)
Throughput/DynamicPacket/8MB/TLSv12-32      1.26k ± 0%     4.71k ± 0%  +272.60%  (p=0.008 n=5+5)
Throughput/DynamicPacket/8MB/TLSv13-32      1.92k ± 1%     5.35k ± 0%  +178.67%  (p=0.008 n=5+5)
Throughput/DynamicPacket/16MB/TLSv12-32     1.79k ± 1%     8.84k ± 0%  +394.33%  (p=0.008 n=5+5)
Throughput/DynamicPacket/16MB/TLSv13-32     2.46k ± 1%     9.46k ± 0%  +285.13%  (p=0.008 n=5+5)
Throughput/DynamicPacket/32MB/TLSv12-32     2.86k ± 1%    17.10k ± 0%  +498.46%  (p=0.008 n=5+5)

By looking at 1.16 vs 90d6bbb -- the last commit to touch crypto/tls commit prior to the cipher change above -- I think we can see what effect these extra allocations have on the actual performance:

name                                 old time/op    new time/op    delta
Throughput/MaxPacket/2MB/TLSv12-32     6.20ms ± 1%    6.37ms ± 1%    +2.63%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB/TLSv13-32     6.46ms ± 1%    6.61ms ± 1%    +2.38%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv12-32    77.9ms ± 0%    80.9ms ± 0%    +3.88%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv13-32    80.3ms ± 1%    82.7ms ± 0%    +2.93%  (p=0.008 n=5+5)

name                                 old speed      new speed      delta
Throughput/MaxPacket/2MB/TLSv12-32    338MB/s ± 1%   329MB/s ± 1%    -2.56%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB/TLSv13-32    325MB/s ± 1%   317MB/s ± 1%    -2.32%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv12-32   431MB/s ± 0%   415MB/s ± 0%    -3.74%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv13-32   418MB/s ± 1%   406MB/s ± 0%    -2.85%  (p=0.008 n=5+5)

name                                 old alloc/op   new alloc/op   delta
Throughput/MaxPacket/2MB/TLSv12-32      216kB ± 0%     249kB ± 0%   +15.22%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB/TLSv13-32      257kB ± 0%     290kB ± 0%   +12.85%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv12-32     267kB ± 2%     848kB ± 1%  +217.68%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv13-32     309kB ± 1%     889kB ± 0%  +187.97%  (p=0.008 n=5+5)

name                                 old allocs/op  new allocs/op  delta
Throughput/MaxPacket/2MB/TLSv12-32        843 ± 0%      1558 ± 0%   +84.79%  (p=0.008 n=5+5)
Throughput/MaxPacket/2MB/TLSv13-32      1.51k ± 0%     2.22k ± 0%   +47.52%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv12-32     2.83k ± 1%    17.02k ± 0%  +501.81%  (p=0.008 n=5+5)
Throughput/MaxPacket/32MB/TLSv13-32     3.50k ± 1%    17.69k ± 0%  +405.72%  (p=0.008 n=5+5)

Is this enough of a regression to justify a backport?

alkmc · 2022-02-24T19:09:08Z

Is there any update on this issue?

moredure · 2022-04-13T08:56:50Z

#52274

johanbrandhorst · 2022-05-06T20:24:05Z

Thanks @dt and @FiloSottile for getting this in ❤️

alkmc · 2022-05-09T08:27:22Z

@dt @FiloSottile what about 1.17 and 1.18 backport?

dt · 2022-05-09T15:21:47Z

I'm unclear as to whether or not the performance regression seen here qualifies for a backport under the backport policy's definition of a serious issue with no workaround:

A “serious” problem is one that prevents a program from working at all.

I'll go ahead and open backport issues to continue the discussion there I guess?

@gopherbot please consider this for backport to 1.17, as it is a regression compared to 1.16.

@gopherbot please consider this for backport to 1.18, as it is a regression compared to 1.16.

gopherbot · 2022-05-09T15:22:04Z

Backport issue(s) opened: #52790 (for 1.17), #52791 (for 1.18).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

gopherbot · 2022-05-11T17:03:23Z

Change https://go.dev/cl/405544 mentions this issue: crypto/tls: avoid extra allocations in steady-state Handshake calls

dt changed the title ~~crypto/tls: large increase in allocations from (*tls.Context).Read in go 1.17~~ crypto/tls: large increase in allocations from (*tls.Conn).Read in go 1.17 Jan 17, 2022

ALTree added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Jan 17, 2022

komuw mentioned this issue Feb 3, 2022

net/http: Transport wastes TLS handshake progress #50984

Open

toothrot added this to the Backlog milestone Feb 4, 2022

dt mentioned this issue Apr 13, 2022

crypto/tls: avoid allocating context.WithCancel on each read/write op #52298

Closed

dt changed the title ~~crypto/tls: large increase in allocations from (*tls.Conn).Read in go 1.17~~ crypto/tls: 500% increase in allocations from (*tls.Conn).Read in go 1.17 May 4, 2022

gopherbot closed this as completed in a4af356 May 6, 2022

This was referenced May 9, 2022

crypto/tls: 500% increase in allocations from (*tls.Conn).Read in go 1.17 [1.17 backport] #52790

Closed

crypto/tls: 500% increase in allocations from (*tls.Conn).Read in go 1.17 [1.18 backport] #52791

Closed

TriAnMan mentioned this issue May 30, 2022

crypto/tls: consider less allocations for Conn.Read #53142

Open

ZekeLu mentioned this issue Jun 30, 2022

net.http: http.Client sometimes does not use a connection produced by a canceled dialer #53627

Closed

rhysh mentioned this issue Feb 16, 2023

x/net/quic: add QUIC implementation #58547

Open

golang locked and limited conversation to collaborators May 11, 2023

gopherbot added the FrozenDueToAge label May 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crypto/tls: 500% increase in allocations from `(*tls.Conn).Read` in go 1.17 #50657

crypto/tls: 500% increase in allocations from `(*tls.Conn).Read` in go 1.17 #50657

dt commented Jan 17, 2022

johanbrandhorst commented Jan 17, 2022 •

edited

dt commented Jan 17, 2022 •

edited

johanbrandhorst commented Jan 17, 2022 •

edited

dt commented Jan 17, 2022 •

edited

johanbrandhorst commented Jan 17, 2022

gopherbot commented Jan 17, 2022

johanbrandhorst commented Jan 18, 2022

dt commented Jan 21, 2022 •

edited

alkmc commented Feb 24, 2022

moredure commented Apr 13, 2022 •

edited

johanbrandhorst commented May 6, 2022

alkmc commented May 9, 2022

dt commented May 9, 2022

gopherbot commented May 9, 2022

gopherbot commented May 11, 2022

crypto/tls: 500% increase in allocations from (*tls.Conn).Read in go 1.17 #50657

crypto/tls: 500% increase in allocations from (*tls.Conn).Read in go 1.17 #50657

Comments

dt commented Jan 17, 2022

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

johanbrandhorst commented Jan 17, 2022 • edited

dt commented Jan 17, 2022 • edited

johanbrandhorst commented Jan 17, 2022 • edited

dt commented Jan 17, 2022 • edited

johanbrandhorst commented Jan 17, 2022

gopherbot commented Jan 17, 2022

johanbrandhorst commented Jan 18, 2022

dt commented Jan 21, 2022 • edited

alkmc commented Feb 24, 2022

moredure commented Apr 13, 2022 • edited

johanbrandhorst commented May 6, 2022

alkmc commented May 9, 2022

dt commented May 9, 2022

gopherbot commented May 9, 2022

gopherbot commented May 11, 2022

crypto/tls: 500% increase in allocations from `(*tls.Conn).Read` in go 1.17 #50657

crypto/tls: 500% increase in allocations from `(*tls.Conn).Read` in go 1.17 #50657

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

johanbrandhorst commented Jan 17, 2022 •

edited

dt commented Jan 17, 2022 •

edited

johanbrandhorst commented Jan 17, 2022 •

edited

dt commented Jan 17, 2022 •

edited

dt commented Jan 21, 2022 •

edited

moredure commented Apr 13, 2022 •

edited