New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
crypto/tls: handshake race between read and write goroutines in go1.7 #17101
Comments
Tagging as Go 1.7.2 since I believe this is a regression from Go 1.6. But this might be Go 1.8 material instead. I'll let @agl decide. |
It's definitely a regression since go1.5.3, but unfortunately, I haven't tested on go1.6. |
And I though that I might have gotten away with caving on support renegotiation without any nasty bugs :( Will look at this today. |
CL https://golang.org/cl/29164 mentions this issue. |
CL https://golang.org/cl/31268 mentions this issue. |
…te handshake. After renegotiation support was added (af125a5) it's possible for a Write to block on a Read when racing to complete the handshake: 1. The Write determines that a handshake is needed and tries to take the neccesary locks in the correct order. 2. The Read also determines that a handshake is needed and wins the race to take the locks. 3. The Read goroutine completes the handshake and wins a race to unlock and relock c.in, which it'll hold when waiting for more network data. If the application-level protocol requires the Write to complete before data can be read then the system as a whole will deadlock. Unfortunately it doesn't appear possible to reverse the locking order of c.in and handshakeMutex because we might read a renegotiation request at any point and need to be able to do a handshake without unlocking. So this change adds a sync.Cond that indicates that a goroutine has committed to doing a handshake. Other interested goroutines can wait on that Cond when needed. The test for this isn't great. I was able to reproduce the deadlock with it only when building with -race. (Because -race happened to alter the timing just enough.) Fixes #17101. Change-Id: I4e8757f7b82a84e46c9963a977d089f0fb675495 Reviewed-on: https://go-review.googlesource.com/29164 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-on: https://go-review.googlesource.com/31268
…te handshake. After renegotiation support was added (af125a5) it's possible for a Write to block on a Read when racing to complete the handshake: 1. The Write determines that a handshake is needed and tries to take the neccesary locks in the correct order. 2. The Read also determines that a handshake is needed and wins the race to take the locks. 3. The Read goroutine completes the handshake and wins a race to unlock and relock c.in, which it'll hold when waiting for more network data. If the application-level protocol requires the Write to complete before data can be read then the system as a whole will deadlock. Unfortunately it doesn't appear possible to reverse the locking order of c.in and handshakeMutex because we might read a renegotiation request at any point and need to be able to do a handshake without unlocking. So this change adds a sync.Cond that indicates that a goroutine has committed to doing a handshake. Other interested goroutines can wait on that Cond when needed. The test for this isn't great. I was able to reproduce the deadlock with it only when building with -race. (Because -race happened to alter the timing just enough.) Fixes golang#17101. Change-Id: I4e8757f7b82a84e46c9963a977d089f0fb675495 Reviewed-on: https://go-review.googlesource.com/29164 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-on: https://go-review.googlesource.com/31268
After renegotiation support was added (af125a5) it's possible for a Write to block on a Read when racing to complete the handshake: 1. The Write determines that a handshake is needed and tries to take the neccesary locks in the correct order. 2. The Read also determines that a handshake is needed and wins the race to take the locks. 3. The Read goroutine completes the handshake and wins a race to unlock and relock c.in, which it'll hold when waiting for more network data. If the application-level protocol requires the Write to complete before data can be read then the system as a whole will deadlock. Unfortunately it doesn't appear possible to reverse the locking order of c.in and handshakeMutex because we might read a renegotiation request at any point and need to be able to do a handshake without unlocking. So this change adds a sync.Cond that indicates that a goroutine has committed to doing a handshake. Other interested goroutines can wait on that Cond when needed. The test for this isn't great. I was able to reproduce the deadlock with it only when building with -race. (Because -race happened to alter the timing just enough.) Fixes golang#17101. Change-Id: I4e8757f7b82a84e46c9963a977d089f0fb675495 Reviewed-on: https://go-review.googlesource.com/29164 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
After renegotiation support was added (af125a5) it's possible for a Write to block on a Read when racing to complete the handshake: 1. The Write determines that a handshake is needed and tries to take the neccesary locks in the correct order. 2. The Read also determines that a handshake is needed and wins the race to take the locks. 3. The Read goroutine completes the handshake and wins a race to unlock and relock c.in, which it'll hold when waiting for more network data. If the application-level protocol requires the Write to complete before data can be read then the system as a whole will deadlock. Unfortunately it doesn't appear possible to reverse the locking order of c.in and handshakeMutex because we might read a renegotiation request at any point and need to be able to do a handshake without unlocking. So this change adds a sync.Cond that indicates that a goroutine has committed to doing a handshake. Other interested goroutines can wait on that Cond when needed. The test for this isn't great. I was able to reproduce the deadlock with it only when building with -race. (Because -race happened to alter the timing just enough.) Fixes golang#17101. Change-Id: I4e8757f7b82a84e46c9963a977d089f0fb675495 Reviewed-on: https://go-review.googlesource.com/29164 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org>
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go1.7
What operating system and processor architecture are you using (
go env
)?linux
What did you do?
We have a simple Stubby-alike RPC protocol ("Sake"), which spins up input and output goroutines. We don't currently force a handshake, so input and output race to handshake. (I'll be fixing that after I file this bug.)
in
at the point where the handshake routine tries to give uphandshakeMutex
and grabin
.in
and completes handshake.in
, and blocks forever, waiting for input. Because this is a simple client-server protocol, with client-initiated communication, after the connection is established and the handshake is finished, nothing will ever be read except in response to a client-initiated request.in
inside ofHandshake()
.The comment here https://github.com/golang/go/blob/5a589904a3/src/crypto/tls/conn.go#L1204 exactly describes the problem.
It was introduced in this commit: af125a5#diff-ef0187d9cfe69a02cab179f844c8e712R1167
What did you expect to see?
Successful connections and communications.
What did you see instead?
The output thread stuck indefinitely until a redeploy of the other end tears down all the connections.
Notes
In our staging environment, I have a test client running on two hosts, each making 300 connections to a test server. It took me six or seven restarts to see this problem happen on one connection, so let's guess one attempt in two thousand. Ish.
An apology: it should be possible to reproduce in a test case, but I haven't the time to write it right now: set up a simple server, using TLS, and repeatedly try to connect pairs of input/output goroutines. A watchdog goroutine checks for hung connections.
@agl
The text was updated successfully, but these errors were encountered: