Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/tls: regression in Handshake behavior for bad certificates #15709

Closed
dsnet opened this issue May 16, 2016 · 15 comments
Closed

crypto/tls: regression in Handshake behavior for bad certificates #15709

dsnet opened this issue May 16, 2016 · 15 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented May 16, 2016

Commit 37c2875 caused a regression in the tls package that causes some tests to fail.

This is my hypothesis (I'm not an expert in the TLS protocol):
Previously, in the event of a bad client certificate (using the ClientAuth feature), the server would inform the client that the client's certificate was rejected before returning from the Handshake method. After the change mentioned, it seems that Handshake would return before the client is informed of the failure.

Consider the following code (where the client is using a bad certificate and should fail the Handshake):

// Snippet of server-side logic.
if err := tc.(*tls.Conn).Handshake(); err != nil {
    // If the client presented a bad cert, it seems reasonable to
    // Close the connection immediately.
    tc.Close()
}


// Snippet of client-side logic.
if err := c.Handshake(); err != nil {
    // The client is intentionally using a bad certificate, so we expect that
    // the error be "bad certificate". However, it is more often a "pipe error" instead.
    panic(err)
}

If the server's Handshake returns before it informs the client of a bad certificate, then there is a race condition going on here. As the client's Handshake is asking the server for the status of the TLS handshake, the client's packet may be received by the server before the server calls tc.Close or it may be received after calling tc.Close. If received before, then the client's Handshake will properly return an "bad certificate" error. However, if received after, then the client's Handshake will return a "broken pipe" error, which is more indicative of a network failure, than an authentication error.

Putting a time.Sleep(time.Second) before calling tc.Close on the server makes the client's Handshake almost always return 'bad certificate', further indicating that this is a race.

See this example: https://play.golang.org/p/7aY0K9wcEW

@agl, @bradfitz

@dsnet dsnet added this to the Go1.7 milestone May 16, 2016
@dsnet
Copy link
Member Author

dsnet commented May 16, 2016

/cc @tamird also

@dsnet
Copy link
Member Author

dsnet commented May 17, 2016

Here are TCP stream dumps of the communication between client and server. stream_good is from running on go1.6, while stream_bad is from running 81a8960. In the files peer0 is the client and peer1 is the server.

steam_good.txt
stream_bad.txt

@quentinmit quentinmit added the NeedsFix The path to resolution is known, but the work has not been done. label May 26, 2016
@adg
Copy link
Contributor

adg commented May 31, 2016

cc @agl

@ianlancetaylor
Copy link
Contributor

CC @tamird

@tamird
Copy link
Contributor

tamird commented Jun 1, 2016

Not sure what help I can provide here without a failing test. The change in 37c2875 only does better error handling, so whatever is happening here is the result of an unmasking of some other bug.

@ianlancetaylor
Copy link
Contributor

@tamird There is a failing test in the initial report. The program wants to consistently see a "bad certificate" error. Instead it sometimes sees a write error.

I don't think there is a hidden bug here. What seems to be happening is that the server reads some data, detects a bad certificate, sends an alert, and closes the connection. The client seems to write some data including a bad certificate, then write some more data. The second write fails and the client reports a write error, rather than checking for the alert sent from the server. I think.

@tamird
Copy link
Contributor

tamird commented Jun 1, 2016

OK, looking.

@ianlancetaylor
Copy link
Contributor

I have a fix for this but I'm not sure how to write a test that fits into the package.

@ianlancetaylor
Copy link
Contributor

@gopherbot
Copy link

CL https://golang.org/cl/23604 mentions this issue.

@agl
Copy link
Contributor

agl commented Jun 1, 2016

https://go-review.googlesource.com/23609 is my take on this. It appears to solve the problem and also saves some packets on the wire.

I am not sure whether it's suitable for landing during the freeze, however.

@gopherbot
Copy link

CL https://golang.org/cl/23609 mentions this issue.

@adg
Copy link
Contributor

adg commented Jun 1, 2016

@agl if this is a fix for a regression then it seems suitable for the release to me.

Getting it in before the beta means that it should be sufficiently tested before the final release.

How would you rate the risk of this change?

@agl
Copy link
Contributor

agl commented Jun 1, 2016

I think it has a low risk, but there's a fair amount of uncertainty in that. It's possible that some implementations may be sensitive to how the handshake is cut up into packets and that's very hard to predict.

If you're happy using the beta process to test it then I've no problem with it landing. (I'm not sure how to write a good test for it, however, because it's inherently about a race. I think counting the number of writes in a handshake might be the best solution.)

@adg
Copy link
Contributor

adg commented Jun 1, 2016

I'd be happy to include it in the beta and encourage testing by people doing interesting things with TLS.

If it were any later then I wouldn't accept it.

slackpad added a commit to hashicorp/consul that referenced this issue Nov 8, 2016
…on errors.

We traced through and realized that golang/go#15709
causes the output from the client to get buffered, which cuts off the alert
feedback due to the flush() call getting bypassed by the error return.
slackpad added a commit to hashicorp/consul that referenced this issue Nov 8, 2016
)

* Upgrades to Go 1.7 and fixes vet finding and TLS behavior change.

* Fixes unit tests in a better manner by closing the client connection on errors.

We traced through and realized that golang/go#15709
causes the output from the client to get buffered, which cuts off the alert
feedback due to the flush() call getting bypassed by the error return.
@golang golang locked and limited conversation to collaborators Jun 2, 2017
FiloSottile pushed a commit to FiloSottile/go that referenced this issue Oct 12, 2018
This change causes TLS handshake messages to be buffered and written in
a single Write to the underlying net.Conn.

There are two reasons to want to do this:

Firstly, it's slightly preferable to do this in order to save sending
several, small packets over the network where a single one will do.

Secondly, since 37c2875 errors from
Write have been returned from a handshake. This means that, if a peer
closes the connection during a handshake, a “broken pipe” error may
result from tls.Conn.Handshake(). This can mask any, more detailed,
fatal alerts that the peer may have sent because a read will never
happen.

Buffering handshake messages means that the peer will not receive, and
possibly reject, any of a flow while it's still being written.

Fixes golang#15709

Change-Id: I38dcff1abecc06e52b2de647ea98713ce0fb9a21
Reviewed-on: https://go-review.googlesource.com/23609
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: Andrew Gerrand <adg@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
FiloSottile pushed a commit to FiloSottile/go that referenced this issue Oct 12, 2018
This change causes TLS handshake messages to be buffered and written in
a single Write to the underlying net.Conn.

There are two reasons to want to do this:

Firstly, it's slightly preferable to do this in order to save sending
several, small packets over the network where a single one will do.

Secondly, since 37c2875 errors from
Write have been returned from a handshake. This means that, if a peer
closes the connection during a handshake, a “broken pipe” error may
result from tls.Conn.Handshake(). This can mask any, more detailed,
fatal alerts that the peer may have sent because a read will never
happen.

Buffering handshake messages means that the peer will not receive, and
possibly reject, any of a flow while it's still being written.

Fixes golang#15709

Change-Id: I38dcff1abecc06e52b2de647ea98713ce0fb9a21
Reviewed-on: https://go-review.googlesource.com/23609
Reviewed-by: Andrew Gerrand <adg@golang.org>
Run-TryBot: Andrew Gerrand <adg@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

7 participants