Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/crypto/ssh: ssh: handshake failed: ssh: unexpected message type 21 (expected 6) #15066

Closed
jfmyers9 opened this issue Apr 1, 2016 · 14 comments

Comments

@jfmyers9
Copy link

jfmyers9 commented Apr 1, 2016

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
go version go1.6 linux/amd64
  1. What operating system and processor architecture are you using (go env)?
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/jfmyers9/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GO15VENDOREXPERIMENT="1"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
CXX="g++"
CGO_ENABLED="1"
  1. What did you do?

Run go test in crypto/ssh. You will eventually see this failure on random tests performing server/client handshakes.

=== RUN  TestSessionShell
--- FAIL: TestSessionShell (0.01s)
session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
  1. What did you expect to see?

A passing test.

  1. What did you see instead?

We see a failure for an unexpected message type. After digging into the issue, it appears that the handshakeTransport.readPacket method is used to perform the key exchange, but also in order to read packets in the client/server authentication flow. When a key exchange is initiated during authentication, we can experience failures as the packet returned by the above method is actually the msgNewKeys message which is the result of successfully performing the key exchange.

We went about solving this by making the following commit

Could you provide some feedback on the proposed changes and help us progress with getting the above fix into the additional crypto libraries?

@bradfitz bradfitz added this to the Unreleased milestone Apr 4, 2016
@bradfitz
Copy link
Contributor

bradfitz commented Apr 4, 2016

/cc @hanwen

@hanwen
Copy link
Contributor

hanwen commented Apr 5, 2016

thanks for the careful analysis. I wonder you triggered a kex during auth, as it usually only happens after lots of data transfer.

I'll have a look at handling msgNewKeys in a more principled way.

@hanwen
Copy link
Contributor

hanwen commented Apr 5, 2016

wait, this is in our unittests?! This should never happen, as both sides wait for the first msgNewKeys explicitly and then don't exchange enough data for a new kex to be trggered.

@hanwen
Copy link
Contributor

hanwen commented Apr 5, 2016

can you tell us a little more about the circumstances under which this happens, or how to reproduce?
I tried "go test -v -count 1000 -run SessionShell" and it passes.

@jfmyers9
Copy link
Author

jfmyers9 commented Apr 5, 2016

Just keep running the tests. This generally fails faster in resource constrained environments such such as a container or a VM. On my personal laptop it can take anywhere from 50-10000 runs in order to see this failure occur. Also you can see this issue reported in public builds of golang, for example here.

RFC 4253 Section 9 states that either the client or server may re-initiate a key exchange at any point, and the other side must be able to handle this key exchange. This would imply that a key exchange during authentication is theoretically possible, and should not result in a handshake failed: ssh: unexpected message type 21 (expected 6).

I believe that this is failing intermittently because of timing changes introduced in go 1.6. If you look at the test setup for the above test here, you will see that both the NewClientConn and NewServerConn happen in separate goroutines. Also, both the client transport and the server transport begin reading packets asynchronously in a goroutine using the readOnePacket method. readOnePacket handles key exchanges internally, so there is no guarantee that both the client and the server have requested key exchanges with the expected timing. Resulting in a failure on the subsequent readPacket in client authentication.

Granted this could fail on any calls to readPacket that are not used for key exchange as both sides may initiate a key exchange at any point in time, as noted above. Hence why in the this commit we ignore key exchange packets in previous locations where readPacket was used, as these key exchange packets are handled internally by readOnePacket. Please note, that we cannot have readPacket always ignore key exchange packets, as we use this method to validate that we have in fact performed a key exchange here.

Best,

Jim

@hanwen
Copy link
Contributor

hanwen commented Apr 6, 2016

I agree that the library doesn't perform up to spec, since a kex may be requested at any time and should not cause problems during auth.

However, in the unittests, the client and server are both golang.crypto, which should issue only one key exchange, the initial one right before the auth step. Client and server synchronize on both sides, see here

https://github.com/golang/crypto/blob/master/ssh/client.go#L106

and here

https://github.com/golang/crypto/blob/master/ssh/server.go#L197

I would guess that there is still some sort of race condition lingering that somehow causes msgNewKeys to be read once, but emitted twice from the packetTransport.

If you have a platform where this problem reproduces easily, it would be great if you could try to run with the race detector to see if there are any race conditions I missed.

@jfmyers9
Copy link
Author

jfmyers9 commented Apr 6, 2016

@hanwen msgNewKeys is not a message that is actually ever read over the wire. It is a hard-coded message in the client/server here:

https://github.com/golang/crypto/blob/master/ssh/handshake.go#L211

This has nothing to do with msgNewKeys being emitted twice incorrectly. It has to do with the internal handling of all Kex Packets in readOnePacket here:

https://github.com/golang/crypto/blob/master/ssh/handshake.go#L173-L211

The two lines you linked above do not show that the client/server synchronize on requesting their key exchanges at the same time. All they show is that the respective client/server requested a key exchange and received the correct response. There is nothing stopping them from both requesting a key exchange.

Understanding the race, you could artificially introduce timing into the production code to get this to fail easier.

Adding this time.Sleep to server.go in serverHandshake between these lines:

    s.clientVersion, err = exchangeVersions(s.sshConn.conn, s.serverVersion)
    if err != nil {
        return nil, err
    }

    tr := newTransport(s.sshConn.conn, config.Rand, false /* not client */)
    s.transport = newServerTransport(tr, s.clientVersion, s.serverVersion, config)

    time.Sleep(1 * time.Second)

    if err := s.transport.requestKeyChange(); err != nil {
        return nil, err
    }

    if packet, err := s.transport.readPacket(); err != nil {
        return nil, err
    } else if packet[0] != msgNewKeys {
        return nil, unexpectedMessageError(msgNewKeys, packet[0])
    }

or a similar sleep in clientHandshake will cause a failure 100% of the time. That should be sufficient enough for you to reproduce the problem.

Once again the issue here is that readOnePacket correctly handles all KexInit messages internally, including responding with a KexInit message if one has not already been sent, seen here:

https://github.com/golang/crypto/blob/master/ssh/handshake.go#L308

Thus the server can receive the clients KexInit, respond with a KexInit successfully and then send another KexInit because of disconnect between reading and buffering packets which begins here:

https://github.com/golang/crypto/blob/master/ssh/server.go#L189

and forcibly requesting a key exchange here:

https://github.com/golang/crypto/blob/master/ssh/server.go#L191

Also please note that this timing issue exists in both the client and the server, just in the example above I artificially introduced the timing in the server. I could have just have easily added the time.Sleep to clientHandshake and received a entirely different error message from the server handshake failing.

Please let me know if you have any other questions.

@jfmyers9
Copy link
Author

jfmyers9 commented Apr 6, 2016

For example, here is some output from go test:

--- FAIL: TestHostKeyCert (0.10s)
    certs_test.go:208: NewClientConn("hostname"): ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestClientAuthPublicKey (0.11s)
    client_auth_test.go:97: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestAuthMethodPassword (0.10s)
    client_auth_test.go:110: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestAuthMethodFallback (0.10s)
    client_auth_test.go:129: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestAuthMethodWrongPassword (0.10s)
    client_auth_test.go:147: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestAuthMethodKeyboardInteractive (0.10s)
    client_auth_test.go:164: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestAuthMethodRSAandDSA (0.10s)
    client_auth_test.go:208: client could not authenticate with rsa key: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestClientHMAC (0.10s)
    client_auth_test.go:224: client could not authenticate with mac algo hmac-sha2-256: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestClientLoginCert (1.04s)
    client_auth_test.go:277: should succeed
    client_auth_test.go:279: cert login failed: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
    client_auth_test.go:282: corrupted signature
    client_auth_test.go:288: revoked
    client_auth_test.go:296: sign with wrong key
    client_auth_test.go:302: host cert
    client_auth_test.go:310: principal specified
    client_auth_test.go:314: cert login failed: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
    client_auth_test.go:317: wrong principal specified
    client_auth_test.go:325: added critical option
    client_auth_test.go:332: allowed source address
    client_auth_test.go:336: cert login with source-address failed: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
    client_auth_test.go:339: disallowed source address
--- FAIL: TestPermissionsPassing (0.10s)
    client_auth_test.go:380: ssh: unexpected message type 21 (expected 50)
--- FAIL: TestNoPermissionsPassing (0.10s)
    client_auth_test.go:380: ssh: unexpected message type 21 (expected 50)
--- FAIL: TestSessionShell (0.10s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestSessionStdoutPipe (0.10s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestSessionOutput (0.10s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestSessionCombinedOutput (0.10s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestExitStatusNonZero (0.10s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestExitStatusZero (0.10s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestExitSignalAndStatus (0.10s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestKnownExitSignalOnly (0.10s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestUnknownExitSignal (0.10s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestExitWithoutStatusOrSignal (0.10s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestServerWindow (0.38s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestClientHandlesKeepalives (0.10s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
--- FAIL: TestClientWriteEOF (0.10s)
    session_test.go:67: unable to dial remote side: ssh: handshake failed: ssh: unexpected message type 21 (expected 6)
    session_test.go:40: Unable to handshake: ssh: unexpected message type 21 (expected 50)

With this diff:

diff --git a/ssh/server.go b/ssh/server.go
index 4781eb7..605cfe8 100644
--- a/ssh/server.go
+++ b/ssh/server.go
@@ -10,6 +10,7 @@ import (
        "fmt"
        "io"
        "net"
+       "time"
 )

 // The Permissions type holds fine-grained permissions that are
@@ -188,6 +189,8 @@ func (s *connection) serverHandshake(config *ServerConfig) (*Permissions, error)
        tr := newTransport(s.sshConn.conn, config.Rand, false /* not client */)
        s.transport = newServerTransport(tr, s.clientVersion, s.serverVersion, config)

+       time.Sleep(100 * time.Millisecond)
+
        if err := s.transport.requestKeyChange(); err != nil {
                return nil, err
        }

@hanwen
Copy link
Contributor

hanwen commented Apr 6, 2016

I appreciate that you are trying to explain me how this thing works, but I actually wrote all of that code, and know more or less how it is supposed to work.

thanks for the last diff - I'll have a closer look.

@hanwen
Copy link
Contributor

hanwen commented Apr 6, 2016

OK, I get it now.

The problem is that on setting up the connection both sides should send a kexInit to do the first (mandatory) key exchange, and they should both only proceed if they are sure the kex succeeded.

The current code does this by doing requestKeyExchange in both the server and the client.

However, if one of both sides is slow, the kex initiated by the fast side may have already run and completed. Then the requestKeyExchange effectively executes a 2nd kex, the confirmation of which (msgNewKeys) causes havoc.

I think the solution would be to have a separate requestFirstKeyChange(), which does nothing if sessionID != nil.

@jfmyers9
Copy link
Author

jfmyers9 commented Apr 6, 2016

That seems reasonable. I believe the requestFirstKeyChange() would also need to lock the mutex around sending KexInit packets before checking the sessionID as well in order to prevent the race.

While highly unlikely, the problem around receiving a KexInit while reading other packets is still a possibility. Due to the manner in which key exchanges are handled internally, I still think it would make sense to explore ideas around ignoring these packets in all cases in which we are not performing a key exchange.

Thanks for taking the time to look at this.

@hanwen
Copy link
Contributor

hanwen commented Apr 6, 2016

Yes, msgNewKey should probably be filtered from the rest of the code.

However, that would mask the current problem, which is that we might do 2 kexes if one side is slow; I prefer to fix the latter problem first.

@hanwen
Copy link
Contributor

hanwen commented Apr 6, 2016

https://go-review.googlesource.com/#/c/21606/

@dmitshur
Copy link
Contributor

It looks like golang/crypto@d68c3ec has been merged, which fixes this issue, so it can be closed now.

@hanwen hanwen closed this as completed Apr 28, 2016
@golang golang locked and limited conversation to collaborators Apr 28, 2017
FiloSottile pushed a commit to FiloSottile/go that referenced this issue Nov 24, 2019
This is done by running the key exchange and setting the session ID
under mutex. If the first exchange encounters an already set session
ID, then do nothing.

This fixes a race condition:

On setting up the connection, both sides sent a kexInit to initiate
the first (mandatory) key exchange.  If one side was faster, the
faster side might have completed the key exchange, before the slow
side had a chance to send a kexInit.  The slow side would send a
kexInit which would trigger a second key exchange. The resulting
confirmation message (msgNewKeys) would confuse the authentication
loop.

This fix removes sessionID from the transport struct.

This fix also deletes the unused interface rekeyingTransport.

Fixes golang#15066

Change-Id: I7f303bce5d3214c9bdd58f52d21178a185871d90
Reviewed-on: https://go-review.googlesource.com/21606
Reviewed-by: Adam Langley <agl@golang.org>
Reviewed-by: Han-Wen Nienhuys <hanwen@google.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants