Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/tls: TLS1.3 occasionally fails with "tls: error decoding message" when fetching from https://tls13.1d.pw #41983

Closed
firefart opened this issue Oct 14, 2020 · 11 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@firefart
Copy link

firefart commented Oct 14, 2020

What version of Go are you using (go version)?

$ go version
go version go1.15.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/firefart/.cache/go-build"
GOENV="/home/firefart/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/firefart/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/firefart/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build097380316=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Hi,
we've noticed something strange when testing a tool against a TLS1.3 only test server over at https://tls13.1d.pw/ . Requests to this server sometimes succeed and sometimes return a remote error: tls: error decoding message error.
From here it looks like the webserver used on this testing site is a custom implementation (the Server header indicates Server: Z/pZ v.1.11.beta) so I think this might be a server side issue on their side, but by using various browsers this issue can not be reproduced (maybe they are less strict on handling the TLS1.3 protocol).
So I'm opening an issue here so someone familiar with TLS1.3 might have a look at this to rule out a possible golang error on handling or sending TLS1.3 packets.

Thanks!

Test script:

package main

import (
        "net/http"
        "log"
        "io/ioutil"
)

func main() {
        resp, err := http.Get("https://tls13.1d.pw/")
        if err != nil {
                log.Fatalln(err)
        }

        body, err := ioutil.ReadAll(resp.Body)
        if err != nil {
                log.Fatalln(err)
        }

  log.Println("Success")
  _ = body
  // log.Println(string(body))
}

Example run:

for x in $(seq 5); do echo $x; go run test.go; done
1
2020/10/14 23:34:52 Get "https://tls13.1d.pw/": remote error: tls: error decoding message
exit status 1
2
2020/10/14 23:34:52 Success
3
2020/10/14 23:34:53 Get "https://tls13.1d.pw/": remote error: tls: error decoding message
exit status 1
4
2020/10/14 23:34:53 Success
5
2020/10/14 23:34:54 Get "https://tls13.1d.pw/": remote error: tls: error decoding message
exit status 1

Here are 2 pcaps of good and failed requests:
pcap.zip

What did you expect to see?

A successful http request

What did you see instead?

go run test.go
2020/10/14 23:30:25 Get "https://tls13.1d.pw/": remote error: tls: error decoding message
exit status 1
@dmitshur dmitshur changed the title Strange TLS1.3 behaviour on test site: tls: error decoding message net/http: strange TLS1.3 behavior on test site, "tls: error decoding message" Oct 15, 2020
@dmitshur dmitshur added this to the Backlog milestone Oct 15, 2020
@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 15, 2020
@dmitshur
Copy link
Contributor

CC @FiloSottile.

@odeke-em
Copy link
Member

Here is a minimal code reproducer, using the crypto/tls dialer and writing HTTP/1.1. over the wire without using the net/http package, so that we can isolate the problem:

package main

import (
	"bufio"
	"crypto/tls"
	"crypto/x509"
	"errors"
	"fmt"
	"io"
)

func main() {
	certPool, err := x509.SystemCertPool()
	if err != nil {
		panic(err)
	}
	host := "tls13.1d.pw"
	conn, err := tls.Dial("tcp", host+":443", &tls.Config{RootCAs: certPool})
	if err != nil {
		panic(err)
	}
	defer conn.Close()

	fmt.Fprintf(conn, "GET / HTTP/1.1\r\nPath: /\r\nConnection: close\r\nHost: %s\r\n\r\n", host)
	br := bufio.NewReader(conn)
	for {
		line, _, err := br.ReadLine()
		if errors.Is(err, io.EOF) {
			break
		}
		if err != nil {
			panic(err)
		}
		println(string(line))
	}
}

which will produce sometimes either of:

Bad response

$ go run main.go 
panic: remote error: tls: error decoding message

goroutine 1 [running]:
main.main()
	/Users/emmanuelodeke/Desktop/openSrc/bugs/golang/41983/main.go:15 +0x413
exit status 2

Good response

HTTP/1.1 200 OK
Server: Z/pZ v.1.11.beta
Public-Key-Pins: pin-sha256="LOip4wo3g8bntfGValXzDEmzTG23JtAZZ5zECX2rOlw="; pin-sha256="As6pWcwmAbUdO56S2HsZWl+kuLXR+ffow9nrkMHhrAQ="; pin-sha256="fec87drmn0s0sf95Tf2ltYSCesjFl43xCX++/laTvTs="; max-age=259201;
Strict-Transport-Security: max-age=15552000
X-Recent-Code-Change: desc="Updated URL handling"
X-TLS-ClientRandom-Challenge: try="0xDEADDEADDEADC0DE0[0...]-in-Random"
Content-Type: text/html; charset=UTF-8
Connection: close

<!DOCTYPE html><html><head><link rel="stylesheet" href="/assets/styles.css" /><title>TLS 1.3 connection test server at tls13.1d.pw</title></head><body>

<h1>TLS 1.3 Connection Test Server at tls13.1d.pw</h1>
<div id="info_panel"><p>Successfully connected</p><p>TLS 1.3 OK;</p>
</div>
<div id="wrapper">
<p>Detailed description:
<pre>v.0.21.7-k
(This TLS-1.3-only server supports ESNI, HelloRetryRequest, KeyUpdate, ECDHE, FFDHE, etc.)


CRYPTO_CONTEXT

	Protocol version: 0x0304
	Cipher suite: 0x1301 (TLS_AES_128_GCM_SHA256)
	Key exchange: 0x001D (X25519)

	(Handshake level)
	Handshake secrets:
		Client: 0x1FC704DEA339962A31BE50F6DF06D59BB4E4FDA9C574BD7A4348DEABC1EE83C8
		Server: 0x95D4275CEB6D7DF898D58060B4BF9864775BA6298C800796E475F2497A716743
	Handshake keys:
		Client write: 0x9835E654D4DDB4FA01F13D9AD1965F5B
		Server write: 0x02431611068A7EC71AEC32B4A7A3EC40
	Handshake IVs:
		Client write (IV): 0xC154F4811769FDA1B24EE35A
		Server write (IV): 0x0AD1CC35010650D4F5F48C01

	(Traffic level)
	Traffic secrets:
		Client: 0x65D80D9FA15BB68FBABC959577017441BE7FFA8B88BE11C8C311F2FCA1A52BFF
		Server: 0x24A7D5D11E19AE2919DD17AA36F3D791840643F098B271042C46C0B962241297
	Traffic keys:
		Client write: 0xEF196A6D6EFA3F4BAF2AA3524EC72EC0
		Server write: 0x5E41861935D6F36495A0880A1469D1D6
	Traffic base IVs:
		Client write (base IV): 0x0859B5C82C0F389E9288EE2B
		Server write (base IV): 0xBB07F9D63333D7AF527CDB67

--- Messages ---

[Legacy ChangeCipherSpec message sent (server)]

CLIENT_HELLO (recieved)

	Version: 0x0303
	Cipher suites:
		0xC02F,0xC030,0xC02B,0xC02C,0xCCA8,0xCCA9,0xC013,0xC009
		0xC014,0xC00A,0x009C,0x009D,0x002F,0x0035,0xC012,0x000A
		0x1301,0x1303,0x1302
	Client Random: 27:75:32:F3:15:62:3E:A0:8C:09:50:EA:F9:24:EF:37:42:B7:14:58:DA:39:B2:16:73:35:5E:23:8D:8B:61:2F
	Client SessionID: 5D:89:25:6C:9A:87:EF:7C:F5:69:01:E3:F7:B3:D9:0D:74:A8:DF:51:DD:9C:F0:36:A4:3B:D8:72:7C:26:1D:6B
	Extensions:
		Type: 00 (0x0000); "server_name"
			00:00:0E:00:00:0B:74:6C:73:31:33:2E:31:64:2E:70:77
			hostname: tls13.1d.pw
		Type: 05 (0x0005); "status_request"
			01:00:00:00:00
		Type: 10 (0x000A); "supported_groups"
			0x001D (X25519)
			0x0017 (Secp256r1)
			0x0018 (Secp384r1)
			0x0019 (Secp521r1)
		Type: 11 (0x000B); /non-TLS-1.3/
			01:00
		Type: 13 (0x000D); "signature_algorithms"
			00:18:08:04:04:03:08:07:08:05:08:06:04:01:05:01
			06:01:05:03:06:03:02:01:02:03
		Type: 65281 (0xFF01); /non-TLS-1.3/
			00
		Type: 18 (0x0012); "signed_certificate_timestamp"
		Type: 43 (0x002B); "supported_versions"
			0x0304
			0x0303
			0x0302
			0x0301
		Type: 51 (0x0033); "key_share"
			X25519 (0x001D)
				x = 0x1828B5EAFEF9C23A4EE5373FFC26D7D66380FCA6776AC09BB36D43E99B69930B

SERVER_HELLO (sent)

	Legacy Version: 0x0303
	Cipher suite: 0x1301 (TLS_AES_128_GCM_SHA256)
	Server Random: DE:AD:DE:AD:DE:AD:C0:DE:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00
	SessionID: 5D:89:25:6C:9A:87:EF:7C:F5:69:01:E3:F7:B3:D9:0D:74:A8:DF:51:DD:9C:F0:36:A4:3B:D8:72:7C:26:1D:6B
	Extensions:
		Type: 51 (0x0033); "key_share"
			X25519 (0x001D)
				x = 0x145891E6697DAA111BD9A43EB39FA860847D09303D603128F82CE7451DAE456B
		Type: 43 (0x002B); "supported_versions"
			0x0304

[Legacy ChangeCipherSpec message sent (server)]

SERVER_HANDSHAKE_MESSAGES (sent)

	Type: 8 (0x08) - "encrypted_extensions"
		Length (octets, dec.): 2
		00:00
	Type: 11 (0x0B) - "certificate"
		Length (octets, dec.): 3165
		[...] /skipped 3165 data octets/
	Type: 15 (0x0F) - "certificate_verify"
		Length (octets, dec.): 107
		05:03:00:67:30:65:02:31:00:84:20:8C:6D:A6:44:2F:9D:63:B2:27:50:3E:2F:02:48:B6:BE:35:C3:A7:38:21
		28:8C:9F:9E:8C:38:B8:F5:11:D8:12:F5:E0:76:EF:F4:EB:AA:3F:70:9D:82:57:04:5A:02:30:11:5D:13:31:9D
		B4:02:DE:C6:8E:FA:08:35:27:D9:78:AC:AB:FC:9B:A0:43:E7:BC:15:25:24:9D:43:94:FD:28:B0:35:DC:59:B2
		88:EA:0B:B5:0A:9D:06:11:D2:E8:85
	Type: 20 (0x14) - "finished"
		Length (octets, dec.): 32
		5A:8E:23:33:47:54:6D:66:6F:37:49:36:C1:3B:69:BC:84:6A:6A:4A:E0:DB:AC:1B:44:52:E0:CA:77:87:BF:57

[Legacy ChangeCipherSpec message present (client)]

CLIENT_FINISHED (received)

	Finished value: 0x6C16E254640F9DD42558938C1CACC3BF97D50D7F4DA8C5ABC9661C8944F5D6FD
	Client Finished status: OK

CLIENT_APPLICATION_DATA (recieved)

	ASCII dump (filtered):
		GET / HTTP/1.1..Path: /..Connection: close..Host: tls13.1d.pw...
		.

TCP: server: 178.128.250.95:443; client: 207.140.28.70:59847; timestamp: 1602835460


Contacts: dxdt.ru, email := alex//\/io7.nl.

 /\_/\
( 0.0 )
 = ^ =
 /|_|\
(") (")=~


Provide ESNI to get second ASCII cat

Elapsed server side: 198ms; ServerHello sent: 8ms; TLS connection established: 198ms.
</pre></p></div>

</body></html>

@odeke-em odeke-em changed the title net/http: strange TLS1.3 behavior on test site, "tls: error decoding message" crypto/tls: TLS1.3 occasionally fails with "tls: error decoding message" when fetching from https://tls13.1d.pw Oct 16, 2020
@odeke-em
Copy link
Member

/cc @rolandshoemaker

@apoq
Copy link

apoq commented Oct 16, 2020

You also get an error in firefox occasionally with that specific domain: "An error occurred during a connection to tls13.1d.pw. SSL received a record with an incorrect Message Authentication Code. Error code: SSL_ERROR_BAD_MAC_READ".

See: https://serverfault.com/questions/6653/ssl-error-ssl-error-bad-mac-read

@apoq
Copy link

apoq commented Oct 16, 2020

conn.go:712: return c.in.setErrorLocked(c.sendAlert(alertDecodeError))
handshake_client_tls13.go:189: c.sendAlert(alertDecodeError)
handshake_client_tls13.go:290: c.sendAlert(alertDecodeError)
handshake_client_tls13.go:427: c.sendAlert(alertDecodeError)

@davecheney
Copy link
Contributor

@apoq given that you have confirmed this problem happens using firefox, which is not written in go, is it possible that the problem exists at the remote end? Are you able to contact the owners of the site and gather some details about the software they are running?

@rolandshoemaker
Copy link
Member

This appears to be caused by server implementation issue rather than a client one. When the server sends a message with the records hello_retry_request, change_cipher_spec the crypto/tls client responds with change_cipher_spec, client_hello.

It seems like the client sending change_cipher_spec before client_hello trips up this particular server and causes a decode_error alert. If you remove the change_cipher_spec message, or move it after the client_hello, the server continues normally. This issue looks to be an implementation issue on the server side as the server should just be dropping all of the change_cipher_spec messages on the ground without doing anything with them (and in fact RFC 8446 call out that servers should be capable of handling a change_cipher_spec message between the two client_hello messages).

/cc @FiloSottile who understands TLS 1.3 far better than I do though and may have different thoughts.

@FiloSottile
Copy link
Contributor

Thanks @rolandshoemaker for looking into this! Indeed, we are doing what the spec says

   -  If not offering early data, the client sends a dummy
      change_cipher_spec record (see the third paragraph of Section 5)
      immediately before its second flight.  This may either be before
      its second ClientHello or before its encrypted handshake flight.
      If offering early data, the record is placed immediately after the
      first ClientHello.

and the server is expected to tolerate it

   An implementation may receive an unencrypted record of type
   change_cipher_spec consisting of the single byte value 0x01 at any
   time after the first ClientHello message has been sent or received
   and before the peer's Finished message has been received and MUST
   simply drop it without further processing.

This is a server issue. It looks like this is an experimental server implementation, so I recommend reaching out.

@firefart
Copy link
Author

Thanks for looking into this!

@avened
Copy link

avened commented Oct 18, 2020

Hello,

I'm the one behind the tls13.1d.pw TLS implementation. Hopefully, I have now fixed this bug in server. The cause was exactly the leading change_cipher_spec handling - in some cases server was erroneously skipping not only CCS, but also next message. Thanks.

@OJ
Copy link

OJ commented Oct 19, 2020

Thanks everyone for taking the time to look at this! And thanks @avened for jumping on it. Local test looks good:

image

You're all wonderful humans.

@golang golang locked and limited conversation to collaborators Oct 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

10 participants