Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/http: TestClientTimeout* failures with "missing timeout substring" on Windows #43120

Open
bcmills opened this issue Dec 10, 2020 · 18 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Dec 10, 2020

#!watchflakes
post <- goos == "windows" && pkg == "net/http" && test ~ `^TestClientTimeout.*` && `missing timeout substring`

2020-12-09T01:34:53-ae9b442/windows-arm-zx2c4
2020-12-07T17:19:41-7f9a2bc/windows-arm-zx2c4
2020-12-03T02:35:36-da54dfb/windows-arm-zx2c4
2020-12-02T20:17:57-2d0258d/windows-arm-zx2c4

--- FAIL: TestClientTimeout_h1 (0.20s)
    client_test.go:1277: error string = "context deadline exceeded"; missing timeout substring
--- FAIL: TestClientTimeout_h2 (0.20s)
    client_test.go:1277: error string = "context deadline exceeded"; missing timeout substring
FAIL
FAIL	net/http	43.328s

CC @zx2c4 @bradfitz @neild @empijei

@bcmills bcmills added OS-Windows NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 10, 2020
@bcmills bcmills added this to the Backlog milestone Dec 10, 2020
@zx2c4
Copy link
Contributor

zx2c4 commented Dec 10, 2020

We increased the timeout here: golang/build@b6220de

But I don't know if that change has been deployed yet. @dmitshur - do you know?

@zx2c4
Copy link
Contributor

zx2c4 commented Dec 10, 2020

Oh, the latest log shows "GO_TEST_TIMEOUT_SCALE=3", so looks like it has been deployed.

So I'm not quite sure...

@dmitshur
Copy link
Contributor

Yes, it has. You can find out the deployed version by visiting https://farmer.golang.org, it tells the version at the top:

[...] Uptime 10h21m44s. Version 90d5e7c418c099f6d88cfc0e6269a94fa4da7f0e.

@bcmills
Copy link
Contributor Author

bcmills commented Aug 31, 2021

Looks like it has started occurring on other builders too.

greplogs --dashboard -md -l -E 'missing timeout substring' --since=2020-12-10

2021-08-30T22:07:43-437362c/openbsd-amd64-68
2021-06-30T16:44:19-0fa3265/windows-arm64-10
2021-06-25T21:04:07-5160896/windows-arm64-aws
2021-06-18T16:56:48-57aaa19/windows-arm-zx2c4
2021-06-16T20:38:07-0e67ce3/windows-arm64-aws
2021-06-15T18:42:11-219fe9d/windows-arm64-aws
2021-06-11T16:09:15-e2dc6dd/windows-arm-zx2c4
2021-05-10T15:47:52-a9edda3/openbsd-amd64-68
2021-05-04T23:35:34-137be77/windows-arm-zx2c4
2021-04-30T19:59:10-faff49a/windows-arm-zx2c4
2021-04-13T21:13:12-c19759a/windows-arm-zx2c4
2021-04-12T18:08:47-849dba0/openbsd-amd64-68
2021-04-09T14:33:44-a690a5d/windows-arm-zx2c4
2021-04-07T13:24:10-4520da4/windows-arm-zx2c4
2021-04-07T02:05:55-8d77e45/windows-arm-zx2c4
2021-04-06T22:55:58-b084073/windows-arm-zx2c4
2021-04-01T00:51:23-3304b22/windows-arm-zx2c4
2021-03-30T16:13:36-89b141c/windows-arm-zx2c4
2021-03-29T08:11:13-565e70f/windows-arm-zx2c4
2021-03-29T06:21:31-d10241f/windows-arm-zx2c4
2021-03-22T20:20:58-d9691ff/windows-arm-zx2c4
2021-03-10T19:33:23-ccf9ace/windows-arm-zx2c4
2021-03-08T20:03:01-bd37284/windows-arm-zx2c4
2021-03-05T23:32:34-7205a4f/windows-arm-zx2c4
2021-03-04T23:37:01-96a96a9/windows-arm-zx2c4
2021-03-04T21:47:26-9d3718e/windows-arm-zx2c4
2021-02-24T18:49:08-b7f62da/windows-arm-zx2c4
2021-02-24T12:37:13-26001d1/windows-arm-zx2c4
2021-02-23T18:39:53-c584f42/windows-arm-zx2c4
2021-02-05T22:35:11-b54cd94/windows-arm-zx2c4
2021-01-06T15:02:50-d213170/windows-arm-zx2c4
2020-12-30T22:39:48-0ae2e03/windows-arm-zx2c4
2020-12-16T09:01:04-75e16f5/windows-arm-zx2c4

CC @neild

@bcmills bcmills changed the title net/http: TestClientTimeout* failures with "missing timeout substring" on windows-arm-zx2c4 net/http: TestClientTimeout* failures with "missing timeout substring" Aug 31, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Nov 4, 2021

Curiously, this has still only ever occurred on windows-arm variants and openbsd-amd64-68. I don't know whether we can really discount the two OpenBSD failures, but the apparent specificity to windows-arm is puzzling. (Maybe this is a memory-model synchronization issue, like #48072?)

greplogs --dashboard -md -l -e '(?ms)FAIL: TestClientTimeout.*missing timeout substring' --since=2021-10-01

2021-11-03T18:37:22-cfd016d/windows-arm64-10
2021-10-22T21:26:30-5d414d1/windows-arm64-10
2021-10-14T18:43:59-765c911/windows-arm64-10
2021-10-14T04:18:44-9e4dc6f/windows-arm64-10
2021-10-05T15:47:57-017ffcd/windows-arm64-10

@gopherbot
Copy link

Change https://golang.org/cl/361275 mentions this issue: net/http: reduce TestClientTimeout_h{1,2} latency

gopherbot pushed a commit that referenced this issue Nov 4, 2021
The test had been setting an arbitrary 200ms timeout to allow the
server's handler to set up before timing out. That is not only
potentially flaky on slow machines, but also typically much longer
than necessary. Replace the hard-coded timeout with a much shorter
initial timeout, and use exponential backoff to lengthen it if needed.

This allows the test to be run about 20x faster in the typical case,
which may make it easier to reproduce rare failure modes by running
with a higher -count flag.

For #43120

Change-Id: I1e0d0ec99d5a107fff56e3bcc7174d686ec582d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/361275
Trust: Bryan C. Mills <bcmills@google.com>
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
@bcmills
Copy link
Contributor Author

bcmills commented Jan 5, 2022

greplogs --dashboard -md -l -e '(?ms)FAIL: TestClientTimeout.*missing timeout substring' --since=2021-11-05

2021-12-29T19:23:11-8a306e2/windows-arm64-10
2021-11-12T23:34:09-fdee1b2/windows-arm64-10

@bcmills
Copy link
Contributor Author

bcmills commented Jan 5, 2022

windows/arm64 is not currently listed as a first class port, so — given the consistency of the failing platform — I think a skip may be in order to reduce windows/arm64 builder flakiness.

I honestly don't know what to make of the couple of openbsd-amd64-68 failures, though.

(CC @bufflig)

@gopherbot
Copy link

Change https://golang.org/cl/375635 mentions this issue: net/http: skip TestClientTimeout_h{1,2} on windows/arm and windows/arm64

@bcmills
Copy link
Contributor Author

bcmills commented Jan 5, 2022

Marking as release-blocker until the skip is added, at which point this can be moved back to the backlog.

@bcmills bcmills modified the milestones: Backlog, Go1.18 Jan 5, 2022
@bcmills bcmills modified the milestones: Go1.18, Backlog Jan 5, 2022
gopherbot pushed a commit that referenced this issue Jan 5, 2022
These tests are empirically flaky on the windows/arm and windows/arm64
builders, with a consistent (but rare) failure mode.

This change skips the test if that particular failure mode is
encountered on those platforms; the skip can be removed if and when
someone has the time to pin down the root cause.

For #43120

Change-Id: Ie3a9a06bf47e3a907c7b07441acc1494a4631135
Reviewed-on: https://go-review.googlesource.com/c/go/+/375635
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
@bcmills
Copy link
Contributor Author

bcmills commented Jan 5, 2022

The skip is merged. Back to the Backlog. 😩

@bcmills
Copy link
Contributor Author

bcmills commented Jan 18, 2022

Apparently TestClientTimeout_Headers_h[12] are also affected by the same underlying bug:

--- FAIL: TestClientTimeout_Headers_h2 (0.01s)
    client_test.go:1342: error string = "Get \"https://127.0.0.1:53626\": context deadline exceeded"; missing timeout substring
--- FAIL: TestClientTimeout_Headers_h1 (0.01s)
    client_test.go:1342: error string = "Get \"http://127.0.0.1:53627\": context deadline exceeded"; missing timeout substring
2022/01/17 01:29:00 http: TLS handshake error from 127.0.0.1:53639: read tcp 127.0.0.1:53638->127.0.0.1:53639: wsarecv: An existing connection was forcibly closed by the remote host.
FAIL
FAIL	net/http	6.571s

greplogs --dashboard -md -l -e '(?ms)FAIL: TestClientTimeout.*missing timeout substring' --since=2022-01-06

2022-01-17T09:23:25-897b3da/windows-arm64-10

@bcmills bcmills modified the milestones: Backlog, Go1.18 Jan 18, 2022
@gopherbot
Copy link

Change https://golang.org/cl/379156 mentions this issue: net/http: skip TestClientTimeout_Headers_h{1,2} on windows/arm and windows/arm64

@bcmills bcmills changed the title net/http: TestClientTimeout* failures with "missing timeout substring" net/http: TestClientTimeout* failures with "missing timeout substring" on windows/arm* Jan 18, 2022
@bcmills bcmills added arch-arm Issues solely affecting the 32-bit arm architecture. arch-arm64 OS-Windows and removed release-blocker labels Jan 18, 2022
@bcmills bcmills modified the milestones: Go1.18, Backlog Jan 18, 2022
gopherbot pushed a commit that referenced this issue Jan 18, 2022
…ndows/arm64

This extends the skip added in CL 375635 to the "_Headers" variant of
the test, since we have observed similar failures in that variant on
the builders.

For #43120

Change-Id: Ib1c97fbb776b576271629272f3194da77913a941
Reviewed-on: https://go-review.googlesource.com/c/go/+/379156
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
jproberts pushed a commit to jproberts/go that referenced this issue Jun 21, 2022
These tests are empirically flaky on the windows/arm and windows/arm64
builders, with a consistent (but rare) failure mode.

This change skips the test if that particular failure mode is
encountered on those platforms; the skip can be removed if and when
someone has the time to pin down the root cause.

For golang#43120

Change-Id: Ie3a9a06bf47e3a907c7b07441acc1494a4631135
Reviewed-on: https://go-review.googlesource.com/c/go/+/375635
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
jproberts pushed a commit to jproberts/go that referenced this issue Jun 21, 2022
…ndows/arm64

This extends the skip added in CL 375635 to the "_Headers" variant of
the test, since we have observed similar failures in that variant on
the builders.

For golang#43120

Change-Id: Ib1c97fbb776b576271629272f3194da77913a941
Reviewed-on: https://go-review.googlesource.com/c/go/+/379156
Trust: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
@bcmills
Copy link
Contributor Author

bcmills commented Aug 22, 2022

Looks like the skip isn't quite wide enough — this can also manifest during the initial Get call if the deadline happens to fire before the headers are processed.

2022-08-08T16:40:58-e730703/windows-arm64-10:

--- FAIL: TestClientTimeout_h2 (0.01s)
    client_test.go:1260: attempting test with timeout 10ms
    client_test.go:1275: Get "https://127.0.0.1:51129/?nonce=0": context deadline exceeded
FAIL
FAIL	net/http	6.996s

@gopherbot
Copy link

Change https://go.dev/cl/425096 mentions this issue: net/http: skip Get flakes in TestClientTimeout tests on windows/arm4

gopherbot pushed a commit that referenced this issue Aug 22, 2022
There is already a skip in case of a later failure in the same test on
these platforms. Skip the failure if it occurs earlier too.

For #43120.

Change-Id: Id530370caa6a7df8cae593f6fdcb66871b86b125
Reviewed-on: https://go-review.googlesource.com/c/go/+/425096
Run-TryBot: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Damien Neil <dneil@google.com>
@bcmills
Copy link
Contributor Author

bcmills commented Aug 30, 2023

This test is still skipped at HEAD, but there have been a fair number of net/http fixes since it was added.

@neild, do you think it's worth unskipping at this point?

@bcmills bcmills changed the title net/http: TestClientTimeout* failures with "missing timeout substring" on windows/arm* net/http: TestClientTimeout* failures with "missing timeout substring" on Windows Oct 13, 2023
@bcmills bcmills removed arch-arm64 arch-arm Issues solely affecting the 32-bit arm architecture. labels Oct 13, 2023
@bcmills
Copy link
Contributor Author

bcmills commented Oct 13, 2023

Apparently not fixed: it reproduced on windows/amd64 in #63365 (comment).

(CC @golang/windows)

@gopherbot
Copy link

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "windows" && pkg == "net/http" && test ~ `^TestClientTimeout.*` && `missing timeout substring`
2023-10-04 02:55 windows-amd64-2016 go@9018601e net/http.TestClientTimeout_Headers (log)
--- FAIL: TestClientTimeout_Headers (0.00s)
    --- FAIL: TestClientTimeout_Headers/h1 (0.01s)
        client_test.go:1313: error string = "Get \"http://127.0.0.1:51755\": context deadline exceeded"; missing timeout substring

watchflakes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
Status: No status
Development

No branches or pull requests

4 participants