Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto/x509: TestHybridPool failures on Windows with x509: certificate signed by unknown authority #51599

Closed
bcmills opened this issue Mar 10, 2022 · 13 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 OS-Windows release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Mar 10, 2022

--- FAIL: TestHybridPool (15.02s)
    hybrid_pool_test.go:66: verification failed for google.com chain (empty pool): x509: certificate signed by unknown authority
FAIL
FAIL	crypto/x509	31.546s

greplogs --dashboard -md -l -e 'FAIL: TestHybridPool'

2022-03-10T16:06:29-1cf6770/windows-amd64-longtest

(attn @golang/security for triage)

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Mar 10, 2022
@bcmills bcmills added this to the Go1.19 milestone Mar 10, 2022
@rolandshoemaker rolandshoemaker self-assigned this Mar 18, 2022
@bcmills
Copy link
Contributor Author

bcmills commented Mar 30, 2022

greplogs --dashboard -md -l -e 'FAIL: TestHybridPool' --since=2022-03-11

2022-03-24T17:50:47-b95d332/windows-amd64-longtest

@gopherbot
Copy link

Change https://go.dev/cl/397694 mentions this issue: crypto/x509: local platform verifier tests on trybots

@bcmills bcmills changed the title crypto/x509: TestHybridPool failure with x509: certificate signed by unknown authority crypto/x509: TestHybridPool failures on Windows with x509: certificate signed by unknown authority May 12, 2022
@gopherbot
Copy link

Change https://go.dev/cl/405914 mentions this issue: crypto/x509: attempt to prime windows root pool before hybrid test

gopherbot pushed a commit that referenced this issue May 12, 2022
In TestHybridPool attempt to prime to the windows root pool before
the real test actually happens. This is a bit of a band-aid, with
a better long term solution discussed in #52108.

Updates #51599

Change-Id: I406add8d9cd9e3fae37bfc20b97f5479c10a52c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/405914
Reviewed-by: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Roland Shoemaker <roland@golang.org>
@bcmills
Copy link
Contributor Author

bcmills commented May 13, 2022

Due to #52591 the windows-amd64-longtest builders have only completed one run since the above CL went in, but so far they haven't failed with this mode again.

Marking WaitingForInfo to remind us to recheck the fix after more test runs have completed.

@bcmills bcmills added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 13, 2022
@bcmills
Copy link
Contributor Author

bcmills commented May 18, 2022

There was a rash of failures yesterday — again all on Windows — but the failure mode is slightly different. 🤔

--- FAIL: TestPlatformVerifier (0.08s)
    --- FAIL: TestPlatformVerifier/wrong_host_for_leaf (0.02s)
        root_windows_test.go:109: unexpected verification error: got "x509: certificate has expired or is not yet valid: ", want "x509: certificate is valid for *.badssl.com, badssl.com, not wrong.host.badssl.com"
FAIL
FAIL	crypto/x509	1.942s

greplogs -l -e 'FAIL: TestPlatformVerifier .*(?:\n[ ]{4}.*)*badssl\.com' --since=2022-05-13
2022-05-17T18:55:46-668041e/windows-amd64-longtest
2022-05-17T18:19:34-076039a/windows-amd64-longtest
2022-05-17T18:13:13-afd181c/windows-amd64-longtest
2022-05-17T17:54:33-cc4957a/windows-amd64-longtest
2022-05-17T17:29:34-cf5fa2b/windows-amd64-longtest
2022-05-17T16:09:37-770e0e5/windows-amd64-longtest
2022-05-17T14:39:18-cf7ec0f/windows-amd64-longtest

@bcmills bcmills removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label May 18, 2022
@bcmills
Copy link
Contributor Author

bcmills commented May 18, 2022

I think the above failures may have been due to a misconfiguration of the badssl.com service itself, although it does seem strange to me that it only failed on Windows.

I wonder if the Windows root store ended up pulling in one of the badssl.com certificates but not the others, causing the error to be reported from the wrong certificate? Or maybe the badssl.com service itself was misconfigured for a while...

@dmitshur
Copy link
Contributor

dmitshur commented Jun 1, 2022

@rolandshoemaker This is currently blocking beta 1. Any updates and do you expect it's safe to resolve after beta 1?

@rolandshoemaker
Copy link
Member

Sorry for the lag. I think this is reasonable to resolve after beta 1. This failure isn't due to new behavior, it's simply surfacing behavior that always existed but was never tested. Any resolution (likely) requires changes to builders, which seems unlikely to happen before the beta. If we just want to remove this flake until we re-open the tree for 1.20, it would probably be reasonable to just skip this test until the tree re-opens.

@heschi heschi added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Jun 6, 2022
@heschi
Copy link
Contributor

heschi commented Jun 8, 2022

@rolandshoemaker It seems like there's some confusion about whether this is a bug in the test, or a bug in badssl.com. If the former, is it feasible to fix before the final release? If the latter, is this test valuable enough to risk having a release get stuck due to it being misconfigured again?

@rolandshoemaker
Copy link
Member

There are two different failure modes described here, the first is the failures in TestHybridPool, which are an inherent problem in how the test and the Windows root store interact (sometimes priming the pool fails/times out before our call to CertGetCertificateChain returns), and the second, the failures in TestPlatformVerifier , was that badssl.com was (for some short period of time) serving an expired certificate.

For the first issue, after landing CL 405914 I don't think we've seen these failures again, but they could still plausibly happen (if the builders are, for whatever reason, being incredibly slow).

For the second, you are right that we are entirely reliant on badssl.com being properly configured, and they could realistically break us by doing something bad.

I opened #52108 to solve both of these issues, but the solution is... complicated, and seems unlikely to happen before the final release. If we want to make sure these tests don't block the release, I think it's reasonable to just add skips until the release and then fix #52108 in the 1.20 cycle (this behavior has been included since 1.18, so we're relatively confident it's not horribly broken.)

@heschi
Copy link
Contributor

heschi commented Jun 9, 2022

IMO it's up to you if you want to add the skips; you're the one who has perspective on how valuable the tests are. Unless we have reason to expect badssl.com to break again it doesn't seem urgent to me and in that case I'd vote to close this issue.

@rolandshoemaker
Copy link
Member

I think it's fine to close this out, if it pops back up we can revisit.

@golang golang locked and limited conversation to collaborators Jun 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 OS-Windows release-blocker
Projects
None yet
Development

No branches or pull requests

5 participants