New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net: TestDialTimeout flake on linux-amd64-sid #22896
Comments
I've looked through all the code and I've convinced myself that this can not happen. I also have not been able to reproduce it. That said, here is another instance: https://build.golang.org/log/95a745a6f8fed78a9b854b3cbad5986275ffc0cd |
It looks to mee that there is an error in the log-message, it should have logged |
The |
@nilsmagnus Logging |
Binding on port 0 is a request to the kernel to look for a free port and bind to it. It actually does not bind to 0. Therefore, it should not be possible to connect to port 0. But that should not be an issue here since the socket filter will prevent the connection from happening in the first place. However, in both the buildlet failures, the failure happens on item
@bradfitz - Anything suspicious about these values that can cause a flake ? I tried very hard to reproduce but could not. |
ping @mikioh for above question. |
I guess that one possibility is that the buildbot, cmd/go or testing package did run the net.test process incorrectly, without the harness of net/internal/socktest package. |
@mikioh I don't see any way that that could happen. |
I'm also not sure the mechanism, but the evidence, the test failed within 0.00s and attempted to connect to the literal port number zero (as a boobytrap set by me), leads me to the possibility. |
Hm, the harness is naively designed to be unstrapped before the execution of benchmarking, so it depends on the execution order of testing and benchmarking. It's likely that unstrapping happens if the testing goroutine is overtaken by one of benchmarking goroutines, like #21090. |
Here are the cases I can find today. It's perhaps interesting that it always seems to be case 5. 2017-04-17T18:33:08-73f283f/linux-ppc64-buildlet:
2017-04-21T23:43:59-49f54e8/linux-ppc64-buildlet:
2017-04-22T00:53:51-c202d4d/linux-amd64-clang:
2017-04-28T15:32:18-6511931/linux-ppc64-buildlet:
2017-08-28T05:26:59-eb07028/linux-amd64-alpine:
2017-09-23T09:24:02-6171d05/linux-amd64-alpine:
2017-10-05T08:28:41-be272ec/linux-ppc64le-buildlet:
2017-10-08T20:14:33-e285f39/linux-amd64-alpine:
2017-11-02T03:33:53-b66a294/linux-amd64:
2017-11-03T17:45:11-14f2bfd/linux-amd64-clang:
2017-11-10T04:40:13-17f35c6/linux-amd64-nocgo:
2017-11-22T22:37:15-ecc8650/linux-amd64-sid:
2017-11-28T01:18:54-b5c7183/linux-amd64-sid:
|
These failures are occurring on the builders, which as far as I know don't run the benchmarks, so I don't think the problem could be the benchmarks uninstalling the test hooks. |
Then, there might be a race on the callpath switch between test cases; enableSocketConnect? sounds plausible. Adding per-test case filter feature into socktest.Switch might be a solution. |
Change https://golang.org/cl/105095 mentions this issue: |
This has not happened since 2017-11-28. I don't know what fixed it, but I think it must be fixed. |
Is it supposed to be dialing 127.0.0.1:0 (the 0 port is weird):
https://build.golang.org/log/450b8cb212512d306aed03190d275f546c332715
/cc @mikioh
The text was updated successfully, but these errors were encountered: