-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/go/internal/test: test concurrency limited by sequential printing #61233
Comments
I opened https://go-review.git.corp.google.com/c/go/+/508515 with approach 1 |
Change https://go.dev/cl/508515 mentions this issue: |
I know this isn't the most productive comment but its a bit frustrating to have found a regression causing tests to take 2x as long, and submit a fix, and not have any comment at all within 2 months. Is there anyway to move this forward? |
Sorry for the delay — I think I had mentally lumped this in with #60203, although it's clearly a separate issue. 😅 I've replied on the CL. |
In case it is helpful: I think we have run into this with the tests for https://github.com/DataDog/datadog-agent (thanks to @hush-hush for debugging this). Going from Go 1.19 to Go 1.20 makes the tests about 20-30% slower (an extra ~17 seconds for a 60 second test run). We can help verify a fix once one is ready, although it looks like the conversation on https://go.dev/cl/508515 has stalled (last comment Sept 6th). |
I'd love to pick this up again if there is some guidance on what would be an acceptable solution. |
I don't have a lot of context on this but I think bcmills's suggestion to log an event that contains the information about the ordering of the test packages sounds reasonable? @aclements , what do you think? |
We've also run into this issue. Running a large set of test packages slowed down about 6x due to this, from 15 minutes to 90 minutes. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What did you do?
Ran
go test -debug-trace=trace -debug-actiongraph=actiongraph ./...
What did you expect to see?
Tests are run with maximum concurrency
What did you see instead?
test run
tasks are started but not doing any work, blocking other work from progressing.Example simple reproducer:
Basically this just makes a few packages, one with a slow test, and a few empty packages.
When we run this, I would expect to have one worker blocked on the
b
test (sleep 1
), while the rest execute in the other worker and quickly complete.Instead, we see this:

This looks odd - the
e
package doesn't have any tests, so why istest run e
taking 1.3s?The reason seems to be related to https://go-review.googlesource.com/c/go/+/448357 (although I am not certain it worked different before that CL). One the
runTestActor
starts, it is immediately blocked on<-r.prev
. This, in turn, means it is blocked until the previous test (d
) finishes.Ultimately, this blocks an entire worker doing nothing.
A real world example of this can be found here (warning: 170mb). You can see it occurring in a few places, most prominently around 300s we have almost all 16 workers blocked, most/all on "no test files" packages.
I think based on the way actions are pushed into the
ready
queue, any action that has zero remaining dependencies may be pushed onto the queue. In some of my testing, I saw the occasionally some of the very last packages in the test ordering would get scheduled first - blocking an entire worker for almost the entire process.I believe this new logic is only need for
-json
mode. One option is to just enable it only if-json
is enabled.I tested this locally against the same repo as the "real world example" above.
With Go 1.20, the tests took 70s to execute. With the patch, they took 30s.
With the same patch, the simple reproducer looks as expected:

Before trace After trace (warning: 150mb)
Another approach is to add each
test run
action as a dependency for the previous, to serialize the ordering. However, this makes things far worse, as this is serializing things much stricter -- the current serialization just ensures the starts are serialized, while this approach would make sure the entire test execution is as well - not what we want.In my testing this made things worse.
A similar approach may be to make another action type
test start
, but I didn't test this outThe text was updated successfully, but these errors were encountered: