Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: completed parallel subtests leak goroutines until earlier subtests finish #45127

Closed
bcmills opened this issue Mar 19, 2021 · 2 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Mar 19, 2021

Looking at the goroutine dump in https://build.golang.org/log/a70cdaaff0171cc2515eac00f3e909829c7309c2, there is a huge amount of useless noise from blocked goroutines.

Many of the goroutines are blocked on t.Parallel(), for which I may file a separate issue. However, many of them are blocked — for many minutes at a time! — waiting to notify the parent that the subtest is complete.

~279 copies in the linked test output:

goroutine 312 [chan send, 8 minutes]:
testing.tRunner.func1.1(0xc0008edd7f, 0x0, 0x0, 0xc000655980, 0x1)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1166 +0x90
testing.tRunner.func1(0xc000655980)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1227 +0x62b
testing.tRunner(0xc000655980, 0xc0006249c0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1240 +0x256
created by testing.(*T).Run
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1281 +0x78c

~123 copies:

goroutine 412 [chan send, 8 minutes]:
testing.tRunner.func1.1(0xc0009b779f, 0x0, 0x0, 0xc00073f980, 0x1)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1166 +0x90
testing.tRunner.func1(0xc00073f980)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1227 +0x62b
runtime.Goexit()
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/runtime/panic.go:633 +0x1da
testing.(*common).SkipNow(0xc00073f980)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:843 +0x65
testing.(*common).Skip(0xc00073f980, 0x0, 0x0, 0x0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:823 +0x8a
cmd/go_test.(*testScript).cmdSkip(0xc000317cc0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/script_test.go:849 +0x21e
cmd/go_test.(*testScript).run(0xc000317cc0)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/script_test.go:393 +0xb22
cmd/go_test.TestScript.func1(0xc00073f980)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/cmd/go/script_test.go:93 +0x3fc
testing.tRunner(0xc00073f980, 0xc00018cb40)
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1236 +0x22e
created by testing.(*T).Run
	/var/folders/kh/5zzynz152r94t18yzstnrwx80000gn/T/workdir-host-darwin-10_15/go/src/testing/testing.go:1281 +0x78c

The blocked channel send for both of those traces is here:

go/src/testing/testing.go

Lines 1163 to 1166 in 9289c12

// Only report that the test is complete if it doesn't panic,
// as otherwise the test binary can exit before the panic is
// reported to the user. See issue 41479.
t.signal <- signal

The purpose of the channel seems to be to keep the parent test blocked until the subtest either finishes or calls t.Parallel:

t.signal <- true // Release calling test.

If the subtest calls t.Parallel, the subtest is added to the sub field for the parent test:

go/src/testing/testing.go

Lines 1045 to 1046 in 3979fb9

// Add to the list of tests to be released by the parent.
t.parent.sub = append(t.parent.sub, t)

The parent later waits for the parallel subtests to complete:

go/src/testing/testing.go

Lines 1158 to 1161 in 3979fb9

// Wait for subtests to complete.
for _, sub := range t.sub {
<-sub.signal
}


It is not at all obvious to me why the t.signal channel is unbuffered. The sender does not appear to care whether the parent has actually received the signal, and does not appear to do any additional work after the send that would require synchronizing on it.

For tests with a lot of parallel subtests, such as cmd/go.TestScript, the amount of noise introduced by this (temporary) leak can be quite significant.

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 19, 2021
@bcmills bcmills added this to the Backlog milestone Mar 19, 2021
@bcmills
Copy link
Contributor Author

bcmills commented Mar 19, 2021

https://play.golang.org/p/9GYO0SD0JfB illustrates the pollution from these goroutines. The leaky subtests are reported as passing pretty much immediately, but nonetheless appear in the goroutine dump when the slow subtest finally panics.

@gopherbot
Copy link

Change https://golang.org/cl/303330 mentions this issue: testing: allow parallel-subtest goroutines to exit when the subtest is complete

@bcmills bcmills self-assigned this Mar 19, 2021
@bcmills bcmills modified the milestones: Backlog, Go1.17 Mar 19, 2021
@golang golang locked and limited conversation to collaborators Mar 19, 2022
@rsc rsc unassigned bcmills Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

2 participants