x/build/cmd/coordinator: runSubrepoTests (golang.org/x repo tests) should also check maxTestExecErrors constant #36226

dmitshur · 2019-12-20T01:17:34Z

In my investigation at #35581 (comment), I wrote:

It is intentional to keep retrying "communications failures" forever, because the expectation is that they should eventually succeed.

I'm seeing now that this isn't quite true. There is a constant defined:

// maxTestExecError is the number of test execution failures at which
// we give up and stop trying and instead permanently fail the test.
// Note that this is not related to whether the test failed remotely,
// but whether we were unable to start or complete watching it run.
// (A communication error)
const maxTestExecErrors = 3

The runTestsOnBuildlet method, which is called by runTests method, has block that checks if ti.numFail has reached maxTestExecErrors:

if err != nil {
	bc.MarkBroken() // prevents reuse
	for _, ti := range tis {
		ti.numFail++
		st.logf("Execution error running %s on %s: %v (numFails = %d)", ti.name, bc, err, ti.numFail)
		if err == buildlet.ErrTimeout {
			ti.failf("Test %q ran over %v limit (%v); saw output:\n%s", ti.name, timeout, execDuration, buf.Bytes())
		} else if ti.numFail >= maxTestExecErrors {
			ti.failf("Failed to schedule %q test after %d tries.\n", ti.name, maxTestExecErrors)
		} else {
			ti.retry()
		}
	}
	return
}

However, the runTests method is only used for the main Go repository, not golang.org/x repos:

if st.IsSubrepo() {
	remoteErr, err = st.runSubrepoTests()
} else {
	remoteErr, err = st.runTests(st.getHelpers())
}

So this bug is about making the golang.org/x repos path also use the maxTestExecErrors constant and give up after some number of tries.

It's low value to fix because we rarely run into a situation where communication errors happen 3 times or more; that happens most often due to other bugs which we need to fix anyway.

/cc @bradfitz @cagedmantis @toothrot

The text was updated successfully, but these errors were encountered:

dmitshur added Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done. FeatureRequest labels Dec 20, 2019

dmitshur added this to the Backlog milestone Dec 20, 2019

dmitshur mentioned this issue Dec 20, 2019

x/build/cmd/coordinator: trybots don't work on golang.org/dl repo #35581

Closed

dmitshur mentioned this issue Sep 20, 2022

x/build: no error for missing file in windows workspace #55145

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/build/cmd/coordinator: runSubrepoTests (golang.org/x repo tests) should also check maxTestExecErrors constant #36226

x/build/cmd/coordinator: runSubrepoTests (golang.org/x repo tests) should also check maxTestExecErrors constant #36226

dmitshur commented Dec 20, 2019

Navigation Menu

x/build/cmd/coordinator: runSubrepoTests (golang.org/x repo tests) should also check maxTestExecErrors constant #36226

x/build/cmd/coordinator: runSubrepoTests (golang.org/x repo tests) should also check maxTestExecErrors constant #36226

Comments

dmitshur commented Dec 20, 2019