Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing: race detected in goroutine can report an error in an unnamed test #60083

Closed
ianlancetaylor opened this issue May 9, 2023 · 16 comments
Closed
Assignees
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@ianlancetaylor
Copy link
Contributor

Test case:

package foo_test

import (
	"fmt"
	"testing"
)

func makeRace() {
	var x int
	c := make(chan bool)
	go func() {
		x = 1
		c <- true
	}()
	fmt.Println(x)
	<-c
}

func TestFoo(t *testing.T) {
	makeRace()
}

Running this test with go test -race -test.v prints

=== RUN   TestFoo
0
==================
WARNING: DATA RACE
Write at 0x00c000014288 by goroutine 8:
  command-line-arguments_test.makeRace.func1()
      /home/iant/foo_test.go:12 +0x33

Previous read at 0x00c000014288 by goroutine 7:
  command-line-arguments_test.makeRace()
      /home/iant/foo_test.go:15 +0x10c
  command-line-arguments_test.TestFoo()
      /home/iant/foo_test.go:20 +0x1c
  testing.tRunner()
      /home/iant/go/src/testing/testing.go:1595 +0x238
  ??()
      -:0 +0x5243c4

Goroutine 8 (running) created at:
  command-line-arguments_test.makeRace()
      /home/iant/foo_test.go:11 +0xfc
  command-line-arguments_test.TestFoo()
      /home/iant/foo_test.go:20 +0x1c
  testing.tRunner()
      /home/iant/go/src/testing/testing.go:1595 +0x238
  ??()
      -:0 +0x5243c4

Goroutine 7 (running) created at:
  testing.(*T).Run()
      /home/iant/go/src/testing/testing.go:1648 +0x835
  testing.runTests.func1()
      /home/iant/go/src/testing/testing.go:2054 +0x84
  testing.tRunner()
      /home/iant/go/src/testing/testing.go:1595 +0x238
  testing.runTests()
      /home/iant/go/src/testing/testing.go:2052 +0x896
  testing.(*M).Run()
      /home/iant/go/src/testing/testing.go:1925 +0xb57
  main.main()
      _testmain.go:47 +0x2bd
==================
    testing.go:1465: race detected during execution of test
--- FAIL: TestFoo (0.00s)
=== NAME  
    testing.go:1465: race detected during execution of test
FAIL
FAIL	command-line-arguments	0.007s
FAIL

Note the === NAME near the end. A === NAME line should be followed by the name of the test. Omitting the name is confusing and can break code that is parsing the test output.

@cherrymui cherrymui added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 9, 2023
@cherrymui cherrymui added this to the Backlog milestone May 9, 2023
@cherrymui
Copy link
Member

cc @bcmills

@bcmills
Copy link
Contributor

bcmills commented May 9, 2023

I think this is plausibly correct? The empty NAME line could mean “this line is independent of the individual tests”, which is true for race reports in general (albeit perhaps not this specific case).

@ianlancetaylor
Copy link
Contributor Author

I tend to think that an empty NAME line is confusing and unhelpful. If we want to say that the report is independent of individual tests, let's say that.

@bcmills
Copy link
Contributor

bcmills commented May 9, 2023

I believe it is currently necessary in order for cmd/test2json to correctly interpret the output, but perhaps it could be omitted (or changed) when -test.v is not set to test2json.

@ianlancetaylor
Copy link
Contributor Author

Can you expand on that? test2json currently doesn't seem to count race detector reports at all.

@cip999
Copy link

cip999 commented May 9, 2023

Hi! While we're at it, let me point out a related "weird" pattern in race error messages for tests. Unlike other errors, these get propagated to all the ancestors of the subtest that generated the race, and the same hardcoder error message (testing.go:1465: race detected during execution of test) is printed for all of them. [Since the ancestors include the "root" test that encapsulates the whole suite, and which has an empty name, this causes the === NAME line to be printed.]

Just wanted to mention this before I forgot. It's a different problem (if we want to call it that) from the subject of this issue, in the sense that fixing any of them won't solve the other. Maybe it's worth filing another bug for this?

@ianlancetaylor
Copy link
Contributor Author

@cip999 Thanks, I was just looking at that.

@cip999
Copy link

cip999 commented May 9, 2023

The reason for that is how race errors are counted:

  • Total number of race errors (race.Errors()) is logged right before running the (sub)test.
  • (Sub)test is run.
  • Total number of race errors is logged again, and if greater than the previous one the race detection error is thrown.

(By the way, for some reason this is done by storing the negative of the first number, adding the second number, and checking whether the result is greater than 0. I'm confused.)

It shouldn't be difficult to fix it.

@ianlancetaylor
Copy link
Contributor Author

Using the negative count and then adding to it is because the number of race detector errors is monotonically increasing. We want to do the right thing when we run multiple tests, and some have race errors.

@cip999
Copy link

cip999 commented May 9, 2023

Yes, my confusion was about why not storing the first error count (not its negative) and comparing it to the second one. But I think I figured it out, it's to handle the counting for parallel tests. Anyway, I'll stop diverging from the main subject. :)

@gopherbot
Copy link

Change https://go.dev/cl/494057 mentions this issue: testing: only report subtest races once

@cip999
Copy link

cip999 commented May 10, 2023

Following up on this comment thread, I investigated the way race errors are handled and reported in parallel tests.

There are a couple notable points.

  • Parallel tests are run after the parent test has finished. Most importantly, the order of the events is the following: parent test function returns -> race errors for parent test are counted (and possibly reported) -> parallel subtests are released.
  • This means that direct parents of parallel tests aren't affected by this issue: if a race happens inside a parallel test, by that time its parent will have already logged the number of races. However, the race will be reported for the parent's parent (unless the parent is also run in parallel) and all previous ancestors.
  • As a concrete example, if we call t.Parallel() inside TestFoo in the proof of concept above, we no longer get the === NAME line and the race is reported only once.

[Incidentally, the semantics of T.Parallel is a bit counterintuitive; but let's not go there.]

Now, what happens with the proposed change? What I said in the comments isn't exactly true. Parallel tests will still be "deferred" to when race errors for the parent have been already reported, so the parent won't be affected at all. However, the parent's parent (and all previous ancestors) will be: if a race happens during parallel execution of two or more sub-sub-tests, the race count will be decremented by 2 or more (instead of 1). Then, if another race happens outside the subtests, it won't be detected. (Disclaimer: I haven't verified this, I'm too lazy to patch the CL...)

Admittedly, sub-sub-tests are seldom a thing (if ever). Still, something that works "by chance" is not super solid.

Edit: Just to clarify, a minimal working example of the scenario I describe should be the following:

func makeRace() {
  var x int
  c := make(chan bool)
  go func() {
    x = 1
    c <- true
  }()
  fmt.Println(x)
  <-c
}

func TestFoo(t *testing.T) {
  t.Run("subtest", func(t *testing.T) {
    c := make(chan bool)

    t.Run("first subsubtest", func(t *testing.T) {
      t.Parallel()
      <-c
      makeRace() // All the signaling is to make sure that this happens inside both subsubtests.
      c <- true
    })

    t.Run("second subsubtest", func(t *testing.T) {
      t.Parallel()
      c <- true
      <-c
    })
  })
}

@ianlancetaylor
Copy link
Contributor Author

CL is being reverted.

@gopherbot
Copy link

Change https://go.dev/cl/501895 mentions this issue: testing: correctly handle race error detection in subtests

@gopherbot
Copy link

Change https://go.dev/cl/506300 mentions this issue: testing: use monotonic counts to attribute races in subtests

@gopherbot
Copy link

Change https://go.dev/cl/506755 mentions this issue: testing: simplify concurrency and cleanup

@bcmills bcmills modified the milestones: Backlog, Go1.22 Jun 28, 2023
@bcmills bcmills self-assigned this Jun 28, 2023
gopherbot pushed a commit that referenced this issue Nov 21, 2023
While investigating #60083, I found a couple of bugs (notably #61034)
that had slipped through code review in part because the concurrency
patterns used in the testing package were too complex for me to fully
reason about. This change adjusts those patterns to be more in line
with current idioms, and to reduce the number of special cases that
depend on details that should be orthogonal. (For example: the details
of how we invoke the Cleanup functions should not depend on whether
the test happened to run any parallel subtests.)

In the process, this change fixes a handful of bugs:

- Concurrent calls to Run (explicitly allowed by TestParallelSub)
  could previously drive the testcontext.running count negative,
  causing the number of running parallel tests to exceed the -parallel
  flag.

- The -failfast flag now takes effect immediately on failure. It no
  longer delays until the test finishes, and no longer misses failures
  during cleanup (fixing #61034).

- If a Cleanup function calls runtime.Goexit (typically via t.FailNow)
  during a panic, Cleanup functions from its parent tests are no
  longer skipped and buffered logs from its parent tests are now
  flushed.

- The time reported for a test with subtests now includes the time spent
  running those subtests, regardless of whether they are parallel.
  (Previously, non-parallel subtests were included but parallel subtests
  were not.)

- Calls to (*B).Run in iterations after the first are now diagnosed
  with a panic. (This diagnoses badly-behaved benchmarks: if Run is
  called during the first iteration, no subsequent iterations are
  supposed to occur.)

Fixes #61034.

Change-Id: I3797f6ef5210a3d2d5d6c2710d3f35c0219b02ea
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race,gotip-windows-amd64-longtest
Reviewed-on: https://go-review.googlesource.com/c/go/+/506755
Auto-Submit: Bryan Mills <bcmills@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants