Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/metrics: data race detected in Read #53542

Closed
cristiangreco opened this issue Jun 24, 2022 · 17 comments
Closed

runtime/metrics: data race detected in Read #53542

cristiangreco opened this issue Jun 24, 2022 · 17 comments
Assignees
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Milestone

Comments

@cristiangreco
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.18.3 darwin/arm64

Does this issue reproduce with the latest release?

Yes, still reproducible with 1.19beta1.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/cristian/Library/Caches/go-build"
GOENV="/Users/cristian/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/cristian/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/cristian/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/opt/homebrew/Cellar/go/1.18.3/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/opt/homebrew/Cellar/go/1.18.3/libexec/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.18.3"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/98/fgzhqvmx7x1d61fqsvw9cg9m0000gn/T/go-build1308397819=/tmp/go-build -gno-record-gcc-switches -fno-common"
GOROOT/bin/go version: go version go1.18.3 darwin/arm64
GOROOT/bin/go tool compile -V: compile version go1.18.3
uname -v: Darwin Kernel Version 21.5.0: Tue Apr 26 21:08:29 PDT 2022; root:xnu-8020.121.3~4/RELEASE_ARM64_T8101
ProductName:	macOS
ProductVersion:	12.4
BuildVersion:	21F79
lldb --version: lldb-1316.0.9.46
Apple Swift version 5.6.1 (swiftlang-5.6.0.323.66 clang-1316.0.20.12)

What did you do?

runtime/metrics says that "It is safe to execute multiple Read calls concurrently". However, running the following code with race detector enabled suggests a data race. Unsure whether this is a false positive (metrics.Read uses a semaphore lock).

package main

import (
        "fmt"
        "runtime/metrics"
        "sync"
)

func main() {
        var wg sync.WaitGroup

        for i := 0; i < 2; i++ {
                wg.Add(1)
                go func() {
                        sample := make([]metrics.Sample, 1)
                        sample[0].Name = "/gc/heap/allocs:bytes"
                        metrics.Read(sample)
                        fmt.Printf("bytes allocated: %d\n", sample[0].Value.Uint64())
                        wg.Done()
                }()
        }

        wg.Wait()
}

What did you expect to see?

No race detected.

What did you see instead?

Data race detected.

$ go run -race main.go
bytes allocated: 511392
==================
WARNING: DATA RACE
Read at 0x00c000100030 by goroutine 8:
  runtime.evacuate_fast64()
      /opt/homebrew/Cellar/go/1.18.3/libexec/src/runtime/map_fast64.go:376 +0x3fc
  runtime/metrics.runtime_readMetrics()
      /opt/homebrew/Cellar/go/1.18.3/libexec/src/runtime/metrics.go:563 +0xc0

Previous write at 0x00c000100030 by goroutine 7:
  runtime.mapaccess2_faststr()
      /opt/homebrew/Cellar/go/1.18.3/libexec/src/runtime/map_faststr.go:108 +0x43c
  runtime.initMetrics()
      /opt/homebrew/Cellar/go/1.18.3/libexec/src/runtime/metrics.go:67 +0x254

Goroutine 8 (running) created at:
  main.main()
      /Users/cristian/Desktop/main.go:14 +0x50

Goroutine 7 (finished) created at:
  main.main()
      /Users/cristian/Desktop/main.go:14 +0x50
==================
bytes allocated: 552264
Found 1 data race(s)
exit status 66
@ianlancetaylor ianlancetaylor changed the title Data race detected in metrics.Read runtime/metrics: data race detected in Read Jun 24, 2022
@ianlancetaylor
Copy link
Contributor

CC @golang/runtime

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 24, 2022
@ianlancetaylor ianlancetaylor added this to the Go1.19 milestone Jun 24, 2022
@prattmic
Copy link
Member

I believe this is a false positive, as semacquire/semrelease aren't race instrumented.

cc @mknyszek

@aclements
Copy link
Member

Given that readMetrics and initMetrics are both in package runtime (readMetrics is linknamed to runtime/metrics, but race instrumentation just looks at the package we're compiling), why did this get any race instrumentation at all?

@aclements
Copy link
Member

Oh, readMetrics didn't get any race instrumentation. The race instrumentation in this case is hand-coded in the map implementation (and hence insensitive to what package it's being called from). Probably we just need a little manual race instrumentation around the map usage in the metrics code.

@aclements
Copy link
Member

This isn't new in 1.19, so it doesn't have to be a release-blocker. It's also probably easy to fix, so for now I'll leave it marked just so we don't lose track.

@gopherbot
Copy link

Change https://go.dev/cl/414517 mentions this issue: runtime: add race annotations to metricsSema

@prattmic
Copy link
Member

@gopherbot please backport to 1.17 and 1.18.

This triggers a false positive in the race detector for programs calling runtime/metrics.Read from multiple goroutines. There is a workaround: use a single global lock around all calls to metrics.Read in the program. However, that is quite invasive and has serious performance implications for a high-performance API.

@gopherbot
Copy link

Backport issue(s) opened: #53589 (for 1.17), #53590 (for 1.18).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://go.dev/wiki/MinorReleases.

@cherrymui
Copy link
Member

Would it make sense to change the map functions to not call the race functions if the caller PC is in the runtime package? It would add the overhead of a PC lookup, but it can be limited to race mode, so maybe it could be acceptable?

@prattmic
Copy link
Member

I don't think it is necessary. Maps aren't too widely used in the runtime, and I'd prefer us to be able to expand the number of races in the runtime rather than turning it off. It would be nice to catch these quicker though.

@cherrymui
Copy link
Member

I recall Windows callback is another place that uses a map and causes false race, #50249 .

@prattmic
Copy link
Member

Argh. I sent a fix for that too, but that one is nastier because it is called prior to scheduler initialization, which is fine for the raceread/racewrite in the map, but not raceacquire/racerelease.

@prattmic
Copy link
Member

That one does me a bit more inclined to turn it off in the map.

@prattmic
Copy link
Member

prattmic commented Jun 28, 2022

FWIW, I did a quick benchmark. A non-race map access is ~3ns. A race map access is ~17ns. With an additional runtime pc check (for runtime. prefix), a race map access is ~32ns [1].

This isn't horrible given we're in race mode, but it is still pretty large.

[1] My map was in runtime_test, so the string comparison had to go through most of the characters before failing.

@cherrymui
Copy link
Member

Thanks for the benchmark.

There are a few places where we check whether a PC is in the runtime. Maybe we could provide a 1-bit metadata for whether a function is a runtime function, to avoid the string comparison. But I guess in this case most of the overhead is probably the PC lookup, instead of the string comparison. Maybe we could make it simply an address range comparison, as the linker already lay out runtime functions together (mostly, I think).

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Jun 29, 2022
@gopherbot
Copy link

Change https://go.dev/cl/415195 mentions this issue: [release-branch.go1.18] runtime: add race annotations to metricsSema

@gopherbot
Copy link

Change https://go.dev/cl/415196 mentions this issue: [release-branch.go1.17] runtime: add race annotations to metricsSema

gopherbot pushed a commit that referenced this issue Jul 6, 2022
metricsSema protects the metrics map. The map implementation is race
instrumented regardless of which package is it called from.

semacquire/semrelease are not automatically race instrumented, so we can
trigger race false positives without manually annotating our lock
acquire and release.

See similar instrumentation on trace.shutdownSema and reflectOffs.lock.

Fixes #53589.
For #53542.

Change-Id: Ia3fd239ac860e037d09c7cb9c4ad267391e70705
Reviewed-on: https://go-review.googlesource.com/c/go/+/414517
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
(cherry picked from commit d6481d5)
Reviewed-on: https://go-review.googlesource.com/c/go/+/415196
gopherbot pushed a commit that referenced this issue Jul 6, 2022
metricsSema protects the metrics map. The map implementation is race
instrumented regardless of which package is it called from.

semacquire/semrelease are not automatically race instrumented, so we can
trigger race false positives without manually annotating our lock
acquire and release.

See similar instrumentation on trace.shutdownSema and reflectOffs.lock.

Fixes #53590.
For #53542.

Change-Id: Ia3fd239ac860e037d09c7cb9c4ad267391e70705
Reviewed-on: https://go-review.googlesource.com/c/go/+/414517
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
(cherry picked from commit d6481d5)
Reviewed-on: https://go-review.googlesource.com/c/go/+/415195
bradfitz pushed a commit to tailscale/go that referenced this issue Jul 14, 2022
metricsSema protects the metrics map. The map implementation is race
instrumented regardless of which package is it called from.

semacquire/semrelease are not automatically race instrumented, so we can
trigger race false positives without manually annotating our lock
acquire and release.

See similar instrumentation on trace.shutdownSema and reflectOffs.lock.

Fixes golang#53590.
For golang#53542.

Change-Id: Ia3fd239ac860e037d09c7cb9c4ad267391e70705
Reviewed-on: https://go-review.googlesource.com/c/go/+/414517
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
(cherry picked from commit d6481d5)
Reviewed-on: https://go-review.googlesource.com/c/go/+/415195
jproberts pushed a commit to jproberts/go that referenced this issue Aug 10, 2022
metricsSema protects the metrics map. The map implementation is race
instrumented regardless of which package is it called from.

semacquire/semrelease are not automatically race instrumented, so we can
trigger race false positives without manually annotating our lock
acquire and release.

See similar instrumentation on trace.shutdownSema and reflectOffs.lock.

Fixes golang#53542.

Change-Id: Ia3fd239ac860e037d09c7cb9c4ad267391e70705
Reviewed-on: https://go-review.googlesource.com/c/go/+/414517
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@golang golang locked and limited conversation to collaborators Jun 29, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done. release-blocker
Projects
None yet
Development

No branches or pull requests

8 participants