Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: race detector prints mis-leading stack trace #33309

Closed
dsnet opened this issue Jul 26, 2019 · 11 comments
Closed

cmd/compile: race detector prints mis-leading stack trace #33309

dsnet opened this issue Jul 26, 2019 · 11 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Jul 26, 2019

(discovered while debugging golang/protobuf#838)

Consider the following main.go:

package main

import (
	"sync"

	// github.com/golang/protobuf@v1.3.2
	"github.com/golang/protobuf/jsonpb"
	"github.com/golang/protobuf/proto"

	descriptorpb "github.com/golang/protobuf/protoc-gen-go/descriptor"
)

func main() {
	obj := &descriptorpb.FileDescriptorProto{
		Options: &descriptorpb.FileOptions{
			GoPackage: proto.String("path/to/my/package"),
		},
	}

	wg := &sync.WaitGroup{}
	defer wg.Wait()

	wg.Add(1)
	go func() {
		defer wg.Done()
		proto.Marshal(obj)
	}()

	wg.Add(1)
	go func() {
		defer wg.Done()
		(&jsonpb.Marshaler{}).MarshalToString(obj)
	}()
}

When running go run -race main.go, I see (-go1.12 +go1.11):

 ==================
 WARNING: DATA RACE
 Read at 0x000000000000 by goroutine 7:
   reflect.typedmemmove()
       /usr/local/go/src/runtime/mbarrier.go:177
   reflect.packEface()
       /usr/local/go/src/reflect/value.go:119
   reflect.valueInterface()
-      /usr/local/go/src/reflect/value.go:1009
+      /usr/local/go/src/reflect/value.go:978
+  reflect.Value.Interface()
+      /usr/local/go/src/reflect/value.go:948
   github.com/golang/protobuf/jsonpb.(*Marshaler).marshalValue()
-      /usr/local/go/src/reflect/value.go:979
+      /usr/local/protobuf-golang/jsonpb/jsonpb.go:534
   github.com/golang/protobuf/jsonpb.(*Marshaler).marshalField()
       /usr/local/protobuf-golang/jsonpb/jsonpb.go:488
   github.com/golang/protobuf/jsonpb.(*Marshaler).marshalObject()
       /usr/local/protobuf-golang/jsonpb/jsonpb.go:346
   github.com/golang/protobuf/jsonpb.(*Marshaler).Marshal()
       /usr/local/protobuf-golang/jsonpb/jsonpb.go:139
   github.com/golang/protobuf/jsonpb.(*Marshaler).MarshalToString()
       /usr/local/protobuf-golang/jsonpb/jsonpb.go:145
   main.main.func2()
       /Users/rawr/blah/test.go:31

 Previous write at 0x000000000000 by goroutine 6:
   sync/atomic.StoreInt32()
       /usr/local/go/src/runtime/race_amd64.s:229
   github.com/golang/protobuf/proto.(*marshalInfo).size()
       /usr/local/protobuf-golang/proto/table_marshal.go:203
   github.com/golang/protobuf/proto.makeMessageMarshaler.func1()
       /usr/local/protobuf-golang/proto/table_marshal.go:2223
   github.com/golang/protobuf/proto.(*marshalInfo).size()
       /usr/local/protobuf-golang/proto/table_marshal.go:183
   github.com/golang/protobuf/proto.(*InternalMessageInfo).Size()
       /usr/local/protobuf-golang/proto/table_marshal.go:125
   github.com/golang/protobuf/protoc-gen-go/descriptor.(*FileDescriptorProto).XXX_Size()
       /usr/local/protobuf-golang/protoc-gen-go/descriptor/descriptor.pb.go:435
   github.com/golang/protobuf/proto.Marshal()
       /usr/local/protobuf-golang/proto/table_marshal.go:2713
   main.main.func1()
       /Users/rawr/blah/test.go:25

 Goroutine 7 (running) created at:
   main.main()
       /Users/rawr/blah/test.go:29

 Goroutine 6 (finished) created at:
   main.main()
       /Users/rawr/blah/test.go:23
 ==================

The interesting line is /usr/local/protobuf-golang/jsonpb/jsonpb.go:534, which is missing in the stack trace on Go1.12, presumably because of mid-stack inlining.

@agnivade
Copy link
Contributor

@randall77 @dvyukov

@agnivade agnivade added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 27, 2019
@agnivade agnivade added this to the Go1.14 milestone Jul 27, 2019
@randall77
Copy link
Contributor

I think the race detector traceback code isn't handling mid-stack inlining correctly.

As far as I can tell, the race detector does its own stack traceback, then just asks the runtime to symbolize a bunch of pcs. It requests one pc at a time, and we can only give a single response to each pc. So there's no way to report multiple frames per pc. Kind of strangely, we report the function name of the outermost function, but the file/line of the innermost function (this kind of makes sense for profiling, but is weird in this situation). So if f calls g and g is inlined, we get only one stack frame printed, declaring that the function is f but the file/line is in g.

Dmitry, can you think of any way to fix this?

Simple repro:

package main

func main() {
	c := make(chan bool)
	go f(c)
	go f(c)
	<-c
	<-c
}

func f(c chan bool) {
	g(c)
}

func g(c chan bool) {
	x = 7
	c <- true
}

var x int

Generates

==================
WARNING: DATA RACE
Write at 0x0000005265b0 by goroutine 7:
  main.f()
      /usr/local/google/home/khr/gowork/tmp1.go:16 +0x3b

Previous write at 0x0000005265b0 by goroutine 6:
  main.f()
      /usr/local/google/home/khr/gowork/tmp1.go:16 +0x3b

Goroutine 7 (running) created at:
  main.main()
      /usr/local/google/home/khr/gowork/tmp1.go:6 +0x8b

Goroutine 6 (running) created at:
  main.main()
      /usr/local/google/home/khr/gowork/tmp1.go:5 +0x69
==================

If I add a go:noinline to g, then I get the right answer:

==================
WARNING: DATA RACE
Write at 0x0000005265b0 by goroutine 7:
  main.g()
      /usr/local/google/home/khr/gowork/tmp1.go:17 +0x3a
  main.f()
      /usr/local/google/home/khr/gowork/tmp1.go:12 +0x38

Previous write at 0x0000005265b0 by goroutine 6:
  main.g()
      /usr/local/google/home/khr/gowork/tmp1.go:17 +0x3a
  main.f()
      /usr/local/google/home/khr/gowork/tmp1.go:12 +0x38

Goroutine 7 (running) created at:
  main.main()
      /usr/local/google/home/khr/gowork/tmp1.go:6 +0x8b

Goroutine 6 (running) created at:
  main.main()
      /usr/local/google/home/khr/gowork/tmp1.go:5 +0x69
==================

@dvyukov
Copy link
Member

dvyukov commented Sep 15, 2019

To fix this one would need to extend the interface between tsan and Go runtime to be able to pass multiple frames. The tsan runtime supported multiple frames per PC from day one, so it's just a matter of extending the interface here:
https://github.com/llvm-mirror/compiler-rt/blob/master/lib/tsan/go/tsan_go.cpp#L56-L71

Updating the syso files may be a bit challenging. We don't do this regularly, so some things may get rotten. Not sure if racebuild is working today.

randall77 added a commit to randall77/compiler-rt that referenced this issue Sep 16, 2019
…rames for a single PC

This fix allows tsan to report stack traces correctly even in the
presence of mid-stack inlining by the Go compiler.

See golang/go#33309
@gopherbot
Copy link

Change https://golang.org/cl/195781 mentions this issue: runtime: allow the Go runtime to return multiple stack frames for a single PC

@randall77
Copy link
Contributor

CLs out for the fixes in both tsan and the runtime, so they can correctly exchange info about mid-stack inlining.

I'm having trouble with racebuild. I seem to be able to run the steps manually, but just using racebuild itself fails in a strange way:

racebuild -goroot ~/sandbox/readonly -platforms linux/amd64 -rev 8a97ee95f2203295e34b7f8ade1092080809fddb

does a bunch of setup work (apt-get, ...), then fails with:

Cloning into 'go'...
fatal: unable to access 'https://go.googlesource.com/go/': Failed to connect to go.googlesource.com port 443: No route to host
Error running run: exit status 128

Why can't the trybots clone the go source code? It works fine on my machine, and if I gomote ssh to the trybot. It is just running git clone https://go.googlesource.com/go. Does GO_DISABLE_OUTBOUND_NETWORK=1 have anything to do with it? apt-get seems to work, so how does git clone not? Neither are Go binaries, I think.

@bradfitz @dmitshur

@dmitshur
Copy link
Contributor

dmitshur commented Sep 16, 2019

Hmm. I have not used racebuild, so I'm not sure how it works.

GO_DISABLE_OUTBOUND_NETWORK may be relevant. It was added in March (see #30612 (comment)). Is it possible no one has used racebuild since then and so we haven't noticed?

Why can't the trybots clone the go source code?

For normal trybot runs, the Go repository and other golang.org/x repositories are not git cloned on the builder itself, but rather they're fetched by coordinator (which doesn't have outbound network restrictions) and then uploaded to the builder via its API (e.g., see here).

@dmitshur
Copy link
Contributor

dmitshur commented Sep 16, 2019

Ok, I think I see the problem in the code. Update: Read the last paragraph of this comment first.

  1. Gomote "run" is called here.

  2. gomote run uses buildlet.Exec here.

  • It uses conf.Env(), which includes GO_DISABLE_OUTBOUND_NETWORK=1 for most builders, here.
  1. buildlet.Exec makes a POST request to /exec, which is handled by handleExec here.

  2. If GO_DISABLE_OUTBOUND_NETWORK=1 is set in the environment, handleExec disables the outbound network (permanently for the duration of the builder) here.

I can send a CL to fix the problem. In the meantime, you can probably just patch racebuild locally to work around the problem with something like:

-if _, err := p.Gomote(ctx, "run", "-e=REV="+*flagRev, "-e=GOREV="+goRev, p.Inst, targetName); err != nil {
+if _, err := p.Gomote(ctx, "run", "-e=REV="+*flagRev, "-e=GOREV="+goRev, "-e=GO_DISABLE_OUTBOUND_NETWORK=0", p.Inst, targetName); err != nil {

The -e env vars are added after and thus take higher precedence.


However, I'm noticing now that gomote run should already be doing that by default when ‑firewall flag isn't set to true:

https://github.com/golang/build/blob/1c6297c7b50cad1f0fad91b9aae7241076d0a8b8/cmd/gomote/run.go#L72

@randall77 Maybe the problem is your gomote binary is old? Can you check if it has the ‑firewall flag? That flag was added in CL 170397 in late March. Try installing the latest gomote, it seems like that should fix the problem.

@randall77
Copy link
Contributor

Reopening, still need to update the syso files.

@randall77 randall77 reopened this Sep 23, 2019
@gopherbot
Copy link

Change https://golang.org/cl/197337 mentions this issue: runtime: fix ppc64le race code

gopherbot pushed a commit that referenced this issue Sep 25, 2019
This code is not currently compiling, the asm vet checks fail. When running race.bash on ppc64le, I get:

runtime/race_ppc64le.s:104:1: [ppc64le] RaceReadRange: wrong argument size 24; expected $...-16
runtime/race_ppc64le.s:514:1: [ppc64le] racecallbackthunk: unknown variable cmd; offset 0 is arg+0(FP)
runtime/race_ppc64le.s:515:1: [ppc64le] racecallbackthunk: unknown variable ctx

I'm also not sure why it ever worked; it looks like it is writing
the arguments to racecallback in the wrong place (the race detector
itself probably still works, it would just have trouble symbolizing
any resulting race report).

At a meta-level, we should really add a ppc64le/race builder.
Otherwise this code will rot, as evidenced by the rot this CL fixes :)

Update #33309

Change-Id: I3b49c2442aa78538fbb631a143a757389a1368fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/197337
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@gopherbot
Copy link

Change https://golang.org/cl/201898 mentions this issue: runtime/race: update race detector shared libraries

@gopherbot
Copy link

Change https://golang.org/cl/195984 mentions this issue: runtime/race: add test for midstack inlining

gopherbot pushed a commit that referenced this issue Oct 18, 2019
Add test to make sure we get the right traceback when mid-stack inlining.

Update #33309

Change-Id: I23979cbe6b12fad105dbd26698243648aa86a354
Reviewed-on: https://go-review.googlesource.com/c/go/+/195984
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
@golang golang locked and limited conversation to collaborators Oct 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants