Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/race: false positive with go + defer #42599

Closed
randall77 opened this issue Nov 13, 2020 · 7 comments
Closed

runtime/race: false positive with go + defer #42599

randall77 opened this issue Nov 13, 2020 · 7 comments

Comments

@randall77
Copy link
Contributor

package main

import (
	"runtime"
	"sync/atomic"
)

type foo struct {
	bar int64
}

func (f *foo) doFork(depth int) {
	atomic.StoreInt64(&f.bar, 1)
	defer atomic.StoreInt64(&f.bar, 0)
	if depth > 0 {
		for i := 0; i < 2; i++ {
			f2 := &foo{}
			go f2.doFork(depth - 1)
		}
	}
	runtime.GC()
}

func main() {
	f := &foo{}
	f.doFork(11)
}

This reports a race when run with the race detector. I don't think it should.

==================
WARNING: DATA RACE
Write at 0x00c00011a008 by goroutine 6:
  sync/atomic.StoreInt64()
      /home/khr/go1.14/src/runtime/race_amd64.s:234 +0xb
  main.(*foo).doFork()
      /home/khr/gowork/tmp1.go:22 +0x135

Previous write at 0x00c00011a008 by goroutine 2576:
  main.(*foo).doFork()
      /home/khr/gowork/tmp1.go:17 +0xdb

Goroutine 6 (running) created at:
  main.(*foo).doFork()
      /home/khr/gowork/tmp1.go:18 +0x118
  main.main()
      /home/khr/gowork/tmp1.go:26 +0x6b

Goroutine 2576 (running) created at:
  main.(*foo).doFork()
      /home/khr/gowork/tmp1.go:18 +0x118
==================

This fails at tip, go1.14 and go1.15. Not a release blocker.

@randall77 randall77 added this to the Go1.16 milestone Nov 13, 2020
@prattmic
Copy link
Member

FWIW, a few observations I've made:

  1. Only one GC is required (call runtime.GC only at depth==11 still triggers the race).
  2. A different STW (runtime.ReadMemStats) doesn't trigger the race.
  3. A small sleep in place of runtime.GC doesn't trigger the race.

cc @nixprime

@prattmic
Copy link
Member

Replacing defer atomic.StoreInt64(&f.bar, 0) with

defer func() {
  atomic.StoreInt64(&f.bar, 0)
}()

Also makes the race disappear.

@prattmic
Copy link
Member

What I see in most cases is that the race address is initially the first foo:

tracealloc(0xc00011a000, 0x10, main.foo)
goroutine 1 [running]:
runtime.mallocgc(0x10, 0x49d0a0, 0x1, 0xc000061f78)
        /usr/local/google/home/mpratt/src/go/src/runtime/malloc.go:1132 +0x56e fp=0xc000061f30 sp=0xc000061e80 pc=0x43c60e
runtime.newobject(0x49d0a0, 0xffffffff)
        /usr/local/google/home/mpratt/src/go/src/runtime/malloc.go:1195 +0x38 fp=0xc000061f60 sp=0xc000061f30 pc=0x43cdd8
main.main()
        /usr/local/google/home/mpratt/Downloads/gotsan.go:37 +0x3b fp=0xc000061f88 sp=0xc000061f60 pc=0x49145b
runtime.main()
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:204 +0x209 fp=0xc000061fe0 sp=0xc000061f88 pc=0x461809
runtime.goexit()
        /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc000061fe8 sp=0xc000061fe0 pc=0x48d8c1

Later, GC frees it:

tracefree(0xc00011a000, 0x10)
goroutine 0 [idle]:
runtime.(*mspan).sweep(0x7f6a72f42d30, 0x4a7e00, 0xc0003a2370)
        /usr/local/google/home/mpratt/src/go/src/runtime/mgcsweep.go:425 +0x430 fp=0x7f6a4b1ebcf0 sp=0x7f6a4b1ebc00 pc=0x44f430
runtime.(*mcentral).uncacheSpan(0x54b458, 0x7f6a72f42d30)
        /usr/local/google/home/mpratt/src/go/src/runtime/mcentral.go:383 +0xec fp=0x7f6a4b1ebd18 sp=0x7f6a4b1ebcf0 pc=0x443a0c
runtime.(*mcache).releaseAll(0x7f6a73571e98)
        /usr/local/google/home/mpratt/src/go/src/runtime/mcache.go:162 +0x6a fp=0x7f6a4b1ebd40 sp=0x7f6a4b1ebd18 pc=0x44334a
runtime.(*mcache).prepareForSweep(0x7f6a73571e98)
        /usr/local/google/home/mpratt/src/go/src/runtime/mcache.go:189 +0x46 fp=0x7f6a4b1ebd68 sp=0x7f6a4b1ebd40 pc=0x4433c6
runtime.procresize(0xc, 0xc000080000)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:4382 +0x473 fp=0x7f6a4b1ebdf8 sp=0x7f6a4b1ebd68 pc=0x46ba73
runtime.startTheWorldWithSema(0x51ad01, 0xc0003a2390)
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:1065 +0x85 fp=0x7f6a4b1ebe58 sp=0x7f6a4b1ebdf8 pc=0x463de5
runtime.gcMarkTermination.func3()
        /usr/local/google/home/mpratt/src/go/src/runtime/mgc.go:1752 +0x26 fp=0x7f6a4b1ebe78 sp=0x7f6a4b1ebe58 pc=0x4875e6
runtime.systemstack(0x7f6a40000020)
        /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:370 +0x66 fp=0x7f6a4b1ebe80 sp=0x7f6a4b1ebe78 pc=0x48bba6
runtime.mstart()
        /usr/local/google/home/mpratt/src/go/src/runtime/proc.go:1116 fp=0x7f6a4b1ebe88 sp=0x7f6a4b1ebe80 pc=0x463fa0

Then it is allocated as another foo:

tracealloc(0xc00011a000, 0x10, main.foo)
goroutine 606 [running]:
runtime.mallocgc(0x10, 0x49d0a0, 0x1, 0x0)
        /usr/local/google/home/mpratt/src/go/src/runtime/malloc.go:1132 +0x56e fp=0xc000324708 sp=0xc000324658 pc=0x43c60e
runtime.newobject(0x49d0a0, 0x1)
        /usr/local/google/home/mpratt/src/go/src/runtime/malloc.go:1195 +0x38 fp=0xc000324738 sp=0xc000324708 pc=0x43cdd8
main.(*foo).doFork(0xc0002801e0, 0x4)
        /usr/local/google/home/mpratt/Downloads/gotsan.go:27 +0xd2 fp=0xc0003247d0 sp=0xc000324738 pc=0x491352
runtime.goexit()
        /usr/local/google/home/mpratt/src/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0003247d8 sp=0xc0003247d0 pc=0x48d8c1
created by main.(*foo).doFork
        /usr/local/google/home/mpratt/Downloads/gotsan.go:28 +0x125

One possibility that I haven't been able to verify yet is that the defer is not keeping f alive, and thus GC is freeing the object before we're done using it (adding logging tends to make the problem go away...). I'm also unsure if TSAN would report this as a data race or more precisely as a "use after free".

cc @mknyszek

@nixprime
Copy link
Contributor

One possibility that I haven't been able to verify yet is that the defer is not keeping f alive, and thus GC is freeing the object before we're done using it (adding logging tends to make the problem go away...).

In support of this hypothesis: I tried replacing:

	atomic.StoreInt64(&f.bar, 1)
	defer atomic.StoreInt64(&f.bar, 0)

with:

	if val := atomic.LoadUint64(&f.bar); val != 0 {
		panic(fmt.Sprintf("foo.bar: got %#x, wanted 0", val))
	}
	defer atomic.StoreUint64(&f.bar, 0x5858585858585858)

And indeed:

==================
WARNING: DATA RACE
Write at 0x00c000180000 by goroutine 11:
  sync/atomic.StoreInt64()
      /home/jamieliu/go/src/runtime/race_amd64.s:234 +0xb
  main.(*foo).doFork()
      /home/jamieliu/src/gob42599/main.go:25 +0x158

Previous write at 0x00c000180000 by goroutine 2297:
  main.(*foo).doFork()
      /home/jamieliu/src/gob42599/main.go:20 +0xfa

Goroutine 11 (running) created at:
  main.(*foo).doFork()
      /home/jamieliu/src/gob42599/main.go:21 +0x137

Goroutine 2297 (running) created at:
  main.(*foo).doFork()
      /home/jamieliu/src/gob42599/main.go:21 +0x137
==================
panic: foo.bar: got 0x5858585858585858, wanted 0

goroutine 3248 [running]:
main.(*foo).doFork(0xc00013a050, 0x0)
	/home/jamieliu/src/gob42599/main.go:15 +0x22f
created by main.(*foo).doFork
	/home/jamieliu/src/gob42599/main.go:21 +0x138
exit status 2

I haven't reproduced this panic without -race yet; I'm not sure if this is because the bug is specific to the race detector, or if turning it off simply changes timings by enough to cause it to fail to reproduce.

@cherrymui
Copy link
Member

cherrymui commented Nov 13, 2020

I think you're right that it seems a liveness bug.

I haven't reproduced this panic without -race yet; I'm not sure if this is because the bug is specific to the race detector, or if turning it off simply changes timings by enough to cause it to fail to reproduce.

I don't think this is specific to race detector. It's just hard to crash even with the use-after-free.
Edit: this is indeed race-specific.

@gopherbot
Copy link

Change https://golang.org/cl/270079 mentions this issue: runtime: declare arg size/map for race version of sync/atomic functions

@gopherbot
Copy link

Change https://golang.org/cl/270397 mentions this issue: cmd/compile, runtime: check defer arg size

@golang golang locked and limited conversation to collaborators Nov 16, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants