Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fmt: apparent race on fmt.pp buf #44672

Open
myitcv opened this issue Feb 27, 2021 · 21 comments
Open

fmt: apparent race on fmt.pp buf #44672

myitcv opened this issue Feb 27, 2021 · 21 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@myitcv
Copy link
Member

myitcv commented Feb 27, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16 linux/amd64

Does this issue reproduce with the latest release?

Yes, although I can't reproduce this.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/myitcv/.cache/go-build"
GOENV="/home/myitcv/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/myitcv/gostuff/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/myitcv/gostuff"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/myitcv/gos"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/myitcv/gos/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1396033664=/tmp/go-build -gno-record-gcc-switches"

What did you do?

This problem surfaced in a -race build for govim. The raw log is now stored here:

https://gist.github.com/myitcv/483887344dd4e48596ed6bb364d77651

The relevant part of the test failure is:

WARNING: DATA RACE
Write at 0x00c001f64300 by goroutine 208:
  runtime.slicecopy()
      /go/src/runtime/slice.go:247 +0x0
  fmt.(*buffer).write()
      /go/src/fmt/print.go:78 +0xb1
  fmt.(*pp).Write()
      /go/src/fmt/print.go:187 +0x2f
  text/tabwriter.(*Writer).write0()
      /go/src/text/tabwriter/tabwriter.go:254 +0x72
  text/tabwriter.(*Writer).writeLines()
      /go/src/text/tabwriter/tabwriter.go:321 +0x4a4
  text/tabwriter.(*Writer).format()
      /go/src/text/tabwriter/tabwriter.go:409 +0x570
  text/tabwriter.(*Writer).flushNoDefers()
      /go/src/text/tabwriter/tabwriter.go:520 +0x9c
  text/tabwriter.(*Writer).Write()
      /go/src/text/tabwriter/tabwriter.go:554 +0xb44
  github.com/kr/pretty.writeByte()
      /home/runner/gopath/pkg/mod/github.com/kr/pretty@v0.1.0/formatter.go:319 +0x2684
  github.com/kr/pretty.(*printer).printValue()
      /home/runner/gopath/pkg/mod/github.com/kr/pretty@v0.1.0/formatter.go:169 +0x260a
  github.com/kr/pretty.(*printer).printValue()
      /home/runner/gopath/pkg/mod/github.com/kr/pretty@v0.1.0/formatter.go:249 +0x1ae6
  github.com/kr/pretty.formatter.Format()
      /home/runner/gopath/pkg/mod/github.com/kr/pretty@v0.1.0/formatter.go:57 +0x224
  github.com/kr/pretty.(*formatter).Format()
      <autogenerated>:1 +0xac
  fmt.(*pp).handleMethods()
      /go/src/fmt/print.go:593 +0x1c7
  fmt.(*pp).printArg()
      /go/src/fmt/print.go:709 +0x1b8
  fmt.(*pp).doPrint()
      /go/src/fmt/print.go:1157 +0xe6
  fmt.Sprint()
      /go/src/fmt/print.go:249 +0x64
  github.com/kr/pretty.Sprint()
      /home/runner/gopath/pkg/mod/github.com/kr/pretty@v0.1.0/pretty.go:91 +0x77
  github.com/govim/govim/cmd/govim.loggingGoplsServer.Initialize()
      /home/runner/govim/cmd/govim/gopls_server.go:27 +0x239
  github.com/govim/govim/cmd/govim.(*loggingGoplsServer).Initialize()
      <autogenerated>:1 +0xae
  github.com/govim/govim/cmd/govim.(*govimplugin).startGopls()
      /home/runner/govim/cmd/govim/gopls.go:169 +0x18bc
  github.com/govim/govim/cmd/govim.(*govimplugin).Init()
      /home/runner/govim/cmd/govim/main.go:337 +0x486d
  github.com/govim/govim.(*govimImpl).load.func2()
      /home/runner/govim/govim.go:340 +0x33b
  github.com/govim/govim.(*govimImpl).DoProto()
      /home/runner/govim/channel_cmds.go:131 +0x83
  github.com/govim/govim.(*govimImpl).load()
      /home/runner/govim/govim.go:320 +0x97b
  github.com/govim/govim.(*govimImpl).load-fm()
      /home/runner/govim/govim.go:291 +0x4a
  github.com/govim/govim.(*govimImpl).goHandleShutdown.func1()
      /home/runner/govim/govim.go:283 +0xa9
  gopkg.in/tomb%2ev2.(*Tomb).run()
      /home/runner/gopath/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:163 +0x51

Previous read at 0x00c001f64300 by goroutine 58:
  internal/race.ReadRange()
      /go/src/internal/race/race.go:45 +0xb0
  syscall.Write()
      /go/src/syscall/syscall_unix.go:215 +0x94
  internal/poll.ignoringEINTRIO()
      /go/src/internal/poll/fd_unix.go:581 +0x16e
  internal/poll.(*FD).Write()
      /go/src/internal/poll/fd_unix.go:274 +0x294
  os.(*File).write()
      /go/src/os/file_posix.go:48 +0xb1
  os.(*File).Write()
      /go/src/os/file.go:174 +0x6b
  os/exec.(*closeOnce).Write()
      <autogenerated>:1 +0x84
  io.WriteCloser.Write-fm()
      /go/src/io/io.go:139 +0x79
  github.com/govim/govim/cmd/govim/internal/golang_org_x_tools/fakenet.(*connFeeder).run()
      /home/runner/govim/cmd/govim/internal/golang_org_x_tools/fakenet/conn.go:121 +0x1d4

At the time of this data race, govim was running tests against the then tip of x/tools and gopls. The module state at the time of the test is captured in this branch:

https://github.com/govim/govim/tree/race_fail

What did you expect to see?

Passing tests.

What did you see instead?

Above.

From the quick analysis I've done thus far I don't think the problem can be in our code because the apparent race is happening on the buf field of a fmt.pp.

@myitcv myitcv added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 27, 2021
@robpike
Copy link
Contributor

robpike commented Feb 28, 2021

The fmt package has been tested and race-tested innumerable times. It could indeed be a bug in the library but I doubt it. It's much likelier to be a problem in your code or environment that reveals itself as this race.

That's not helpful, I know, but we would need a reproducible case to do any serious debugging of such a well-known and well-tested package.

@ianlancetaylor
Copy link
Contributor

CC @dvyukov

I wonder if this could be due to the sync.Pool. The sync.Pool code has special code to avoid triggering the race detector on values put into the pool and then retrieved from the pool, but it's not clear to me that that that code handles a slice referenced by the object in the pool. I wonder if it is possible that the race detector is not detecting the synchronization implied by using the pool.

@myitcv
Copy link
Member Author

myitcv commented Feb 28, 2021

The fmt package has been tested and race-tested innumerable times. It could indeed be a bug in the library but I doubt it. It's much likelier to be a problem in your code or environment that reveals itself as this race.

Thanks, @robpike. I used the word "apparent" in the title to try and convey the fact that we're not definitely looking at a race, but a situation the race detector thinks is a race (I didn't do that in the body of the description - now fixed). My analysis and assertion that the problem is not in our code might be wrong of course.

@dvyukov
Copy link
Member

dvyukov commented Feb 28, 2021

The first stack seems to be on the fmt internal buffer (which is either newly allocated memory, or come from sync.Pool, right?).
But what's strange is that the second stack seems to be on some user slice here:
https://github.com/govim/govim/blob/main/cmd/govim/internal/golang_org_x_tools/fakenet/conn.go#L121
which was received from f.input channel:
https://github.com/govim/govim/blob/main/cmd/govim/internal/golang_org_x_tools/fakenet/conn.go#L116
Do internal fmt buffers ever leave fmt package? I don't see how this can happen. Say, fmt.Sprintf would return a copy of the internal buffer, right?

Is this race reliably reproducible? Does it always look the same way with the same stacks?

@myitcv
Copy link
Member Author

myitcv commented Feb 28, 2021

Is this race reliably reproducible? Does it always look the same way with the same stacks?

It's the first time we've seen it unfortunately. Will obviously add more details here in case we find them.

@myitcv
Copy link
Member Author

myitcv commented Feb 28, 2021

But what's strange is that the second stack seems to be on some user slice here

I'm not sure I can definitively say it is a user slice.

Do internal fmt buffers ever leave fmt package? I don't see how this can happen. Say, fmt.Sprintf would return a copy of the internal buffer, right?

Slightly stabbing in the dark (very much out of my depth), but is it possible that a compiler optimisation determined it was safe for an internal fmt buffer to leave the package at some point?

On a related topic, how does the race detector handle/work with compiler optimisations?

@monkey92t
Copy link

i also encountered a similar error, I don’t know if it’s the same problem

WARNING: DATA RACE
Read at 0x00c0000814c8 by goroutine 27:
  internal/race.ReadRange()
      /home/travis/.gimme/versions/go1.15.8.linux.amd64/src/internal/race/race.go:45 +0xb0
  syscall.Write()
      /home/travis/.gimme/versions/go1.15.8.linux.amd64/src/syscall/syscall_unix.go:215 +0x94
  internal/poll.(*FD).Write.func1()
      /home/travis/.gimme/versions/go1.15.8.linux.amd64/src/internal/poll/fd_unix.go:267 +0xf9
  internal/poll.ignoringEINTR()
      /home/travis/.gimme/versions/go1.15.8.linux.amd64/src/internal/poll/fd_unix.go:567 +0x38
  internal/poll.(*FD).Write()
      /home/travis/.gimme/versions/go1.15.8.linux.amd64/src/internal/poll/fd_unix.go:267 +0x20b
  net.(*netFD).Write()
      /home/travis/.gimme/versions/go1.15.8.linux.amd64/src/net/fd_posix.go:73 +0x65
  net.(*conn).Write()
      /home/travis/.gimme/versions/go1.15.8.linux.amd64/src/net/net.go:194 +0xeb
  net.(*TCPConn).Write()
      <autogenerated>:1 +0x69
..........more omit

Previous write at 0x00c0000814c8 by goroutine 79:
  time.NewTicker()
      /home/travis/.gimme/versions/go1.15.8.linux.amd64/src/time/tick.go:35 +0x12a
  github.com/go-redis/redis/v8/internal/pool.(*ConnPool).reaper()
      /home/travis/gopath/src/github.com/go-redis/redis/internal/pool/pool.go:449 +0x55
-----no more


//pool.go

//448 func (p *ConnPool) reaper(frequency time.Duration) {
//449 	ticker := time.NewTicker(frequency)
//450 	defer ticker.Stop()
                .....
//}

why do conn.Write([]byte) and time.NewTicker produce data race?
i don’t know the conditions under which they appear, but i often encounter such errors.

@dvyukov
Copy link
Member

dvyukov commented Mar 7, 2021

@monkey92t if you provide full report and git revisions used, it will be more useful. Currently it's unclear what calls TCPConn.Write.

@monkey92t
Copy link

@monkey92t if you provide full report and git revisions used, it will be more useful. Currently it's unclear what calls TCPConn.Write.

@dvyukov travis job log: https://travis-ci.org/github/go-redis/redis/jobs/761788172

@dvyukov
Copy link
Member

dvyukov commented Mar 7, 2021

@monkey92t if you provide full report and git revisions used, it will be more useful. Currently it's unclear what calls TCPConn.Write.

@dvyukov travis job log: https://travis-ci.org/github/go-redis/redis/jobs/761788172

At first glance it does not look related. So my first guess would be it's just a bug in that code.

@monkey92t
Copy link

@monkey92t if you provide full report and git revisions used, it will be more useful. Currently it's unclear what calls TCPConn.Write.

@dvyukov travis job log: https://travis-ci.org/github/go-redis/redis/jobs/761788172

At first glance it does not look related. So my first guess would be it's just a bug in that code.

I have investigated many times and found that there is no relationship between the data. I don’t know much about the working principle of race. Why does it report two unrelated data errors?

@myitcv
Copy link
Member Author

myitcv commented Mar 7, 2021

@monkey92t - I'd agree with @dvyukov, your situation does not look related to this apparent race condition.

If you think there is an underlying Go issue that's causing your race condition I'd suggested opening another issue.

@dvyukov
Copy link
Member

dvyukov commented Mar 7, 2021

@dvyukov
Copy link
Member

dvyukov commented Mar 7, 2021

Subsequent races look even more insane, with some timer internal.
There are also lots of mentions of github.com/go-redis/redis/v8/internal/pool which uses sync.Pool. Maybe a coincidence, or maybe not...
If it's not reproducible (is it?), it will be very hard to debug. If there is suspicion on sync.Pool, I would probably start with some super stress test that would try to expose potential sync.Pool problems as much as possible.

@monkey92t
Copy link

monkey92t commented Mar 8, 2021

I was skeptical of lenBuff at first, but then I discovered that they are different new(Struct).
I can't make them appear 100%, but I often encounter them, I can try to delete internal/pool.go:sync.Pool{} to see if it alleviates the problem.

@dvyukov
Copy link
Member

dvyukov commented Mar 8, 2021

I can try to delete internal/pool.go:sync.Pool{} to see if it alleviates the problem.

Yes, it would be useful. If we could rule out sync.Pool, it would be already something.

@myitcv
Copy link
Member Author

myitcv commented Mar 13, 2021

We've just seen another instance of this (govim commit 2c8a2ba49ed0170bfad80633bd87414b6ec9cffd and gopls/x/tools 8e4f4c8 for reference)

Write at 0x00c0003e2780 by goroutine 98:
  runtime.slicestringcopy()
      /go/src/runtime/slice.go:281 +0x0
  fmt.(*buffer).writeString()
      /go/src/fmt/print.go:82 +0x155
  fmt.(*pp).doPrintf()
      /go/src/fmt/print.go:991 +0xac
  fmt.Sprintf()
      /go/src/fmt/print.go:219 +0x73
  github.com/govim/govim.(*govimImpl).Logf()
      /home/runner/govim/govim.go:944 +0x89
  github.com/govim/govim.(*govimImpl).logVimEventf()
      /home/runner/govim/govim.go:940 +0x3da
  github.com/govim/govim.(*govimImpl).run()
      /home/runner/govim/govim.go:431 +0x2b9
  github.com/govim/govim.(*govimImpl).Run.func1()
      /home/runner/govim/govim.go:403 +0x44
  github.com/govim/govim.(*govimImpl).DoProto()
      /home/runner/govim/channel_cmds.go:131 +0x83
  github.com/govim/govim.(*govimImpl).Run()
      /home/runner/govim/govim.go:402 +0x64
  github.com/govim/govim/testdriver.(*TestDriver).runGovim()
      /home/runner/govim/testdriver/testdriver.go:468 +0xe1
  github.com/govim/govim/testdriver.(*TestDriver).runGovim-fm()
      /home/runner/govim/testdriver/testdriver.go:466 +0x44
  github.com/govim/govim/testdriver.(*TestDriver).tombgo.func1()
      /home/runner/govim/testdriver/testdriver.go:410 +0x61
  gopkg.in/tomb%2ev2.(*Tomb).run()
      /home/runner/gopath/pkg/mod/gopkg.in/tomb.v2@v2.0.0-20161208151619-d5d1b5820637/tomb.go:163 +0x51

Previous read at 0x00c0003e2780 by goroutine 242:
  internal/race.ReadRange()
      /go/src/internal/race/race.go:45 +0xb0
  syscall.Write()
      /go/src/syscall/syscall_unix.go:215 +0x94
  internal/poll.(*FD).Write.func1()
      /go/src/internal/poll/fd_unix.go:267 +0xf9
  internal/poll.ignoringEINTR()
      /go/src/internal/poll/fd_unix.go:567 +0x38
  internal/poll.(*FD).Write()
      /go/src/internal/poll/fd_unix.go:267 +0x20b
  os.(*File).write()
      /go/src/os/file_posix.go:48 +0xb1
  os.(*File).Write()
      /go/src/os/file.go:173 +0x6b
  os/exec.(*closeOnce).Write()
      <autogenerated>:1 +0x84
  io.WriteCloser.Write-fm()
      /go/src/io/io.go:133 +0x68
  github.com/govim/govim/cmd/govim/internal/golang_org_x_tools/fakenet.(*connFeeder).run()
      /home/runner/govim/cmd/govim/internal/golang_org_x_tools/fakenet/conn.go:121 +0x1d7

It seems unlikely, but is the race detector getting confused about the fact the []byte was passed over a channel? The call to Write, the ultimate source of the the []byte in this case, is still blocked at this point, but the file write (to os.Stdout) is happening on another go routine.

@dvyukov
Copy link
Member

dvyukov commented Mar 14, 2021

It seems unlikely, but is the race detector getting confused about the fact the []byte was passed over a channel?

Yes, I agree it seems unlikely. We have unit-tests for channels and I think we would see way more races and much more reproducible if channels synchronization would broke.

@monkey92t
Copy link

@dvyukov I once again encountered a very puzzled data race, It seems that they write to the same memory address (I don't know very well). they should never produce a data race.

atomic.StoreUint32() data race variable := &struct{}

I don’t know if it helps you: redis/go-redis#1657 (comment)

@AlexanderYastrebov
Copy link
Contributor

Running

func Test44672(t *testing.T) {
	x := []byte("test")

	go func() {
		for {
			Sprintf("%s", x)
		}
	}()

	for {
		x[0] = 0
	}
}

I get very close stacktrace so it looks to me that the formatted value is modified in another sproc.

@ianlancetaylor
Copy link
Contributor

@AlexanderYastrebov That program has a race condition. One goroutine reads x while another writes to x[0]. The race detector should report an error for that program regardless of this issue.

@seankhliao seankhliao added this to the Unplanned milestone Aug 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants