Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: extra allocations in race mode #36516

Closed
lni opened this issue Jan 12, 2020 · 7 comments
Closed

cmd/compile: extra allocations in race mode #36516

lni opened this issue Jan 12, 2020 · 7 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@lni
Copy link

lni commented Jan 12, 2020

What version of Go are you using (go version)?

go 1.14 beta1

Does this issue reproduce with the latest release?

no

What operating system and processor architecture are you using (go env)?

GO111MODULE=""
GOARCH="amd64"
GOBIN="/Users/lni/gobin"
GOCACHE="/Users/lni/Library/Caches/go-build"
GOENV="/Users/lni/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/lni/go"
GOPRIVATE=""
GOPROXY="https://mirrors.aliyun.com/goproxy"
GOROOT="/usr/local/go"
GOSUMDB="off"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/b5/ndb6rntd4ys0451zchd64p_m0000gn/T/go-build700408697=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Run the following test with and without -race

package alloctest

import (
  "sync"
  "testing"
)

type rs struct {
  complete chan struct{}
  pool     *sync.Pool
}

func (rs *rs) release() {
  rs.pool.Put(rs)
}

func getrs(p *sync.Pool) *rs {
  return p.Get().(*rs)
}

func TestAlloc(t *testing.T) {
  p := &sync.Pool{}
  p.New = func() interface{} {
    v := &rs{
      complete: make(chan struct{}, 1),
      pool:     p,
    }
    return v
  }
  ac := testing.AllocsPerRun(1000, func() {
    v := getrs(p)
    v.release()
  })
  if ac > 1 {
    t.Fatalf("ac: %f, want <= 1", ac)
  }
}

What did you expect to see?

The test should pass with or without -race on 1.14 beta1

What did you see instead?

On 1.14 beta 1, it fails when -race is enabled, it passes when -race is not enabled. On 1.13.x, it passes with or without -race.

@josharian
Copy link
Contributor

cc @randall77 as I believe he is the only one who meaningfully touched the race detector this cycle

@josharian josharian added this to the Go1.14 milestone Jan 12, 2020
@randall77
Copy link
Contributor

Here's a somewhat simpler reproducer:

package alloctest

import (
  "sync"
  "testing"
)

func TestAlloc(t *testing.T) {
  p := &sync.Pool{}
  hit := 0
  miss := 0
  ac := testing.AllocsPerRun(1000000, func() {
    var v *int
    x := p.Get()
    if x != nil {
       hit++
       v = x.(*int)
    } else {
       miss++
       v = new(int)
    }
    p.Put(v)
  })
  t.Fatalf("allocs:%f hitRate:%f", ac, float64(hit)/float64(hit+miss))
}

Additionally, apply this hack to package testing to prevent AllocsPerRun from rounding to integer:

--- a/src/testing/allocs.go
+++ b/src/testing/allocs.go
@@ -41,5 +41,5 @@ func AllocsPerRun(runs int, f func()) (avg float64) {
        // We are forced to return a float64 because the API is silly, but do
        // the division as integers so we can ask if AllocsPerRun()==1
        // instead of AllocsPerRun()<2.
-       return float64(mallocs / uint64(runs))
+       return float64(mallocs) / float64(runs)
 }

WIthout -race, both go 1.13 and tip produce:

    issue36516_test.go:24: allocs:0.000000 hitRate:0.999999

With -race, go 1.13 produces:

issue36516_test.go:24: allocs:0.250180 hitRate:0.749819

So the race detector makes sync.Pool less effective. Not really sure what's going on there, but it isn't this issue. This happens for both 1.13 and tip.

With -race, tip produces:

    issue36516_test.go:24: allocs:2.249870 hitRate:0.750143

So we're getting an extra 2 allocations per iteration.
The heap profiler says they are coming from here:

(pprof) top10
Showing nodes accounting for 162MB, 100% of 162MB total
      flat  flat%   sum%        cum   cum%
  147.50MB 91.05% 91.05%   147.50MB 91.05%  sync.indexLocal (inline)
   14.50MB  8.95%   100%      162MB   100%  _/Users/khr/gowork/issue36516.TestAlloc.func1
         .          .    276:func indexLocal(l unsafe.Pointer, i int) *poolLocal {
  147.50MB   147.50MB    277:	lp := unsafe.Pointer(uintptr(l) + uintptr(i)*unsafe.Sizeof(poolLocal{}))
         .          .    278:	return (*poolLocal)(lp)
         .          .    279:}

For some reason, with -race on the poolLocal{} is allocating. Weird.

@randall77
Copy link
Contributor

So it isn't the poolLocal{}. It's the checking of the pointer arithmetic.
If I mark indexLocal as //go:noninline, then look at its assembly when built with -race, I see an allocation of something being passed to runtime.checkptrArithmetic.

TEXT _sync.indexLocal(SB) /Users/khr/sandbox/readonly/src/sync/pool.go
func indexLocal(l unsafe.Pointer, i int) *poolLocal {
  0x10b9e10             65488b0c2530000000      MOVQ GS:0x30, CX                
  0x10b9e19             483b6110                CMPQ 0x10(CX), SP               
  0x10b9e1d             0f86bf000000            JBE 0x10b9ee2                   
  0x10b9e23             4883ec30                SUBQ $0x30, SP                  
  0x10b9e27             48896c2428              MOVQ BP, 0x28(SP)               
  0x10b9e2c             488d6c2428              LEAQ 0x28(SP), BP               
  0x10b9e31             488b442430              MOVQ 0x30(SP), AX               
  0x10b9e36             48890424                MOVQ AX, 0(SP)                  
  0x10b9e3a             e85191feff              CALL _runtime.racefuncenter(SB) 
        lp := unsafe.Pointer(uintptr(l) + uintptr(i)*128)
  0x10b9e3f             488d05daef1100          LEAQ _runtime.rodata+91616(SB), AX      
  0x10b9e46             48890424                MOVQ AX, 0(SP)                          
  0x10b9e4a             e81108f9ff              CALL _runtime.newobject(SB)             
  0x10b9e4f             488b442440              MOVQ 0x40(SP), AX                       
  0x10b9e54             48c1e007                SHLQ $0x7, AX                           
  0x10b9e58             488b4c2438              MOVQ 0x38(SP), CX                       
  0x10b9e5d             4801c8                  ADDQ CX, AX                             
  0x10b9e60             488b7c2408              MOVQ 0x8(SP), DI                        
  0x10b9e65             833d14fc2c0000          CMPL $0x0, _runtime.writeBarrier(SB)    
  0x10b9e6c             7564                    JNE 0x10b9ed2                           
  0x10b9e6e             48890f                  MOVQ CX, 0(DI)                          
  0x10b9e71             4889442420              MOVQ AX, 0x20(SP)                       
  0x10b9e76             48890424                MOVQ AX, 0(SP)                          
  0x10b9e7a             48897c2408              MOVQ DI, 0x8(SP)                        
  0x10b9e7f             48c744241001000000      MOVQ $0x1, 0x10(SP)                     
  0x10b9e88             48c744241801000000      MOVQ $0x1, 0x18(SP)                     
  0x10b9e91             e8caa2f8ff              CALL _runtime.checkptrArithmetic(SB)    
        return (*poolLocal)(lp)
  0x10b9e96             488b442420              MOVQ 0x20(SP), AX                       
  0x10b9e9b             48890424                MOVQ AX, 0(SP)                          
  0x10b9e9f             488d0d7a291300          LEAQ _runtime.rodata+172000(SB), CX     
  0x10b9ea6             48894c2408              MOVQ CX, 0x8(SP)                        
  0x10b9eab             48c744241001000000      MOVQ $0x1, 0x10(SP)                     
  0x10b9eb4             e8c7a1f8ff              CALL _runtime.checkptrAlignment(SB)     
  0x10b9eb9             e81291feff              CALL _runtime.racefuncexit(SB)          
  0x10b9ebe             488b442420              MOVQ 0x20(SP), AX                       
  0x10b9ec3             4889442448              MOVQ AX, 0x48(SP)                       
  0x10b9ec8             488b6c2428              MOVQ 0x28(SP), BP                       
  0x10b9ecd             4883c430                ADDQ $0x30, SP                          
  0x10b9ed1             c3                      RET                                     
        lp := unsafe.Pointer(uintptr(l) + uintptr(i)*128)
  0x10b9ed2             4889c2                  MOVQ AX, DX                             
  0x10b9ed5             4889c8                  MOVQ CX, AX                             
  0x10b9ed8             e85375feff              CALL _runtime.gcWriteBarrier(SB)        
  0x10b9edd             4889d0                  MOVQ DX, AX                             
  0x10b9ee0             eb8f                    JMP 0x10b9e71                           
func indexLocal(l unsafe.Pointer, i int) *poolLocal {
  0x10b9ee2             e8a955feff              CALL _runtime.morestack_noctxt(SB)      
  0x10b9ee7             e924ffffff              JMP _sync.indexLocal(SB)                

Somehow the compiler is deciding that the originals argument to checkptrArithmetic is escaping?

@mdempsky

@randall77
Copy link
Contributor

In cmd/compile/internal/gc/walk.go:walkCheckPtrArithmetic, I think we're marking the wrong node as not escaping. We need to mark the backing store of the slice, not the slice itself.
This hack fixes the problem:

diff --git a/src/cmd/compile/internal/gc/walk.go b/src/cmd/compile/internal/gc/walk.go
index ef88db4742..7cc94c7ff1 100644
--- a/src/cmd/compile/internal/gc/walk.go
+++ b/src/cmd/compile/internal/gc/walk.go
@@ -4067,7 +4067,9 @@ func walkCheckPtrArithmetic(n *Node, init *Nodes) *Node {
 
        n = cheapexpr(n, init)
 
-       slice := mkdotargslice(types.NewSlice(types.Types[TUNSAFEPTR]), originals, init, nil)
+       ddd := nod(OXXX, nil, nil)
+       ddd.Esc = EscNone
+       slice := mkdotargslice(types.NewSlice(types.Types[TUNSAFEPTR]), originals, init, ddd)
        slice.Esc = EscNone
        slice.SetTransient(true) 

Probably want something more explicit than that, though.

@josharian
Copy link
Contributor

Re: sync.Pool being less effective with race mode enabled, that is intentional, to try to prevent the pool from hiding work that the race detector could catch bugs in.

// Put adds x to the pool.
func (p *Pool) Put(x interface{}) {
	if x == nil {
		return
	}
	if race.Enabled {
		if fastrand()%4 == 0 {
			// Randomly drop x on floor.
			return
		}

@randall77 randall77 changed the title testing.AllocsPerRun reports unexpected allocs count when -race is enabled cmd/compile: extra allocations in race mode Jan 13, 2020
@josharian josharian added the NeedsFix The path to resolution is known, but the work has not been done. label Jan 13, 2020
@gopherbot
Copy link

Change https://golang.org/cl/214679 mentions this issue: cmd/compile: mark ... argument to checkptrAlignment as not escaping

@gopherbot
Copy link

Change https://golang.org/cl/215477 mentions this issue: test: disable test for #36516 when cgo is not enabled

gopherbot pushed a commit that referenced this issue Jan 19, 2020
CL 214679 added a -race test which shouldn't be run when cgo is not
enabled.

Fixes the nocgo builder.

Change-Id: Iceddf802c4ef6c0de2c3a968e86342303d2d27d8
Reviewed-on: https://go-review.googlesource.com/c/go/+/215477
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Jan 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants