Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: generic function argument causes escape to heap #48849

Open
bboreham opened this issue Oct 7, 2021 · 20 comments
Open

cmd/compile: generic function argument causes escape to heap #48849

bboreham opened this issue Oct 7, 2021 · 20 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. generics Issue is related to generics NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@bboreham
Copy link
Contributor

bboreham commented Oct 7, 2021

What version of Go are you using (go version)?

$ go version
go version devel go1.18-7c572a29eb Sat Oct 2 22:06:39 2021 +0100 darwin/arm64

Does this issue reproduce with the latest release?

Yes, needs generics.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/bryan/Library/Caches/go-build"
GOENV="/Users/bryan/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/bryan/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/bryan/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/bryan/src/github.com/golang/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/bryan/src/github.com/golang/go/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="devel go1.18-7c572a29eb Sat Oct 2 22:06:39 2021 +0100"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/bryan/src/github.com/golang/go/src/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/bp/y4j6bp157f52wv55knd_j1fc0000gn/T/go-build3341590964=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

This program: https://play.golang.org/p/speaQxyFO4a

package main

type fooer interface {
	foo()
}

type bar struct{}

func (b *bar) foo() {
}

func f[T fooer](t T) {
	t.foo()
}

func main() {
	var b bar
	f(&b)
}

Compiled with -gcflags "-m -m -l" to show escape analysis.

What did you expect to see?

What I get if t is declared as the concrete type *bar:

/tmp/main.go:12:8: t does not escape

What did you see instead?

/tmp/main.go:12:17: parameter t leaks to {heap} with derefs=0:
/tmp/main.go:12:17:   flow: {heap} = t:
/tmp/main.go:12:17:     from unsafe.Pointer(t) (interface-converted) at /tmp/main.go:13:3
/tmp/main.go:12:17:     from (<node EFACE>).foo() (call parameter) at /tmp/main.go:13:7
@randall77
Copy link
Contributor

randall77 commented Oct 7, 2021

This is expected. When compiling f, we don't know whether the call to foo escapes its receiver or not. Consider adding:

type baz struct{ x int }
var g *baz
func (b *baz) foo() {
   g = b
}

The foo function of baz does escape its receiver. But f doesn't know if it was instantiated with *bar or *baz.

Note that we could resolve this if we were fully stenciling. But in our current implementation of generics (with gcshape stenciling and dictionaries) both f[*bar] and f[*baz] use the same blob of assembly.

@mknyszek mknyszek changed the title Generic function argument should not cause heap escape cmd/compile: generic function argument causes escape to heap Oct 7, 2021
@mknyszek mknyszek added generics Issue is related to generics NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Oct 7, 2021
@mknyszek mknyszek added this to the Go1.18 milestone Oct 7, 2021
@rogpeppe
Copy link
Contributor

rogpeppe commented Oct 8, 2021

Note that we could resolve this if we were fully stenciling. But in our current implementation of generics (with gcshape stenciling and dictionaries) both f[*bar] and f[*baz] use the same blob of assembly.

This is getting beyond the scope of this issue, and probably well travelled ground (I haven't been following any CLs), but this makes me wonder about ubiquitous GC stenciling.

I would agree that we don't want a special comment to enable full templating (a.k.a. monomorphisation) because then everyone will use it and we could end up just like Rust with bloated compile times and binaries.

However it seems fairly clear to me that there are some situations where the templating overhead is small and the potential gains large. We could make a similar decision as we do when deciding whether to inline functions currently. Specifically, we can consider how to instantiate a function F with some set of type parameters [T₁, T₂, ...] with respect to some cost metric C(T₁, T₂, ...). A possible cost metric might be as the (estimated) code size of the function's code multiplied by the total number of instantiations of F (one instantiation for each unique set of type parameters to F). If the cost metric is below some threshold, we fully template the function; otherwise we use GC stenciling, or even full fully generic code (cheaper than reflect but not much) in extreme cases. One canonical "extreme case" is the case of unbounded types (e.g. https://go2goplay.golang.org/p/3kUZ6L8amfd) which would then run OK but be predictably costly)

@bboreham
Copy link
Contributor Author

bboreham commented Oct 8, 2021

Thank you for the explanation, and the follow-up.

FWIW the reason I tried this was the regexp package has three input variants: byte slice, string and io.Reader, so the expected number of instantiations is small and the expected benefit from inlining, not-escaping, etc., is relatively high.

@uluyol
Copy link
Contributor

uluyol commented Oct 8, 2021

Note that we could resolve this if we were fully stenciling. But in our current implementation of generics (with gcshape stenciling and dictionaries) both f[*bar] and f[*baz] use the same blob of assembly.

This is getting beyond the scope of this issue, and probably well travelled ground (I haven't been following any CLs), but this makes me wonder about ubiquitous GC stenciling.

This example at least seems solvable with improved escape analysis. Right now f[T](x) causes x to escape because the decision is made in a call-oblivious manner (I.e. we don't know what T is). But it seems like that decision (whether the arg needs to escape) could be deferred to the call site, where we know what type T is.

@rogpeppe
Copy link
Contributor

rogpeppe commented Oct 8, 2021

But it seems like that decision (whether the arg needs to escape) could be deferred to the call site, where we know what type T is.

That's not necessarily true. The caller might have T as a type parameter itself. You'd have to go all the way back up the instantiation stacks until you find the actual set of types that T was instantiated as, and then check that it's the case that all of those do not escape x. Another possibility might be to put that information into the dictionary for the relevant functions so the stack vs heap decision is made dynamically, but that's definitely something for a future version not now, I suspect.

@uluyol
Copy link
Contributor

uluyol commented Oct 8, 2021

But it seems like that decision (whether the arg needs to escape) could be deferred to the call site, where we know what type T is.

That's not necessarily true. The caller might have T as a type parameter itself. You'd have to go all the way back up the instantiation stacks until you find the actual set of types that T was instantiated as, and then check that it's the case that all of those do not escape x. Another possibility might be to put that information into the dictionary for the relevant functions so the stack vs heap decision is made dynamically, but that's definitely something for a future version not now, I suspect.

True, but call-site sensitive escape analysis would help with some generic cases, and actually should help some non-generic cases as well. For example

func ReadAtLeast(r Reader, buf []byte, min int) (n int, err error) {
        if len(buf) < min {
                return 0, ErrShortBuffer
        }
        for n < min && err == nil {
                var nn int
                nn, err = r.Read(buf[n:])
                n += nn
        }
        if n >= min {
                err = nil
        } else if n > 0 && err == EOF {
                err = ErrUnexpectedEOF
        }
        return
}

This function cannot be inlined, so buf is always put on the heap (by the caller, or callers caller or whatever). But actually, it's safe to put buf on the stack if we know that the Reader doesn't hold onto it. But I digress.

@gopherbot
Copy link

Change https://golang.org/cl/360015 mentions this issue: crypto/elliptic: use generics for nistec-based curves

@randall77 randall77 modified the milestones: Go1.18, Go1.19 Nov 23, 2021
gopherbot pushed a commit that referenced this issue Apr 27, 2022
There was no way to use an interface because the methods on the Point
types return concrete Point values, as they should.

A couple somewhat minor annoyances:

    - Allocations went up due to #48849. This is fine here, where
      math/big causes allocations anyway, but would probably not be fine
      in nistec itself.

    - Carrying the newPoint/newGenerator functions around as a field is
      a little weird, even if type-safe. It also means we have to make
      what were functions methods so they can access newPoint to return
      the zero value. This is #35966.

For #52182

Change-Id: I050f3a27f15d3f189818da80da9de0cba0548931
Reviewed-on: https://go-review.googlesource.com/c/go/+/360015
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Filippo Valsorda <filippo@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Roland Shoemaker <roland@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
@ianlancetaylor
Copy link
Contributor

Moving to 1.20 milestone.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.19, Go1.20 Jun 24, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
@mknyszek
Copy link
Contributor

In triage, @randall77 notes that PGO information might make this much easier. No plans on doing this in the near future, for now.

@mknyszek mknyszek modified the milestones: Go1.20, Backlog Nov 30, 2022
@CannibalVox
Copy link

I encountered this and I was surprised by the behavior, mainly because I didn't have a similar problem with interfaces. I'm probably missing something:

package main

import (
	"fmt"
	"testing"
)

type S struct {
	Val int
}

type I interface {
	SetVal(v int)
}

func (s *S) SetVal(v int) {
	s.Val = v
}

func f[T I](s T) {
	s.SetVal(5)
}

func TestAlloc(t *testing.T) {
	fmt.Println(testing.AllocsPerRun(1, func() {
		var s S
		f(&s)
	}))
}

This allocates:

=== RUN   TestAlloc
1
--- PASS: TestAlloc (0.00s)
PASS

Program exited.
package main

import (
	"fmt"
	"testing"
)

type S struct {
	Val int
}

type I interface {
	SetVal(v int)
}

func (s *S) SetVal(v int) {
	s.Val = v
}

func f(s I) {
	s.SetVal(5)
}

func TestAlloc(t *testing.T) {
	fmt.Println(testing.AllocsPerRun(1, func() {
		var s S
		f(&s)
	}))
}

This does not alloc:

=== RUN   TestAlloc
0
--- PASS: TestAlloc (0.00s)
PASS

Program exited.

I would expect the two cases to work more or less the same.

@randall77
Copy link
Contributor

@CannibalVox
Your interface-based code does not allocate because f gets inlined, and then the s.SetVal call gets devirtualized, so the callee of that call can be determined statically. This then lets escape analysis know it can allocate s on the stack.
(You can demonstrate this by adding //go:noinline just before f.)

I'm not entirely sure why the inline/devirtualize optimization doesn't apply to the generic code. I suspect there's a phase-ordering issue, or the shapifying is somehow blocking the devirtualization step. Maybe someone else knows?
@mdempsky @cuonglm

@cuonglm
Copy link
Member

cuonglm commented Apr 9, 2023

@randall77 method expression on type parameter (s.SetVal(5) in this case) requires indirect call through runtime dictionary, That's why the devirtualization pass is not applied for the generic version of above code.

If type parameter is instantiated with an interface type, maybe we should return the method expression directly, instead of the indirect call.

@randall77
Copy link
Contributor

@randall77 method expression on type parameter (s.SetVal(5) in this case) requires indirect call through runtime dictionary, That's why the devirtualization pass is not applied for the generic version of above code.

So if f is inlined, then the s.SetVal is a call using a constant dictionary. Maybe devirtualization can unpack the dictionary entry and make a direct call?

If type parameter is instantiated with an interface type, maybe we should return the method expression directly, instead of the indirect call.

This isn't instantiating with an interface type, it is instantiating with *S, a pointer-to-struct.

@cuonglm
Copy link
Member

cuonglm commented Apr 9, 2023

This isn't instantiating with an interface type, it is instantiating with *S, a pointer-to-struct.

Ah right.

So if f is inlined, then the s.SetVal is a call using a constant dictionary. Maybe devirtualization can unpack the dictionary entry and make a direct call?

What "constant" do you mean in "constant dictionary"? Does it mean the .dict := .dict assignment generated when doing inlining?

I'm not sure we have enough information to unpack the dictionary when devirtualization happens.

@randall77
Copy link
Contributor

I mean that the func in TestAlloc loads the address of a static dictionary .dict.f[*S] and passes that to f[.shape.*uint8]. When the latter is inlined, we can see the callsite is calling a function loaded from a static dictionary entry. We can figure out what target that is.

At least, we could figure that out if we can follow the flow of the dictionary after inlining. You're right that we might have to understand dictionary assignments generated as part of inlining.

@aktau
Copy link
Contributor

aktau commented Aug 22, 2023

Would the approaches discussed in this issue also solve the escape of parameters? Example (I tried to escape the escape by parameterizing on a struct type FloatHidden, but gc sees through my poor trick):

play link

package main

func main() {
	{
		ts := TimeSeries[*FloatOf]{pending: new(FloatOf)}
		f := FloatOf(32) // Escape (due to Add).
		ts.Add(&f)
	}
	{
		ts := TimeSeries[FloatHidden]{pending: NewFloatHidden(0)}
		f := NewFloatHidden(94) // Escape (due to Add).
		ts.Add(f)
	}
	{
		ts := &TimeSeriesFloat{pending: new(FloatOf)}
		f := FloatOf(64) // No escape.
		ts.Add(&f)
	}
}

type Observable[T any] interface{ Add(other T) }

// ==== TimeSeries[T] and the manually monomorphized variant: TimeSeriesFloat ====

type TimeSeries[T Observable[T]] struct{ pending T }

func (ts *TimeSeries[T]) Add(observation T) { ts.pending.Add(observation) }

type TimeSeriesFloat struct{ pending *FloatOf }

func (ts *TimeSeriesFloat) Add(observation *FloatOf) { ts.pending.Add(observation) }

// ==== Types that implement Observable[T] ====

type FloatOf float64

func (f *FloatOf) Add(other *FloatOf) { *f += *other }

// FloatHidden is an attempt to work around
// https://planetscale.com/blog/generics-can-make-your-go-code-slower by not
// parameterizing on a pointer type. Sadly, it doesn't work.
type FloatHidden struct{ Ptr *float64 }

func NewFloatHidden(f float64) FloatHidden  { return FloatHidden{Ptr: &f} }
func (f FloatHidden) Add(other FloatHidden) { *f.Ptr += *other.Ptr }

This is an allocation in a hot path that would be nice to avoid, ideally without adding hacks like manual monomorphization (e.g. using code generation). We don't really need all of the optimizations that inlining could bring (although that would be nice), it would already be enough if the escape property would be propagated upwards.

@thepudds
Copy link
Contributor

Hi @aktau, your example seems to crash with a nil pointer dereference: https://go.dev/play/p/JQj8ERexpjb.

FWIW, I have a large-ish WIP stack of CLs that attempt some escape analysis improvements, and at first glance, I thought it might help at least part of your example, but at second glance, I'm less sure, including I might have confused myself as to which pieces in your example are your desired code vs. which pieces are attempted workarounds.

I also have a general question-- are there some places in your example you could just use floats instead of pointers to floats? (The more pointers you have, the harder it is for escape analysis of course). Also, escape analysis usually doesn't like a pointer dereference on the LHS of an assignment, which you might be doing.

Sorry, probably not a very helpful comment based just on a quick look.

@aktau
Copy link
Contributor

aktau commented Aug 22, 2023

@thepudds sorry about that. I copy/pasted a simplified version of my code without checking that it actually runs. I had only checked the -m=2 output to confirm that it still escaped after simplifying. I've updated the example and it should run without crashing now, as well as make a bit more sense.

I also have a general question-- are there some places in your example you could just use floats instead of pointers to floats? (The more pointers you have, the harder it is for escape analysis of course).

In this case, floats are just an example of an observable. In real code, this could be some aggregate data structure. E.g.:

type RpcStat struct {
  responseBytes int64
  requestBytes int64
  countPerStatus map[StatusCode]int64
  // ...
}

func (t *RpcStat) Add(o *RpcStat) {
  t.responseBytes += o.responseBytes
  // ...
}

Making the parameter a non-pointer type would complicate the implementation of the data structure, and also pass a potentially very large struct through registers and on the stack. The receiver must in any case be a pointer.

Also, escape analysis usually doesn't like a pointer dereference on the LHS of an assignment, which you might be doing.

This is the reason why I included the monomorphized non-generic variant of the timeseries data type: TimeSeriesfloat. It's the same as TimeSeries, but with T textually replaced by *FloatOf. In this case, the escapes are gone. It seems to me that gc is clever enough to deal with derefs on both lhs and rhs.

@mdempsky mdempsky self-assigned this Aug 23, 2023
@mdempsky
Copy link
Member

I'm reworking the top-level optimization phases. I think it's feasible to devirtualize through known dictionaries.

@dallbee
Copy link

dallbee commented Jan 8, 2024

I ran into what I think is a form of this today, so I'll just submit it as a use-case and a +1 for any potential improvement here.

There appears to be no way to return a type T where (*T) implements a required constraint method without allocating.

https://godbolt.org/z/3f9bWrcrs

type X struct{}

func (x *X) Unmarshal([]byte) {}

type Unmarshaler[T any] interface {
	Unmarshal([]byte)
	*T
}

// out escapes
func UnmarshalNew[T any, U Unmarshaler[T]](p []byte) T {
	var out U = new(T)
	out.Unmarshal(p)
	return *out
}

// out escapes
func UnmarshalConversion[T any, U Unmarshaler[T]](p []byte) T {
	var out T
	U(&out).Unmarshal(p)
	return out
}

// fully inlines
func DirectUnmarshalNew(p []byte) X {
        var out *X = new(X)
        out.Unmarshal(p)
        return *out
}

// fully inlines
func DirectUnmarshalConversion(p []byte) X {
        var out X
	out.Unmarshal(p)
	return out
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. generics Issue is related to generics NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Status: No status
Development

No branches or pull requests