Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: iterators cannot be composed without incurring extra heap allocations (val + func literals) #69539

Closed
coxley opened this issue Sep 19, 2024 · 12 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@coxley
Copy link

coxley commented Sep 19, 2024

Go version

go 1.23 / go1.24-fc968d4 / go1.24-165bf24

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='arm64'
GOBIN=''
GOCACHE='/Users/coxley/Library/Caches/go-build'
GOENV='/Users/coxley/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/coxley/.go/pkg/mod'
GOOS='darwin'
GOPATH='/Users/coxley/.go'
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/homebrew/Cellar/go/1.23.1/libexec'
GOSUMDB='off'
GOTMPDIR=''
GOTOOLCHAIN='local'
GOTOOLDIR='/opt/homebrew/Cellar/go/1.23.1/libexec/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.1'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/coxley/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='cc'
CXX='c++'
CGO_ENABLED='1'
GOMOD='/var/folders/3s/w_9thc5x5zj6d5vyf_9j9xw00000gn/T/tmp.vEBTr6BuI7/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/3s/w_9thc5x5zj6d5vyf_9j9xw00000gn/T/go-build3137110682=/tmp/go-build -gno-record-gcc-switches -fno-common'

What did you do?

Runnable benchmark: https://go.dev/play/p/1J9H8AL2aGM

Created two types:

type Slice[T any] []T

type Collection[T any] struct {
  underlying Slice[T]
}

This simulates an actual use-case I have: an AVL tree being embedded in a graph data structure. Iterating from a higher-order type should be as performant as iterating from the foundational one.

When moving away from callback-style iteration (avl.Each(func(key int64, val *Edge) { /* ... */ })) to idiomatic iterators, I noticed a non-trivial performance regression. Granted, some of this iteration is in a tight-loop so maybe it isn't affecting every program, but after digging I noticed that higher-order iterators' func literals escape to the heap while the lowest-order one's do not.

The benchmark link above demonstrates. I'm not sure if this is #66469, #69015, or simply related. But the first issue didn't get any traction which was before the 1.23 release.

What did you see happen?

  • A basic iter.Seq style iterator performs 0 allocations.
  • Another iter.Seq style iterator on that dispatches to the other incurs allocations — 6 in the go.dev/play example.

What did you expect to see?

Equivalent performance of both.

The only current option is to duplicate iteration logic instead of composing iterators.

@gabyhelp
Copy link

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 19, 2024
@coxley coxley changed the title cmd/compile: Nested iterators escape to heap when base iterator does not cmd/compile: Nested iterators cannot be composed without incurring extra heap allocations (val + func literals) Sep 19, 2024
@coxley coxley changed the title cmd/compile: Nested iterators cannot be composed without incurring extra heap allocations (val + func literals) cmd/compile: Iterators cannot be composed without incurring extra heap allocations (val + func literals) Sep 19, 2024
@coxley
Copy link
Author

coxley commented Sep 19, 2024

I had hoped "cmd/compile: fix wrong esacpe analysis for rangefunc" would fix it, but it appears the composed version has the same cost in both go 1.23.1 and master.

> go version
go version go1.23.1 darwin/arm64

> go build -gcflags='-m=3' main.go 2>&1 | grep 'inline Collection\[go.shape.int\].Values'
./main.go:59:6: can inline Collection[go.shape.int].Values with cost 17 as: method(Collection[go.shape.int]) func(*[8]uintptr) iter.Seq[go.shape.int] { return func literal }
./main.go:60:9: cannot inline Collection[go.shape.int].Values.func1: function too complex: cost 189 exceeds budget 80
./main.go:61:3: cannot inline Collection[go.shape.int].Values.func1-range1: function too complex: cost 142 exceeds budget 80
./main.go:46:9: can inline Collection[go.shape.int].Values.func1.Slice[go.shape.int].All.1 with cost 72 as: func(func(int, go.shape.int) bool) { for loop }

> /Users/coxley/projects/go/bin/go version
go version devel go1.24-165bf24 2024-09-19 00:40:50 +0000 darwin/arm64

> /Users/coxley/projects/go/bin/go build -gcflags='-m=3' main.go 2>&1 | grep 'inline Collection\[go.shape.int\].Values'
./main.go:59:6: can inline Collection[go.shape.int].Values with cost 17 as: method(Collection[go.shape.int]) func(*[8]uintptr) iter.Seq[go.shape.int] { return func literal }
./main.go:60:9: cannot inline Collection[go.shape.int].Values.func1: function too complex: cost 189 exceeds budget 80
./main.go:61:3: cannot inline Collection[go.shape.int].Values.func1-range1: function too complex: cost 142 exceeds budget 80
./main.go:46:9: can inline Collection[go.shape.int].Values.func1.Slice[go.shape.int].All.1 with cost 72 as: func(func(int, go.shape.int) bool) { for loop }

@nemith
Copy link
Contributor

nemith commented Sep 19, 2024

Moving the underlying iterator out of the closure seems to remove the allocations but it is still a lot slower.

func (s Collection[T]) Values() iter.Seq[T] {
	underlyingIter := s.underlying.All()
	return func(yield func(T) bool) {
		for _, v := range underlyingIter {
			if !yield(v) {
				return
			}
		}
	}
}
$ go run main.go
Direct:        5	       200.0 ns/op        0 B/op	       0 allocs/op
Nested:        5	      5033 ns/op        0 B/op	       0 allocs/op

@cagedmantis cagedmantis changed the title cmd/compile: Iterators cannot be composed without incurring extra heap allocations (val + func literals) cmd/compile: iterators cannot be composed without incurring extra heap allocations (val + func literals) Sep 20, 2024
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 20, 2024
@cagedmantis cagedmantis added this to the Go1.24 milestone Sep 20, 2024
@cuonglm
Copy link
Member

cuonglm commented Sep 23, 2024

I had hoped "cmd/compile: fix wrong esacpe analysis for rangefunc" would fix it, but it appears the composed version has the same cost in both go 1.23.1 and master.

The fix is about wrong analysis of escape analysis, so it won't change the inlining cost.

@dr2chase
Copy link
Contributor

dr2chase commented Oct 23, 2024

Not entirely sure what's gone wrong here -- when compiled "-m", it helpfully reports that it can inline all the things with recent closure-inline-enhancing CLs applied, but it is not faster.

Correction -- it is NOT inlining all the things.

@dr2chase
Copy link
Contributor

Here's another benchmark -- if nothing else, assigning to "_" in the loop body gets optimized away. I'd also rather write these as go-test benchmarks, which reduces the need to check to see if something non-standard is going on.

package foo_test

import (
	"iter"
	"testing"
)

var u int

func BenchmarkDirect(b *testing.B) {
	s := Slice[int](make([]int, 64))
	for range b.N {
		for _, v := range s.All() {
			u = v
		}
	}
}

func BenchmarkNestedA(b *testing.B) {
	collection := Collection[int]{Slice[int](make([]int, 64))}
	for range b.N {
		for v := range collection.ValuesA() {
			u = v
		}
	}
}

func BenchmarkNestedB(b *testing.B) {
	collection := Collection[int]{Slice[int](make([]int, 64))}
	for range b.N {
		for v := range collection.ValuesB() {
			u = v
		}
	}
}

func BenchmarkNestedACall(b *testing.B) {
	collection := Collection[int]{Slice[int](make([]int, 64))}
	yield := func(i int) bool {
		u = i
		return true
	}
	for range b.N {
		collection.ValuesA()(yield)
	}
}

func BenchmarkNestedBCall(b *testing.B) {
	collection := Collection[int]{Slice[int](make([]int, 64))}
	yield := func(i int) bool {
		u = i
		return true
	}
	for range b.N {
		collection.ValuesB()(yield)
	}
}

type Slice[T any] []T

func (s Slice[T]) All() iter.Seq2[int, T] {
	return func(yield func(int, T) bool) {
		for i, v := range s {
			if !yield(i, v) {
				return
			}
		}
	}
}

type Collection[T any] struct {
	underlying Slice[T]
}

func (s Collection[T]) ValuesA() iter.Seq[T] {
	return func(yield func(T) bool) {
		for _, v := range s.underlying.All() {
			if !yield(v) {
				return
			}
		}
	}
}

func (s Collection[T]) ValuesB() iter.Seq[T] {
	underlyingIter := s.underlying.All()
	return func(yield func(T) bool) {
		for _, v := range underlyingIter {
			if !yield(v) {
				return
			}
		}
	}
}

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/622415 mentions this issue: cmd/compile: spell "go.runtime" correctly for inline "cheap" test

gopherbot pushed a commit that referenced this issue Oct 24, 2024
Updates #69539.

Change-Id: I40885e9c23f35772f8ace645044afee0d55b70b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/622415
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Griesemer <gri@google.com>
@coxley
Copy link
Author

coxley commented Oct 24, 2024

@dr2chase Sorry for the non-standard benchmark — only did that so it was viewable in go.dev/play.

@dr2chase
Copy link
Contributor

dr2chase commented Nov 5, 2024

This is a dupe of #69411 -- same bug, same root cause. The inliner needs to get better at closures.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/629195 mentions this issue: cmd/compile: strongly favor closure inlining

gopherbot pushed a commit that referenced this issue Nov 19, 2024
This tweaks the inlining cost knob for closures
specifically, they receive a doubled budget.  The
rationale for this is that closures have a lot of
"crud" in their IR that will disappear after inlining,
so the standard budget penalizes them unnecessarily.

This is also the cause of these bugs -- looking at the
code involved, these closures "should" be inlineable,
therefore tweak the parameters until behavior matches
expectations.  It's not costly in binary size, because
the only-called-from-one-site case is common (especially
for rangefunc iterators).

I can imagine better fixes and I am going to try to
get that done, but this one is small and makes things
better.

Fixes #69411, #69539.

Change-Id: I8a892c40323173a723799e0ddad69dcc2724a8f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/629195
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@eliasnaur
Copy link
Contributor

CL 629195 says "Fixes #69411, #69539." but the bot didn't close this issue.

@mvdan
Copy link
Member

mvdan commented Nov 20, 2024

I don't think the GitHub regex matching for fixes/closes deals with lists like that - I normally do one fixes per line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Development

No branches or pull requests

10 participants