cmd/compile: named returns are faster than inline returns #40638

prashantv · 2020-08-07T18:14:08Z

What version of Go are you using (`go version`)?

$ go version
go version go1.14.7 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/prashant/.cache/go-build"
GOENV="/home/prashant/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/prashant/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/prashant/.gimme/versions/go1.14.7.linux.amd64"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/prashant/.gimme/versions/go1.14.7.linux.amd64/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build010078908=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Return a medium sized struct (E.g., 4 byte slices) in similar, but slightly different ways (e.g., using named returns, as part of return), and saw unexpected slow performance for the return "immediate"

E.g., this approach was slower:

func (s largeStruct) ReturnDirectly() (largeStruct, error) {
        return largeStruct{
                bs1: s.bs2,
                bs2: s.bs1,
                bs3: s.bs4,
                bs4: s.bs3,
        }, nil
}

as compared to using named returns:

//go:noinline
func (s largeStruct) NamedReturnExplicit() (ls largeStruct, _ error) {
        ls = largeStruct{
                bs1: s.bs2,
                bs2: s.bs1,
                bs3: s.bs4,
                bs4: s.bs3,
        }
        return ls, nil
}

Full repro:

package main

import "testing"

type largeStruct struct {
	bs1      []byte
	bs2      []byte
	bs3, bs4 int
}

func BenchmarkFoo(b *testing.B) {
	tests := []struct {
		msg string
		f   func(s largeStruct) (largeStruct, error)
	}{
                {"ReturnDirectly", largeStruct.ReturnDirectly},
                {"ReturnImmediate", largeStruct.ReturnIntermediate},
                {"NamedReturnExplicit", largeStruct.NamedReturnExplicit},
                {"NamedReturnImplicit", largeStruct.NamedReturnImplicit},
        }

        for _, tt := range tests {
                b.Run(tt.msg, func(b *testing.B) {
                        var s largeStruct
                        var err error
                        for i := 0; i < b.N; i++ {
                                s, err = tt.f(s)
                        }
                        _ = err
                })
        }
}

//go:noinline
func (s largeStruct) ReturnDirectly() (largeStruct, error) {
        return largeStruct{
                bs1: s.bs2,
                bs2: s.bs1,
                bs3: s.bs4,
                bs4: s.bs3,
        }, nil
}

//go:noinline
func (s largeStruct) ReturnIntermediate() (largeStruct, error) {
        ls := largeStruct{
                bs1: s.bs2,
                bs2: s.bs1,
                bs3: s.bs4,
                bs4: s.bs3,
        }
        return ls, nil
}

//go:noinline
func (s largeStruct) NamedReturnExplicit() (ls largeStruct, _ error) {
        ls = largeStruct{
                bs1: s.bs2,
                bs2: s.bs1,
                bs3: s.bs4,
                bs4: s.bs3,
        }
        return ls, nil
}

//go:noinline
func (s largeStruct) NamedReturnImplicit() (ls largeStruct, err error) {
        ls = largeStruct{
                bs1: s.bs2,
                bs2: s.bs1,
                bs3: s.bs4,
                bs4: s.bs3,
        }
        return
}

What did you expect to see?

Expected the performance to be the same for the different approaches.

What did you see instead?

Named returns has a performance advantage over returning a value as part of a return statement.

The text was updated successfully, but these errors were encountered:

As part of #792, we moved to typed.ReadBuffer which is much more readable than direct binary translations, but we lost a little bit of performance on the KV iterator, Original: ``` BenchmarkKeyValIterator-12 52558426 66.7 ns/op ``` With typed.ReadBuffer: ``` BenchmarkKeyValIterator-12 34918740 101 ns/op ``` WIth these optimizations, performance is close to the original: ``` BenchmarkKeyValIterator-12 46994628 76.1 ns/op ``` This change has the following optimizations: * Make ReadBuffer smaller by 2 words (replace []byte with int). This gets rid of the Fill functionality, which was never used in any production code anyway. * Don't create a read buffer for single Uint16 read when creating KeyValueIterator * Instead of using io.EOF to check for end of all headers, add a Remaining method which returns a bool (vs the Next method which returns > 10 words). We were spending time just copying the large struct values on the stack after the last header read. * Simplify how the KeyValueIterator.Next method creates remaining by adding method directly to the ReadBuffer. * Use named returns in KeyValueIterator.Next which give an unexpected ~6% performance improvement. Filed golang/go#40638 since this was unexpected.

As part of #792, we moved to typed.ReadBuffer which is much more readable than direct binary translations, but we lost a little bit of performance on the KV iterator, Original: ``` BenchmarkKeyValIterator-12 52558426 66.7 ns/op ``` With typed.ReadBuffer: ``` BenchmarkKeyValIterator-12 34918740 101 ns/op ``` WIth these optimizations, performance is close to the original: ``` BenchmarkKeyValIterator-12 46994628 76.1 ns/op ``` This change has the following optimizations: * Make ReadBuffer smaller by 2 words (replace []byte with int). This gets rid of the Fill functionality, which was never used in any production code anyway. * Don't create a read buffer for single Uint16 read when creating KeyValueIterator * Instead of using io.EOF to check for end of all headers, add a Remaining method which returns a bool (vs the Next method which returns > 10 words). We were spending time just copying the large struct values on the stack after the last header read. * Simplify how the KeyValueIterator.Next method creates remaining by adding method directly to the ReadBuffer. * Use named returns in KeyValueIterator.Next which give an unexpected ~6% performance improvement. Filed golang/go#40638 as we'd like to avoid named returns here.

ianlancetaylor · 2020-08-07T19:12:21Z

It's a pretty small effect, but it does seem that using a local variable causes the compiler to assemble the value on the stack and then copy it to the result parameter stack location. Using a named result parameter causes the compiler to assemble the value directly in the result parameter stack location.

When using gccgo I see that ReturnIntermediate is slightly slower, and the others are all about the same speed.

CC @randall77

dgryski · 2020-08-07T19:13:44Z

Duplicate of #20859 ?

ianlancetaylor · 2020-08-07T19:19:37Z

Thanks. Closing as dup.

As part of #792, we moved to typed.ReadBuffer which is much more readable than direct binary translations, but we lost a little bit of performance on the KV iterator, Original: ``` BenchmarkKeyValIterator-12 52558426 66.7 ns/op ``` With typed.ReadBuffer: ``` BenchmarkKeyValIterator-12 34918740 101 ns/op ``` WIth these optimizations, performance is close to the original: ``` BenchmarkKeyValIterator-12 46994628 76.1 ns/op ``` This change has the following optimizations: * Make ReadBuffer smaller by 2 words (replace []byte with int). This gets rid of the Fill functionality, which was never used in any production code anyway. * Don't create a read buffer for single Uint16 read when creating KeyValueIterator * Instead of using io.EOF to check for end of all headers, add a Remaining method which returns a bool (vs the Next method which returns > 10 words). We were spending time just copying the large struct values on the stack after the last header read. * Simplify how the KeyValueIterator.Next method creates remaining by adding method directly to the ReadBuffer. * Use named returns in KeyValueIterator.Next which give an unexpected ~6% performance improvement. Filed golang/go#40638 as we'd like to avoid named returns here.

prashantv mentioned this issue Aug 7, 2020

Optimize KVIterator and typed.ReadBuffer for 25% improvement uber/tchannel-go#795

Merged

ianlancetaylor changed the title ~~cmd/go: Named returns are faster than inline returns~~ cmd/compile: named returns are faster than inline returns Aug 7, 2020

ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 7, 2020

ianlancetaylor added this to the Backlog milestone Aug 7, 2020

ianlancetaylor closed this as completed Aug 7, 2020

golang locked and limited conversation to collaborators Aug 7, 2021

gopherbot added the FrozenDueToAge label Aug 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: named returns are faster than inline returns #40638

cmd/compile: named returns are faster than inline returns #40638

prashantv commented Aug 7, 2020 •

edited by ianlancetaylor

ianlancetaylor commented Aug 7, 2020

dgryski commented Aug 7, 2020

ianlancetaylor commented Aug 7, 2020

cmd/compile: named returns are faster than inline returns #40638

cmd/compile: named returns are faster than inline returns #40638

Comments

prashantv commented Aug 7, 2020 • edited by ianlancetaylor

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

ianlancetaylor commented Aug 7, 2020

dgryski commented Aug 7, 2020

ianlancetaylor commented Aug 7, 2020

prashantv commented Aug 7, 2020 •

edited by ianlancetaylor

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?