Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: passing function argument via interface is 16x slower than via typed value #17118

Closed
valyala opened this issue Sep 15, 2016 · 13 comments

Comments

@valyala
Copy link
Contributor

valyala commented Sep 15, 2016

What version of Go are you using (go version)?

go version devel +67e22a1 Thu Sep 15 17:35:49 2016 +0300 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/aliaksandr/gopath"
GORACE=""
GOROOT="/home/aliaksandr/work/go"
GOTOOLDIR="/home/aliaksandr/work/go/pkg/tool/linux_amd64"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build571408045=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"

What did you do?

I benchmarked the following code:

package main

import "testing"

func BenchmarkEfaceArg(b *testing.B) {
        n := 0
        for i := 0; i < b.N; i++ {
                n += fEface(i)
        }
        if b.N > 0 && n == 0 {
                b.Fatalf("n shouldn't be zero. b.N=%d", b.N)
        }
}

func BenchmarkIntArg(b *testing.B) {
        n := 0
        for i := 0; i < b.N; i++ {
                n += fInt(i)
        }
        if b.N > 0 && n == 0 {
                b.Fatalf("n shouldn't be zero. b.N=%d", b.N)
        }
}

func fEface(v interface{}) int {
        return v.(int)+1
}

func fInt(n int) int {
        return n+1
}

Below are benchmark results:

go test -run=11111 -bench=Arg -benchmem -benchtime=1s
testing: warning: no tests to run
BenchmarkEfaceArg-4     100000000           16.3 ns/op         0 B/op          0 allocs/op
BenchmarkIntArg-4       2000000000           0.98 ns/op        0 B/op          0 allocs/op
PASS

What did you expect to see?

Both benchmarks should run with comparable speed.

What did you see instead?

The benchmark passing the argument via interface to fEface is 16x slower than the benchmark passing the argument to fInt.

@bradfitz
Copy link
Contributor

This is expected since Go 1.4 or Go 1.5, when interfaces were changed to only hold 0 or 1 pointer directly in their data word. In your example it's having to store a pointer to your int, and then dereference it.

There's no bug here, so closing. If you had a specific implementation proposal that I would've kept this open.

@valyala
Copy link
Contributor Author

valyala commented Sep 15, 2016

It looks like that all the types that fit a machine word (including int) are stored by value inside interface data field.

CPU profiling shows that the majority of time is spent inside convT2E -> typedememmove -> memmove when calling fEface.

I thought that the compiler could be smarter in this case:

  1. it could use faster approach when copying int to interface{} data field, since 16x slowdown looks too scary :)
  2. it could pass interface{} argument via CPU registers instead of stack
  3. it could inline fEface and completely get rid of int -> interface{} conversion.

@josharian , could you look into this?

@josharian
Copy link
Contributor

  1. It's possible we could specialize convT2E calls for particular common sizes to avoid going through typedmemmove. There are trade-offs here, but it might be worth investigating.
  2. @dr2chase wants to change the calling convention in general. That's probably worth doing. Changing it just for one call ain't gonna happen.
  3. We've inlined a bunch of these. Those that are left aren't worth inlining because they still make other function calls (notably allocation) anyway.

I can't look into this myself, since I'm leaving on vacation in a few hours. But I'm going to reopen this in case anyone (@martisch?) wants to investigate creating a few specialized convT2E variants.

@josharian josharian reopened this Sep 15, 2016
@randall77
Copy link
Contributor

We don't store all types that fit into a machine word by value any more. We used to, but we had to change that as part of moving to a precise garbage collector (1.3? 1.4? I forget now). Things that aren't pointers are now stored by reference. So to make an int into an interface we have to allocate an int on the heap and store a *int in the interface data word.

It would be nice if we could allocate that int on the stack instead of the heap. I think that is #13807 or #8618.

@valyala
Copy link
Contributor Author

valyala commented Sep 16, 2016

So to make an int into an interface we have to allocate an int on the heap and store a *int in the interface data word

But benchmark results say there are no memory allocations when passing int via interface{}:

BenchmarkEfaceArg-4     100000000           16.3 ns/op         0 B/op          0 allocs/op

edit:
I noticed that the i may be allocated on heap only once per benchmark, so I changed the benchmark to:

func BenchmarkEfaceArg(b *testing.B) {
        n := 0
        for i := 0; i < b.N; i++ {
                x := i
                n += fEface(x)
        }
        if b.N > 0 && n == 0 {
                b.Fatalf("n shouldn't be zero. b.N=%d", b.N)
        }
}

No x must be allocated during each iteration, but benchmark results still say there are no allocations.

@bradfitz
Copy link
Contributor

That's probably escape analysis.

What does go tool compile -l -l say?

@randall77
Copy link
Contributor

Looks like we are allocating that int on the stack, not the heap, sorry.
We're still calling runtime.convT2E for this, but that call probably isn't necessary. That's #15375.

@gopherbot
Copy link

CL https://golang.org/cl/29373 mentions this issue.

gopherbot pushed a commit that referenced this issue Sep 19, 2016
No point in calling a function when we can build the interface
using a known type (or itab) and the address of a local.

Get rid of third arg (preallocated stack space) to convT2{I,E}.

Makes go binary smaller by 0.2%

benchmark                   old ns/op     new ns/op     delta
BenchmarkEfaceInteger-8     16.7          10.1          -39.52%

Update #17118
Update #15375

Change-Id: I9724a1f802bfa1e3957bf1856b55558278e198a2
Reviewed-on: https://go-review.googlesource.com/29373
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
@valyala
Copy link
Contributor Author

valyala commented Sep 19, 2016

@randall77 , results are way better now:

BenchmarkEfaceArg-4     200000000            8.41 ns/op        0 B/op          0 allocs/op
BenchmarkIntArg-4       2000000000           0.98 ns/op        0 B/op          0 allocs/op

There is no convT2E in cpu profile now, but there is a assertE2T, which, in turn, calls typedmemmove -> memmove:

(pprof) top
9.22s of 9.22s total (  100%)
      flat  flat%   sum%        cum   cum%
     3.23s 35.03% 35.03%      9.22s   100%  _/home/aliaksandr/work/rev-share-backend.BenchmarkEfaceArg
     2.12s 22.99% 58.03%      5.99s 64.97%  runtime.assertE2T
        2s 21.69% 79.72%         2s 21.69%  runtime.memmove
     1.87s 20.28%   100%      3.87s 41.97%  runtime.typedmemmove
         0     0%   100%      9.22s   100%  runtime.goexit
         0     0%   100%      9.22s   100%  testing.(*B).launch
         0     0%   100%      9.22s   100%  testing.(*B).runN

(pprof) list assertE2T
Total: 9.22s
ROUTINE ======================== runtime.assertE2T in /home/aliaksandr/work/go1.5/src/runtime/iface.go
     2.12s      5.99s (flat, cum) 64.97% of Total
         .          .    227:       typedmemmove(t, r, i.data)
         .          .    228:   }
         .          .    229:   return true
         .          .    230:}
         .          .    231:
     670ms      670ms    232:func assertE2T(t *_type, e eface, r unsafe.Pointer) {
      70ms       70ms    233:   if e._type == nil {
         .          .    234:       panic(&TypeAssertionError{"", "", t.string(), ""})
         .          .    235:   }
     370ms      370ms    236:   if e._type != t {
         .          .    237:       panic(&TypeAssertionError{"", e._type.string(), t.string(), ""})
         .          .    238:   }
      50ms       50ms    239:   if r != nil {
     240ms      240ms    240:       if isDirectIface(t) {
         .          .    241:           writebarrierptr((*uintptr)(r), uintptr(e.data))
         .          .    242:       } else {
     360ms      4.23s    243:           typedmemmove(t, r, e.data)
         .          .    244:       }
         .          .    245:   }
     360ms      360ms    246:}
         .          .    247:

@randall77
Copy link
Contributor

All of those calls are from v.(int).
We might be able to inline the common case where the type matches.
We still need to have a call somewhere, though, as there is always the possibility of a type mismatch and a resulting panic.

@valyala
Copy link
Contributor Author

valyala commented Sep 19, 2016

Yeah, the following fEface code works much faster:

func fEface(v interface{}) int {
        if x, ok := v.(int); ok {
                return x+1
        }
        return 0
}
BenchmarkEfaceArg-4     3000000000           2.13 ns/op        0 B/op          0 allocs/op

I think the issue may be closed now.

@valyala
Copy link
Contributor Author

valyala commented Sep 19, 2016

All of those calls are from v.(int).
We might be able to inline the common case where the type matches.
We still need to have a call somewhere, though, as there is always the possibility of a type mismatch and a resulting panic.

@randall77 , how about automatic rewriting of the following code:

x := v.(int)

into

x, ok := v.(int) // fast path
if !ok {
    // slow path
    panic(fmt.Sprintf("cannot convert %T into int", v))
}

during compilation? Then the assertE2T cost should disappear on the fast path

@randall77
Copy link
Contributor

Yes, that's what I meant by "inline the common case where the type matches".

@golang golang locked and limited conversation to collaborators Sep 19, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants