Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reflect: Call is slow #7818

Open
bradfitz opened this issue Apr 18, 2014 · 14 comments
Open

reflect: Call is slow #7818

bradfitz opened this issue Apr 18, 2014 · 14 comments

Comments

@bradfitz
Copy link
Contributor

reflect.Value.Call is pretty slow.

In addition to always allocating a slice for its []reflect.Value result parameters, it
also does a lot of prep work on each call.

It would be nice to both avoid that allocation and also do the setup checks just once.

Maybe a new method to return some sort of 'Caller' type that makes repeated Calls
faster, with less paranoia and no allocation.

Or just speed up the checks and make a new Call method that also accepts a slice for the
result values.
@bradfitz

This comment has been minimized.

@bradfitz

This comment has been minimized.

dvyukov added a commit that referenced this issue Jan 28, 2015
Call frame allocations can account for significant portion
of all allocations in a program, if call is executed
in an inner loop (e.g. to process every line in a log).
On the other hand, the allocation is easy to remove
using sync.Pool since the allocation is strictly scoped.

benchmark           old ns/op     new ns/op     delta
BenchmarkCall       634           338           -46.69%
BenchmarkCall-4     496           167           -66.33%

benchmark           old allocs     new allocs     delta
BenchmarkCall       1              0              -100.00%
BenchmarkCall-4     1              0              -100.00%

Update #7818

Change-Id: Icf60cce0a9be82e6171f0c0bd80dee2393db54a7
Reviewed-on: https://go-review.googlesource.com/1954
Reviewed-by: Keith Randall <khr@golang.org>
@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@bradfitz bradfitz modified the milestones: Go1.9Maybe, Unplanned Dec 12, 2016
@bradfitz
Copy link
Contributor Author

Looking at some internal fleet-wide CPU usage, I see reflect.Value.Call and reflect.Value.call show up pretty high in the list.

It might be time to optimize this.

In Go 1.8, there's now precedent in the reflect package for returning a worker func (reflect.Swapper) after doing the validation only once.

Investigate the top 2014 comment's Caller idea, and see how much CPU it can save.

/cc @dsnet

@bradfitz bradfitz self-assigned this Dec 12, 2016
@aclements
Copy link
Member

Maybe a new method to return some sort of 'Caller' type that makes repeated Calls faster, with less paranoia and no allocation.

Value.Interface() sort of does this, but you have to call the result like a real function, not using Value.Call(). If you can fit code into using Value.Interface(), then there's no allocation overhead (unless the frame pool is empty) and no validation. The compiler constructs the arguments frame and interprets the result frame and reflect just has to do some memory copying. (That said, it still does two copies of both the arguments and the results. I feel like it should be possible to get that down to one.)

@nicola-spb

This comment has been minimized.

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9Maybe Jun 7, 2017
@bruth
Copy link

bruth commented Oct 17, 2017

For some concrete numbers, as of 1.9 on darwin/amd64, reflect.Value.Call is ~65-80x slower than other invocation types. Gist of benchmark code.

BenchmarkReflectMethodCall-4         	10000000	       144 ns/op
BenchmarkReflectOnceMethodCall-4     	10000000	       138 ns/op
BenchmarkStructMethodCall-4          	1000000000	         2.20 ns/op
BenchmarkInterfaceMethodCall-4       	1000000000	         2.14 ns/op
BenchmarkTypeSwitchMethodCall-4      	2000000000	         1.88 ns/op
BenchmarkTypeAssertionMethodCall-4   	2000000000	         1.83 ns/op

@bradfitz bradfitz modified the milestones: Go1.10, Unplanned Nov 14, 2017
@bradfitz bradfitz removed their assignment Nov 14, 2017
@dongweigogo

This comment has been minimized.

@patricksuo
Copy link

patricksuo commented Jun 12, 2018

Caching function layout in a Caller reduce 50% reflect call overhead. https://gist.github.com/sillyousu/606e4874839456cc02335bd1c5045f27

update @bruth 's benchmark:

BenchmarkReflectCaller-8                20000000                64.6 ns/op
BenchmarkReflectMethodCall-8            10000000               135 ns/op
BenchmarkReflectOnceMethodCall-8        10000000               126 ns/op
BenchmarkStructMethodCall-8             2000000000               1.67 ns/op
BenchmarkInterfaceMethodCall-8          1000000000               2.41 ns/op
BenchmarkTypeSwitchMethodCall-8         2000000000               1.23 ns/op
BenchmarkTypeAssertionMethodCall-8      2000000000               1.42 ns/op

@bradfitz

This comment has been minimized.

@dongweigogo
Copy link

It seems that java sets a buffer slice to avoid this problem.

@gopherbot
Copy link

Change https://golang.org/cl/166462 mentions this issue: reflect: make all flag.mustBe* methods inlinable

gopherbot pushed a commit that referenced this issue Mar 9, 2019
mustBe was barely over budget, so manually inlining the first flag.kind
call is enough. Add a TODO to reverse that in the future, once the
compiler gets better.

mustBeExported and mustBeAssignable were over budget by a larger amount,
so add slow path functions instead. This is the same strategy used in
the sync package for common methods like Once.Do, for example.

Lots of exported reflect.Value methods call these assert-like unexported
methods, so avoiding the function call overhead in the common case does
shave off a percent from most exported APIs.

Finally, add the methods to TestIntendedInlining.

While at it, replace a couple of uses of the 0 Kind with its descriptive
name, Invalid.

name     old time/op    new time/op    delta
Call-8     68.0ns ± 1%    66.8ns ± 1%  -1.81%  (p=0.000 n=10+9)
PtrTo-8    8.00ns ± 2%    7.83ns ± 0%  -2.19%  (p=0.000 n=10+9)

Updates #7818.

Change-Id: Ic1603b640519393f6b50dd91ec3767753eb9e761
Reviewed-on: https://go-review.googlesource.com/c/go/+/166462
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
jsign added a commit to jsign/go that referenced this issue Jan 3, 2021
Taking a look at a CPU profile, IsVariadic was calculated multiple times
unnecessarily.

I also added a new BechmarkCallMethod to measure this use-case.

name                       old time/op    new time/op    delta
Call-16                      20.3ns ± 8%    20.3ns ±16%    ~     (p=0.443 n=18+20)
CallMethod-16                99.2ns ± 3%    90.1ns ± 2%  -9.22%  (p=0.000 n=20+17)
CallArgCopy/size=128-16      15.0ns ± 2%    14.5ns ± 3%  -2.76%  (p=0.000 n=20+19)
CallArgCopy/size=256-16      15.9ns ± 7%    15.3ns ± 5%  -4.26%  (p=0.000 n=20+20)
CallArgCopy/size=1024-16     17.6ns ± 7%    17.2ns ± 6%  -1.73%  (p=0.044 n=19+20)
CallArgCopy/size=4096-16     25.3ns ± 4%    24.9ns ± 4%  -1.66%  (p=0.016 n=18+20)
CallArgCopy/size=65536-16     375ns ± 4%     376ns ± 4%    ~     (p=0.644 n=20+20)

name                       old alloc/op   new alloc/op   delta
Call-16                       0.00B          0.00B         ~     (all equal)
CallMethod-16                 0.00B          0.00B         ~     (all equal)

name                       old allocs/op  new allocs/op  delta
Call-16                        0.00           0.00         ~     (all equal)
CallMethod-16                  0.00           0.00         ~     (all equal)

name                       old speed      new speed      delta
CallArgCopy/size=128-16    8.56GB/s ± 2%  8.80GB/s ± 3%  +2.84%  (p=0.000 n=20+19)
CallArgCopy/size=256-16    16.1GB/s ± 6%  16.8GB/s ± 5%  +4.45%  (p=0.000 n=20+20)
CallArgCopy/size=1024-16   58.2GB/s ± 7%  59.4GB/s ± 6%  +2.16%  (p=0.026 n=20+20)
CallArgCopy/size=4096-16    161GB/s ± 4%   165GB/s ± 4%  +1.95%  (p=0.007 n=17+20)
CallArgCopy/size=65536-16   175GB/s ± 4%   174GB/s ± 4%    ~     (p=0.640 n=20+20)

Updates golang#7818

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
jsign added a commit to jsign/go that referenced this issue Jan 3, 2021
These calls are cachable, so do that in order to avoid doing extra work.

This opportunity was discovered while taking a look at a CPU profile while
investigating golang#7818.

I added a BenchmarkCallMethod which is similar to BechmarkCall but for a
method receiver.

Benchmark results, including the new BenchmarkCallMethod:

name                       old time/op    new time/op    delta
Call-16                      22.0ns ±19%    20.2ns ±17%  -8.08%  (p=0.000 n=40+40)
CallMethod-16                 100ns ± 3%      91ns ± 2%  -9.13%  (p=0.000 n=40+39)
CallArgCopy/size=128-16      15.7ns ± 1%    14.3ns ± 4%  -8.98%  (p=0.000 n=38+37)
CallArgCopy/size=256-16      15.9ns ± 3%    15.0ns ± 5%  -6.12%  (p=0.000 n=39+39)
CallArgCopy/size=1024-16     18.8ns ± 6%    17.1ns ± 6%  -9.03%  (p=0.000 n=38+38)
CallArgCopy/size=4096-16     26.6ns ± 3%    25.2ns ± 4%  -5.19%  (p=0.000 n=39+40)
CallArgCopy/size=65536-16     379ns ± 3%     371ns ± 5%  -2.11%  (p=0.000 n=39+40)

name                       old alloc/op   new alloc/op   delta
Call-16                       0.00B          0.00B         ~     (all equal)
CallMethod-16                 0.00B          0.00B         ~     (all equal)

name                       old allocs/op  new allocs/op  delta
Call-16                        0.00           0.00         ~     (all equal)
CallMethod-16                  0.00           0.00         ~     (all equal)

name                       old speed      new speed      delta
CallArgCopy/size=128-16    8.13GB/s ± 1%  8.92GB/s ± 4%  +9.77%  (p=0.000 n=38+38)
CallArgCopy/size=256-16    16.1GB/s ± 3%  17.1GB/s ± 5%  +6.56%  (p=0.000 n=39+39)
CallArgCopy/size=1024-16   54.6GB/s ± 6%  60.1GB/s ± 5%  +9.93%  (p=0.000 n=38+38)
CallArgCopy/size=4096-16    154GB/s ± 5%   163GB/s ± 4%  +5.63%  (p=0.000 n=40+40)
CallArgCopy/size=65536-16   173GB/s ± 3%   177GB/s ± 5%  +2.18%  (p=0.000 n=39+40)

Updates golang#7818.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
jsign added a commit to jsign/go that referenced this issue Jan 3, 2021
These calls are cachable, so do that in order to avoid doing extra work.

This opportunity was discovered while taking a look at a CPU profile while
investigating golang#7818.

I added a BenchmarkCallMethod which is similar to BechmarkCall but for a
method receiver.

Benchmark results, including the new BenchmarkCallMethod:

name                       old time/op    new time/op    delta
Call-16                      22.0ns ±19%    20.2ns ±17%  -8.08%  (p=0.000 n=40+40)
CallMethod-16                 100ns ± 3%      91ns ± 2%  -9.13%  (p=0.000 n=40+39)
CallArgCopy/size=128-16      15.7ns ± 1%    14.3ns ± 4%  -8.98%  (p=0.000 n=38+37)
CallArgCopy/size=256-16      15.9ns ± 3%    15.0ns ± 5%  -6.12%  (p=0.000 n=39+39)
CallArgCopy/size=1024-16     18.8ns ± 6%    17.1ns ± 6%  -9.03%  (p=0.000 n=38+38)
CallArgCopy/size=4096-16     26.6ns ± 3%    25.2ns ± 4%  -5.19%  (p=0.000 n=39+40)
CallArgCopy/size=65536-16     379ns ± 3%     371ns ± 5%  -2.11%  (p=0.000 n=39+40)

name                       old alloc/op   new alloc/op   delta
Call-16                       0.00B          0.00B         ~     (all equal)
CallMethod-16                 0.00B          0.00B         ~     (all equal)

name                       old allocs/op  new allocs/op  delta
Call-16                        0.00           0.00         ~     (all equal)
CallMethod-16                  0.00           0.00         ~     (all equal)

name                       old speed      new speed      delta
CallArgCopy/size=128-16    8.13GB/s ± 1%  8.92GB/s ± 4%  +9.77%  (p=0.000 n=38+38)
CallArgCopy/size=256-16    16.1GB/s ± 3%  17.1GB/s ± 5%  +6.56%  (p=0.000 n=39+39)
CallArgCopy/size=1024-16   54.6GB/s ± 6%  60.1GB/s ± 5%  +9.93%  (p=0.000 n=38+38)
CallArgCopy/size=4096-16    154GB/s ± 5%   163GB/s ± 4%  +5.63%  (p=0.000 n=40+40)
CallArgCopy/size=65536-16   173GB/s ± 3%   177GB/s ± 5%  +2.18%  (p=0.000 n=39+40)

Updates golang#7818.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
@gopherbot
Copy link

Change https://golang.org/cl/281252 mentions this issue: reflect: cache IsVariadic calls in Call

mvdan pushed a commit to jsign/go that referenced this issue Mar 30, 2021
These calls are cachable, so do that in order to avoid doing extra work.

This opportunity was discovered while taking a look at a CPU profile while
investigating golang#7818.

I added a BenchmarkCallMethod which is similar to BechmarkCall but for a
method receiver.

Benchmark results, including the new BenchmarkCallMethod:

name                       old time/op    new time/op    delta
Call-16                      22.0ns ±19%    20.2ns ±17%  -8.08%  (p=0.000 n=40+40)
CallMethod-16                 100ns ± 3%      91ns ± 2%  -9.13%  (p=0.000 n=40+39)
CallArgCopy/size=128-16      15.7ns ± 1%    14.3ns ± 4%  -8.98%  (p=0.000 n=38+37)
CallArgCopy/size=256-16      15.9ns ± 3%    15.0ns ± 5%  -6.12%  (p=0.000 n=39+39)
CallArgCopy/size=1024-16     18.8ns ± 6%    17.1ns ± 6%  -9.03%  (p=0.000 n=38+38)
CallArgCopy/size=4096-16     26.6ns ± 3%    25.2ns ± 4%  -5.19%  (p=0.000 n=39+40)
CallArgCopy/size=65536-16     379ns ± 3%     371ns ± 5%  -2.11%  (p=0.000 n=39+40)

name                       old alloc/op   new alloc/op   delta
Call-16                       0.00B          0.00B         ~     (all equal)
CallMethod-16                 0.00B          0.00B         ~     (all equal)

name                       old allocs/op  new allocs/op  delta
Call-16                        0.00           0.00         ~     (all equal)
CallMethod-16                  0.00           0.00         ~     (all equal)

name                       old speed      new speed      delta
CallArgCopy/size=128-16    8.13GB/s ± 1%  8.92GB/s ± 4%  +9.77%  (p=0.000 n=38+38)
CallArgCopy/size=256-16    16.1GB/s ± 3%  17.1GB/s ± 5%  +6.56%  (p=0.000 n=39+39)
CallArgCopy/size=1024-16   54.6GB/s ± 6%  60.1GB/s ± 5%  +9.93%  (p=0.000 n=38+38)
CallArgCopy/size=4096-16    154GB/s ± 5%   163GB/s ± 4%  +5.63%  (p=0.000 n=40+40)
CallArgCopy/size=65536-16   173GB/s ± 3%   177GB/s ± 5%  +2.18%  (p=0.000 n=39+40)

Updates golang#7818.

Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>
gopherbot pushed a commit that referenced this issue Mar 30, 2021
These calls are cacheable, so do that to avoid doing extra work.

This opportunity was discovered while taking a look at a CPU profile
while investigating #7818.

I added a BenchmarkCallMethod, which is similar to BechmarkCall but
for a method receiver.

Benchmark results, including the new BenchmarkCallMethod:

	name                       old time/op    new time/op    delta
	Call-16                      22.0ns ±19%    20.2ns ±17%  -8.08%  (p=0.000 n=40+40)
	CallMethod-16                 100ns ± 3%      91ns ± 2%  -9.13%  (p=0.000 n=40+39)
	CallArgCopy/size=128-16      15.7ns ± 1%    14.3ns ± 4%  -8.98%  (p=0.000 n=38+37)
	CallArgCopy/size=256-16      15.9ns ± 3%    15.0ns ± 5%  -6.12%  (p=0.000 n=39+39)
	CallArgCopy/size=1024-16     18.8ns ± 6%    17.1ns ± 6%  -9.03%  (p=0.000 n=38+38)
	CallArgCopy/size=4096-16     26.6ns ± 3%    25.2ns ± 4%  -5.19%  (p=0.000 n=39+40)
	CallArgCopy/size=65536-16     379ns ± 3%     371ns ± 5%  -2.11%  (p=0.000 n=39+40)

	name                       old alloc/op   new alloc/op   delta
	Call-16                       0.00B          0.00B         ~     (all equal)
	CallMethod-16                 0.00B          0.00B         ~     (all equal)

	name                       old allocs/op  new allocs/op  delta
	Call-16                        0.00           0.00         ~     (all equal)
	CallMethod-16                  0.00           0.00         ~     (all equal)

	name                       old speed      new speed      delta
	CallArgCopy/size=128-16    8.13GB/s ± 1%  8.92GB/s ± 4%  +9.77%  (p=0.000 n=38+38)
	CallArgCopy/size=256-16    16.1GB/s ± 3%  17.1GB/s ± 5%  +6.56%  (p=0.000 n=39+39)
	CallArgCopy/size=1024-16   54.6GB/s ± 6%  60.1GB/s ± 5%  +9.93%  (p=0.000 n=38+38)
	CallArgCopy/size=4096-16    154GB/s ± 5%   163GB/s ± 4%  +5.63%  (p=0.000 n=40+40)
	CallArgCopy/size=65536-16   173GB/s ± 3%   177GB/s ± 5%  +2.18%  (p=0.000 n=39+40)

Updates #7818.

Change-Id: I94f88811ea9faf3dc2543984a13b360b5db66a4b
GitHub-Last-Rev: 9bbaa18
GitHub-Pull-Request: #43475
Reviewed-on: https://go-review.googlesource.com/c/go/+/281252
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Daniel Martí <mvdan@mvdan.cc>
Trust: Brad Fitzpatrick <bradfitz@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Go Bot <gobot@golang.org>
@mvdan
Copy link
Member

mvdan commented Aug 24, 2022

Now that we have #49340, which is a reasonable proposal which is just waiting for a prototype, should we close this issue as a duplicate? Is there any other part of Call which is slower than it needs to be besides the allocation for the results slice?

@mitar
Copy link
Contributor

mitar commented Mar 6, 2024

I updated benchmark by @bruth here and these are results on my machine:

BenchmarkReflectMethodByNameInterface-8   	 3859410	       317.2 ns/op
BenchmarkReflectMethodByName-8            	 5119252	       237.1 ns/op
BenchmarkReflectMethodCall-8              	 8402196	       129.5 ns/op
BenchmarkReflectOnceMethodCall-8          	 9686301	       131.1 ns/op
BenchmarkReflectCallInterface-8           	516733941	         2.164 ns/op
BenchmarkStructMethodCall-8               	472757470	         2.524 ns/op
BenchmarkInterfaceMethodCall-8            	544497948	         2.027 ns/op
BenchmarkTypeSwitchMethodCall-8           	437183308	         2.694 ns/op
BenchmarkTypeAssertionMethodCall-8        	428367258	         2.714 ns/op

I added method invocation of using MethodByName and then of calling .Call directly or of first type-casting through .Interface().(func()). I also added .Interface().(func()) on the method pointer itself. What is interesting to me:

  • BenchmarkInterfaceMethodCall (calling a method on the interface value) is faster than BenchmarkStructMethodCall (calling a method on a struct pointer). That is a surprise.
  • Type-casting through .Interface().(func()) removes overhead for method pointer, but adds overhead when method is obtained using MethodByName. It is even faster to call .Call on the method from MethodByName than when type-casting it. That is the biggest surprise to me. Doing:
	s := reflect.ValueOf(v)
	f := s.MethodByName("Inc").Interface().(func())
	for k := 0; k < n; k++ {
		f()
	}

I would expect and assume that f would behave and be close to performance of a regular function call. Especially as it looks like a regular function value.

So I think there are definitely some improvements needed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants