reflect: Call is slow #7818

bradfitz · 2014-04-18T17:56:20Z

reflect.Value.Call is pretty slow.

In addition to always allocating a slice for its []reflect.Value result parameters, it
also does a lot of prep work on each call.

It would be nice to both avoid that allocation and also do the setup checks just once.

Maybe a new method to return some sort of 'Caller' type that makes repeated Calls
faster, with less paranoia and no allocation.

Or just speed up the checks and make a new Call method that also accepts a slice for the
result values.

Call frame allocations can account for significant portion of all allocations in a program, if call is executed in an inner loop (e.g. to process every line in a log). On the other hand, the allocation is easy to remove using sync.Pool since the allocation is strictly scoped. benchmark old ns/op new ns/op delta BenchmarkCall 634 338 -46.69% BenchmarkCall-4 496 167 -66.33% benchmark old allocs new allocs delta BenchmarkCall 1 0 -100.00% BenchmarkCall-4 1 0 -100.00% Update #7818 Change-Id: Icf60cce0a9be82e6171f0c0bd80dee2393db54a7 Reviewed-on: https://go-review.googlesource.com/1954 Reviewed-by: Keith Randall <khr@golang.org>

bradfitz · 2016-12-12T19:02:11Z

Looking at some internal fleet-wide CPU usage, I see reflect.Value.Call and reflect.Value.call show up pretty high in the list.

It might be time to optimize this.

In Go 1.8, there's now precedent in the reflect package for returning a worker func (reflect.Swapper) after doing the validation only once.

Investigate the top 2014 comment's Caller idea, and see how much CPU it can save.

/cc @dsnet

aclements · 2017-03-28T21:07:05Z

Maybe a new method to return some sort of 'Caller' type that makes repeated Calls faster, with less paranoia and no allocation.

Value.Interface() sort of does this, but you have to call the result like a real function, not using Value.Call(). If you can fit code into using Value.Interface(), then there's no allocation overhead (unless the frame pool is empty) and no validation. The compiler constructs the arguments frame and interprets the result frame and reflect just has to do some memory copying. (That said, it still does two copies of both the arguments and the results. I feel like it should be possible to get that down to one.)

bruth · 2017-10-17T19:37:31Z

For some concrete numbers, as of 1.9 on darwin/amd64, reflect.Value.Call is ~65-80x slower than other invocation types. Gist of benchmark code.

BenchmarkReflectMethodCall-4         	10000000	       144 ns/op
BenchmarkReflectOnceMethodCall-4     	10000000	       138 ns/op
BenchmarkStructMethodCall-4          	1000000000	         2.20 ns/op
BenchmarkInterfaceMethodCall-4       	1000000000	         2.14 ns/op
BenchmarkTypeSwitchMethodCall-4      	2000000000	         1.88 ns/op
BenchmarkTypeAssertionMethodCall-4   	2000000000	         1.83 ns/op

patricksuo · 2018-06-12T08:02:30Z

Caching function layout in a Caller reduce 50% reflect call overhead. https://gist.github.com/sillyousu/606e4874839456cc02335bd1c5045f27

update @bruth 's benchmark:

BenchmarkReflectCaller-8                20000000                64.6 ns/op
BenchmarkReflectMethodCall-8            10000000               135 ns/op
BenchmarkReflectOnceMethodCall-8        10000000               126 ns/op
BenchmarkStructMethodCall-8             2000000000               1.67 ns/op
BenchmarkInterfaceMethodCall-8          1000000000               2.41 ns/op
BenchmarkTypeSwitchMethodCall-8         2000000000               1.23 ns/op
BenchmarkTypeAssertionMethodCall-8      2000000000               1.42 ns/op

dongweigogo · 2018-12-28T02:39:27Z

It seems that java sets a buffer slice to avoid this problem.

gopherbot · 2019-03-09T19:02:30Z

Change https://golang.org/cl/166462 mentions this issue: reflect: make all flag.mustBe* methods inlinable

mustBe was barely over budget, so manually inlining the first flag.kind call is enough. Add a TODO to reverse that in the future, once the compiler gets better. mustBeExported and mustBeAssignable were over budget by a larger amount, so add slow path functions instead. This is the same strategy used in the sync package for common methods like Once.Do, for example. Lots of exported reflect.Value methods call these assert-like unexported methods, so avoiding the function call overhead in the common case does shave off a percent from most exported APIs. Finally, add the methods to TestIntendedInlining. While at it, replace a couple of uses of the 0 Kind with its descriptive name, Invalid. name old time/op new time/op delta Call-8 68.0ns ± 1% 66.8ns ± 1% -1.81% (p=0.000 n=10+9) PtrTo-8 8.00ns ± 2% 7.83ns ± 0% -2.19% (p=0.000 n=10+9) Updates #7818. Change-Id: Ic1603b640519393f6b50dd91ec3767753eb9e761 Reviewed-on: https://go-review.googlesource.com/c/go/+/166462 Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>

Taking a look at a CPU profile, IsVariadic was calculated multiple times unnecessarily. I also added a new BechmarkCallMethod to measure this use-case. name old time/op new time/op delta Call-16 20.3ns ± 8% 20.3ns ±16% ~ (p=0.443 n=18+20) CallMethod-16 99.2ns ± 3% 90.1ns ± 2% -9.22% (p=0.000 n=20+17) CallArgCopy/size=128-16 15.0ns ± 2% 14.5ns ± 3% -2.76% (p=0.000 n=20+19) CallArgCopy/size=256-16 15.9ns ± 7% 15.3ns ± 5% -4.26% (p=0.000 n=20+20) CallArgCopy/size=1024-16 17.6ns ± 7% 17.2ns ± 6% -1.73% (p=0.044 n=19+20) CallArgCopy/size=4096-16 25.3ns ± 4% 24.9ns ± 4% -1.66% (p=0.016 n=18+20) CallArgCopy/size=65536-16 375ns ± 4% 376ns ± 4% ~ (p=0.644 n=20+20) name old alloc/op new alloc/op delta Call-16 0.00B 0.00B ~ (all equal) CallMethod-16 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call-16 0.00 0.00 ~ (all equal) CallMethod-16 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128-16 8.56GB/s ± 2% 8.80GB/s ± 3% +2.84% (p=0.000 n=20+19) CallArgCopy/size=256-16 16.1GB/s ± 6% 16.8GB/s ± 5% +4.45% (p=0.000 n=20+20) CallArgCopy/size=1024-16 58.2GB/s ± 7% 59.4GB/s ± 6% +2.16% (p=0.026 n=20+20) CallArgCopy/size=4096-16 161GB/s ± 4% 165GB/s ± 4% +1.95% (p=0.007 n=17+20) CallArgCopy/size=65536-16 175GB/s ± 4% 174GB/s ± 4% ~ (p=0.640 n=20+20) Updates golang#7818 Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

These calls are cachable, so do that in order to avoid doing extra work. This opportunity was discovered while taking a look at a CPU profile while investigating golang#7818. I added a BenchmarkCallMethod which is similar to BechmarkCall but for a method receiver. Benchmark results, including the new BenchmarkCallMethod: name old time/op new time/op delta Call-16 22.0ns ±19% 20.2ns ±17% -8.08% (p=0.000 n=40+40) CallMethod-16 100ns ± 3% 91ns ± 2% -9.13% (p=0.000 n=40+39) CallArgCopy/size=128-16 15.7ns ± 1% 14.3ns ± 4% -8.98% (p=0.000 n=38+37) CallArgCopy/size=256-16 15.9ns ± 3% 15.0ns ± 5% -6.12% (p=0.000 n=39+39) CallArgCopy/size=1024-16 18.8ns ± 6% 17.1ns ± 6% -9.03% (p=0.000 n=38+38) CallArgCopy/size=4096-16 26.6ns ± 3% 25.2ns ± 4% -5.19% (p=0.000 n=39+40) CallArgCopy/size=65536-16 379ns ± 3% 371ns ± 5% -2.11% (p=0.000 n=39+40) name old alloc/op new alloc/op delta Call-16 0.00B 0.00B ~ (all equal) CallMethod-16 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call-16 0.00 0.00 ~ (all equal) CallMethod-16 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128-16 8.13GB/s ± 1% 8.92GB/s ± 4% +9.77% (p=0.000 n=38+38) CallArgCopy/size=256-16 16.1GB/s ± 3% 17.1GB/s ± 5% +6.56% (p=0.000 n=39+39) CallArgCopy/size=1024-16 54.6GB/s ± 6% 60.1GB/s ± 5% +9.93% (p=0.000 n=38+38) CallArgCopy/size=4096-16 154GB/s ± 5% 163GB/s ± 4% +5.63% (p=0.000 n=40+40) CallArgCopy/size=65536-16 173GB/s ± 3% 177GB/s ± 5% +2.18% (p=0.000 n=39+40) Updates golang#7818. Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

gopherbot · 2021-01-03T15:08:17Z

Change https://golang.org/cl/281252 mentions this issue: reflect: cache IsVariadic calls in Call

These calls are cachable, so do that in order to avoid doing extra work. This opportunity was discovered while taking a look at a CPU profile while investigating golang#7818. I added a BenchmarkCallMethod which is similar to BechmarkCall but for a method receiver. Benchmark results, including the new BenchmarkCallMethod: name old time/op new time/op delta Call-16 22.0ns ±19% 20.2ns ±17% -8.08% (p=0.000 n=40+40) CallMethod-16 100ns ± 3% 91ns ± 2% -9.13% (p=0.000 n=40+39) CallArgCopy/size=128-16 15.7ns ± 1% 14.3ns ± 4% -8.98% (p=0.000 n=38+37) CallArgCopy/size=256-16 15.9ns ± 3% 15.0ns ± 5% -6.12% (p=0.000 n=39+39) CallArgCopy/size=1024-16 18.8ns ± 6% 17.1ns ± 6% -9.03% (p=0.000 n=38+38) CallArgCopy/size=4096-16 26.6ns ± 3% 25.2ns ± 4% -5.19% (p=0.000 n=39+40) CallArgCopy/size=65536-16 379ns ± 3% 371ns ± 5% -2.11% (p=0.000 n=39+40) name old alloc/op new alloc/op delta Call-16 0.00B 0.00B ~ (all equal) CallMethod-16 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call-16 0.00 0.00 ~ (all equal) CallMethod-16 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128-16 8.13GB/s ± 1% 8.92GB/s ± 4% +9.77% (p=0.000 n=38+38) CallArgCopy/size=256-16 16.1GB/s ± 3% 17.1GB/s ± 5% +6.56% (p=0.000 n=39+39) CallArgCopy/size=1024-16 54.6GB/s ± 6% 60.1GB/s ± 5% +9.93% (p=0.000 n=38+38) CallArgCopy/size=4096-16 154GB/s ± 5% 163GB/s ± 4% +5.63% (p=0.000 n=40+40) CallArgCopy/size=65536-16 173GB/s ± 3% 177GB/s ± 5% +2.18% (p=0.000 n=39+40) Updates golang#7818. Signed-off-by: Ignacio Hagopian <jsign.uy@gmail.com>

These calls are cacheable, so do that to avoid doing extra work. This opportunity was discovered while taking a look at a CPU profile while investigating #7818. I added a BenchmarkCallMethod, which is similar to BechmarkCall but for a method receiver. Benchmark results, including the new BenchmarkCallMethod: name old time/op new time/op delta Call-16 22.0ns ±19% 20.2ns ±17% -8.08% (p=0.000 n=40+40) CallMethod-16 100ns ± 3% 91ns ± 2% -9.13% (p=0.000 n=40+39) CallArgCopy/size=128-16 15.7ns ± 1% 14.3ns ± 4% -8.98% (p=0.000 n=38+37) CallArgCopy/size=256-16 15.9ns ± 3% 15.0ns ± 5% -6.12% (p=0.000 n=39+39) CallArgCopy/size=1024-16 18.8ns ± 6% 17.1ns ± 6% -9.03% (p=0.000 n=38+38) CallArgCopy/size=4096-16 26.6ns ± 3% 25.2ns ± 4% -5.19% (p=0.000 n=39+40) CallArgCopy/size=65536-16 379ns ± 3% 371ns ± 5% -2.11% (p=0.000 n=39+40) name old alloc/op new alloc/op delta Call-16 0.00B 0.00B ~ (all equal) CallMethod-16 0.00B 0.00B ~ (all equal) name old allocs/op new allocs/op delta Call-16 0.00 0.00 ~ (all equal) CallMethod-16 0.00 0.00 ~ (all equal) name old speed new speed delta CallArgCopy/size=128-16 8.13GB/s ± 1% 8.92GB/s ± 4% +9.77% (p=0.000 n=38+38) CallArgCopy/size=256-16 16.1GB/s ± 3% 17.1GB/s ± 5% +6.56% (p=0.000 n=39+39) CallArgCopy/size=1024-16 54.6GB/s ± 6% 60.1GB/s ± 5% +9.93% (p=0.000 n=38+38) CallArgCopy/size=4096-16 154GB/s ± 5% 163GB/s ± 4% +5.63% (p=0.000 n=40+40) CallArgCopy/size=65536-16 173GB/s ± 3% 177GB/s ± 5% +2.18% (p=0.000 n=39+40) Updates #7818. Change-Id: I94f88811ea9faf3dc2543984a13b360b5db66a4b GitHub-Last-Rev: 9bbaa18 GitHub-Pull-Request: #43475 Reviewed-on: https://go-review.googlesource.com/c/go/+/281252 Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org> Trust: Daniel Martí <mvdan@mvdan.cc> Trust: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Daniel Martí <mvdan@mvdan.cc> TryBot-Result: Go Bot <gobot@golang.org>

mvdan · 2022-08-24T08:31:58Z

Now that we have #49340, which is a reasonable proposal which is just waiting for a prototype, should we close this issue as a duplicate? Is there any other part of Call which is slower than it needs to be besides the allocation for the results slice?

mitar · 2024-03-06T11:38:05Z

I updated benchmark by @bruth here and these are results on my machine:

BenchmarkReflectMethodByNameInterface-8   	 3859410	       317.2 ns/op
BenchmarkReflectMethodByName-8            	 5119252	       237.1 ns/op
BenchmarkReflectMethodCall-8              	 8402196	       129.5 ns/op
BenchmarkReflectOnceMethodCall-8          	 9686301	       131.1 ns/op
BenchmarkReflectCallInterface-8           	516733941	         2.164 ns/op
BenchmarkStructMethodCall-8               	472757470	         2.524 ns/op
BenchmarkInterfaceMethodCall-8            	544497948	         2.027 ns/op
BenchmarkTypeSwitchMethodCall-8           	437183308	         2.694 ns/op
BenchmarkTypeAssertionMethodCall-8        	428367258	         2.714 ns/op

I added method invocation of using MethodByName and then of calling .Call directly or of first type-casting through .Interface().(func()). I also added .Interface().(func()) on the method pointer itself. What is interesting to me:

BenchmarkInterfaceMethodCall (calling a method on the interface value) is faster than BenchmarkStructMethodCall (calling a method on a struct pointer). That is a surprise.
Type-casting through .Interface().(func()) removes overhead for method pointer, but adds overhead when method is obtained using MethodByName. It is even faster to call .Call on the method from MethodByName than when type-casting it. That is the biggest surprise to me. Doing:

	s := reflect.ValueOf(v)
	f := s.MethodByName("Inc").Interface().(func())
	for k := 0; k < n; k++ {
		f()
	}

I would expect and assume that f would behave and be close to performance of a regular function call. Especially as it looks like a regular function value.

So I think there are definitely some improvements needed here.

This comment has been minimized.

Sign in to view

bradfitz added accepted Performance GarbageCollector labels Sep 2, 2014

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed release-none labels Apr 10, 2015

bradfitz modified the milestones: Go1.9Maybe, Unplanned Dec 12, 2016

bradfitz self-assigned this Dec 12, 2016

This comment has been minimized.

Sign in to view

bradfitz modified the milestones: Go1.10, Go1.9Maybe Jun 7, 2017

bradfitz modified the milestones: Go1.10, Unplanned Nov 14, 2017

bradfitz removed their assignment Nov 14, 2017

bradfitz added the help wanted label Nov 14, 2017

This comment has been minimized.

Sign in to view

This was referenced Sep 12, 2018

Improve performance dhui/passhash#10

Closed

Improve performance dhui/thevent#4

Open

This comment has been minimized.

Sign in to view

jsign mentioned this issue Jan 3, 2021

reflect: cache IsVariadic calls in Call #43475

Closed

knusbaum mentioned this issue Sep 22, 2021

internal/appsec: add new AppSec beta functionality and support for contrib/net/http DataDog/dd-trace-go#1007

Merged

bcmills mentioned this issue Oct 4, 2021

proposal: testing: a less error-prone API for benchmark iteration #48768

Closed

jonastheis mentioned this issue Apr 26, 2022

Serix: Serialization library iotaledger/hive.go#333

Merged

ValarDragon mentioned this issue Jun 18, 2022

feat: add query.GenericFilteredPaginated cosmos/cosmos-sdk#12253

Merged

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reflect: Call is slow #7818

reflect: Call is slow #7818

bradfitz commented Apr 18, 2014

This comment has been minimized.

This comment has been minimized.

bradfitz commented Dec 12, 2016

aclements commented Mar 28, 2017

This comment has been minimized.

bruth commented Oct 17, 2017

This comment has been minimized.

patricksuo commented Jun 12, 2018 •

edited

This comment has been minimized.

dongweigogo commented Dec 28, 2018

gopherbot commented Mar 9, 2019

gopherbot commented Jan 3, 2021

mvdan commented Aug 24, 2022

mitar commented Mar 6, 2024 •

edited

reflect: Call is slow #7818

reflect: Call is slow #7818

Comments

bradfitz commented Apr 18, 2014

This comment has been minimized.

This comment has been minimized.

bradfitz commented Dec 12, 2016

aclements commented Mar 28, 2017

This comment has been minimized.

bruth commented Oct 17, 2017

This comment has been minimized.

patricksuo commented Jun 12, 2018 • edited

This comment has been minimized.

dongweigogo commented Dec 28, 2018

gopherbot commented Mar 9, 2019

gopherbot commented Jan 3, 2021

mvdan commented Aug 24, 2022

mitar commented Mar 6, 2024 • edited

patricksuo commented Jun 12, 2018 •

edited

mitar commented Mar 6, 2024 •

edited