reflect: use binary search in MethodByName #50617

quasilyte · 2022-01-14T21:28:46Z

What version of Go are you using (`go version`)?

go1.17.2

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (`go env`)?

Linux/amd64

But it should be irrelevant.

What did you do?

Executed hugo on digitalgov.gov site, collected the CPU profiles. Found that reflect.Type.MethodByName is top1.

      flat  flat%   sum%        cum   cum%
     8.84s  6.28%  6.28%     57.85s 41.10%  reflect.(*rtype).MethodByName
     7.93s  5.63% 11.92%      8.50s  6.04%  reflect.name.readVarint
     7.56s  5.37% 17.29%    111.79s 79.43%  reflect.Value.call
     7.53s  5.35% 22.64%     23.33s 16.58%  runtime.mallocgc
     7.29s  5.18% 27.82%     16.10s 11.44%  reflect.name.name

If we take a look at the MethodByName sources, we'll notice that it uses a linear search. There is even a comment to fix that:

go/src/reflect/type.go

Lines 888 to 893 in a99c38d

    
           // TODO(mdempsky): Binary search. 
        
           for i, p := range ut.exportedMethods() { 
        
           	if t.nameOff(p.name).name() == name { 
        
           		return t.Method(i), true 
        
           	} 
        
           }

What did you expect to see?

reflect.Type.MethodByName doesn't use linear search for bigger slices.

What did you see instead?

reflect.Type.MethodByName is quite slow for some cases.

More info, numbers, benchmarks

I'll send a CL that adds a binary search to the reflect.Type.MethodByName, but since there are a few caveats, I think it's worthwhile to share some info I gathered by investigating this.

After patching the reflect package, I compared the results.

Old site build time: 20.4s
New site build time: 16.5s

If we take a look at the "new" CPU profile, MethodByName doesn't appear in top20 at all, even though it was a top1 entry before:

(pprof) top 20
Showing nodes accounting for 50.85s, 44.61% of 114s total
Dropped 1174 nodes (cum <= 0.57s)
Showing top 20 nodes out of 194
      flat  flat%   sum%        cum   cum%
     8.64s  7.58%  7.58%     25.01s 21.94%  runtime.mallocgc
     7.69s  6.75% 14.32%     87.02s 76.33%  reflect.Value.call
     4.17s  3.66% 17.98%      5.39s  4.73%  runtime.heapBitsSetType
     2.88s  2.53% 20.51%         4s  3.51%  runtime.findObject
     2.83s  2.48% 22.99%      2.83s  2.48%  runtime.nextFreeFast (inline)
     2.79s  2.45% 25.44%      9.46s  8.30%  runtime.scanobject
     2.74s  2.40% 27.84%      2.74s  2.40%  runtime.duffcopy
     1.82s  1.60% 29.44%      1.88s  1.65%  runtime.pageIndexOf (partial-inline)
     1.76s  1.54% 30.98%      1.76s  1.54%  runtime.memclrNoHeapPointers
     1.69s  1.48% 32.46%      3.24s  2.84%  runtime.mapaccess2
     1.66s  1.46% 33.92%      1.83s  1.61%  syscall.Syscall
     1.49s  1.31% 35.23%      1.61s  1.41%  reflect.name.readVarint (inline)
     1.43s  1.25% 36.48%      3.07s  2.69%  reflect.name.name
     1.42s  1.25% 37.73%     10.54s  9.25%  reflect.FuncOf
     1.37s  1.20% 38.93%      1.75s  1.54%  github.com/gohugoio/hugo/related.ranks.Less
     1.34s  1.18% 40.11%      4.99s  4.38%  sync.(*Map).Load
     1.33s  1.17% 41.27%      1.33s  1.17%  reflect.(*rtype).Kind
     1.29s  1.13% 42.40%      1.29s  1.13%  cmpbody
     1.28s  1.12% 43.53%      1.28s  1.12%  runtime.resolveNameOff
     1.23s  1.08% 44.61%      1.30s  1.14%  runtime.heapBitsForAddr (partial-inline)

But then we have a question: why we get an impact that big? We need to collect some stats to be sure that it's not a coincidence.

First, let's see a distribution of number of methods.

15920637 times | 120 methods
493669 times | 29 methods
227736 times | 16 methods
180098 times | 8 methods
143846 times | 39 methods
31371 times | 6 methods
14636 times | 1 methods
10933 times | 31 methods
10094 times | 9 methods
9026 times | 4 methods
102 times | 113 methods
85 times | 7 methods
4 times | 114 methods
1 times | 0 methods

It looks like vast majority of calls goes through 120-entry slice with linear search. Not very common I guess, but that's what we get.

But if the searched method would be located at 2nd or 3rd pos, it wouldn't be a problem. Therefore, we need to see how "found at" indexes are distributed.

12998952 times | 98 index pos
1203864 times | 70 index pos
515978 times | 102 index pos
354191 times | 110 index pos
262965 times | 6 index pos
200819 times | 11 index pos
178457 times | 19 index pos
170806 times | 97 index pos
147510 times | 74 index pos
136927 times | 3 index pos
123071 times | 20 index pos
106250 times | 9 index pos
87190 times | 7 index pos
62767 times | 31 index pos
62192 times | 5 index pos
50170 times | 51 index pos
49120 times | 69 index pos
45325 times | 72 index pos
42236 times | 115 index pos
38646 times | 0 index pos
37106 times | 28 index pos
22744 times | 108 index pos
21521 times | 14 index pos
18134 times | 92 index pos
17341 times | 4 index pos
11535 times | 103 index pos
10791 times | 63 index pos
10303 times | 2 index pos
9170 times | 10 index pos
7278 times | 68 index pos
5936 times | 15 index pos
5466 times | 65 index pos
5385 times | 16 index pos
5172 times | 47 index pos
4478 times | 43 index pos
3032 times | 40 index pos
2340 times | 17 index pos
2265 times | 25 index pos
1516 times | 44 index pos
1068 times | 88 index pos
843 times | 1 index pos
709 times | 71 index pos
358 times | 109 index pos
100 times | 67 index pos
50 times | 64 index pos
40 times | 42 index pos
33 times | 34 index pos
17 times | 86 index pos
15 times | 83 index pos
15 times | 41 index pos
15 times | 33 index pos
12 times | 118 index pos
5 times | 8 index pos
5 times | 27 index pos
4 times | 66 index pos

Most cases fall into "found at i=98" category, which is pretty bad for linear search.

In total, building that site makes hugo call MethodByName 17042238 times.

Benchmarking the new method shows mixed, but expected, results:

name                           old time/op  new time/op  delta
MethodByName/4_first-8          426ns ± 2%   423ns ± 1%     ~     (p=0.056 n=10+9)
MethodByName/4_last-8           482ns ± 1%   476ns ± 1%   -1.33%  (p=0.000 n=10+10)
MethodByName/4_nonexisting-8   57.8ns ± 0%  57.3ns ± 1%   -0.78%  (p=0.000 n=9+9)
MethodByName/16_first-8         427ns ± 1%   538ns ± 0%  +25.97%  (p=0.000 n=9+9)
MethodByName/16_last-8          643ns ± 1%   519ns ± 1%  -19.24%  (p=0.000 n=9+9)
MethodByName/16_nonexisting-8   194ns ± 0%   105ns ± 0%  -46.03%  (p=0.000 n=8+9)
MethodByName/32_first-8         429ns ± 1%   552ns ± 1%  +28.78%  (p=0.000 n=10+8)
MethodByName/32_last-8          831ns ± 0%   540ns ± 2%  -34.97%  (p=0.000 n=10+9)
MethodByName/32_nonexisting-8   396ns ± 1%   125ns ± 1%  -68.56%  (p=0.000 n=10+10)
MethodByName/64_first-8         431ns ± 1%   580ns ± 2%  +34.64%  (p=0.000 n=10+10)
MethodByName/64_last-8         1.36µs ± 1%  0.56µs ± 1%  -58.95%  (p=0.000 n=10+8)
MethodByName/64_nonexisting-8   759ns ± 0%   146ns ± 1%  -80.82%  (p=0.000 n=8+10)
[Geo mean]                      430ns        303ns       -29.58%

In general, the new approach is only slower if searched method happens to be located somewhere in the beginning of the slice. If we assume that they're somewhere in the middle on average, then binary search performs better. I would still keep a linear search path for small slices, like len<=8 to avoid a minor regression for small slices (that are not so uncommon).

Caveats

reflect can't import sort package, so we can't use sort.Search
Linear search works better if a search term is somewhere at the beginning of a slice

We can decrease the effect of (2) by using a linear search path for small slices, but in general, this side-effect is hard to avoid.

The text was updated successfully, but these errors were encountered:

gopherbot · 2022-01-14T21:32:43Z

Change https://golang.org/cl/378634 mentions this issue: reflect: use binary search in MethodByName

robpike · 2022-01-14T21:42:44Z

Thanks for the thorough analysis. It's nice to see someone really dig into a performance problem like this.

120 methods is just crazy. I wonder how common this problem actually is, and if a different fix, if any, is the right way forward. MethodByName works very hard, even ignoring the a linear scan. Also, using the reflect package to call a method is sometimes necessary but often avoidable with a bit of work elsewhere. I am nervous about opening the door to optimizations of the reflect package.

YuriyNasretdinov · 2022-01-14T22:07:16Z

As a regular Go user, I think it's pretty reasonable to assume that MethodByName() won't be an O(N) operation. I think the author did list one such example, the templating engine, where it matters, but one would argue that e.g. if you make a reflection-based HTTP router, or any kind of dynamic dispatch, e.g. in an interpreter like yaegi, you would probably just use MethodByName(). You would not think of caching it unless you know that it's implemented as a linear search, because it's a standard library function, and most of the time the user's assumption would be that it's reasonably (as much as it can be reasonable given that it's reflection) optimised.

quasilyte · 2022-01-14T22:13:05Z

@YuriyNasretdinov in fact, yaegi does use MethodByName in a few places. I haven't done any research to find out whether these usages lie on a hot path or anything. I can probably run the benchmarks that involve calling MethodByName and give some old/new timings.

MBkkt · 2022-01-15T08:43:07Z

Linear search works better if a search term is somewhere at the beginning of a slice

I think you can try to fix it with something like this:

r = linearSearch(0, 4, x); 0-3 if found, 4 if not
if (r != 4) {
  return a[r];
}
l = r;
for (; r < len(a) / 2;) {
  r *= 2;
  if (x < a[r]) {
    return binarySearch(l, r, x);
  }
  l = r;
}
return binarySearch(l, len(a), x);

quasilyte · 2022-01-15T09:01:21Z

@MBkkt probably. It may be worthwhile to see which strategy gives the mean average boost on different real codebases as opposed to the benchmarks.

That being said, I would like to wait for a decision on whether any optimization to MethodByName should be done at all.

qiulaidongfeng · 2024-01-07T09:08:24Z

This seems to fix in https://go-review.googlesource.com/c/go/+/425481
Also, I discover reflect can import sort package for now.

cc @golang/runtime
Is there anything more to be done about this issue? Or whether it should be closed.

quasilyte · 2024-01-07T09:48:05Z

Well, I'm more surprised that it was accepted without an open issue while mine patch was on hold due to discussions.
I did the research and found the real-world case where this change would matter (with numbers).
The referenced CL doesn't even have the benchstat numbers. :[
This is all understandable, yet quite discouraging.

Looking and the CL you've mentioned, this issue indeed can be closed.

seankhliao added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Jan 14, 2022

quasilyte mentioned this issue Jan 14, 2022

Maybe cache the results of reflect.Type.MethodByName? gohugoio/hugo#9386

Open

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022

seankhliao added this to the Backlog milestone Aug 20, 2022

quasilyte closed this as completed Jan 7, 2024

qiulaidongfeng mentioned this issue Jan 11, 2024

access: go-assignees and go-editors #65058

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reflect: use binary search in MethodByName #50617

reflect: use binary search in MethodByName #50617

quasilyte commented Jan 14, 2022 •

edited

gopherbot commented Jan 14, 2022

robpike commented Jan 14, 2022

YuriyNasretdinov commented Jan 14, 2022

quasilyte commented Jan 14, 2022

MBkkt commented Jan 15, 2022

quasilyte commented Jan 15, 2022 •

edited

qiulaidongfeng commented Jan 7, 2024

quasilyte commented Jan 7, 2024

reflect: use binary search in MethodByName #50617

reflect: use binary search in MethodByName #50617

Comments

quasilyte commented Jan 14, 2022 • edited

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

More info, numbers, benchmarks

Caveats

gopherbot commented Jan 14, 2022

robpike commented Jan 14, 2022

YuriyNasretdinov commented Jan 14, 2022

quasilyte commented Jan 14, 2022

MBkkt commented Jan 15, 2022

quasilyte commented Jan 15, 2022 • edited

qiulaidongfeng commented Jan 7, 2024

quasilyte commented Jan 7, 2024

quasilyte commented Jan 14, 2022 •

edited

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

quasilyte commented Jan 15, 2022 •

edited