Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: specialize memhash32, memhash64 #21539

Closed
josharian opened this issue Aug 19, 2017 · 6 comments
Closed

runtime: specialize memhash32, memhash64 #21539

josharian opened this issue Aug 19, 2017 · 6 comments
Labels
FrozenDueToAge Performance Suggested Issues that may be good for new contributors looking for work to do.
Milestone

Comments

@josharian
Copy link
Contributor

memhash32 and memhash64 are defined as:

func memhash32(p unsafe.Pointer, h uintptr) uintptr {
	return memhash(p, h, 4)
}
func memhash64(p unsafe.Pointer, h uintptr) uintptr {
	return memhash(p, h, 8)
}

The generic memhash implementation contains a lot of mechanism that can be skipped when the size is known. A quick hack up of a specialized memhash64 on amd64, with aeshash manually disabled, shows nice gains:

name                  old time/op    new time/op    delta
MapPopulate/1-8         76.7ns ± 4%    75.1ns ± 4%     ~     (p=0.055 n=10+10)
MapPopulate/10-8         613ns ± 3%     570ns ± 3%   -6.94%  (p=0.000 n=10+9)
MapPopulate/100-8       7.93µs ± 2%    7.31µs ± 3%   -7.85%  (p=0.000 n=9+10)
MapPopulate/1000-8      97.0µs ± 3%    89.1µs ± 2%   -8.20%  (p=0.000 n=10+9)
MapPopulate/10000-8      843µs ± 3%     759µs ± 2%  -10.02%  (p=0.000 n=10+9)
MapPopulate/100000-8    9.19ms ± 4%    8.69ms ± 2%   -5.38%  (p=0.000 n=10+9)

The specialized code is small, both in terms of lines of code and machine code. For non-aeshash architectures, this seems like an easy, significant win.

Leaving for someone else to implement all the way through and benchmark on a non-aeshash architecture, due to my limited cycles.

cc @martisch @philhofer @randall77

@josharian josharian added this to the Go1.10 milestone Aug 19, 2017
@josharian josharian added the Suggested Issues that may be good for new contributors looking for work to do. label Aug 19, 2017
@mvdan
Copy link
Member

mvdan commented Aug 19, 2017

I assume that the MapPopulate benchmark only calls one of these function variants over and over. Is there a benchmark that calls all versions in a way that they are mixed?

Perhaps my understanding of caching is wrong, but I seem to recall that splitting a function into many could have costs if a program needs all variants and thus has a harder time keeping all the code cached.

@josharian
Copy link
Contributor Author

I'd bet this would be an improvement regardless, but it doesn't hurt to try a mixed benchmark.

@martisch
Copy link
Contributor

martisch commented Aug 19, 2017

@mvdan it is correct that more binary code can generate more icache pressure however for hot code it can be a trade off that is worth it. e.g. see all the specialized map functions (which are high in the top hot code paths after garbage collection functions).

In some cases e.g. stringconcatX/memequalX it can also save stack space because the len argument is implicit in the function call and does not need to be provided and thereby safes instructions at every call site. Also the function itself is more constrained and can be simpler and needing less icache than the general version. I do not expect the memequal32/64 functions to be very large. But they are also called indirectly so wont save much at the call site. However the call to memhash itself generates some instructions that wont be needed anymore.

Note that we already inline many memcopy and other calls with direct instructions to save function call overhead and not needed size checks, but "bloat" the binary a bit as a trade off. Same with inlining.

Getting back to joshs requests:
I am happy to have a look on 386/amd64. Have to work through some of the other queued up go1.10 work before :)

I have a G3220 Haswell Pentium that does not support AES (apart from a zoo of older go supported machines down to dual pentium mmx) where i tested the internal/cpu package.
The Skylake Pentiums (e.g. G4400) support AES so i think finding modern amd64 CPUs without AES even when low end will get more difficult in the future.

@josharian
Copy link
Contributor Author

@martisch I didn't mean to "assign" anything to you, just thought you might be interested. :) This is actually a good new contributor issue (thus labeled Suggested).

@martisch
Copy link
Contributor

martisch commented Aug 19, 2017

@josharian I did not see it as assignment ( my use of "request" was likely a bad choice ) Just wanted to express high interest but also that i wont be able to work on it right away :)
Your are correct in that it makes for a good runtime contributor starter project.

Anybody who has time and wants to start contributing to the runtime: feel free to work on it after stating interest here.

@gopherbot
Copy link

Change https://golang.org/cl/59352 mentions this issue: runtime: specialize memhash32 and memhash64

@golang golang locked and limited conversation to collaborators Aug 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Performance Suggested Issues that may be good for new contributors looking for work to do.
Projects
None yet
Development

No branches or pull requests

4 participants