New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: specialize memhash32, memhash64 #21539
Comments
I assume that the Perhaps my understanding of caching is wrong, but I seem to recall that splitting a function into many could have costs if a program needs all variants and thus has a harder time keeping all the code cached. |
I'd bet this would be an improvement regardless, but it doesn't hurt to try a mixed benchmark. |
@mvdan it is correct that more binary code can generate more icache pressure however for hot code it can be a trade off that is worth it. e.g. see all the specialized map functions (which are high in the top hot code paths after garbage collection functions). In some cases e.g. stringconcatX/memequalX it can also save stack space because the len argument is implicit in the function call and does not need to be provided and thereby safes instructions at every call site. Also the function itself is more constrained and can be simpler and needing less icache than the general version. I do not expect the memequal32/64 functions to be very large. But they are also called indirectly so wont save much at the call site. However the call to memhash itself generates some instructions that wont be needed anymore. Note that we already inline many memcopy and other calls with direct instructions to save function call overhead and not needed size checks, but "bloat" the binary a bit as a trade off. Same with inlining. Getting back to joshs requests: I have a G3220 Haswell Pentium that does not support AES (apart from a zoo of older go supported machines down to dual pentium mmx) where i tested the internal/cpu package. |
@martisch I didn't mean to "assign" anything to you, just thought you might be interested. :) This is actually a good new contributor issue (thus labeled Suggested). |
@josharian I did not see it as assignment ( my use of "request" was likely a bad choice ) Just wanted to express high interest but also that i wont be able to work on it right away :) Anybody who has time and wants to start contributing to the runtime: feel free to work on it after stating interest here. |
Change https://golang.org/cl/59352 mentions this issue: |
memhash32 and memhash64 are defined as:
The generic memhash implementation contains a lot of mechanism that can be skipped when the size is known. A quick hack up of a specialized memhash64 on amd64, with aeshash manually disabled, shows nice gains:
The specialized code is small, both in terms of lines of code and machine code. For non-aeshash architectures, this seems like an easy, significant win.
Leaving for someone else to implement all the way through and benchmark on a non-aeshash architecture, due to my limited cycles.
cc @martisch @philhofer @randall77
The text was updated successfully, but these errors were encountered: