-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: intrinsify some math functions #17069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @randall77 See also #13095. |
#13095 sounds like we should not lower them but instead improve the inlining and load+stores. We could still consider cases were we have hardware opcodes that can replace these functions such as Sqrt as these can not directly be simplified from go code to the opcode. |
When intrinsifying more things in the compiler, please do not base the decision on the (package name, symbol name) alone. If I copy/vendor/fork the stdlib elsewhere, the identical package "math" code elsewhere should be just as fast. ... unless the package is some "guts" package like |
@bradfitz agreed in principle, although it is worth noting that Sqrt currently violates that principle, and I don't see any reasonable way to make that practice agree with principle. Most of the math functions are not like Sqrt, though. |
@bradfitz good point. As for copying code that is then not lowered to opcodes (because of name change) it might also be the case that it does not produce equal results for some float math. Sadly math.sqrt already seems to violate what you said. But i guess we can at least move math.Sqrt from frontend as intrinsic to the backend for cleanup since removing its special treatment would otherwise cause a performance regression. So for now i have a look at lowering sqrt as an intrinsic and leave the rest alone :) (unless other reasons and candidates surface to handle more math functions as opcodes directly). If needed for compiler or runtime performance we could always have internal function names and lower only those. |
@bradfitz how do we get value out of that, and how does that value compare to improved performance (which has also yet to be measured)? The cases I know where people want to replace libraries wholesale do not usually also require exact performance compatibility; in my experience it's for debugging (have done it myself) and performance is not critical. That said, it plausible that we could build some flow-graph pattern-matchers for the things we wish to substitute (have also done that myself). |
Also, sin and cosine are not candidates for this unless their inputs are range-reduced, and they're also x87 FP, not "modern" Intel FP. Very large inputs merely set a flag and return their input unmodified, so sin(2-to-the-64) returns 2-to-the-64. Intel hardware transcendentals use a 68-bit approximation of Pi (66-bits, but the first two missing bits are zero, so 68) which means that sin(machinePi) (for 53-bit mantissa double) is incorrect in its 16th bit, or at least it was back in the x87. (I may have misremembered the number of bits, but the last hex digit is C and it is more than 64)
I know this well, because Java's hotspot compiler once replaced correct software sine with incorrect hardware sine after the test loop had been executed "enough". Moving the problematic case to the front of the iteration ran it through the software path, and the problem would disappear. |
For max and min math.Max and maxsd have different semantics in NAN case (at least on AMD64). |
These break down into:
Only category 3 is worth turning into an intrinsic, and then only when it is a demonstrated hotspot in a plausible real program. We waited for years until we found a real demonstration that the call overhead for Sqrt mattered, and then we added the intrinsic. If any of category 2 shows up as a hotspot and isn't written in the best inlinable way, better to do that. The current Inf and NaN may be examples of that, but it would help to have a demonstration that the current performance is inadequate. I'm going to close this issue because it is so broad, but please feel free to open new specific ones for Inf, NaN, or any of the other specific cases, provided there's a plausible path to optimization and a plausible need. |
Currently math.sqrt (OSQRT) is lowered to OpSqrt and in case of constants precomputed and otherwise a hardware opcode may be generated (SQRTSD on amd64).
#15543
Other math functions like NaN,Inf,... are "only" inlined.
I started experimenting with a CL to intrinsify NaN, Inf, ... due to the idea in #17063.
At a first quick glance it seems the compiler is able to optimize code better if we would lower the precomputeable values to constants (NaN,...) or in some cases just directly emit efficient code in the backend if the function can be replaced by 1-2 opcodes.
More simplifications and general constant folding can then be handled and infrastructure for it extended in later stages of the backend.
Possible functions that look (at first glance) easy or useful to be lowered or have hardware opcodes (on Amd64):
Abs
Ceil
Copysign
Cos
Exp
Exp2
Float32bits
Float32frombits
Float64bits
Float64frombits
Floor
IsInf
IsNaN
Log
Log10
Log2
Max
Min
Mod
Modf
NaN
Pow
Pow10
Remainder
Signbit
Sin
Sqrt (OSQRT opcode could possibly be removed from frontend when handled as intrinsic)
If this finds support i would like to assign myself to start investigate this further and start CL's that would intrinsify some of the above functions (depending on popularity, usefulness, ease of integration into the backend,...).
The text was updated successfully, but these errors were encountered: