cmd/compile: intrinsify some math functions #17069

martisch · 2016-09-12T05:50:58Z

Currently math.sqrt (OSQRT) is lowered to OpSqrt and in case of constants precomputed and otherwise a hardware opcode may be generated (SQRTSD on amd64).
#15543

Other math functions like NaN,Inf,... are "only" inlined.

I started experimenting with a CL to intrinsify NaN, Inf, ... due to the idea in #17063.

At a first quick glance it seems the compiler is able to optimize code better if we would lower the precomputeable values to constants (NaN,...) or in some cases just directly emit efficient code in the backend if the function can be replaced by 1-2 opcodes.

More simplifications and general constant folding can then be handled and infrastructure for it extended in later stages of the backend.

Possible functions that look (at first glance) easy or useful to be lowered or have hardware opcodes (on Amd64):

Abs
Ceil
Copysign
Cos
Exp
Exp2
Float32bits
Float32frombits
Float64bits
Float64frombits
Floor
IsInf
IsNaN
Log
Log10
Log2
Max
Min
Mod
Modf
NaN
Pow
Pow10
Remainder
Signbit
Sin
Sqrt (OSQRT opcode could possibly be removed from frontend when handled as intrinsic)

If this finds support i would like to assign myself to start investigate this further and start CL's that would intrinsify some of the above functions (depending on popularity, usefulness, ease of integration into the backend,...).

josharian · 2016-09-12T13:25:07Z

cc @randall77

See also #13095.

martisch · 2016-09-12T13:48:03Z

#13095 sounds like we should not lower them but instead improve the inlining and load+stores. We could still consider cases were we have hardware opcodes that can replace these functions such as Sqrt as these can not directly be simplified from go code to the opcode.

bradfitz · 2016-09-12T13:52:56Z

When intrinsifying more things in the compiler, please do not base the decision on the (package name, symbol name) alone. If I copy/vendor/fork the stdlib elsewhere, the identical package "math" code elsewhere should be just as fast.

... unless the package is some "guts" package like runtime or reflect. But normal packages which are pure Go (or could be pure Go) should not be blessed by the compiler based on whether they live in std or golang.org/x/* or github.

josharian · 2016-09-12T14:09:19Z

@bradfitz agreed in principle, although it is worth noting that Sqrt currently violates that principle, and I don't see any reasonable way to make that practice agree with principle. Most of the math functions are not like Sqrt, though.

martisch · 2016-09-12T14:12:35Z

@bradfitz good point. As for copying code that is then not lowered to opcodes (because of name change) it might also be the case that it does not produce equal results for some float math.

Sadly math.sqrt already seems to violate what you said. But i guess we can at least move math.Sqrt from frontend as intrinsic to the backend for cleanup since removing its special treatment would otherwise cause a performance regression.

So for now i have a look at lowering sqrt as an intrinsic and leave the rest alone :) (unless other reasons and candidates surface to handle more math functions as opcodes directly). If needed for compiler or runtime performance we could always have internal function names and lower only those.

dr2chase · 2016-09-12T14:16:22Z

@bradfitz how do we get value out of that, and how does that value compare to improved performance (which has also yet to be measured)? The cases I know where people want to replace libraries wholesale do not usually also require exact performance compatibility; in my experience it's for debugging (have done it myself) and performance is not critical.

That said, it plausible that we could build some flow-graph pattern-matchers for the things we wish to substitute (have also done that myself).

dr2chase · 2016-09-12T14:58:17Z

Also, sin and cosine are not candidates for this unless their inputs are range-reduced, and they're also x87 FP, not "modern" Intel FP. Very large inputs merely set a flag and return their input unmodified, so sin(2-to-the-64) returns 2-to-the-64. Intel hardware transcendentals use a 68-bit approximation of Pi (66-bits, but the first two missing bits are zero, so 68) which means that sin(machinePi) (for 53-bit mantissa double) is incorrect in its 16th bit, or at least it was back in the x87. (I may have misremembered the number of bits, but the last hex digit is C and it is more than 64)

// use number[m:n] notation to indicate value with same exponent as number, mantissa zero
// except for bits m through n-1 inclusive, where they are equal to their corresponding bits in
// number.  The first bit of the mantissa has index zero.
hwPi = realPi - realpi[68:]
swPi = realPi - realPi[53:]

// mathematical result -- goal is it get correctly rounded 53 bits of this.
sin (swPi) = sin (realPi - realPi[53:]) = sin(realPi[53:]) approx= realPi[53:]

// machine result "sin" = machine sine, sin = mathematical sine
"sin" (swPi) = "sin" (realPi - realPi[53:]) = - sin (realPi - realPi[53:] - hwPi) =
   - sin (realPi - realPi[53:] - (realPi - realpi[68:])) = -sin(realpi[68:]-realPi[53:]) =
  sin(realpi[53:]-realPi[68:]) = sin(realpi[53:68]) approx = realpi[53:68]
// i.e. 15 bits, then zeroes, where the true answer would have its 16th bit set.

I know this well, because Java's hotspot compiler once replaced correct software sine with incorrect hardware sine after the test loop had been executed "enough". Moving the problematic case to the front of the iteration ran it through the software path, and the problem would disappear.

TocarIP · 2016-09-12T15:42:00Z

For max and min math.Max and maxsd have different semantics in NAN case (at least on AMD64).
For ceil and floor math package currently uses roundsd instruction(sse4) + cpuid check, while compiler is limited to sse2, so this may actually cause slowdown.

rsc · 2016-10-21T03:55:31Z

These break down into:

Functions that sound trivial but are actually complicated once you include the special cases they need to get right (Abs, Ceil, ...) or the complexity of the implementation (Cos, Sin, ...).
Functions that can be written in Go and inline or should inline fine (Copysign, Float32bits, IsInf, IsNaN, ...).
Functions that have trivial, correct, not-possible-to-write-in-Go assembly implementations (Sqrt).

Only category 3 is worth turning into an intrinsic, and then only when it is a demonstrated hotspot in a plausible real program. We waited for years until we found a real demonstration that the call overhead for Sqrt mattered, and then we added the intrinsic.

If any of category 2 shows up as a hotspot and isn't written in the best inlinable way, better to do that. The current Inf and NaN may be examples of that, but it would help to have a demonstration that the current performance is inadequate.

I'm going to close this issue because it is so broad, but please feel free to open new specific ones for Inf, NaN, or any of the other specific cases, provided there's a plausible path to optimization and a plausible need.

josharian added the Performance label Sep 12, 2016

quentinmit added this to the Go1.8 milestone Sep 13, 2016

nigeltao mentioned this issue Sep 24, 2016

cmd/compile: optimize f32Slice[i] += 1 across math.Float32bits, Float32frombits #17220

Open

quentinmit added the NeedsFix label Oct 11, 2016

rsc closed this as completed Oct 21, 2016

golang locked and limited conversation to collaborators Oct 21, 2017

gopherbot added the FrozenDueToAge label Oct 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: intrinsify some math functions #17069

cmd/compile: intrinsify some math functions #17069

martisch commented Sep 12, 2016 •

edited

Loading

josharian commented Sep 12, 2016

martisch commented Sep 12, 2016

bradfitz commented Sep 12, 2016

josharian commented Sep 12, 2016

martisch commented Sep 12, 2016 •

edited

Loading

dr2chase commented Sep 12, 2016

dr2chase commented Sep 12, 2016

TocarIP commented Sep 12, 2016

rsc commented Oct 21, 2016

cmd/compile: intrinsify some math functions #17069

cmd/compile: intrinsify some math functions #17069

Comments

martisch commented Sep 12, 2016 • edited Loading

josharian commented Sep 12, 2016

martisch commented Sep 12, 2016

bradfitz commented Sep 12, 2016

josharian commented Sep 12, 2016

martisch commented Sep 12, 2016 • edited Loading

dr2chase commented Sep 12, 2016

dr2chase commented Sep 12, 2016

TocarIP commented Sep 12, 2016

rsc commented Oct 21, 2016

martisch commented Sep 12, 2016 •

edited

Loading

martisch commented Sep 12, 2016 •

edited

Loading