cmd/compile: strength reduce floating point #19827

randall77 · 2017-04-03T16:36:39Z

In cases where it doesn't affect correctness, we could strength reduce some floating-point ops.

x * 2.0 -> x + x
x / 2.0 -> x * 0.5

For the divide->multiply reduction, we could do any value whose reciprocal is representable exactly. Is that just powers of two?

We already do x1,x-1,x/1,x/-1.

See issue #19819

gopherbot · 2017-04-03T17:30:28Z

CL https://golang.org/cl/39295 mentions this issue.

robpike · 2017-04-03T18:00:38Z

Let me be ignorant and ask whether turning floating point multiplication into addition is indeed a performance win. One requires touching an exponent; the other requires touching every bit and normalizing.

philhofer · 2017-04-03T20:33:38Z

For amd64, the Agner Fog tables say that turning an fmul into an fadd might be a small improvement. On Haswell, for example, both are 1 micro-op, but fadd has a 3-cycle result latency, whereas fmul has a 5-cycle result latency. However, these instructions use different ports, so code with interleaved fmul and fadd ops that gets turned into just a series of fadd ops will not execute as quickly.

Conversely, on arm64, there may not be a win. Both FADD and FMUL have a 5-cycle result latency on the Cortex A57 (provided that denormals are flushed to zero).

Division, on the other hand, appears to be consistently more expensive (dozens of cycles of result latency... up to 64 on the A57).

randall77 · 2017-04-03T20:47:41Z

x*2 -> x+x is almost certainly a win.
+ requires futzing with an exponent, * requires a carry-save adder chain of some sort. But that's neither here nor there, in the end we just accept the cycle latency the hardware gives us. The processors I've checked have + faster than or equal to *. In addition, with this rewrite we no longer have to materialize the constant 2.

x*-2 -> -(x+x) is less clear. I could be convinced either way.

randall77 · 2017-04-03T21:06:52Z

name       old time/op  new time/op  delta
Mul2-8      626ns ± 0%   382ns ± 1%  -38.96%  (p=0.000 n=8+10)
MulNeg2-8   635ns ± 1%   759ns ± 1%  +19.57%  (p=0.000 n=9+10)

At least on amd64, *2 is a win. I'll remove the *-2 code from the CL.

robpike · 2017-04-03T21:19:04Z

Thanks for checking.

x*2 -> x+x x/c, c power of 2 -> x*(1/c) Fixes golang#19827 Change-Id: I74c9f0b5b49b2ed26c0990314c7d1d5f9631b6f1 Reviewed-on: https://go-review.googlesource.com/39295 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>

randall77 added the Performance label Apr 3, 2017

randall77 added this to the Go1.9 milestone Apr 3, 2017

randall77 self-assigned this Apr 3, 2017

gopherbot closed this as completed in 63a72fd Apr 3, 2017

golang locked and limited conversation to collaborators Apr 3, 2018

gopherbot added the FrozenDueToAge label Apr 3, 2018

rsc unassigned randall77 Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: strength reduce floating point #19827

cmd/compile: strength reduce floating point #19827

randall77 commented Apr 3, 2017

gopherbot commented Apr 3, 2017

robpike commented Apr 3, 2017

philhofer commented Apr 3, 2017

randall77 commented Apr 3, 2017 •

edited

randall77 commented Apr 3, 2017 •

edited

robpike commented Apr 3, 2017

cmd/compile: strength reduce floating point #19827

cmd/compile: strength reduce floating point #19827

Comments

randall77 commented Apr 3, 2017

gopherbot commented Apr 3, 2017

robpike commented Apr 3, 2017

philhofer commented Apr 3, 2017

randall77 commented Apr 3, 2017 • edited

randall77 commented Apr 3, 2017 • edited

robpike commented Apr 3, 2017

randall77 commented Apr 3, 2017 •

edited

randall77 commented Apr 3, 2017 •

edited