math: Pow(x,y) optimization for cases: constant integer `y`. #29456

Konstantin8105 · 2018-12-29T19:46:52Z

math.Pow(x,y) optimization for cases: constant integer y. For example:

... math.Pow(length, 2)...
... math.Pow(length, 3.0)...

What version of Go are you using (`go version`)?

$ go version
go version go1.11.4 linux/amd64

Does this issue reproduce with the latest release?

yes for go 1.6

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/konstantin/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/konstantin/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build700464290=/tmp/go-build -gno-record-gcc-switches"

What did you do?

package main

import (
        "math"
        "testing"
)

func BenchmarkMathPow(b *testing.B) {
        x := 42.0
        for i := 0; i < b.N; i++ {
                _ = math.Pow(x, 2.0)
        }
}

func BenchmarkInline(b *testing.B) {
        x := 42.0
        for i := 0; i < b.N; i++ {
                _ = x * x
        }
}

What did you expect to see?

same or preliminary same performance between that 2 cases.

What did you see instead?

run benchmark

> go test -bench=.
goos: linux
goarch: amd64
pkg: tmp
BenchmarkMathPow-4   	30000000	        49.6 ns/op
BenchmarkInline-4    	2000000000	         0.68 ns/op
PASS
ok  	tmp	2.992s

Different more 70 times.

Note

Changing function pow in math/pow.go is not enough for solving performance problem.

func BenchmarkPowModify(b *testing.B) {
	x := 42.0
	for i := 0; i < b.N; i++ {
		_ = pow(x, 2.0)
	}
}

func pow(x, y float64) float64 {
	switch {
	case y == 0 || x == 1:
		return 1

// changes for cases with integer `y`.
	case float64(int8(y)) == y:
		if y < 0 {
			return 1.0 / pow(x, -y)
		}
		n := int(y)
		r := 1.0
		for i := 0; i < n; i++ {
			r *= x
		}
		return r

	case math.IsNaN(x) || math.IsNaN(y):
		return math.NaN()

... other like in file math/pow.go ....

result is better, but not the best:

BenchmarkMathPow-4     	30000000	        49.6 ns/op
BenchmarkPowModify-4   	200000000	         7.56 ns/op
BenchmarkInline-4      	2000000000	         0.69 ns/op
PASS
ok  	tmp	5.268s

Have we any optimization place for change AST from function expressions math.Pow with parameter y = 2 to expression x*x? (Like I understood ssa is not a right place).

The text was updated successfully, but these errors were encountered:

gopherbot · 2018-12-29T20:37:49Z

Change https://golang.org/cl/155921 mentions this issue: math: Pow(x,y) optimization for cases: constant integer y.

martisch · 2018-12-29T20:59:27Z

Note that for:

func BenchmarkInline(b *testing.B) {
        x := 42.0
        for i := 0; i < b.N; i++ {
                _ = x * x
        }
}

The newer Go Gc compilers are smart enough to infer that the loop body computation is never used and optimize the loop body away. Therefore this is likely benchmarking an empty loop.
https://godbolt.org/z/ot9w2G

To avoid some issues with dead code elimination and constant folding code like below can be used:

var Sink float64
var x = 42.0

func BenchmarkInline(b *testing.B) {
        for i := 0; i < b.N; i++ {
                Sink = x * x
        }
}

https://godbolt.org/z/Nt9OWg

Some question for the optimization:

how do the benchmark numbers compare for large constant y?
is there a loss of precision for large constant y vs the current implementation?

I do not think rewriting to x*x at AST level is worth it. I would expect programmers that need the speed improvement already use the simpler and faster x*x version instead of math.Pow. It would also have the bad side effect of copying the math code resulting in a slow down. Generally these kind of package specific rewrite optimizations are avoided in the compiler and should if possible be covered by general optimizations that match code patterns instead. Alternatively these special cases can be covered by if statements in the function and with improving inlining and constant propagation will have reduced overhead than currently in the future.

Konstantin8105 · 2018-12-29T22:08:12Z

Dear @martisch ,
After changing BenchmarkInline as in your example for go1.11.4 on my laptop result is same 0.7...1.03 ns/op. So, I checked.
About your questions:

in my practice of FE(finite element) modeling, I use y = 2 or y = 3. In practice of gonum : y=7... So value is not big, so I check for y=-127...127 - that is enough for my tasks. If in my tasks need more, then after approve that design of code, you can modify for more y value.
present implementation of math.Pow and that PR have identical result.

I agree with your point, if y > 3, but for cases y = 2 and y = 3 it is ok by myself.
Now, I found any optimization for math library with constants, for example if we have code like that :

a := math.Max(b,12.7)

no need all checks of y in function math.Max

martisch · 2018-12-29T22:54:45Z

After changing BenchmarkInline as in your example for go1.11.4 on my laptop result is same
present implementation of math.Pow and that PR have identical result.

Thanks for checking and clarification.

no need all checks of y in function math.Max

Yes and the compiler with better inlining support could determine that generally (also outside of the math package) and only use and inline the checks that are relevant. There are also examples where this would help in the strings package e.g. Count with string to count of length 1. Its something that can be handled better generally without the need for hardcoded AST optimizations.

MichaelTJones · 2018-12-31T12:02:32Z

I'm watching the changes here. If we're going to do this may I recommend using a simple better method? For background, the choices are:

celeste:power7 mtj$ go test -v -bench=.
=== RUN TestSanityPowerM
--- PASS: TestSanityPowerM (0.03s)
=== RUN TestSanityPowerBinary
--- PASS: TestSanityPowerBinary (0.03s)
goos: darwin
goarch: amd64
pkg: power7
BenchmarkPowerMath-8 30000000 52.9 ns/op
BenchmarkPowerBinary-8 200000000 9.09 ns/op
BenchmarkPowerM-8 300000000 5.33 ns/op
BenchmarkPowerMathFixed-8 50000000 34.9 ns/op
BenchmarkPowerBinaryFixed-8 200000000 7.08 ns/op
BenchmarkPowerMFixed-8 300000000 4.95 ns/op
BenchmarkPowerMDirect124-8 1000000000 2.56 ns/op

The math lib approach, which benchmarks for me at an average of 52.9 ns/op across a range of exponents.

The iterative method under review, which is ~37s for the specific case of x**124 and 24-40ns generally.

The well-known binary powering method (below) which is 9.09 ns/op in general and 7.08 ns/op for the x**124 case.

The tricky set of auto-generated magic optimal powering trees as generated by my program (and Knuth's, and lots of other people's). With this you get 5.33 ns/op overall and 4.95 ns/op for the specific case of x**124. (The direct case does not apply here; it is for the situation where you know what the exponent is at compile time and call the specific routine directly. This would only happen if exponentiation were an operator in Go, which it is not.)

The universal solution for powering using the binary powering algorithm is as follows. I suggest that you incorporate this into your inner loop.

// Compute a**b using binary powering algorithm
// See Donald Knuth, The Art of Computer Programming, Volume 2, Section 4.6.3
func PowerBinary(a float64, b int) (p float64) {
for p = 1; b > 0; b >>= 1 {
if b&1 != 0 {
p *= a
}
a *= a
}
return
}

The tricky version, which was decided here long ago as "too much code for the problem's importance and saving 2-3 ns/call over the binary method" looks like this...

// PowerM124 computes x**124 with 9 multiplications (version 311 of 315)
func PowerM124(x1 float64) float64 {
x2 := x1 * x1
x3 := x2 * x1
x6 := x3 * x3
x7 := x6 * x1
x12 := x6 * x6
x19 := x12 * x7
x31 := x19 * x12
x62 := x31 * x31
x124 := x62 * x62
return x124
}

...and has a companion dispatcher...

// PowerM computes x**y using minimal multiplications
func PowerM(x float64, y int) float64 {
if 1 <= y && y <= 256 {
return callPowerMy
}
return math.Pow(x, float64(y))
}

var callPowerM = []func(float64) float64{
nil,
PowerM1,
PowerM2,
PowerM3,
PowerM4,
PowerM5,
PowerM6,
PowerM7,
PowerM8,
...

If this decision is ever changed, I have the optimal code for all the powers in the range 2..256 and the program to generate them arbitrarily. My suggestion here is simply to use the binary powering algorithm precisely as shown above. It's rounding error, and other aspects are fine.

9 ns is quite a bit less than 30 ns which in turn is less than 52 ns.

Konstantin8105 · 2018-12-31T12:47:26Z

Dear @martisch ,
Yes, if we choose AST hardcoding way - it also not clear for other go tool like debug or ...
As I understood - you also agree with point - "we have not default way for that optimization".

Dear @MichaelTJones ,
I agree with any optimization )). As I understood we can use your design also for [-256 , -1] with little modifications. If you want, then you can prepare PR with your design for detail analyze.

One more think about present design of function math.Pow, we have that part:

...
 	case y == 0.5:
 		return Sqrt(x)
 	case y == -0.5:
 		return 1 / Sqrt(x)
...

I think we can optimize that part too, for example after switch add:

// optimization for cases:
// y = 1/(2^r) , where integer value r = 1 ... 127 ...
if n := int8(1.0/y); n%2 == 0 && float64(n) == 1.0/y{
   // we have 
   // y = 0.5      n = 2
   // y = 0.25     n = 4
   // ...

   // convert `n` to `r` if possible. if not possible then default alorithm(example n = 6, y=0.16666... it is not 1/(2^r))

   // calculation
   result := x
   for i:= 0;i<r;i++{
      result = math.Sqrt(result)
   }
   return result
}

But I haven't idea for convert n to r.

katiehockman · 2019-01-02T19:21:50Z

/cc @rsc (as an owner for math)

bmkessler · 2019-01-03T23:33:26Z

Note that there is already an open issue #25270 for the current Pow(x, y) implementation being inaccurate (including for integer y inputs) because it doesn't perform the intermediate calculations to high enough accuracy. The standard library functions should ensure accuracy first before optimizing for speed. Also, I think most of the proposed speed gains are due to avoiding the Frexp and Ldexp calls which may be unnecessary in the original code.

don4get · 2024-03-15T10:45:12Z

Hello everyone,

What is the current status of this optimisation?
Last comment is in 2019 and I am not sure I understand why the lack of precision of Pow justify the fact we cannot actually improve its speed, with no deterioration of its precision.

I replicated the benchmark and it seems that in 1.21.4 there is still a huge gap between a*a and Pow(a, 2).
Maybe the benchmark I did is wrong, I don't see how.

package main

import (
	"fmt"
	"math"
	"time"
)

func compute_square_with_pow(a float64) float64 {
	return math.Pow(a, 2)
}

func compute_square(a float64) float64 {
	return a * a
}

func main() {
	length := 1000000000 // Adjust the length as needed for your benchmark

	// Benchmark compute_square_with_pow
	start := time.Now()
	for i := 0; i < length; i++ {
		compute_square_with_pow(2.5)
	}
	durationWithPow := time.Since(start)
	fmt.Printf("compute_square_with_pow took %v\n", durationWithPow)

	// Benchmark compute_square
	start = time.Now()
	for i := 0; i < length; i++ {
		compute_square(2.5)
	}
	duration := time.Since(start)
	fmt.Printf("compute_square took %v\n", duration)

	// Comparing the results and calculating relative difference
	var relativeDifference float64
	if duration < durationWithPow {
		absoluteDifference := durationWithPow - duration
		relativeDifference = (absoluteDifference.Seconds() / durationWithPow.Seconds()) * 100
		fmt.Printf("Direct multiplication is faster by %v (relative difference: %.2f%%)\n", absoluteDifference, relativeDifference)
	} else if duration > durationWithPow {
		absoluteDifference := duration - durationWithPow
		relativeDifference = (absoluteDifference.Seconds() / duration.Seconds()) * 100
		fmt.Printf("math.Pow is faster by %v (relative difference: %.2f%%)\n", absoluteDifference, relativeDifference)
	} else {
		fmt.Println("Both methods took the same amount of time")
	}
}

go run test/basics/power_optimisation.go 
compute_square_with_pow took 11.365080585s
compute_square took 193.561528ms
Direct multiplication is faster by 11.171519057s (relative difference: 98.30%)

odeke-em changed the title ~~math: math.Pow(x,y) optimization for cases: constant integer y.~~ math: Pow(x,y) optimization for cases: constant integer y. Dec 29, 2018

Konstantin8105 mentioned this issue Dec 29, 2018

math: Pow(x,y) optimization for cases: constant integer y. #29457

Closed

katiehockman added this to the Go1.13 milestone Jan 2, 2019

katiehockman added the NeedsDecision label Jan 2, 2019

andybons modified the milestones: Go1.13, Go1.14 Jul 8, 2019

rsc modified the milestones: Go1.14, Backlog Oct 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

math: Pow(x,y) optimization for cases: constant integer `y`. #29456

math: Pow(x,y) optimization for cases: constant integer `y`. #29456

Konstantin8105 commented Dec 29, 2018

gopherbot commented Dec 29, 2018

martisch commented Dec 29, 2018 •

edited

Loading

Konstantin8105 commented Dec 29, 2018

martisch commented Dec 29, 2018 •

edited

Loading

MichaelTJones commented Dec 31, 2018

Konstantin8105 commented Dec 31, 2018 •

edited

Loading

katiehockman commented Jan 2, 2019

bmkessler commented Jan 3, 2019

don4get commented Mar 15, 2024

math: Pow(x,y) optimization for cases: constant integer y. #29456

math: Pow(x,y) optimization for cases: constant integer y. #29456

Comments

Konstantin8105 commented Dec 29, 2018

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

Note

gopherbot commented Dec 29, 2018

martisch commented Dec 29, 2018 • edited Loading

Konstantin8105 commented Dec 29, 2018

martisch commented Dec 29, 2018 • edited Loading

MichaelTJones commented Dec 31, 2018

Konstantin8105 commented Dec 31, 2018 • edited Loading

katiehockman commented Jan 2, 2019

bmkessler commented Jan 3, 2019

don4get commented Mar 15, 2024

math: Pow(x,y) optimization for cases: constant integer `y`. #29456

math: Pow(x,y) optimization for cases: constant integer `y`. #29456

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?

martisch commented Dec 29, 2018 •

edited

Loading

martisch commented Dec 29, 2018 •

edited

Loading

Konstantin8105 commented Dec 31, 2018 •

edited

Loading