Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: optimize comparisons to min/max (u)ints by checking for overflow #41664

Closed
josharian opened this issue Sep 27, 2020 · 2 comments
Closed
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@josharian
Copy link
Contributor

package p

import "math"

func f(x int64) bool {
	return x == math.MinInt64
}

On amd64, the core of this compiles to:

	0x0005 00005 (z.go:6)	MOVQ	$-9223372036854775808, CX
	0x000f 00015 (z.go:6)	CMPQ	AX, CX
	0x0012 00018 (z.go:6)	SETEQ	"".~r1+16(SP)

It would be cheaper and smaller instead to decrement CX and check the flags for underflow. A similar trick can be used for checking min and max ints and uints of all sizes. It might also be useful in the division fix-up code, where we must check for min int divisor.

cc @randall77 @dr2chase @martisch @mundaym

@josharian josharian added this to the Unplanned milestone Sep 27, 2020
@andybons andybons added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 29, 2020
@quasilyte
Copy link
Contributor

I tried to implement this. The idea looks interesting, at least on the surface.

Unfortunately, I get 5-8% slowdown in comparison with the original ("unoptimized") code. It does generate smaller code, but it runs slower.

The benchmarking code:

package foo

import (
	"testing"
	"math"
)

//go:noinline
func checkInt64(i int64, u uint64) int {
	if i == math.MaxInt64 {
		if u != math.MaxUint64 {
			return 10
		}
		return 1
	}
	if i == math.MinInt64 {
		return 2
	}
	return 0
}

func BenchmarkCheckInt(b *testing.B) {
	for i := 0; i < b.N; i++ {
		checkInt64(0, 0)
		checkInt64(math.MaxInt64, 0)
		checkInt64(math.MinInt64, 0)
		checkInt64(math.MaxInt64, math.MaxUint64)
		checkInt64(math.MinInt64, math.MaxUint64)
	}
}

Optimization rules:

// Convert `x == int64max` to increment+overflow check,
// for int64min test do a decrement+overflow check.
// For unsigned values, check for the carry flag instead.
//
// TODO: we could emit INC/DEC instead of ADDQ, but that would require a separate
// flag-aware SSA op: INC/DEC do not set carry flag, so we can't produce them for ADDQconstcarry.
// This rewrite needs overflow flags that are modified by both ADD and INC instructions.
((EQ|NE) (CMPQ <flags> x intmin:(MOVQconst [c])) yes no) && c == math.MinInt64 && intmin.Uses == 1 =>
  ((OS|OC) (Select1 <flags> (ADDQconstcarry [-1] x)) yes no)
((SETEQ|SETNE) (CMPQ <flags> x intmin:(MOVQconst [c]))) && c == math.MinInt64 && intmin.Uses == 1 =>
  ((SETO|SETNO) (Select1 <flags> (ADDQconstcarry [-1] x)))
((EQ|NE) (CMPQ <flags> x intmax:(MOVQconst [c])) yes no) && c == math.MaxInt64 && intmax.Uses == 1 =>
  ((OS|OC) (Select1 <flags> (ADDQconstcarry [1] x)) yes no)
((SETEQ|SETNE) (CMPQ <flags> x intmax:(MOVQconst [c]))) && c == math.MaxInt64 && intmax.Uses == 1 =>
  ((SETO|SETNO) (Select1 <flags> (ADDQconstcarry [1] x)))
// For the reference:
// SETB is "overflow flag is set" (SETCS in asm)
// SETAE is "overflow flag is not set" (SETCC in asm); also known as SETNB
// UGE is AJCS in asm
// ULT is AJCC in asm
((EQ|NE) intmax:(CMPQconst <flags> x [c]) yes no) && uint64(c) == math.MaxUint64 && intmax.Uses == 1 =>
  ((ULT|UGE) (Select1 <flags> (ADDQconstcarry [1] x)) yes no)
((SETEQ|SETNE) (CMPQ <flags> x intmin:(MOVQconst [c]))) && uint64(c) == math.MaxUint64 && intmin.Uses == 1 =>
  ((SETB|SETAE) (Select1 <flags> (ADDQconstcarry [1] x)))

I can submit a CL with tests if anyone is interested in seeing the whole thing.

Note: I haven't tried changing ADDQ 1/-1 to INC/DEC. I'm not sure it can help there.

Note: gcc/clang for C and C++ seem not to use this optimization idea either. Maybe there is a good reason not to do this.

I see two ways right now:

  1. Figuring out what's wrong with the results (using different bench or reading the CPU counters for more info) or the implementation.
  2. Marking this idea as wontfix with this issue being closed.

This is better than someone else spending their evening (or weekend) pursuing something that will result in disappointment. :)

@josharian
Copy link
Contributor Author

Thanks so much for digging into this! Sounds like we shouldn't do this.

@golang golang locked and limited conversation to collaborators Nov 29, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests

4 participants