Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: emit (MAX|MIN)SD instead of UCOMISD when possible #31272

Closed
agnivade opened this issue Apr 5, 2019 · 4 comments
Closed

cmd/compile: emit (MAX|MIN)SD instead of UCOMISD when possible #31272

agnivade opened this issue Apr 5, 2019 · 4 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@agnivade
Copy link
Contributor

agnivade commented Apr 5, 2019

What version of Go are you using (go version)?

$ go version
go version devel +4091cf972a Sun Mar 31 23:35:35 2019 +0000 linux/amd64

Consider this:

func max(x, y float64) float64 {
	if x > y {
		return x
	}
	return y
}

This gets compiled to

0x0000 00000 (max.go:5) MOVSD   "".x+8(SP), X0
0x0006 00006 (max.go:5) MOVSD   "".y+16(SP), X1
0x000c 00012 (max.go:5) UCOMISD X1, X0
0x0010 00016 (max.go:5) JLS     25
0x0012 00018 (max.go:6) MOVSD   X0, "".~r2+24(SP)
0x0018 00024 (max.go:6) RET
0x0019 00025 (max.go:8) MOVSD   X1, "".~r2+24(SP)
0x001f 00031 (max.go:8) RET

Whereas, it could have been written as

MOVSD x+8(SP), X0
MOVSD y+16(SP), X1
MAXSD X1, X0
MOVSD X0, ret+24(SP)
RET

Benchmarks show a 12.6% improvement, when the assembly call overhead is ignored.

I am wondering if it is possible to detect patterns like these and insert a (MAX/MIN)SD instead of a UCOMISD and a Jump ? Might also be possible to use a FCMOV, but that is also not implemented yet.

Is this possible with a simple rewrite rule or does it require deeper compiler knowledge to achieve this ? Happy to work on it if it is the former :)

FYI, the SSA during the lower pass is

b1:-
v1 (?) = InitMem <mem>
v2 (?) = SP <uintptr>
v6 (?) = LEAQ <*float64> {~r2} v2
v7 (4) = Arg <float64> {x} (x[float64])
v8 (4) = Arg <float64> {y} (y[float64])
v11 (+5) = UCOMISD <flags> v7 v8
v10 (+5) = SETGF <bool> v11
v12 (+5) = TESTB <flags> v10 v10
UGT v11 → b3 b2 (5)
b2: ← b1-
v17 (8) = VarDef <mem> {~r2} v1
v18 (+8) = MOVSDstore <mem> {~r2} v2 v8 v17
Ret v18 (+8)
b3: ← b1-
v13 (6) = VarDef <mem> {~r2} v1
v14 (+6) = MOVSDstore <mem> {~r2} v2 v7 v13
Ret v14 (+6)

@josharian @randall77 @martisch

@agnivade agnivade added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance labels Apr 5, 2019
@agnivade agnivade added this to the Unplanned milestone Apr 5, 2019
@josharian
Copy link
Contributor

The change probably isn’t too intricate, although one can never be sure without actually implementing.

However, floating point operations have a bunch of weird corner cases. It’d be good to double-check that those instructions handle all the corner cases the same way Go does. @dr2chase might be able to weigh in usefully here.

@dr2chase
Copy link
Contributor

dr2chase commented Apr 5, 2019

The corner cases involve Nan and -0, and using the instruction probably will get us closer to best practices (in general, NaN op x is the same as x op NaN is NaN, and I did not say "equals" because NaN == NaN is false).

Oops -- if we have a defined Max function, yes, the instruction is probably better. But as a replacement here, nope, NaNs I think mess things up.

Double oops, that instruction is weird. I'm sure there's a reason, but it's not IEEE.

@agnivade
Copy link
Contributor Author

agnivade commented Apr 5, 2019

So you are suggesting not to use this instruction due to NaN complications ?

The instruction does have some documentation on how it handles NaN:

If the values being compared are both 0.0s (of either sign), the value in the second source operand is returned. If a value in the second source operand is an SNaN, that SNaN is returned unchanged to the destination (that is, a QNaN version of the SNaN is not returned).

If only one value is a NaN (SNaN or QNaN) for this instruction, the second source operand, either a NaN or a valid floating-point value, is written to the result. If instead of this behavior, it is required that the NaN of either source operand be returned, the action of MAXSD can be emulated using a sequence of instructions, such as, a comparison followed by AND, ANDN and OR.

https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf (4-18 Vol2B)

It seems for the NaN cases, always the second operand is returned (which may or may not be a Nan). So it looks like we don't want such a behavior ?

@dr2chase
Copy link
Contributor

dr2chase commented Apr 5, 2019

For legal programs, optimizations are supposed to not change behavior, so no.

@agnivade agnivade closed this as completed Apr 6, 2019
@golang golang locked and limited conversation to collaborators Apr 5, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

4 participants