Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: it is not possible to prevent FMA with complex values #36971

Open
kortschak opened this issue Feb 1, 2020 · 16 comments
Open

cmd/compile: it is not possible to prevent FMA with complex values #36971

kortschak opened this issue Feb 1, 2020 · 16 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@kortschak
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.13.6 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="arm64"
GOBIN="/home/user/bin"
GOCACHE="/home/user/.cache/go-build"
GOENV="/home/user/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/user"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/user/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/user/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="0"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build515689865=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Examine the assembly generated for the code at https://play.golang.org/p/JuTC-BPAIJN with GOARCH=arm64

What did you expect to see?

No FMA/FMS instructions emitted.

What did you see instead?

	0x00ac 00172 (/home/user/c.go:19)	PCDATA	ZR, $5
	0x00ac 00172 (/home/user/c.go:19)	FMOVD	8(R8), F4
	0x00b0 00176 (/home/user/c.go:19)	FMSUBD	F1, F3, F4, F3
	0x00b4 00180 (/home/user/c.go:19)	SCVTFD	R2, F5
	0x00b8 00184 (/home/user/c.go:19)	FADDD	F5, F3, F3
	0x00bc 00188 (/home/user/c.go:19)	FMOVD	F3, (R0)(R6)
	0x00c0 00192 (/home/user/c.go:19)	FMULD	F4, F0, F0
	0x00c4 00196 (/home/user/c.go:19)	FMADDD	F1, F0, F2, F0
	0x00c8 00200 (/home/user/c.go:19)	FMOVD	ZR, F1
	0x00cc 00204 (/home/user/c.go:19)	FADDD	F0, F1, F0

No amount of wrapping the operands in complex128 prevents this AFAICS.

@smasher164 smasher164 added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 2, 2020
@agnivade
Copy link
Contributor

agnivade commented Feb 2, 2020

Does setting "GODEBUG=cpu.fma=off" help ?

@kortschak
Copy link
Contributor Author

The output remains identical with that env var setting.

@agnivade
Copy link
Contributor

agnivade commented Feb 2, 2020

I think that's a runtime setting. Maybe @martisch can help.

@randall77
Copy link
Contributor

I think you're right that there's no way to turn this off for arm64.
@agnivade, the GODEBUG setting should only be for x86 (and x86 only uses fma instructions for math.FMA).

There's nothing in the spec that specifies how complex arithmetic is calculated, so I don't think we're breaking anything.

I'm curious as to what is breaking as a result of this. It's not like you can control the rounding in a complex multiply regardless (there are 3 roundings in a non-fma complex multiply, and just 2 roundings in a fma-enabled complex multiply).

@smasher164
Copy link
Member

smasher164 commented Feb 2, 2020

@agnivade cpu.fma is for runtime feature detection that is necessary for math.FMA. In this issue, the explicit cast described in #17895 to prevent the optimizer from automatically folding x*y+z into fma(x,yz), does not work for the complex128 expression complex128(ev[k]*w[k]) + complex(float64(i), 0).

Edit: accidentally restated @randall77's comment

@kortschak
Copy link
Contributor Author

kortschak commented Feb 2, 2020

I'm in the process of attempting to get gonum's lapack implementation working on arm64. So far, the failures have been due to FMA/FMS. I have gone through the code to attempt to remove the autogeneration of FMA and FMS to get to a point where I can debug the failures. I'm instead getting to the point where I am not going to bother trying to support arm64.

@agnivade
Copy link
Contributor

agnivade commented Feb 2, 2020

I see. Thanks for the clarification Keith and Akhil.

@josharian
Copy link
Contributor

josharian commented Feb 2, 2020

I don't think we're breaking anything.

We might be breaking golden tests that attempt to get the same output (perhaps within some tolerance) across different architectures.

I have gone through the code to attempt to remove the autogeneration of FMA and FMS to get to a point where I can debug the failures.

It'd be pretty straightforward to add a compiler command line flag to disable FMA. And to add a flag that logs where FMA is inserted. This would be in keeping with many other compiler flags. Would that help?

(I still think it is also worth having a conversation about whether using explicit rounding should prevent FMA on a case-by-case basis with complex values.)

@kortschak
Copy link
Contributor Author

We might be breaking golden tests that attempt to get the same output (perhaps within some tolerance) across different architectures.

I suspect that it's worse than that. We already work around this kind of thing by providing sets of acceptable golden values and arch-dependent golden values, but we are working on problems where the intrinsic behaviour of routines as a whole appear to be broken by automatic FMA/FMS emission (I can't guarantee that this is true, because I can't prevent the compiler from emitting all FMA/FMS instructions - hence this issue). The case that brought this up is Eigen decomposition, which is an inherently numerically unstable operation in some cases, I think that the differential precision by fused v non-fused operations is causing amplification of the instability and so the complete (i.e. not just a matter of increasing tolerance) failure on arm64.

@kortschak
Copy link
Contributor Author

kortschak commented Feb 2, 2020

With help from @josharian I've temporarily made these changes and I think my hypothesis is incorrect.

I do still think though that users should be able to prevent fused operations with complex values.

@josharian
Copy link
Contributor

This case was noted in passing at #17895 (comment) and the following comment.

@randall77
Copy link
Contributor

Try computing x*y (as complex128s) using:

func cmul(x, y complex128) complex128 {
	return complex(float64(real(x)*real(y))-float64(imag(x)*imag(y)), float64(real(x)*imag(y))+float64(imag(x)*real(y)))
}

That should inhibit fma computation of floating-point multiply.

@kortschak
Copy link
Contributor Author

Yes, I'm aware of that. At that point I should just use struct{ re, im float64 }.

@martisch
Copy link
Contributor

martisch commented Feb 4, 2020

I think that's a runtime setting. Maybe @martisch can help.

As @randall77 pointed out FMA setting doesnt exist for arm64 as we assume its always present:

options = []option{

There is no condition for arm64 in the compiler currently to not generate FMA or make it conditional on the GODEBUG setting at runtime:

sys.ARM64, sys.PPC64, sys.S390X)

@kortschak
Copy link
Contributor Author

The original issues that prompted this have been resolved, in part by changing the compiler to not emit fused operation instructions. The original issues covered the range of differences in precision causing changes in output precision, differences in precision changing gross behaviour and an actual real bug in our code (~50 issues were rolled up in the original problem).

Because of the complexity of the issues we had (and the number of sites where fused operations can be constructed), it would have been very helpful to be able to tell the compiler to not emit fused operation instructions in order to exclude that as a cause. I'll note that discordance between precision at different sites has historically been a cause of catastrophic failures (I hope no-one is using Go for missile guidance).

@matthinrichsen-wf
Copy link

I found my way here via #53297. In this case, FMA also provided us with incorrect values (with float64 instead of complex128).

I was able to rework the statement to not cause the compiler to optimize it into a FMA, but it would have been much better to have a compiler option to disable it entirely.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
@seankhliao seankhliao added this to the Unplanned milestone Aug 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
Status: Triage Backlog
Development

No branches or pull requests

9 participants