New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: it is not possible to prevent FMA with complex values #36971
Comments
Does setting "GODEBUG=cpu.fma=off" help ? |
The output remains identical with that env var setting. |
I think that's a runtime setting. Maybe @martisch can help. |
I think you're right that there's no way to turn this off for arm64. There's nothing in the spec that specifies how complex arithmetic is calculated, so I don't think we're breaking anything. I'm curious as to what is breaking as a result of this. It's not like you can control the rounding in a complex multiply regardless (there are 3 roundings in a non-fma complex multiply, and just 2 roundings in a fma-enabled complex multiply). |
@agnivade cpu.fma is for runtime feature detection that is necessary for math.FMA. In this issue, the explicit cast described in #17895 to prevent the optimizer from automatically folding x*y+z into fma(x,yz), does not work for the complex128 expression Edit: accidentally restated @randall77's comment |
I'm in the process of attempting to get gonum's lapack implementation working on arm64. So far, the failures have been due to FMA/FMS. I have gone through the code to attempt to remove the autogeneration of FMA and FMS to get to a point where I can debug the failures. I'm instead getting to the point where I am not going to bother trying to support arm64. |
I see. Thanks for the clarification Keith and Akhil. |
We might be breaking golden tests that attempt to get the same output (perhaps within some tolerance) across different architectures.
It'd be pretty straightforward to add a compiler command line flag to disable FMA. And to add a flag that logs where FMA is inserted. This would be in keeping with many other compiler flags. Would that help? (I still think it is also worth having a conversation about whether using explicit rounding should prevent FMA on a case-by-case basis with complex values.) |
I suspect that it's worse than that. We already work around this kind of thing by providing sets of acceptable golden values and arch-dependent golden values, but we are working on problems where the intrinsic behaviour of routines as a whole appear to be broken by automatic FMA/FMS emission (I can't guarantee that this is true, because I can't prevent the compiler from emitting all FMA/FMS instructions - hence this issue). The case that brought this up is Eigen decomposition, which is an inherently numerically unstable operation in some cases, I think that the differential precision by fused v non-fused operations is causing amplification of the instability and so the complete (i.e. not just a matter of increasing tolerance) failure on arm64. |
With help from @josharian I've temporarily made these changes and I think my hypothesis is incorrect. I do still think though that users should be able to prevent fused operations with complex values. |
This case was noted in passing at #17895 (comment) and the following comment. |
Try computing x*y (as complex128s) using:
That should inhibit fma computation of floating-point multiply. |
Yes, I'm aware of that. At that point I should just use |
As @randall77 pointed out FMA setting doesnt exist for arm64 as we assume its always present: go/src/internal/cpu/cpu_arm64.go Line 44 in 03ef105
There is no condition for arm64 in the compiler currently to not generate FMA or make it conditional on the GODEBUG setting at runtime: go/src/cmd/compile/internal/gc/ssa.go Line 3574 in a50c3ff
|
The original issues that prompted this have been resolved, in part by changing the compiler to not emit fused operation instructions. The original issues covered the range of differences in precision causing changes in output precision, differences in precision changing gross behaviour and an actual real bug in our code (~50 issues were rolled up in the original problem). Because of the complexity of the issues we had (and the number of sites where fused operations can be constructed), it would have been very helpful to be able to tell the compiler to not emit fused operation instructions in order to exclude that as a cause. I'll note that discordance between precision at different sites has historically been a cause of catastrophic failures (I hope no-one is using Go for missile guidance). |
I found my way here via #53297. In this case, FMA also provided us with incorrect values (with float64 instead of complex128). I was able to rework the statement to not cause the compiler to optimize it into a FMA, but it would have been much better to have a compiler option to disable it entirely. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Examine the assembly generated for the code at https://play.golang.org/p/JuTC-BPAIJN with
GOARCH=arm64
What did you expect to see?
No FMA/FMS instructions emitted.
What did you see instead?
No amount of wrapping the operands in
complex128
prevents this AFAICS.The text was updated successfully, but these errors were encountered: