Discuss: FP accuracy loss by performance improvement on ARM64 #24033

benshi001 · 2018-02-22T11:45:22Z

My last https://go-review.googlesource.com/c/go/+/94901 improved FP computation performance by about
9% on ARM64, but introduced a little accuracy lost.

The main idea is packing a pair of FMUL/FADD instructions into a single FMADD, and its benefits

save a register for the intermediate mul result
save CPU ticks

How ever accuracy loss also be introduced. Such as

float32(0.6046603 * 0.9405091) + 0.6645601, expected 1.2332485, got 1.2332486
float32(0.67908466 * 0.21855305) + 0.20318687, expected 0.3516029, got 0.35160288
...

The test case go/src/cmd/compile/internal/gc/testdata/fp.go failed.

There are two solutions

Roll back to the less optimized fmul/fadd
Modify the test case, something like pattern matching

float32(0.6046603 * 0.9405091) + 0.6645601 == 1.2332485
float32(0.6046603 * 0.9405091) + 0.6645601 == 1.233248*

What is your opinion?

The text was updated successfully, but these errors were encountered:

benshi001 · 2018-02-22T11:53:42Z

I suggest the second one. FP accuracy loss is a common issue, we usually do
"fp0 - fp1 < loss" than "fp0 == fp1".

mundaym · 2018-02-22T11:55:46Z

The bitwise equality check in the test is deliberate. The results you are seeing with the new optimizations are probably more accurate, not less. I suspect you just need to introduce LoweredRound ops, adding rules like this:

(Round32F x) -> (LoweredRound32F x)
(Round64F x) -> (LoweredRound64F x)

This is needed because the Go spec says that float64(x*y)+c requires an intermediate rounding stage after the multiplication and so can't generally be implemented using a fused multiply-add instruction. Round32F and Round64F ops are inserted so that this is visible to the optimization rules. Most architectures just ignore them because they don't have fused multiply-add rules. See #17895 for more detailed discussion.

cherrymui · 2018-02-22T13:33:41Z

Yes, I think @mundaym is right. We should not use fused FP operations if there is an explicit conversion. Sorry I forgot it in the review.

benshi001 · 2018-02-22T13:46:42Z

Thank you. @mundaym I see it.

gopherbot · 2018-02-22T14:05:14Z

Change https://golang.org/cl/96355 mentions this issue: cmd/compile: fix FP accuracy issue introduced by FMA optimization on ARM64

gopherbot closed this as completed in 7113d3a Feb 22, 2018

golang locked and limited conversation to collaborators Feb 22, 2019

gopherbot added the FrozenDueToAge label Feb 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discuss: FP accuracy loss by performance improvement on ARM64 #24033

Discuss: FP accuracy loss by performance improvement on ARM64 #24033

benshi001 commented Feb 22, 2018 •

edited

Loading

benshi001 commented Feb 22, 2018

mundaym commented Feb 22, 2018

cherrymui commented Feb 22, 2018

benshi001 commented Feb 22, 2018

gopherbot commented Feb 22, 2018

Discuss: FP accuracy loss by performance improvement on ARM64 #24033

Discuss: FP accuracy loss by performance improvement on ARM64 #24033

Comments

benshi001 commented Feb 22, 2018 • edited Loading

benshi001 commented Feb 22, 2018

mundaym commented Feb 22, 2018

cherrymui commented Feb 22, 2018

benshi001 commented Feb 22, 2018

gopherbot commented Feb 22, 2018

benshi001 commented Feb 22, 2018 •

edited

Loading