Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss: FP accuracy loss by performance improvement on ARM64 #24033

Closed
benshi001 opened this issue Feb 22, 2018 · 5 comments
Closed

Discuss: FP accuracy loss by performance improvement on ARM64 #24033

benshi001 opened this issue Feb 22, 2018 · 5 comments

Comments

@benshi001
Copy link
Member

benshi001 commented Feb 22, 2018

My last https://go-review.googlesource.com/c/go/+/94901 improved FP computation performance by about
9% on ARM64, but introduced a little accuracy lost.

The main idea is packing a pair of FMUL/FADD instructions into a single FMADD, and its benefits

  1. save a register for the intermediate mul result
  2. save CPU ticks

How ever accuracy loss also be introduced. Such as

float32(0.6046603 * 0.9405091) + 0.6645601, expected 1.2332485, got 1.2332486
float32(0.67908466 * 0.21855305) + 0.20318687, expected 0.3516029, got 0.35160288
...

The test case go/src/cmd/compile/internal/gc/testdata/fp.go failed.

There are two solutions

  1. Roll back to the less optimized fmul/fadd

  2. Modify the test case, something like pattern matching

    float32(0.6046603 * 0.9405091) + 0.6645601 == 1.2332485
    float32(0.6046603 * 0.9405091) + 0.6645601 == 1.233248*

What is your opinion?

@benshi001
Copy link
Member Author

I suggest the second one. FP accuracy loss is a common issue, we usually do
"fp0 - fp1 < loss" than "fp0 == fp1".

@mundaym
Copy link
Member

mundaym commented Feb 22, 2018

The bitwise equality check in the test is deliberate. The results you are seeing with the new optimizations are probably more accurate, not less. I suspect you just need to introduce LoweredRound ops, adding rules like this:

(Round32F x) -> (LoweredRound32F x)
(Round64F x) -> (LoweredRound64F x)

This is needed because the Go spec says that float64(x*y)+c requires an intermediate rounding stage after the multiplication and so can't generally be implemented using a fused multiply-add instruction. Round32F and Round64F ops are inserted so that this is visible to the optimization rules. Most architectures just ignore them because they don't have fused multiply-add rules. See #17895 for more detailed discussion.

@cherrymui
Copy link
Member

Yes, I think @mundaym is right. We should not use fused FP operations if there is an explicit conversion. Sorry I forgot it in the review.

@benshi001
Copy link
Member Author

Thank you. @mundaym I see it.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/96355 mentions this issue: cmd/compile: fix FP accuracy issue introduced by FMA optimization on ARM64

@golang golang locked and limited conversation to collaborators Feb 22, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants