Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

math/big: rounding to denormal float32/64 still incorrect #14651

Closed
griesemer opened this issue Mar 4, 2016 · 2 comments
Closed

math/big: rounding to denormal float32/64 still incorrect #14651

griesemer opened this issue Mar 4, 2016 · 2 comments

Comments

@griesemer
Copy link
Contributor

This is a follow-up to issue #14553. In the special case of a math.Float number that is smaller than the smallest denormal, but that should be rounded up to the smallest denormal, rounding up doesn't happen for values x with 0.5 * 2**-149 (0.1000p-149) < x < 0.75 * 2**-149 (0.1100p-149) for float32 (analogously for float64).

Since the compiler is using this code, for these numbers we get the wrong bit patterns when converting/rounding at compile-time (constant evaluation):

package main

import (
    "fmt"
    "math"
)

const p149 = 1.0 / (1 << 149) // 1p-149

const (
    m0000 = 0x0 / 16.0 * p149 // = 0.0000p-149
    m1000 = 0x8 / 16.0 * p149 // = 0.1000p-149
    m1001 = 0x9 / 16.0 * p149 // = 0.1001p-149
    m1011 = 0xb / 16.0 * p149 // = 0.1011p-149
    m1100 = 0xc / 16.0 * p149 // = 0.1100p-149
)

func main() {
    print(float32(m0000), f32(m0000))
    print(float32(m1000), f32(m1000))
    print(float32(m1001), f32(m1001))
    print(float32(m1011), f32(m1011))
    print(float32(m1100), f32(m1100))
}

func f32(x float64) float32 {
    return float32(x)
}

func print(a, b float32) {
    fmt.Printf("%016x  %016x\n", math.Float32bits(a), math.Float32bits(b))
}

produces

0000000000000000  0000000000000000
0000000000000000  0000000000000000
0000000000000000  0000000000000001
0000000000000000  0000000000000001
0000000000000001  0000000000000001

(the left column is incorrect).

The problem in this case seems to be with rounding per se, and not so much the Float32/64 conversions.

@gopherbot
Copy link

CL https://golang.org/cl/20816 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/20818 mentions this issue.

gopherbot pushed a commit that referenced this issue Mar 21, 2016
Converting a big.Float value x to a float32/64 value did not correctly
round x up to the smallest denormal float32/64 if x was smaller than the
smallest denormal float32/64, but larger than 0.5 of a smallest denormal
float32/64.

Handle this case explicitly and simplify some code in the turn.

For #14651.

Change-Id: I025e24bf8f0e671581a7de0abf7c1cd7e6403a6c
Reviewed-on: https://go-review.googlesource.com/20816
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alan Donovan <adonovan@google.com>
@golang golang locked and limited conversation to collaborators Mar 22, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants