strconv: ParseFloat should accept 'p' notation for binary exponents #12518

kortschak · 2015-09-06T00:35:16Z

See https://groups.google.com/d/topic/golang-dev/oIB-wBj3ufw/discussion.

The language specification never mentioned binary exponent float representation, but it was previously included in the gc implementation and it is included as a formatting option via strconv.AppendFloat with the 'b' fmt argument. However, it now lives on as a parsing option only in the compiler and test code in strconv.

The capacity to represent exact float values in a clear human-readable way is valuable in numeric code, for example here, where otherwise comments are required to explain the magic hex.

It is not clear how this should be included, since parsing a string is failable at runtime and these values are likely to nearly always be compile time constants.

/cc @griesemer

griesemer · 2015-09-06T04:29:24Z

Some comments:

The compiler is using the mechanism to write (export) and read (import) exported float constants, written using the p exponent, because it permits an easy and lossless representation of a float in decimal form (decimal mantissa, exponent to power of 2, but written in decimal form). Note that the need for this format is likely going away since a binary representation of the exported data is more compact, just as precise, and faster to read and write (I'm working on a respective change).
The strconv.AppendFloat representation of the 'b' format requires the bitsize of the argument (32 or 64). It simply interprets the mantissa as a large decimal, and then prints the exponent. For instance, a float64 0.0, using 'b' format, formats as: "0p-1074" which is somewhat odd as it requires understanding of the 64bit float format to explain how the result was obtained. Similarly, 1.0 is printed as 4503599627370496p-52, that is it is the float64 mantissa (53 bits) interpreted as a decimal, which is then printed (4503599627370496, same as 1<<52), followed by the exponent (http://play.golang.org/p/Rt0SIFzzHi). Again, it requires the mantissa size to explain the output. (0p0 and 1p0 would be just a valid, but be more expensive to derive - basically it's the same mantissa with trailing 0's removed and the exponent adjusted - a canonical form).

And some questions:

What are you proposing? (Is this a proposal?)
Are you arguing that this format should be acceptable syntax in the language?

kortschak · 2015-09-06T04:51:43Z

This is not a proposal, I wanted to sound things out first.

I don't think that it needs to be part of the language, the utility outside numerics is limited. At the most making strconv.ParseFloat handle it is what I am thinking.

griesemer · 2015-09-10T05:37:28Z

Having strconv.ParseFloat handle the format sounds reasonable to me. I think the next step would be to define the exact format (syntax). It's probably something like:

number = [sign] mantissa 'p' [sign] exponent.
sign = '+' | '-' .
mantissa = decimalDigit {decimalDigit}.
exponent = decimalDigit {decimalDigit}.

Questions:

Should both 'p' and 'P' be permitted? Why/why not?
Can the mantissa be hexadecimal? Why/why not?

kortschak · 2015-09-10T05:45:41Z

It seems to me that a single case for 'p' makes sense because it is a (marginally) simpler thing to look for instances (visually and mechanically) when there is only one thing to look for and that thing is visually distinct from a digit (in the decimal case). Mantissa optionally as hex makes moderate sense since a bit pattern may be what is being specified.

griesemer · 2015-09-10T21:17:53Z

Permitting only 'p' sounds good. Perhaps for a start, also leave away hexadecimal notation. Thus, a float in p notation is essentially a signed integer followed by a 'p' exponent.

Venture to send a change list? (strconv/atof.go)

kortschak · 2015-09-10T22:33:55Z

Yeah, I'll look into that. Just an initial observation though; it seems that fmt.Scan* handle binary exponent float representations judging by the test cases that exist (though also with dot rather than int-only mantissa). big.Float also handles these cases, but also includes hex input (including non-int).

Before I add to the variety, I'd like to get input on that.

griesemer · 2015-09-10T23:04:02Z

I haven't looked at fmt.Scan. Permitting a decimal point for a decimal mantissa is tricky. The point of the 'p' notation is 100% lossless conversion with a fast and simple algorithm. In general that's not true anymore once a decimal point is permitted.

big.Float uses a different format: the mantissa is represented by a hex number which corresponds to the bits after (to the right) of the "decimal" point - that is, that mantissa value m is 0.5 <= m < 1.0. It's essentially used for testing (and could possibly be changed).

Given a sign s (-1, +1), a mantissa m that is simply a decimal unsigned integer, and a binary exponent b, the floating point value x is x = s * m * 2**b . No further explanation needed.

There are design decisions to be made when printing using a binary exponent: The mantissa may be scaled arbitrarily. Currently, printing simply prints the float32/float64 mantissa bits like if they were int32 or int64 bit numbers (with appropriate exponent). This requires knowing the bit size of the type to reproduce. Another option would be to always print a canonical form; for instance such that the mantissa is the smallest possible value before requiring a decimal point. That is equivalent to having no trailing 0's in the mantissa (or the mantissa being odd, except for x == 0).

But for parsing it doesn't matter.

More generally: it seems that strconv conversion routines should parse numbers that it can print.

kortschak · 2015-09-10T23:49:16Z

Agreed. Just getting clarification.

rsc · 2015-11-25T20:58:39Z

@kortschak, regarding your initial comment:

The capacity to represent exact float values in a clear human-readable way is valuable in numeric code, for example here, where otherwise comments are required to explain the magic hex.

And the code says:

var (
    // dlamchE is the machine epsilon. For IEEE this is 2^-53.
    dlamchE = math.Float64frombits(0x3ca0000000000000)

    // dlamchP is 2 * eps
    dlamchP = math.Float64frombits(0x3cb0000000000000)

    // dlamchS is the "safe min", that is, the lowest number such that 1/sfmin does
    // not overflow. The Netlib code for calculating this number is not correct --
    // it overflows. Found by trial and error, it is equal to (1/math.MaxFloat64) * (1+ 6*eps)
    dlamchS = math.Float64frombits(0x4000000000001)

    ...
)

I want to make the point, unrelated to what we do in strconv, that this is unnecessary in Go. This kind of thing - specifying floating point constants in hexadecimal - is rampant in C because C compilers have historically been quite bad at reading floating point inputs. Using hex was the only way to guarantee the compiler arrived at the number you intended. But modern practice has improved, and Go gets this right. There are any number of ways you could write the above code using plain floating point constants, but the most direct is:

var (
    // dlamchE is the machine epsilon. For IEEE this is 2^-53.
    dlamchE = 1.1102230246251565e-16

    // dlamchP is 2 * eps
    dlamchP = 2.220446049250313e-16

    // dlamchS is the "safe min", that is, the lowest number such that 1/sfmin does
    // not overflow. The Netlib code for calculating this number is not correct --
    // it overflows. Found by trial and error, it is equal to (1/math.MaxFloat64) * (1+ 6*eps)
    dlamchS = 5.56268464626801e-309

    ...
)

This is guaranteed to have the same effect as the math.Float64frombits calls.

rsc · 2015-11-25T21:02:19Z

Postponing the strconv work.

kortschak · 2015-11-25T21:59:21Z

This misses the point. If we were able to say

var (
    // dlamchE is the machine epsilon. For IEEE this is 2^-53.
    dlamchE = 1p-53 // or package-provided equivalent.

    // dlamchP is 2 * eps
    dlamchP = 2*dlamchE

    // dlamchS is the "safe min", that is, the lowest number such that 1/sfmin does
    // not overflow. The Netlib code for calculating this number is not correct --
    // it overflows. Found by trial and error.
    dlamchS = (1/math.MaxFloat64) * (1+ 6*dlamchE)

    ...
)

then I would agree, but we can't. The capacity to express exact float values is less than half the problem.

griesemer · 2015-12-11T07:37:02Z

@kortschak FWIW, in Go we can express 1p-53 quite elegantly as the constant expression 1.0/(1<<53). All the various forms agree with the value computed from the bit pattern: http://play.golang.org/p/VjkVDA8PrL .

Or more generally, any float constant of the form xxxp+exp or xxxp-exp can be expressed as xxx<<exp or xxx.0/(1<<exp) . Thus, the need for the p notation in code is diminished.

kortschak · 2015-12-11T07:47:04Z

@griesemer Thanks for that tip. This covers our use case better than a strconv parser, though I feel probably the last sentence in #12518 (comment) justifies this addition for other uses.

rsc · 2016-01-04T17:12:41Z

FWIW I agree that since strconv can generate the p form it should also accept the p form. That said, doing so correctly at the boundaries is tricky. It's not a 1-liner.

rsc · 2016-10-10T19:38:02Z

Decision is in my previous comment above: yes, it's fine to do this just get the corner cases right please.

kortschak · 2016-10-11T22:24:43Z

In looking into this I have found that fmt.Scan does binary exponent format float parsing (as should be expected from the documentation). However, it is less restrictive than the documentation in fmt suggests it should be (for example, this works although the documentation specifies a "decimalless scientific notation"). It is even more relaxed than would be acceptable for strconv.ParseFloat since the routines backing the scan functions do not error when the exponent is out of range (instead setting the value to ±Inf). So I guess a question here is whether the scan float binary exponent functionality should be backed by a new strconv binary exponent float parser (keeping the behaviour that overflows are silently converted to Infs by discarding the out of range error), and whether the the Scan behaviour with decimal mantissas should be brought into the strconv.ParseFloat function (and documented in fmt).

rsc · 2016-10-21T01:30:17Z

The fmt scan overflow to inf behavior for binary exponents is a bug. Note that decimal exponents are handled right: https://play.golang.org/p/SHc0zAdyhx. It's just one more corner case that makes this non-trivial. It would be fine to support 1.2p4 as fmt.Scan does. In fact that's probably important. But it adds more corner cases.

I think we should probably postpone this to Go 1.9 since it's not urgent and there's little time left in Go 1.8. I'm interested to see this happen though.

kortschak · 2017-03-22T21:37:08Z

@griesemer

The comment here shows a technique for representing these constants using shifts, however this only really handles a minority of cases easily; anything that is not a multiple of a power of 2 is difficult and (more troublingly) values where the exponent is greater than the spec minimum integer constant representation width cannot easily either. An example of this is the value LAPACK DLAMCH("S") which is 1p-1022 and which cannot be expressed using the expression 1.0 / (1 << 1022), requiring instead 1.0 / (1 << 256) / (1 << 256) / (1 << 256) / (1 << 254) to shoehorn float exponents into the integer model.

griesemer · 2017-03-22T21:41:40Z

@kortschak Ack. Are you proposing the p notation in the language?

kortschak · 2017-03-22T21:50:14Z

At this stage, that's just a data point that I had not noticed before.

The issue here is that I would suppose that a very small group of Go programmer would need that (and go compiler authors who have that in internal code as far as I remember). For the use that we have (1 case in two locations), I am suggesting for use to use the longer expression above. It would be nice to be able to express these values simply, but whether that is worth your (pl.) time is not clear to me.

griesemer · 2017-05-09T18:45:09Z

Not happening for 1.9.

griesemer · 2017-08-15T13:20:34Z

I think the agreement here is that we accept the 'p' notation for binary exponents with strconv.ParseFloat and eventually update fmt.Scan to match its behavior with respect to error handling or corner cases.

@martisch Is this something you might be interested in looking into (starting with ParseFloat)? If so, feel free to assign this issue to yourself.

martisch · 2017-08-15T15:37:23Z

I am interested and added myself.

odeke-em · 2018-03-05T08:59:00Z

/cc @ericlagergren

griesemer · 2018-09-14T00:38:31Z

Moving this off 1.12. This takes a dedicated careful effort. We get to it when we get to it.

kortschak · 2022-06-03T05:46:05Z

This looks like it was fixed in 0771724.

rsc self-assigned this Oct 23, 2015

rsc added this to the Go1.6 milestone Oct 23, 2015

rsc added the Thinking label Oct 23, 2015

rsc modified the milestones: Go1.7, Go1.6 Nov 25, 2015

rsc modified the milestones: Go1.8, Go1.7 May 18, 2016

quentinmit added the NeedsDecision label Oct 10, 2016

rsc added NeedsFix and removed NeedsDecision Thinking labels Oct 10, 2016

rsc modified the milestones: Go1.8Maybe, Go1.8 Oct 10, 2016

rsc removed this from the Go1.8Maybe milestone Oct 21, 2016

griesemer self-assigned this Oct 21, 2016

griesemer modified the milestones: Go1.10, Go1.9 May 9, 2017

griesemer changed the title ~~strconv: add capacity to parse binary exponent float representations~~ strconv: strconv.ParseFloat should accept 'p' notation for binary exponents Aug 15, 2017

martisch self-assigned this Aug 15, 2017

griesemer modified the milestones: Go1.10, Go1.11 Nov 3, 2017

odeke-em changed the title ~~strconv: strconv.ParseFloat should accept 'p' notation for binary exponents~~ strconv: ParseFloat should accept 'p' notation for binary exponents Mar 5, 2018

griesemer modified the milestones: Go1.11, Go1.12 May 1, 2018

sparkprime mentioned this issue May 30, 2018

Hexfloat support for parser; formatter google/jsonnet#517

Closed

griesemer modified the milestones: Go1.12, Unplanned Sep 14, 2018

seebs mentioned this issue Nov 29, 2018

proposal: Go 2: hexadecimal floats #29008

Closed

kortschak closed this as completed Jun 3, 2022

rsc unassigned rsc, martisch and griesemer Jun 23, 2022

golang locked and limited conversation to collaborators Jun 23, 2023

gopherbot added the FrozenDueToAge label Jun 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strconv: ParseFloat should accept 'p' notation for binary exponents #12518

strconv: ParseFloat should accept 'p' notation for binary exponents #12518

kortschak commented Sep 6, 2015

griesemer commented Sep 6, 2015

kortschak commented Sep 6, 2015

griesemer commented Sep 10, 2015

kortschak commented Sep 10, 2015

griesemer commented Sep 10, 2015

kortschak commented Sep 10, 2015

griesemer commented Sep 10, 2015

kortschak commented Sep 10, 2015

rsc commented Nov 25, 2015

rsc commented Nov 25, 2015

kortschak commented Nov 25, 2015

griesemer commented Dec 11, 2015

kortschak commented Dec 11, 2015

rsc commented Jan 4, 2016

rsc commented Oct 10, 2016

kortschak commented Oct 11, 2016

rsc commented Oct 21, 2016

kortschak commented Mar 22, 2017

griesemer commented Mar 22, 2017

kortschak commented Mar 22, 2017

griesemer commented May 9, 2017

griesemer commented Aug 15, 2017

martisch commented Aug 15, 2017

odeke-em commented Mar 5, 2018

griesemer commented Sep 14, 2018

kortschak commented Jun 3, 2022

strconv: ParseFloat should accept 'p' notation for binary exponents #12518

strconv: ParseFloat should accept 'p' notation for binary exponents #12518

Comments

kortschak commented Sep 6, 2015

griesemer commented Sep 6, 2015

kortschak commented Sep 6, 2015

griesemer commented Sep 10, 2015

kortschak commented Sep 10, 2015

griesemer commented Sep 10, 2015

kortschak commented Sep 10, 2015

griesemer commented Sep 10, 2015

kortschak commented Sep 10, 2015

rsc commented Nov 25, 2015

rsc commented Nov 25, 2015

kortschak commented Nov 25, 2015

griesemer commented Dec 11, 2015

kortschak commented Dec 11, 2015

rsc commented Jan 4, 2016

rsc commented Oct 10, 2016

kortschak commented Oct 11, 2016

rsc commented Oct 21, 2016

kortschak commented Mar 22, 2017

griesemer commented Mar 22, 2017

kortschak commented Mar 22, 2017

griesemer commented May 9, 2017

griesemer commented Aug 15, 2017

martisch commented Aug 15, 2017

odeke-em commented Mar 5, 2018

griesemer commented Sep 14, 2018

kortschak commented Jun 3, 2022