-
Notifications
You must be signed in to change notification settings - Fork 18k
strconv: ParseFloat should accept 'p' notation for binary exponents #12518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Some comments:
And some questions:
|
This is not a proposal, I wanted to sound things out first. I don't think that it needs to be part of the language, the utility outside numerics is limited. At the most making strconv.ParseFloat handle it is what I am thinking. |
Having strconv.ParseFloat handle the format sounds reasonable to me. I think the next step would be to define the exact format (syntax). It's probably something like:
Questions:
|
It seems to me that a single case for 'p' makes sense because it is a (marginally) simpler thing to look for instances (visually and mechanically) when there is only one thing to look for and that thing is visually distinct from a digit (in the decimal case). Mantissa optionally as hex makes moderate sense since a bit pattern may be what is being specified. |
Permitting only 'p' sounds good. Perhaps for a start, also leave away hexadecimal notation. Thus, a float in p notation is essentially a signed integer followed by a 'p' exponent. Venture to send a change list? (strconv/atof.go) |
Yeah, I'll look into that. Just an initial observation though; it seems that fmt.Scan* handle binary exponent float representations judging by the test cases that exist (though also with dot rather than int-only mantissa). big.Float also handles these cases, but also includes hex input (including non-int). Before I add to the variety, I'd like to get input on that. |
I haven't looked at fmt.Scan. Permitting a decimal point for a decimal mantissa is tricky. The point of the 'p' notation is 100% lossless conversion with a fast and simple algorithm. In general that's not true anymore once a decimal point is permitted. big.Float uses a different format: the mantissa is represented by a hex number which corresponds to the bits after (to the right) of the "decimal" point - that is, that mantissa value m is 0.5 <= m < 1.0. It's essentially used for testing (and could possibly be changed). Given a sign s (-1, +1), a mantissa m that is simply a decimal unsigned integer, and a binary exponent b, the floating point value x is x = s * m * 2**b . No further explanation needed. There are design decisions to be made when printing using a binary exponent: The mantissa may be scaled arbitrarily. Currently, printing simply prints the float32/float64 mantissa bits like if they were int32 or int64 bit numbers (with appropriate exponent). This requires knowing the bit size of the type to reproduce. Another option would be to always print a canonical form; for instance such that the mantissa is the smallest possible value before requiring a decimal point. That is equivalent to having no trailing 0's in the mantissa (or the mantissa being odd, except for x == 0). But for parsing it doesn't matter. More generally: it seems that strconv conversion routines should parse numbers that it can print. |
Agreed. Just getting clarification. |
@kortschak, regarding your initial comment:
And the code says:
I want to make the point, unrelated to what we do in strconv, that this is unnecessary in Go. This kind of thing - specifying floating point constants in hexadecimal - is rampant in C because C compilers have historically been quite bad at reading floating point inputs. Using hex was the only way to guarantee the compiler arrived at the number you intended. But modern practice has improved, and Go gets this right. There are any number of ways you could write the above code using plain floating point constants, but the most direct is:
This is guaranteed to have the same effect as the math.Float64frombits calls. |
Postponing the strconv work. |
This misses the point. If we were able to say
then I would agree, but we can't. The capacity to express exact float values is less than half the problem. |
@kortschak FWIW, in Go we can express 1p-53 quite elegantly as the constant expression Or more generally, any float constant of the form xxxp+exp or xxxp-exp can be expressed as xxx<<exp or xxx.0/(1<<exp) . Thus, the need for the p notation in code is diminished. |
@griesemer Thanks for that tip. This covers our use case better than a strconv parser, though I feel probably the last sentence in #12518 (comment) justifies this addition for other uses. |
FWIW I agree that since strconv can generate the p form it should also accept the p form. That said, doing so correctly at the boundaries is tricky. It's not a 1-liner. |
Decision is in my previous comment above: yes, it's fine to do this just get the corner cases right please. |
In looking into this I have found that fmt.Scan does binary exponent format float parsing (as should be expected from the documentation). However, it is less restrictive than the documentation in fmt suggests it should be (for example, this works although the documentation specifies a "decimalless scientific notation"). It is even more relaxed than would be acceptable for strconv.ParseFloat since the routines backing the scan functions do not error when the exponent is out of range (instead setting the value to ±Inf). So I guess a question here is whether the scan float binary exponent functionality should be backed by a new strconv binary exponent float parser (keeping the behaviour that overflows are silently converted to Infs by discarding the out of range error), and whether the the Scan behaviour with decimal mantissas should be brought into the strconv.ParseFloat function (and documented in fmt). |
The fmt scan overflow to inf behavior for binary exponents is a bug. Note that decimal exponents are handled right: https://play.golang.org/p/SHc0zAdyhx. It's just one more corner case that makes this non-trivial. It would be fine to support 1.2p4 as fmt.Scan does. In fact that's probably important. But it adds more corner cases. I think we should probably postpone this to Go 1.9 since it's not urgent and there's little time left in Go 1.8. I'm interested to see this happen though. |
The comment here shows a technique for representing these constants using shifts, however this only really handles a minority of cases easily; anything that is not a multiple of a power of 2 is difficult and (more troublingly) values where the exponent is greater than the spec minimum integer constant representation width cannot easily either. An example of this is the value LAPACK |
@kortschak Ack. Are you proposing the p notation in the language? |
At this stage, that's just a data point that I had not noticed before. The issue here is that I would suppose that a very small group of Go programmer would need that (and go compiler authors who have that in internal code as far as I remember). For the use that we have (1 case in two locations), I am suggesting for use to use the longer expression above. It would be nice to be able to express these values simply, but whether that is worth your (pl.) time is not clear to me. |
Not happening for 1.9. |
I think the agreement here is that we accept the 'p' notation for binary exponents with @martisch Is this something you might be interested in looking into (starting with |
I am interested and added myself. |
/cc @ericlagergren |
Moving this off 1.12. This takes a dedicated careful effort. We get to it when we get to it. |
This looks like it was fixed in 0771724. |
See https://groups.google.com/d/topic/golang-dev/oIB-wBj3ufw/discussion.
The language specification never mentioned binary exponent float representation, but it was previously included in the gc implementation and it is included as a formatting option via strconv.AppendFloat with the 'b' fmt argument. However, it now lives on as a parsing option only in the compiler and test code in strconv.
The capacity to represent exact float values in a clear human-readable way is valuable in numeric code, for example here, where otherwise comments are required to explain the magic hex.
It is not clear how this should be included, since parsing a string is failable at runtime and these values are likely to nearly always be compile time constants.
/cc @griesemer
The text was updated successfully, but these errors were encountered: