proposal: strconv: add ParseFloatPrefix #53340

benhoyt · 2022-06-12T10:12:54Z

The strconv.ParseFloat function parses a floating-point value from a string, but it requires that the floating-point value is the entire string, otherwise it returns an error. In many types of parsing you want to know if a string starts with a valid floating-point value, and then continue parsing after that.

For example, you could imagine code that parsed <float> <op> <float>. Currently you'd either have to write a full scanner for floats and pass the result to ParseFloat, which means a lot of tricky code and doing work that ParseFloat will then do again.

There's an existing, unexported function called parseFloatPrefix, with the following signature (the returned int is the number of bytes of s parsed):

func parseFloatPrefix(s string, bitSize int) (float64, int, error) { ... }

If this was exported as ParseFloatPrefix, you could us it to parse a float and skip over the number of bytes it returns. (In fact, parseFloatPrefix is used by ParseComplex, which is basically parsing <float>±<float>.)

Real-world examples other than ParseComplex:

I would use ParseFloatPrefix in my GoAWK project, because the implicit conversion that AWK does when you ask for a number but the input is a string is allowed to be a prefix, like "1.5xyz" yields 1.5. I currently implement this with my own pre-scanner, and then pass the result to ParseFloat. (I believe I used to use text/scanner, but it was rather heavy and slow.) I ran into this because I wanted to add hexadecimal floating point support, which is allowed by the POSIX AWK spec.
strconv.parseFloatPrefix is used by the Vitess database project in some expression-parsing code. They pull in the actual strconv.parseFloatPrefix using a //go:linkname hack.

In the golang-dev thread where I asked about this, @rsc mentioned that strconv.QuotedPrefix was added recently, and said "I think we could reasonably add FloatPrefix, although at that point perhaps we should also consider IntPrefix, UintPrefix, BoolPrefix, and ComplexPrefix". Two things from that:

I'd argue that we need [Parse]FloatPrefix because it's quite complex and error-prone to write, whereas IntPrefix and UintPrefix are relatively simple (at least with a fixed base), BoolPrefix is trivial, and ComplexPrefix becomes relatively simple once we have [Parse]FloatPrefix.
I'm not a fan of how QuotedPrefix doesn't return the unescaped string, because then (in most use cases) you have to call strconv.Unquote later, which will loop over the bytes a second time to unescape it -- rather inefficient! So I'd much rather see ParseFloatPrefix, or a similar signature which returns the parsed floating-point value as well. In my GoAWK use case, efficiency is important, because it's not used just for parsing source code, but may be used on each field in the data passed to the AWK script, so I don't want to scan the bytes twice.

The text was updated successfully, but these errors were encountered:

adonovan · 2022-06-12T17:26:10Z

I agree that operators of this kind are useful; personally I've only wanted the string variant (QuotedPrefix) but it is frustrating when these complex (and evolving, albeit very slowly) routines exist but are inaccessible, and could be made accessible at no dynamic cost and with minimal maintenance cost.

I'm not a fan of how QuotedPrefix doesn't return the unescaped string, because then (in most use cases) you have to call strconv.Unquote later, which will loop over the bytes a second time to unescape it -- rather inefficient!

True, though Unquote is pretty cheap for literals that don't contain metacharacters, since it doesn't need to allocate.

dsnet · 2022-06-12T23:49:24Z

#45033 added QuotedPrefix, which returns a valid quoted string that is at the start of the string. The decision was made that it does not actually perform the unquote operation, which must be done with a subsequent call to Unquote.

Following the existing precedence, I would expect the following be added:

BoolPrefix
ComplexPrefix
FloatPrefix
IntPrefix
UintPrefix

where you would need to perform a subsequent call to ParseXXX after you obtain the leading prefix.

I think we should remain consistent.

benhoyt · 2022-06-13T02:34:39Z

I also like consistency, but here's what I don't like about FloatPrefix not returning the number (benchmark code here, with FloatPrefix implemented in terms of the existing readFloat function):

cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
BenchmarkParseFloatPrefix-8           	28839720	        38.73 ns/op
BenchmarkFloatPrefix-8                	35758686	        30.83 ns/op
BenchmarkFloatPrefixAndParseFloat-8   	16890924	        67.09 ns/op

It's doing almost twice as much work when you need the float value (which I think will be common). It goes against the grain to be redoing work like this -- especially painful because parseFloatPrefix already exists and returns the number.

rsc · 2022-06-15T18:35:54Z

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

rsc · 2022-06-29T17:35:30Z

This still seems like a very complex function. What does it do with "1.2e+A"? Does it return "1.2"? Or does it report an error? It still seems like a text parser should be deciding this kind of error behavior for itself.

dsnet · 2022-06-29T18:09:29Z

Another consideration: the function might not be forwards compatible if we change the grammar of numbers.

Prior to Go1.13, strconv.ParseFloatPrefix("123_456") would have returned 123.
After Go1.13 where we permit "underscore" separators, strconv.ParseFloatPrefix("123_456") would now return 123456.

This would break anyone who was relying on _ as a delimiter between numbers.

benhoyt · 2022-06-29T21:00:03Z

@rsc That's a good point. I don't think it'd be too hard to address, by just documenting whatever the current parseFloatPrefix behaviour is for that kind of case (which is to return an error, parsing "1.2e+A": invalid syntax).

@dsnet True -- another good point. Though that's very rare, and usually that would result in another error further up in someone's parsing.

Still, points taken. I'm definitely having second thoughts about whether this is a good idea, or whether people should continue to define their own pre-parsers specific to their domain (which is what I'm currently doing for GoAWK).

rsc · 2022-07-13T17:29:19Z

@dsnet, thanks. That's a great example of why I'm hesitant about this change. It's quite a bit more subtle than QuotedPrefix.

rsc · 2022-07-13T18:41:22Z

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

rsc · 2022-07-20T18:03:21Z

No change in consensus, so declined.
— rsc for the proposal review group

benhoyt added the Proposal label Jun 12, 2022

gopherbot added this to the Proposal milestone Jun 12, 2022

rsc added the Proposal-FinalCommentPeriod label Jul 13, 2022

rsc removed the Proposal-FinalCommentPeriod label Jul 20, 2022

rsc closed this as completed Jul 20, 2022

rsc moved this to Declined in Proposals Aug 10, 2022

rsc added this to Proposals Aug 10, 2022

golang locked and limited conversation to collaborators Jul 20, 2023

gopherbot added the FrozenDueToAge label Jul 20, 2023

rsc removed this from Proposals Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: strconv: add ParseFloatPrefix #53340

proposal: strconv: add ParseFloatPrefix #53340

benhoyt commented Jun 12, 2022 •

edited

Loading

adonovan commented Jun 12, 2022

dsnet commented Jun 12, 2022 •

edited

Loading

benhoyt commented Jun 13, 2022

rsc commented Jun 15, 2022

rsc commented Jun 29, 2022

dsnet commented Jun 29, 2022 •

edited

Loading

benhoyt commented Jun 29, 2022

rsc commented Jul 13, 2022

rsc commented Jul 13, 2022

rsc commented Jul 20, 2022

proposal: strconv: add ParseFloatPrefix #53340

proposal: strconv: add ParseFloatPrefix #53340

Comments

benhoyt commented Jun 12, 2022 • edited Loading

adonovan commented Jun 12, 2022

dsnet commented Jun 12, 2022 • edited Loading

benhoyt commented Jun 13, 2022

rsc commented Jun 15, 2022

rsc commented Jun 29, 2022

dsnet commented Jun 29, 2022 • edited Loading

benhoyt commented Jun 29, 2022

rsc commented Jul 13, 2022

rsc commented Jul 13, 2022

rsc commented Jul 20, 2022

benhoyt commented Jun 12, 2022 •

edited

Loading

dsnet commented Jun 12, 2022 •

edited

Loading

dsnet commented Jun 29, 2022 •

edited

Loading