Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: strconv: add ParseFloatPrefix #53340

Closed
benhoyt opened this issue Jun 12, 2022 · 10 comments
Closed

proposal: strconv: add ParseFloatPrefix #53340

benhoyt opened this issue Jun 12, 2022 · 10 comments

Comments

@benhoyt
Copy link
Contributor

benhoyt commented Jun 12, 2022

The strconv.ParseFloat function parses a floating-point value from a string, but it requires that the floating-point value is the entire string, otherwise it returns an error. In many types of parsing you want to know if a string starts with a valid floating-point value, and then continue parsing after that.

For example, you could imagine code that parsed <float> <op> <float>. Currently you'd either have to write a full scanner for floats and pass the result to ParseFloat, which means a lot of tricky code and doing work that ParseFloat will then do again.

There's an existing, unexported function called parseFloatPrefix, with the following signature (the returned int is the number of bytes of s parsed):

func parseFloatPrefix(s string, bitSize int) (float64, int, error) { ... }

If this was exported as ParseFloatPrefix, you could us it to parse a float and skip over the number of bytes it returns. (In fact, parseFloatPrefix is used by ParseComplex, which is basically parsing <float>±<float>.)

Real-world examples other than ParseComplex:

  • I would use ParseFloatPrefix in my GoAWK project, because the implicit conversion that AWK does when you ask for a number but the input is a string is allowed to be a prefix, like "1.5xyz" yields 1.5. I currently implement this with my own pre-scanner, and then pass the result to ParseFloat. (I believe I used to use text/scanner, but it was rather heavy and slow.) I ran into this because I wanted to add hexadecimal floating point support, which is allowed by the POSIX AWK spec.
  • strconv.parseFloatPrefix is used by the Vitess database project in some expression-parsing code. They pull in the actual strconv.parseFloatPrefix using a //go:linkname hack.

In the golang-dev thread where I asked about this, @rsc mentioned that strconv.QuotedPrefix was added recently, and said "I think we could reasonably add FloatPrefix, although at that point perhaps we should also consider IntPrefix, UintPrefix, BoolPrefix, and ComplexPrefix". Two things from that:

  • I'd argue that we need [Parse]FloatPrefix because it's quite complex and error-prone to write, whereas IntPrefix and UintPrefix are relatively simple (at least with a fixed base), BoolPrefix is trivial, and ComplexPrefix becomes relatively simple once we have [Parse]FloatPrefix.
  • I'm not a fan of how QuotedPrefix doesn't return the unescaped string, because then (in most use cases) you have to call strconv.Unquote later, which will loop over the bytes a second time to unescape it -- rather inefficient! So I'd much rather see ParseFloatPrefix, or a similar signature which returns the parsed floating-point value as well. In my GoAWK use case, efficiency is important, because it's not used just for parsing source code, but may be used on each field in the data passed to the AWK script, so I don't want to scan the bytes twice.
@gopherbot gopherbot added this to the Proposal milestone Jun 12, 2022
@adonovan
Copy link
Member

I agree that operators of this kind are useful; personally I've only wanted the string variant (QuotedPrefix) but it is frustrating when these complex (and evolving, albeit very slowly) routines exist but are inaccessible, and could be made accessible at no dynamic cost and with minimal maintenance cost.

I'm not a fan of how QuotedPrefix doesn't return the unescaped string, because then (in most use cases) you have to call strconv.Unquote later, which will loop over the bytes a second time to unescape it -- rather inefficient!

True, though Unquote is pretty cheap for literals that don't contain metacharacters, since it doesn't need to allocate.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jun 12, 2022
@dsnet
Copy link
Member

dsnet commented Jun 12, 2022

#45033 added QuotedPrefix, which returns a valid quoted string that is at the start of the string. The decision was made that it does not actually perform the unquote operation, which must be done with a subsequent call to Unquote.

Following the existing precedence, I would expect the following be added:

  • BoolPrefix
  • ComplexPrefix
  • FloatPrefix
  • IntPrefix
  • UintPrefix

where you would need to perform a subsequent call to ParseXXX after you obtain the leading prefix.

I think we should remain consistent.

@benhoyt
Copy link
Contributor Author

benhoyt commented Jun 13, 2022

I also like consistency, but here's what I don't like about FloatPrefix not returning the number (benchmark code here, with FloatPrefix implemented in terms of the existing readFloat function):

cpu: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
BenchmarkParseFloatPrefix-8           	28839720	        38.73 ns/op
BenchmarkFloatPrefix-8                	35758686	        30.83 ns/op
BenchmarkFloatPrefixAndParseFloat-8   	16890924	        67.09 ns/op

It's doing almost twice as much work when you need the float value (which I think will be common). It goes against the grain to be redoing work like this -- especially painful because parseFloatPrefix already exists and returns the number.

@rsc
Copy link
Contributor

rsc commented Jun 15, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Jun 15, 2022
@rsc
Copy link
Contributor

rsc commented Jun 29, 2022

This still seems like a very complex function. What does it do with "1.2e+A"? Does it return "1.2"? Or does it report an error? It still seems like a text parser should be deciding this kind of error behavior for itself.

@dsnet
Copy link
Member

dsnet commented Jun 29, 2022

Another consideration: the function might not be forwards compatible if we change the grammar of numbers.

Prior to Go1.13, strconv.ParseFloatPrefix("123_456") would have returned 123.
After Go1.13 where we permit "underscore" separators, strconv.ParseFloatPrefix("123_456") would now return 123456.

This would break anyone who was relying on _ as a delimiter between numbers.

@benhoyt
Copy link
Contributor Author

benhoyt commented Jun 29, 2022

@rsc That's a good point. I don't think it'd be too hard to address, by just documenting whatever the current parseFloatPrefix behaviour is for that kind of case (which is to return an error, parsing "1.2e+A": invalid syntax).

@dsnet True -- another good point. Though that's very rare, and usually that would result in another error further up in someone's parsing.

Still, points taken. I'm definitely having second thoughts about whether this is a good idea, or whether people should continue to define their own pre-parsers specific to their domain (which is what I'm currently doing for GoAWK).

@rsc
Copy link
Contributor

rsc commented Jul 13, 2022

@dsnet, thanks. That's a great example of why I'm hesitant about this change. It's quite a bit more subtle than QuotedPrefix.

@rsc
Copy link
Contributor

rsc commented Jul 13, 2022

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals (old) Jul 13, 2022
@rsc rsc moved this from Likely Decline to Declined in Proposals (old) Jul 20, 2022
@rsc
Copy link
Contributor

rsc commented Jul 20, 2022

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc closed this as completed Jul 20, 2022
@golang golang locked and limited conversation to collaborators Jul 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

5 participants