Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strconv: optimize Parse for []byte arguments #42429

Closed
dsnet opened this issue Nov 6, 2020 · 11 comments
Closed

strconv: optimize Parse for []byte arguments #42429

dsnet opened this issue Nov 6, 2020 · 11 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance
Milestone

Comments

@dsnet
Copy link
Member

dsnet commented Nov 6, 2020

In 2016, #2632 was closed with the explanation that:

I still think we might be able to do something to make these conversions free and avoid 2x API bloat everywhere. I'd rather hold out for that. People who are super-sensitive to these allocations today can easily copy+modify the current code and ship that in their programs.

In summary, it's argued that 1) the compiler should address this, and 2) people can vendor and modify the strconv package.

It's been 4 years since the issue was closed (and 9 years since the issue was first reported), and there there hasn't been a compiler optimization that permits calling the Parse functions with a []byte without it creating garbage. That is:

var b []byte = ...
v, err := strconv.ParseXXX(string(b), ...)

always (as of Go1.15) allocates and copies the input []byte.

Forking strconv is an unfavorable workaround because it means that the vendor fails to benefit from future optimizations to strconv (e.g., cl/187957 or cl/260858) and fixes to the implementation (e.g., #29491) and may go against company guidelines that forbid forks. Personally, I haven't seen anyone fork strconv. Rather, in most cases I see users cast a []byte to a string using unsafe. This is what we do in the protobuf module. However, some use-cases are constrained to avoid using unsafe (e.g., encoding/json) and so continue to suffer from performance losses.

Given the passage of time and the lack of a natural solution to this, I think we should revisit this issue. It seems that we should either:

  1. Modify the compiler to detect cases where a string variable does not escape and that it's use involves no synchronization points, so that it functionally performs an allocation-free cast of []byte to string in the example snippet above. If such a compiler optimization existed, we would probably also need to modify strconv to avoid escaping the string input through the error value by copying the input string. Note that this will slow down cases where parsing fails.
  2. Accept that a compiler optimization is not happening, in which case we add ParseBoolBytes, ParseComplexBytes, ParseFloatBytes, ParseIntBytes, and ParseUintBytes, which are identical to their counterparts without the Bytes suffix, but operate on a []byte as the input instead of a string. (We can add Bytes equivalents for the QuoteXXX functions as well, but those seem to be rarely with a []byte.)

Obviously option 1 is preferred, but if its not going to happen, but perhaps it's time to do option 2. It's been almost a decade.

\cc @mvdan @rogpeppe @mdlayher

@mvdan mvdan added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance labels Nov 6, 2020
@cespare
Copy link
Contributor

cespare commented Nov 6, 2020

Rather, in most cases I see users cast a []byte to a string using unsafe. This is what we do in the protobuf module.

At my company we have shared functions that do this too. They see a reasonable amount of use in high-performance code.

It's not hard to write the unsafe code, but it's not completely trivial either -- it's easy to overlook the problem of the string being retained unsafely inside the NumError.

@bcmills
Copy link
Contributor

bcmills commented Nov 6, 2020

Don't forget option (3): wait for a decision on generics, and then provide an updated strconv library with functions that are polymorphic over indexable byte-containers.

@rogpeppe
Copy link
Contributor

rogpeppe commented Nov 6, 2020

If there's a new strconv library that's generic over []byte and string then any new call we add to strconv now can presumably be trivially implemented in terms of that

@mvdan
Copy link
Member

mvdan commented Nov 13, 2020

If something like #2205 (comment) was implemented, perhaps we could get the compiler to do the work for us after all. That said, I'm not in favor of simply waiting another few years for that to happen.

@cristaloleg
Copy link

Similar to option (3): byte view that can represent []byte or string #5376 (however it it's covered by generics, probably)

@rogpeppe
Copy link
Contributor

If there was a compiler optimisation for this, it would be good if it could be generalised to support string to []byte conversions too. For example, at the moment, there's no way to calculate the hash of a string without allocations, even with devirtualisation, because the call to Write([]byte(s)) will always incur an allocation.

@go101
Copy link

go101 commented Nov 23, 2020

Maybe, it is not a too bad idea to let slices comparable. Comparing slices is just like comparing strings.

@ianlancetaylor
Copy link
Contributor

@go101 That is #23725 or #38377. This is not the right place to discuss those ideas.

@go101
Copy link

go101 commented Nov 23, 2020

Sorry, I really misunderstood @rogpeppe's last comment. In my case, I often want structs with byte slice fields are hashable and can be used as map keys but I have to convert the slices to strings in other structs to get hashable values.

[update]: I have ever made a proposal to solve the problem mentioned in rogpeppe's last comment: https://github.com/go101/go101/wiki/A-proposal-to-avoid-duplicating-underlying-bytes-when-using-strings-as-read-only-%5B%5Dbyte-arguments

@seebs
Copy link
Contributor

seebs commented Jul 28, 2021

zeebo in Gopher Slack made a patch that eliminates most of the allocations:

https://gist.github.com/zeebo/aeb001d35ca3a6e0ebd85eb52e3ef4c2

@gopherbot
Copy link

Change https://golang.org/cl/345488 mentions this issue: strconv: optimize Parse for []byte arguments

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Aug 14, 2023
@dmitshur dmitshur added this to the Go1.20 milestone Aug 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done. Performance
Projects
None yet
Development

No branches or pull requests