Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strconv: add equivalents of Parsexxx() with []byte arguments #2632

Closed
remyoudompheng opened this issue Dec 29, 2011 · 49 comments
Closed

strconv: add equivalents of Parsexxx() with []byte arguments #2632

remyoudompheng opened this issue Dec 29, 2011 · 49 comments
Labels
FrozenDueToAge GarbageCollector NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance Thinking
Milestone

Comments

@remyoudompheng
Copy link
Contributor

Hello,

Just like we have FormatFloat(...) string and AppendFloat(...) []byte, it would be nice
to avoid a costly string conversion when parsing numbers from a byteslice.

Simple code like:

func main() {
  s := "123.45"
  r := bytes.NewBufferString(s)
  runtime.UpdateMemStats()
  println(runtime.MemStats.Mallocs)
  var v float64
  fmt.Fscanf(r, "%f", &v)
  runtime.UpdateMemStats()
  println(runtime.MemStats.Mallocs)
}

says it's doing 7 allocations in the Scanf.
@adg
Copy link
Contributor

adg commented Jan 4, 2012

Comment 1:

Sounds like a nice idea.
Naming suggestions? "ParseFloatBytes"?

Labels changed: added priority-later, removed priority-triage.

Status changed to HelpWanted.

@rsc
Copy link
Contributor

rsc commented Jan 9, 2012

Comment 2:

I would like to think more about this.
It doubles everything, and we may be able to avoid
it with careful compilation.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 5, 2012

Comment 3:

Regarding "careful compilation", see my comment in the duplicate issue #3197.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 5, 2012

Comment 4:

Issue #3197 has been merged into this issue.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 5, 2012

Comment 5:

From issue #3197:
I attempted to convert strconv's Parse internals to use []byte instead, and then make
the existing string versions make an unsafe []byte of the string's memory using
reflect.SliceHeader / reflect.StringHeader, but the reflect package already depends on
strconv, so there's an import loop.

@remyoudompheng
Copy link
Contributor Author

Comment 6:

In the meanwhile, when I had to parse many numbers and save some allocations, I would
convert a big []byte to a big string and slice from it, instead of slicing from the
[]byte and string-ifying each chunk.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 5, 2012

Comment 7:

I wonder if reflect.SliceHeader / StringHeader should be mirrored in package runtime.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 6, 2012

Comment 8:

Implemented here, for discussion at some point:
http://golang.org/cl/5759057/
The byte view approach may be considered either gross or lovely. Not sure. The grossness
is isolated, well tested, and can't escape to callers at least.

@rsc
Copy link
Contributor

rsc commented Mar 6, 2012

Comment 9:

I am much less worried about the implementation than the expansion of the API.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 6, 2012

Comment 10:

True but this API was already extended with Append* for []byte efficiency, and this just
mirrors that.
And we already have nearly-duplicate strings and bytes packages.  This []byte/string
duplicity is unfortunate, but this is far from the only place.
Still, Later.

@rsc
Copy link
Contributor

rsc commented Mar 6, 2012

Comment 11:

Like I said in #2, we may be able to do better with Parse
than with Format/Append.  The distinction is that Parse
accepts strings as input, while Format returns strings as
output.  It is easier to analyze the former than the latter.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 6, 2012

Comment 12:

Yeah, there's a whole bunch of string<->[]byte optimizations that would be nice.
issue #2204 comes to mind.  And in another project, I want to be able to do a lookup in a
map[string]T with a []byte key. It'd be nice if:
      var m map[string]T = ...
      var key []byte = ...
      v, ok := m[string(key)]
...  didn't generate garabage too.

@remyoudompheng
Copy link
Contributor Author

Comment 13:

I think we have an issue about "escape analysis of strings" already. If a string([]byte)
does not escape, it won't be stored so further modifications to the []byte are safe even
if the string was created zero-copishly. Ah it's issue #2205, but it's about
[]byte(string).
The string([]byte) case does not require read-only flagging, I think, only ordinary
escape analysis.

@bradfitz
Copy link
Contributor

Comment 14:

Labels changed: added performance.

@bradfitz
Copy link
Contributor

Comment 15:

Labels changed: added garbage.

@bradfitz
Copy link
Contributor

Comment 16:

This issue was just causing performance problems for a user inside Google. They had a
large sstable with float64 []byte values. Each strconv.ParseFloat(string(value), 64) was
generating garbage (which ended up being a lot of garbage)
As a medium-term hack, I showed they could do:
    func unsafeString(b []byte) string {
       return *(*string)(unsafe.Pointer(&b))
    }
    freq, err := strconv.ParseFloat(unsafeString(val), 64)
But I felt bad even pointing that out.
I'd really like issue #2205 to be the answer to this issue, so we don't have to add new
API.
That is, I'd like this to be garbage-free:
     var b []byte = ....
     freq, err := strconv.ParseFloat(string(b), 64)
The compiler should do the equivalent unsafeString thing, if it can statically determine
that the string never escapes strconv.ParseFloat.
Yes, the strconv.ParseFloat could then observe concurrent mutations of b via its string,
like this:
     var b []byte = ....
     go mutateBytes(b)
     freq, err := strconv.ParseFloat(string(b), 64)
But that was already a data race.

@DanielMorsing
Copy link
Contributor

Comment 17:

If we implement no-op byte to string conversions, there are a couple of library changes
that would have to be made to use it effectively. Right now, ParseFloat leaks its string
parameter because it's captured in the error handling. With the change, it would be
advantageous to copy the string instead.
Coincidentally, I don't think we have a good way to copy strings. We've never had the
need to before.

@minux
Copy link
Member

minux commented Apr 27, 2013

Comment 18:

doesn't append([]byte{}, str...) do the trick for copying the string?
(into a []byte, of course, we can't really copy a string.)
if the optimization is implemented, just convert the copied []byte
to string will achieve the real copy-string effect (well, maybe
it copies the string multiple times, but it doesn't matter much IMO).

@davecheney
Copy link
Contributor

Comment 19:

http://play.golang.org/p/ghP85_uea9 appears to copy the backing contents of a string.

@bradfitz
Copy link
Contributor

Comment 20:

Dave, you're looking at the address of the StringHeader, not its backing contents.  For
instance: http://play.golang.org/p/-AI1pPNIwW shows the addresses are different, even
though their backing contents are the same.
You really need unsafe to determine whether two strings alias the same memory.
But </tangent>, yes: the strconv package would have to take care to not retain its
input strings, even in error messages.  And we'd need to write tests to guarantee that
strconv never regresses and starts allocating. But that's easy and we do that in a
number of other packages.

@rsc
Copy link
Contributor

rsc commented Jul 30, 2013

Comment 21:

Status changed to Thinking.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 22:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 23:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 24:

Labels changed: added repo-main.

@griesemer
Copy link
Contributor

@bradfitz The point I am making is that the API would have to be []byte oriented in the first place for the (relatively simple) compiler optimization to work. That is, maybe we need to bite the bullet and add these functions. The string versions would be trivial wrappers (which in turn could benefit from the compiler optimization once present).

@rogpeppe
Copy link
Contributor

rogpeppe commented Oct 1, 2015

@griesemer re: the other way around is more tricky:
ISTM that for the string case you should be able to discount other goroutines as long as there are no synchronisation points, as then there would be a race condition (the read to convert string to []byte vs the write to the slice), and with a race condition we don't guarantee correctness so it may still be a viable technique.

You'd also have to make sure that the function in question didn't make any external calls to code that might modify the slice.

@griesemer
Copy link
Contributor

@rogpeppe It's much harder to know statically that there's (globally) no goroutines with synchronization than to know whether a single function doesn't modify an incoming bytes. External calls by that single function could tell (via export data) whether they modify or store incoming byte arguments elsewhere.

@ianlancetaylor
Copy link
Contributor

As I was arguing somewhere else, we don't have to know globally that there are no goroutines with synchronization. We only have to know locally that the function we are calling has no synchronization points. We can't know that if there are any calls through an interface method, but otherwise we can.

@griesemer
Copy link
Contributor

@ianlancetaylor good point!

@rogpeppe
Copy link
Contributor

rogpeppe commented Oct 2, 2015

@ianlancetaylor exactly.

@gopherbot
Copy link

CL https://golang.org/cl/23787 mentions this issue.

@adg adg modified the milestones: Go1.8, Unplanned Jun 6, 2016
@adg
Copy link
Contributor

adg commented Jun 6, 2016

Can't happen until Go 1.8.

@odeke-em
Copy link
Member

@dsnet raised a great point in https://go-review.googlesource.com/#/c/23787
screen shot 2016-08-21 at 1 08 31 am

Unless the compiler guys say this ain't happening, I'm personally still willing to wait for the compiler to address this.

I kindly wanted to ask @josharian @randall77 @griesemer and other compiler folks what y'all think about this; do y'all think we should wait for the compiler optimizations or should we be biting the bullet and implement these functions as suggested by @griesemer here #2632 (comment), then later the string implementation could use these optimizations?

@bradfitz great point in #2632 (comment), I was checking with sample usage for the number of allocations when writing the CL, we should in deed include tests to make sure the strconv package never regresses.

@josharian
Copy link
Contributor

Also cc @dr2chase @dvyukov

@randall77
Copy link
Contributor

I don't think anyone is working on this at the moment. It's doable though. We'd have to add some data to the export info about whether any synchronization is done inside functions.

I'd recommend we wait and do this in the compiler. But that's not a promise on timeline...

@bradfitz
Copy link
Contributor

/cc @griesemer for export data.

@griesemer
Copy link
Contributor

@randall77 It's trivial to add additional information about functions in the export data, even in a backward-compatible way. We just need to have that information.

@odeke-em
Copy link
Member

From the current direction, it seems to me that this issue is a subset or perhaps even duplicate of #2205.

@quentinmit quentinmit added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Oct 10, 2016
@rsc
Copy link
Contributor

rsc commented Oct 10, 2016

I would like to avoid doing this. I still think we might be able to do something to make these conversions free and avoid 2x API bloat everywhere. I'd rather hold out for that. People who are super-sensitive to these allocations today can easily copy+modify the current code and ship that in their programs.

@rsc rsc closed this as completed Oct 10, 2016
@bradfitz
Copy link
Contributor

@hirochachacha
Copy link
Contributor

FWIW, here is working (but tricky) patches to make ParseXXX no escaping.
string([]byte(s)) is used just for cloning strings. I don't know better way to do this though.
(I realized that extending escape analysis isn't an easy path for this.)

diff --git a/src/strconv/atoi.go b/src/strconv/atoi.go
index 66df149..d6f645a 100644
--- a/src/strconv/atoi.go
+++ b/src/strconv/atoi.go
@@ -24,11 +24,11 @@ func (e *NumError) Error() string {
 }

 func syntaxError(fn, str string) *NumError {
-   return &NumError{fn, str, ErrSyntax}
+   return &NumError{fn, string([]byte(str)), ErrSyntax}
 }

 func rangeError(fn, str string) *NumError {
-   return &NumError{fn, str, ErrRange}
+   return &NumError{fn, string([]byte(str)), ErrRange}
 }

 const intSize = 32 << (^uint(0) >> 63)
@@ -134,7 +134,7 @@ func ParseUint(s string, base int, bitSize int) (uint64, error) {
    return n, nil

 Error:
-   return n, &NumError{"ParseUint", s, err}
+   return n, &NumError{"ParseUint", string([]byte(s)), err}
 }

 // ParseInt interprets a string s in the given base (2 to 36) and
@@ -180,7 +180,7 @@ func ParseInt(s string, base int, bitSize int) (i int64, err error) {
    un, err = ParseUint(s, base, bitSize)
    if err != nil && err.(*NumError).Err != ErrRange {
        err.(*NumError).Func = fnParseInt
-       err.(*NumError).Num = s0
+       err.(*NumError).Num = string([]byte(s0))
        return 0, err
    }
    cutoff := uint64(1 << uint(bitSize-1))

@bradfitz
Copy link
Contributor

bradfitz commented Nov 1, 2016

@hirochachacha, what are you trying to accomplish there?

@hirochachacha
Copy link
Contributor

@bradfitz I'm not sure this is deserved to send a CL.

given:

package main

import "strconv"

func main() {
    s := []byte("1234")
    strconv.ParseFloat(string(s[:2]), 64)
    strconv.Atoi(string(s[2:4]))
}

$ go build -gcflags -m a.go

before:

./a.go:7: string(s[:2]) escapes to heap
./a.go:8: string(s[2:4]) escapes to heap
./a.go:6: main ([]byte)("1234") does not escape

after:

./a.go:6: main ([]byte)("1234") does not escape
./a.go:7: main string(s[:2]) does not escape
./a.go:8: main string(s[2:4]) does not escape

@dsnet
Copy link
Member

dsnet commented Nov 1, 2016

I believe @hirochachacha is making the same point I made in issuecomment-135883394. Suppose there was some compiler optimization that could prove that a function never mutates an input []byte and that there are no synchronization events, then it could use a string in its place. This optimization still wouldn't be helpful to the strconv package since the input escapes through the error value (and thus making it really difficult for the compiler to prove anything about the input).

Since this compiler optimization doesn't exist yet, I vote that we don't address this right now. Always copying the input in the error case may lead to unexpectedly bad performance in situations where a string is parsed in a tight loop and expected to fail.

@hirochachacha
Copy link
Contributor

@dsnet I agree with you at all points. Sorry for the duplication, I didn't notice that.

@golang golang locked and limited conversation to collaborators Nov 2, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge GarbageCollector NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Performance Thinking
Projects
None yet
Development

No branches or pull requests