Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/gc: optimized map[string] lookup from []byte key #3512

Closed
bradfitz opened this issue Apr 11, 2012 · 22 comments
Closed

cmd/gc: optimized map[string] lookup from []byte key #3512

bradfitz opened this issue Apr 11, 2012 · 22 comments

Comments

@bradfitz
Copy link
Contributor

key := []byte("some key")  // from network, etc

var m map[string]T
....
v, ok := m[string(key)]  // It would be nice if this were optimized away and didn't
actually make a copy & garbage
....
@rsc
Copy link
Contributor

rsc commented Apr 11, 2012

Comment 1:

In general this might introduce races.  If you pass in a []byte to the key lookup,
there is no guarantee that something is not editing the bytes during the lookup.
But it could probably be done so that it would just fail the lookup instead of 
causing corruption.

Labels changed: added priority-later, removed priority-triage.

Owner changed to @rsc.

Status changed to Thinking.

@bradfitz
Copy link
Contributor Author

Comment 2:

Failing the lookup in that case would be fine & expected.
In fact, I think the naive implementation would get that for free: if the memory is
scanned once for the hashcode and once for the equality, it seems like it just might not
match if the []byte changed.  I don't see how it could do something worse.  The []byte
couldn't be freed by GC and reclaimed to the OS in the meantime because the faked
non-escaping stringheader would still pin it.

@rsc
Copy link
Contributor

rsc commented Sep 12, 2012

Comment 3:

Labels changed: added go1.1maybe.

@robpike
Copy link
Contributor

robpike commented Mar 7, 2013

Comment 4:

Labels changed: removed go1.1maybe.

@bradfitz
Copy link
Contributor Author

Comment 5:

Answering Dmitry's question from https://golang.org/issue/5160?c=4
here:
Consider:
m := make(map[string]T)
var b []byte = slice of some buffer
go someFunc(b)
t, ok := m[string(b)] 
I would like:
1) string(b) to not make a copy. It should instead make a non-escaping fake String
header to pass to the map lookup function, using the same byte data & length as the
[]byte b.
2) the map lookup to be guaranteed to not crash, even if b is concurrently mutated.
3) t to be valid, iff ok == true.
I don't think that would violate any  of the current Go semantics, since any ordering
between mutations from "go someFunc" and string(b) are already not specified by the
memory model.
Correct?

@dvyukov
Copy link
Member

dvyukov commented Mar 31, 2013

Comment 6:

How is it related to interning? There is no sense to intern temps.
Are talking only about map indexing m[string(b)], or string(b) temps in general?

@bradfitz
Copy link
Contributor Author

Comment 7:

This bug has nothing to do with interning except that if I were to implement interning
myself in pure Go, without the help of the runtime (which is issue #5160), then I'd want
to lookup interned strings from []byte from the network, without making copies.
That is, I'd have:
  var internTable map[string]string  // identity map. from key to same value.
But I'd want to access it like:
   var b []byte = slice of bytes from the network in a bytes.Buffer
   _ = internTable[string(b)]
... reusing memory from another similar request (e.g. caching HTTP Request MIME headers)
I'm only talking about indexing m[string(b)] in this bug, for Go->runtime maps.
For Go->Go, that is issue #2205 (cmd/gc: read-only escape analysis and avoiding string ->
[]byte copies)

@dvyukov
Copy link
Member

dvyukov commented Mar 31, 2013

Comment 8:

Ah, I see, so you want on intern []byte and get string as the result.
Yes, I think creating a temp string from []byte in this context should be safe even in
complex statements like:
m[f(...)] = g(h(...), k(intern(b)))

@bradfitz
Copy link
Contributor Author

Comment 9:

Please, ignore everything about interning.  That was just one example of a user of this.
I think you're making this sound more complicated than it is.  And I think one of us is
using the wrong vocabulary.  (I don't want to "intern []byte" because I can't intern a
mutable thing)
I just want m[string(byteSlice)] to avoid a copy and to instead create a fake String
header temporary.  It never escapes the hashmap functions, so it's safe.
And no language semantics change which weren't already racy and undefined.
Example:
m := map[string]int{
   "X": 1,
   "Y": 2,
}
mutate := func(b []byte) { b[0] = 'Y' }
key := []byte{'X'}
go mutate(key)
value, ok := m[string(key)]
What is value & ok?
  1, true?
  2, true?
  0, false?
It's already undefined.

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 1, 2013

Comment 10:

Patch at https://golang.org/cl/8090046

Owner changed to ---.

@bradfitz
Copy link
Contributor Author

Comment 11:

Labels changed: added performance, garbage.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 12:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 13:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 14:

Labels changed: added repo-main.

@bradfitz
Copy link
Contributor Author

bradfitz commented Dec 5, 2013

Comment 15:

I still want this, and it would improve Camlistore start-up time when we slurp a big
on-disk index into memory.
Is this definitely off the table for Go 1.3?

@ianlancetaylor
Copy link
Contributor

Comment 16:

This can be seen as a compiler optimization, a specialized form of escape analysis. 
Given a []byte b, a map lookup of string(b) can use the bytes directly without making a
copy.  In general, we can do this optimization for any function call where the compiler
knows the complete code of the function (the function does not make any calls to unknown
functions) and where the string does not escape the function (this is not something we
normally check for) and where the function does not execute any synchronization
operations (also something we do not normally check for, but given the lack of function
calls this means no channel operations).  Map lookup is a special case of a function
call where these attributes are known.
We could also view it as a language extension, but I don't think that is a good idea. 
It would be similar to the way copy/append accept string as well as []byte, but we could
only permit it for a map lookup, not for a map insertion.
I agree that this optimization doesn't introduce any races that were not already
present, but we still need to consider what should happen for a racy program.  Right now
a racy program will yield an unpredictable string, but the code to convert from []byte
to string is quite simple and will not crash or introduce memory corruption.  We would
need to preserve that property.  Looking at the strhash and strequal functions, I think
we would be OK.
So I don't see any strong objections to this optimization, though of course somebody
would have to actually do the work.

Labels changed: removed priority-later.

Status changed to Accepted.

@bradfitz
Copy link
Contributor Author

bradfitz commented Dec 5, 2013

Comment 17:

Thanks for the reply.
> So I don't see any strong objections to this optimization, 
Russ has previously argued that performance optimizations are de-facto language changes
that people come to rely on and must then be implemented widely.  I argue App Engine's
lack of unsafe + gccgo vs gc already introduce three big worlds of differing performance
characteristics.
> though of course somebody would have to actually do the work.
I did it earlier. See comment #10 above. I barely close to no feedback on it, though.
What I'd like is a decision that we can actually do this.
I very much don't want a language change.  In my CL above, I just looked for
m[string(b)] and made a fake stringheader from b.

@bradfitz
Copy link
Contributor Author

Comment 18:

Ping. Can we do this?
Can I get a decision about whether this is acceptable for Go 1.3?

@bradfitz
Copy link
Contributor Author

Comment 19:

Ping.
Is this thing on?

@rsc
Copy link
Contributor

rsc commented Jan 22, 2014

Comment 20:

In September, I wrote back to a mail thread with you (Brad) and Rob (on 9/23/2013;
subject "m[string(byteSlice))] CL"):
---
I have thought a bit more about this. I believe that we could do it more generally, but
that we should not do it just for this special case.
Specifically, if we special case x = m[string(b)], then 
s := string(b)
x = m[s]
y = m2[s]
will actually be slower than
x = m[string(b)]
y = m2[string(b)]
and that bothers me quite a bit. However, it should be possible to do something a little
like escape analysis to see that between the definition of s and its uses, there are no
synchronization points and no assignments that might legitimately allow s to contain
different bytes from b, at which point the optimization would apply. If we did something
like that, it would make things faster while still keeping performance a bit more
predictable.
I don't know when this would happen. Maybe for Go 1.3.
---
You asked why it bothered me and I wrote:
---
because pulling repeated computed subexpressions into a variable should be faster than
not.
i don't know of any case today where that's not true.
---
I don't think that's "barely close to no feedback". My position is still what it was in
September. 
I apologize for not seeing this issue until today. I did not see this issue due to
over-aggressive mail filtering, which I have fixed[*]. I went looking for the issue
because Ian asked me about it off-issue. (So no, this thing was not on.)
I will look into how hard it is to do the analysis I wrote about in September, but I
can't make any promises.
[*] Because I was auto-CC'ed on every bug, I filtered them all away; I've removed my
auto-CC and am now on golang-bugs, which I filter away, but if I'm explicitly CC'ed on a
bug it will land in my inbox.

@rsc
Copy link
Contributor

rsc commented Apr 3, 2014

Comment 21:

This issue was closed by revision f5f5a8b.

Status changed to Fixed.

@bradfitz
Copy link
Contributor Author

bradfitz commented Apr 7, 2014

Comment 22:

This issue was updated by revision 8072f46.

LGTM=rsc
R=rsc, dave
CC=golang-codereviews, iant
https://golang.org/cl/84230043

@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants