Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: don't allocate when putting a bool into an interface #17725

Closed
bradfitz opened this issue Nov 1, 2016 · 32 comments
Closed

runtime: don't allocate when putting a bool into an interface #17725

bradfitz opened this issue Nov 1, 2016 · 32 comments

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Nov 1, 2016

Shoving a bool into an interface probably doesn't need to allocate. I imagine most binaries already have a static 1 byte and a static 0 byte somewhere whose address we could use in the interface's data pointer.

bradfitz@gdev:~$ cat alloc.go
package main

import "testing"

func main() {
        var x interface{}
        x = true
        println(x)
        x = false
        println(x)
        x = true
        println(x)
        
        println(testing.AllocsPerRun(5000, func() {
                x = true
        }))
}

bradfitz@gdev:~$ go run ~/alloc.go
(0x49b960,0xc42003bf4f)
(0x49b960,0xc42003bf4e)
(0x49b960,0xc42003bf4d)
+1.000000e+000

/cc @josharian @randall77 @mdempsky

@bradfitz bradfitz added this to the Unplanned milestone Nov 1, 2016
@bradfitz bradfitz changed the title cmd/compile: cmd/compile: don't allocate when putting a bool into an interface Nov 1, 2016
@mdempsky
Copy link
Member

mdempsky commented Nov 1, 2016

Relatedly, I've been thinking that converting len-1 []byte to string shouldn't need to allocate either.

@mdempsky mdempsky changed the title cmd/compile: don't allocate when putting a bool into an interface runtime: don't allocate when putting a bool into an interface Nov 1, 2016
@mdempsky
Copy link
Member

mdempsky commented Nov 1, 2016

This can already be done within the runtime in convT2I. I don't believe it needs compiler assistance.

@dr2chase
Copy link
Contributor

dr2chase commented Nov 1, 2016

How do we feel about smallish integers? Seems like we could profile in convT2I to see how this would pay off. One possible value for smallish is {2,1,0,-1}

@bradfitz
Copy link
Contributor Author

bradfitz commented Nov 1, 2016

I feel like last time smallish integers was brought up, the concern was that it would be inconsistent and thus surprising. But with bools at least it would be consistent.

Then again, the better the compiler and runtime get, the more that performance things get surprising, so maybe it's okay nowadays.

I agree some convT2I stats for all this would be interesting.

@mdempsky
Copy link
Member

mdempsky commented Nov 1, 2016

I was thinking we just have a 256-byte slab of "\x00\x01...\xff", and we can index into that for any read-only 1-byte value that needs to be converted to a pointer (e.g., converting a len-1 []byte to string, or storing a bool/byte/int8/uint8 into an interface).

If we instead used [...]int64{..., -3, -2, -1, 0, 1, 2, 3, ...} for a certain range of integers, we could avoid heap allocs for a wider variety of values.

Agreed collecting stats on recognizable patterns would be good.

@josharian
Copy link
Contributor

I was thinking we just have a 256-byte slab of "\x00\x01...\xff"

That was my exact reaction as well.

@randall77
Copy link
Contributor

I had a CL a long time ago which did [...]int64{0..1024}. We ended up not using it because the performance cliff from 1024->1025 would be weird.

The 256-byte one would not have that problem, although it would weirdly encourage people to cram their data into 1 byte chunks.

@josharian
Copy link
Contributor

To spare others the search, that was CL 2500, which in turn also points to #8618. I'd still like to see this get done.

@griesemer
Copy link
Contributor

I'm also still in favor of this. I think the benefits outweigh the oddity in performance behavior.

@bradfitz
Copy link
Contributor Author

bradfitz commented Nov 1, 2016

We ended up not using it because the performance cliff from 1024->1025 would be weird.

Clearly we just need to smooth out that cliff with some fastrand(), slowly increasing the chance of allocating from 0 to 1024, where 1025 is 100%! /s

@mrjrieke
Copy link

mrjrieke commented Nov 1, 2016

@mdempsky would slow compiler tiny bit.... of course, @bradfitz presents a difficult challenge for prediction with fastrand(). cqtm

@josharian
Copy link
Contributor

Byte-sized values handled in CL 35555. I went the compiler route because it was simple and meant that byte-size values required no runtime call at all, shrinking ns/op from ~5 to ~1.

After handling byte-sized values and constant values (#18704), I instrumented the runtime to dump what integer-ish values made it into convT2E. During make.bash, I got this (unsurprising) distribution, top 30 values only:

 percent cnt  bits  uint value
 28.83%  3995 64bit 0
  2.11%   292 64bit 1
  1.45%   201 64bit 4
  1.35%   187 64bit 3
  1.28%   177 64bit 2
  0.87%   120 64bit 17
  0.84%   117 64bit 6
  0.81%   112 64bit 5
  0.71%    99 64bit 7
  0.71%    98 64bit 47
  0.66%    91 64bit 8
  0.48%    66 64bit 13
  0.43%    60 32bit 118
  0.38%    53 64bit 12
  0.36%    50 64bit 9
  0.32%    45 64bit 42
  0.23%    32 64bit 39
  0.22%    30 64bit 44
  0.21%    29 64bit 10
  0.20%    28 64bit 30
  0.17%    24 64bit 16
  0.17%    23 64bit 14
  0.16%    22 32bit 106
  0.16%    22 64bit 11
  0.15%    21 32bit 101
  0.15%    21 64bit 85
  0.13%    18 32bit 281
  0.13%    18 32bit 395
  0.12%    17 32bit 398
  0.12%    16 32bit 123

That says to me we might get the best bang for our buck by inserting a check for integer-ish values of size <= uintptr in the generated code like:

  if val == 0 {
    e = {itab, &zerobase}
  } else {
    e = convT2E(...)
  }

@minux
Copy link
Member

minux commented Jan 22, 2017 via email

@randall77
Copy link
Contributor

It could be done if we increased the size of interfaces to 3 words. Otherwise, I'm afraid that there are a host of complications (mostly in the garbage collector) which make sometimes-a-pointer-sometimes-a-scalar fields hard. I can elaborate more if you wish.

@josharian
Copy link
Contributor

Lots of prior discussion in #8405.

@josharian
Copy link
Contributor

CL 35563 implements the strategy "use zerobase for small pointer-free zero values". It could probably be optimized a bit, but it's a first look at the cost/benefit of doing it entirely in the runtime. Checking if val == 0 in the calling code should be more efficient, because the == 0 check will be more efficient, but it's also considerably more of a pain to implement. :)

While running the stdlib tests, CL 35563 eliminated roughly 170,000 allocations. Increasing the size of zerobase did not show any improvement; larger pointer-free zero values sent to convT2E/convT2I are rare.

@josharian
Copy link
Contributor

And with that, I'm going to pause my hacking on interface conversions. I hope that making the code and numbers concrete will help the conversation about the correct direction to go from here.

@gopherbot
Copy link

CL https://golang.org/cl/35555 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/35563 mentions this issue.

@minux
Copy link
Member

minux commented Jan 25, 2017 via email

gopherbot pushed a commit that referenced this issue Feb 2, 2017
… allocation

Based in part on khr's CL 2500.

Updates #17725
Updates #18121

Change-Id: I744e1f92fc2104e6c5bd883a898c30b2eea8cc31
Reviewed-on: https://go-review.googlesource.com/35555
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@ancaemanuel
Copy link

How can an an static enum from 0x00 to 0xFF can help ?
in 0358367
Can anyone explain ?

@josharian
Copy link
Contributor

Bools are represented as 0 or 1, so they can use the static values for 0 and 1.

@ancaemanuel
Copy link

I know that, and thanks for your reply, but I still do not understand the theory behind this optimisation.
Why do we need to list all the values of one byte ?

@josharian
Copy link
Contributor

I plan to write a blog post (commaok.xyz) sometime in the next few days that explains a lot of this.

@josharian
Copy link
Contributor

I just want to finish CL 35563 first, and that will take some experimentation.

@rasky
Copy link
Member

rasky commented Feb 4, 2017

@ancaemanuel an interface is two words: a pointer to a type and a pointer to a value. To store "12" in an interface, you need to have a position in memory that holds the value "12", to be pointed by the interface's second word. Up until now, that position in memory would be allocated on the heap, every time anew. With this patch, for special values between 0 and 255, the compiler knows that there is a location in memory that already holds those values and use it, rather than allocating on the heap.

@josharian
Copy link
Contributor

@ancaemanuel perhaps http://commaok.xyz/post/interface-allocs/ will also provide some useful background and context, though you're probably better off reading https://research.swtch.com/interfaces :)

@ancaemanuel
Copy link

Thank you.
I have an ideea: the compiler knows the constants used for interfaces in a program. How about generating an static array (with the constants already in the program) at compile time ?

@josharian
Copy link
Contributor

@ancaemanuel if I understand correctly, that is https://golang.org/issue/18704, which is done. (This is also discussed a bit more readably at http://commaok.xyz/post/interface-allocs/.)

@ancaemanuel
Copy link

I read the article.
This is an optimisation for small numbers.
If an program uses 500 constants, they are already in the compiled program. How about using that array (or map) ?

@josharian
Copy link
Contributor

I'm not sure I understand. Would you file a new issue with some more details and cc me and we can discuss there? Thanks!

josharian added a commit to josharian/go that referenced this issue Feb 14, 2017
Fixes golang#17725

name                old time/op    new time/op    delta
ConvT2EInt/const-8    0.90ns ± 4%    0.90ns ± 2%      ~     (p=0.623 n=20+15)
ConvT2EInt/zero-8     21.5ns ± 2%     7.4ns ± 6%   -65.33%  (p=0.000 n=19+18)
ConvT2EInt/one-8      21.6ns ± 2%    22.8ns ± 2%    +5.21%  (p=0.000 n=20+19)

name                old alloc/op   new alloc/op   delta
ConvT2EInt/const-8     0.00B          0.00B           ~     (all equal)
ConvT2EInt/zero-8      8.00B ± 0%     0.00B       -100.00%  (p=0.000 n=20+20)
ConvT2EInt/one-8       8.00B ± 0%     8.00B ± 0%      ~     (all equal)

name                old allocs/op  new allocs/op  delta
ConvT2EInt/const-8      0.00           0.00           ~     (all equal)
ConvT2EInt/zero-8       1.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
ConvT2EInt/one-8        1.00 ± 0%      1.00 ± 0%      ~     (all equal)

Change-Id: I5b71f9e44e3de8b8f2284a3821c5176c13ab2c61
@randall77
Copy link
Contributor

This is fixed with 03583675

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests