runtime: remove unnecessary allocations in convT2E #8892

dvyukov · 2014-10-07T12:29:10Z

This was at lengths discussed in:
https://golang.org/issue/8405
and previously on golang-dev:
https://groups.google.com/d/msg/golang-dev/pwUh0BVFpY0/zqJInvU3NkQJ

Namely, we should allocate heap block for scalars iff the scalar look like a pointer
into heap (otherwise GC will ignore it anyway).

This will allow to have 1-bit/word GC pointer type info *and* don't allocate additional
memory for scalars in interfaces in most cases.

randall77 · 2014-10-07T17:32:11Z

Comment 1:

So if we're allocating scalars sometimes in the data word and sometimes pointed to by
the data word, then users of the interface need to distinguish those cases.  So, for
example, assertI2T for scalars must check if the result looks like a pointer into the
heap and of so, dereference it.

dvyukov · 2014-10-07T17:34:41Z

Comment 2:

Right. If it's not intended to be a pointer but looks like a pointer, dereference it.

rsc · 2014-10-07T19:41:47Z

Comment 3:

I believe we should not do this. It's too clever, and it will come back to bit eus later.

ysmolski · 2018-11-06T10:06:32Z

@dvyukov, does this issue look relevant to you?

dvyukov · 2018-11-06T13:55:31Z

@ysmolsky I don't understand the question. What do you mean?

ianlancetaylor · 2018-11-06T14:36:34Z

@dvyukov I think he is asking whether you think this issue should still be open.

Really this is a question for @aclements and @RLH. Is this a feasible optimization with the current GC?

dvyukov · 2020-04-25T09:46:25Z

It should still be a feasible optimization because we still support Go pointers to point to C heap, right?

Another potential implementation would work for value types that are smaller than pointer size (e.g. uint16, uint32 on 64-bits, bool, small structs, etc). Namely: let's say we have amd64, pointer size is 64 bits, let's say we are storing uint32 into an interface{}, when we are storing the value we put 0xffffffff into the high 32-bits. This makes this value invalid/outside-of-heap if treated as pointer. Extraction won't require any special code, we just take the low 32 bits.
This would require to calculate if this optimization is applicable for a type in the compiler (based on type size, and arch), and adding some high bits in runtime/compiler.
Here is a quick proof-of-concept:

--- a/src/cmd/compile/internal/gc/subr.go
+++ b/src/cmd/compile/internal/gc/subr.go
@@ -1857,6 +1857,10 @@ func isdirectiface(t *types.Type) bool {
                return t.NumFields() == 1 && isdirectiface(t.Field(0).Type)
        }
 
+       if t.Size() == 2 && t.Align == 2 {
+               return true
+       }
+
        return false
 }
 
--- a/src/runtime/iface.go
+++ b/src/runtime/iface.go
 func convT16(val uint16) (x unsafe.Pointer) {
-       if val < uint16(len(staticuint64s)) {
-               x = unsafe.Pointer(&staticuint64s[val])
-               if sys.BigEndian {
-                       x = add(x, 6)
-               }
-       } else {
-               x = mallocgc(2, uint16Type, false)
-               *(*uint16)(x) = val
-       }
-       return
+       return unsafe.Pointer(uintptr(val) | 1<<63)
 }

This passed bootstrap and almost passed go tool dist test. I guess it also needs some changes in the reflect package.
And at this point we obviously want to inline this conversion wholesale because there is no point in the function call now.

Besides avoiding mallocgc call and not generating garbage, this also makes accesses faster (no indirection).

Thoughts?

@dr2chase @mknyszek

dvyukov · 2020-04-25T09:48:38Z

This is inspired by a real use case. I need to store lots of integers in interfaces, but these won't fit into staticuint64s (up to thousands/tens of thousands). But they perfectly fit into uint32 and that's what I use as the type.

josharian · 2020-04-25T16:20:59Z

I’m on my phone, but you probably also need to adjust ifaceData in the compiler. And ifaceeq and efaceeq in the runtime. (Also, it’s be more impressive to get uint8/uint32 working—uint16s are not heavily used.)

Another option, which is less likely to cause rare breakage (e.g. in assembly) is to keep an atomic counter in the slow path of convT32 for smallish values and, when used enough, persistent alloc and populate a staticuint64s-like array with a larger range.

Or keep a per-P cache of recently allocated values so that we can re-use them. (I tried this for string interning, and it was pretty straightforward.)

dr2chase · 2020-04-27T14:37:01Z

There's an endianness issue in that code, that's a minor problem, also the description of the pointer smash (0xffffffff) in upper 32 bits doesn't match what the code does (uintptr(val) | 1<<63). But otherwise, interesting, though I am worried about assembly language.

…

On Sat, Apr 25, 2020 at 12:21 PM Josh Bleecher Snyder < ***@***.***> wrote: I’m on my phone, but you probably also need to adjust ifaceData in the compiler. And ifaceeq and efaceeq in the runtime. (Also, it’s be more impressive to get uint8/uint32 working—uint16s are not heavily used.) Another option, which is less likely to cause rare breakage (e.g. in assembly) is to keep an atomic counter in the slow path of convT32 for smallish values and, when used enough, persistent alloc and populate a staticuint64s-like array with a larger range. Or keep a per-P cache of recently allocated values so that we can re-use them. (I tried this for string interning, and it was pretty straightforward.) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8892 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAOW6JZM77AJJBGOQP7NRXLROMEXTANCNFSM4GCAPQ3Q> .

dvyukov added accepted Performance labels Oct 7, 2014

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed release-none labels Apr 10, 2015

zephyrtronium mentioned this issue Apr 10, 2021

cmd/compile, runtime: GOEXPERIMENT to add two non-pointer words to iface/eface #45494

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: remove unnecessary allocations in convT2E #8892

runtime: remove unnecessary allocations in convT2E #8892

dvyukov commented Oct 7, 2014

randall77 commented Oct 7, 2014

dvyukov commented Oct 7, 2014

rsc commented Oct 7, 2014

ysmolski commented Nov 6, 2018

dvyukov commented Nov 6, 2018

ianlancetaylor commented Nov 6, 2018

dvyukov commented Apr 25, 2020

dvyukov commented Apr 25, 2020

josharian commented Apr 25, 2020

dr2chase commented Apr 27, 2020 via email

runtime: remove unnecessary allocations in convT2E #8892

runtime: remove unnecessary allocations in convT2E #8892

Comments

dvyukov commented Oct 7, 2014

randall77 commented Oct 7, 2014

dvyukov commented Oct 7, 2014

rsc commented Oct 7, 2014

ysmolski commented Nov 6, 2018

dvyukov commented Nov 6, 2018

ianlancetaylor commented Nov 6, 2018

dvyukov commented Apr 25, 2020

dvyukov commented Apr 25, 2020

josharian commented Apr 25, 2020

dr2chase commented Apr 27, 2020 via email