runtime: need a better itab table #20505

randall77 · 2017-05-26T17:41:07Z

The runtime keeps a map from [interface type, concrete type] to the itab for that pair. Most interface->interface conversions need to look in this table to find the itab to use. (concrete type->interface conversions don't need this table any more - see CL 20901, CL 20902, and others.)
It is implemented as a hash table with 1009 entries and chained (singly-linked list) buckets.

We've run into servers inside Google that have 43,540 entries in this table. That leads to lots of CPU getting wasted in walking long linked lists in convI2I calls.

Part of the problem is that we're now loading this table at startup with all the itabs that we build at compile time (introduced in the CLs referenced above), instead of just keeping dynamically-used entries in this table. Part of the problem may just be the sheer size of the code for the server in question (lots of protobufs, I think).

We should come up with a better way to organize the itab table. Maybe it would be enough to just implement move-to-front on the bucket lists? Or maybe keep a hot table and a cold table, put the itabs in the cold table at startup, and only move them to the hot table when referenced dynamically? Other ideas welcome.

The current implementation is ~append-only which makes the multithreading easy. I don't want to lose that property by locking on every call.

@walken-google @cherrymui

bcmills · 2017-05-26T17:51:48Z

Other ideas welcome.

This sounds like pretty much the same use-case that sync.Map is optimized for, although I suspect you'd have to reimplement sync.Map at a much lower level to be able to use it that deeply within the runtime.

huguesb · 2017-05-26T18:10:24Z

Other ideas welcome

Could the compiler generate a perfect hash for this data? That would save CPU by avoiding walks altogether and initialization at startup. The downside being longer compile time, larger object size, and it might not work well with plugins.

randall77 · 2017-05-26T18:17:17Z

@huguesb It is possible to perfect hash the startup data, but that seems overkill. Just sizing the table appropriately would be enough. We'd need a dynamic table anyway, as not all of the data is available at startup. If we move itabs to the dynamic table on first lookup (the hot/cold table idea), the speed of the static tables is ~irrelevant.

dr2chase · 2017-05-26T19:28:50Z

How do you feel about graph coloring? For a type T that implements interfaces I1, I2, I3, create a clique on I1, I2, I3, to indicate that they must have different colors.

Color the resulting graph, and construct tables (slices) mapping colors (now integers) to itabs.
Each itab/table entry also needs to indicate the interface it is for.

To convert Ix to Iy, extract Cx from Ix, then
(itab, iface) := Cx.iitable[color(Iy)]
if iface = Iy, success, otherwise failure.

Problem with this is that the graph coloring could be large-ish, I don't really know. It could happen at link time, might be interesting to figure out if there was a way to do it on-demand.

Yes, I know it's an NP-complete problem. So is register allocation.

randall77 · 2017-05-26T20:19:17Z

@dr2chase I totally don't understand what you are suggesting. What is this graph of? Interfaces? Why are two different interfaces interfering? Is this just to make a small index out of an interface name?

We need to do this lookup even when the compiler never sees any of these types together (they are in different packages, say). So I don't think we could construct a complete interference graph at compile time anyway. Maybe link time, but then plugins break it.

dr2chase · 2017-05-26T20:38:44Z

Goal is to make a small index out of an interface name. Two interfaces I1 and I2 interfere if type T implements both of them (they can't both have the same index in T's table).

Plugins, the risk is either a new type TP collides two previously non-colliding interfaces, thus extending all the tables by one. A new interface IP also potentially extends all the tables.

Conceivably, "failure" can continue the probe by some other means.

How much of an event is loading a plugin?

More on the graph:

graph nodes are interfaces, graph edges indicate that two interfaces interfere (cannot share the same color) because they are both implemented by the same type. This is much the same as register allocation graph coloring, where nodes are variables, and an edge exists whenever two variables have overlapping lifetimes. In both cases, coloring is used to find and reuse small indices.

ianlancetaylor · 2017-05-26T21:05:08Z

I think it would be worth doing some dynamic examination of real programs to see what dynamic types actually appear during specific interface-to-interface conversions. Where are all the interface-to-interface conversions coming from? How many are the fmt package checking for Formatter, GoStringer, and Stringer? Is it worth optimizing those specific cases, by, for example, having special hash tables just for those types?

I note that a type with methods always an uncommonType struct as part of its type descriptor. I note that uncommonType has 48 unused bits. That gives us plenty of room to identify the method index of the standard String method, if that is indeed one of the most common methods to look up.

In general, my guess is that the great majority of interface-to-interface conversions are converting to an interface with a single method. Perhaps we could add a pointer to the type descriptor of an interface type with a single method that points to a hash table mapping concrete types to the appropriate itab or nil if the type does not support the interface. The idea here is to simply split up the hash table.

josharian · 2017-05-28T18:30:19Z

Once we understand the nature of the table use (per Ian), I think a good next step would be to construct a benchmark program that reproduces the problem at different scales, so we can easily profile and experiment.

randall77 · 2017-05-29T19:01:37Z

Of the static entries, there were more than 30,000 proto.Message, plus another 3000 or so helpers, like the additional Oneof types.
I don't have any dynamic counts, unfortunately. That requires being able to run the binary, which isn't easy.
I do have a benchmark which demonstrates the problem. I'll send it out tomorrow.

walken-google · 2017-05-30T09:27:57Z

I think it would make sense to dynamically resize the itab hash table so that we can address both the dynamic and static itab entries cases in one go.

I think this can be simpler than sync.Map: the hash table would need to be a slice (instead of a fixed size array) and there would be an atomic pointer to the hash table slice. Lookups would follow the same "look twice - once without lock, once with" pattern as today. If the entry is not found the second time, we proceed to the insertion case which could be made to resize the table when above a given load factor.

During runtime init we may not be able to handle memory allocation (?); if that is a problem we can just disable hash resizing during early init.

Another issue is that the itab hash modulo would not be a constant anymore. We may want to avoid having an integer divide on that path. Is there a strong requirement for the prime hash size ? I suspect power of two hash sizes could be made to work too...

rasky · 2017-05-30T11:25:22Z

Another option is to precalc the prime sequence and the magic factors (libdivide style)

gopherbot · 2017-05-30T17:44:59Z

CL https://golang.org/cl/44338 mentions this issue.

gopherbot · 2017-05-30T17:54:16Z

CL https://golang.org/cl/44339 mentions this issue.

gopherbot · 2017-05-30T20:18:44Z

CL https://golang.org/cl/44341 mentions this issue.

gopherbot · 2017-05-31T18:59:28Z

CL https://golang.org/cl/44472 mentions this issue.

I noticed that we don't set an itab's function pointers at compile time. Instead, we currently do it at executable startup. Set the function pointers at compile time instead. This shortens startup time. It has no effect on normal binary size. Object files will have more relocations, but that isn't a big deal. For PIE there are additional pointers that will need to be adjusted at load time. There are already other pointers in an itab that need to be adjusted, so the cache line will already be paged in. There might be some binary size overhead to mark these pointers. The "go test -c -buildmode=pie net/http" binary is 0.18% bigger. Update #20505 Change-Id: I267c82489915b509ff66e512fc7319b2dd79b8f7 Reviewed-on: https://go-review.googlesource.com/44341 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com>

randall77 added the Performance label May 26, 2017

randall77 added this to the Go1.10 milestone May 26, 2017

gopherbot closed this as completed in 3d1699e Aug 15, 2017

golang locked and limited conversation to collaborators Aug 15, 2018

gopherbot added the FrozenDueToAge label Aug 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: need a better itab table #20505

runtime: need a better itab table #20505

randall77 commented May 26, 2017 •

edited

bcmills commented May 26, 2017

huguesb commented May 26, 2017

randall77 commented May 26, 2017

dr2chase commented May 26, 2017

randall77 commented May 26, 2017

dr2chase commented May 26, 2017 •

edited

ianlancetaylor commented May 26, 2017

josharian commented May 28, 2017

randall77 commented May 29, 2017

walken-google commented May 30, 2017

rasky commented May 30, 2017

gopherbot commented May 30, 2017

gopherbot commented May 30, 2017

gopherbot commented May 30, 2017

gopherbot commented May 31, 2017

runtime: need a better itab table #20505

runtime: need a better itab table #20505

Comments

randall77 commented May 26, 2017 • edited

bcmills commented May 26, 2017

huguesb commented May 26, 2017

randall77 commented May 26, 2017

dr2chase commented May 26, 2017

randall77 commented May 26, 2017

dr2chase commented May 26, 2017 • edited

ianlancetaylor commented May 26, 2017

josharian commented May 28, 2017

randall77 commented May 29, 2017

walken-google commented May 30, 2017

rasky commented May 30, 2017

gopherbot commented May 30, 2017

gopherbot commented May 30, 2017

gopherbot commented May 30, 2017

gopherbot commented May 31, 2017

randall77 commented May 26, 2017 •

edited

dr2chase commented May 26, 2017 •

edited