Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: need a better itab table #20505

Closed
randall77 opened this issue May 26, 2017 · 15 comments
Closed

runtime: need a better itab table #20505

randall77 opened this issue May 26, 2017 · 15 comments

Comments

@randall77
Copy link
Contributor

randall77 commented May 26, 2017

The runtime keeps a map from [interface type, concrete type] to the itab for that pair. Most interface->interface conversions need to look in this table to find the itab to use. (concrete type->interface conversions don't need this table any more - see CL 20901, CL 20902, and others.)
It is implemented as a hash table with 1009 entries and chained (singly-linked list) buckets.

We've run into servers inside Google that have 43,540 entries in this table. That leads to lots of CPU getting wasted in walking long linked lists in convI2I calls.

Part of the problem is that we're now loading this table at startup with all the itabs that we build at compile time (introduced in the CLs referenced above), instead of just keeping dynamically-used entries in this table. Part of the problem may just be the sheer size of the code for the server in question (lots of protobufs, I think).

We should come up with a better way to organize the itab table. Maybe it would be enough to just implement move-to-front on the bucket lists? Or maybe keep a hot table and a cold table, put the itabs in the cold table at startup, and only move them to the hot table when referenced dynamically? Other ideas welcome.

The current implementation is ~append-only which makes the multithreading easy. I don't want to lose that property by locking on every call.

@walken-google @cherrymui

@randall77 randall77 added this to the Go1.10 milestone May 26, 2017
@bcmills
Copy link
Contributor

bcmills commented May 26, 2017

Other ideas welcome.

This sounds like pretty much the same use-case that sync.Map is optimized for, although I suspect you'd have to reimplement sync.Map at a much lower level to be able to use it that deeply within the runtime.

@huguesb
Copy link
Contributor

huguesb commented May 26, 2017

Other ideas welcome

Could the compiler generate a perfect hash for this data? That would save CPU by avoiding walks altogether and initialization at startup. The downside being longer compile time, larger object size, and it might not work well with plugins.

@randall77
Copy link
Contributor Author

@huguesb It is possible to perfect hash the startup data, but that seems overkill. Just sizing the table appropriately would be enough. We'd need a dynamic table anyway, as not all of the data is available at startup. If we move itabs to the dynamic table on first lookup (the hot/cold table idea), the speed of the static tables is ~irrelevant.

@dr2chase
Copy link
Contributor

How do you feel about graph coloring? For a type T that implements interfaces I1, I2, I3, create a clique on I1, I2, I3, to indicate that they must have different colors.

Color the resulting graph, and construct tables (slices) mapping colors (now integers) to itabs.
Each itab/table entry also needs to indicate the interface it is for.

To convert Ix to Iy, extract Cx from Ix, then
(itab, iface) := Cx.iitable[color(Iy)]
if iface = Iy, success, otherwise failure.

Problem with this is that the graph coloring could be large-ish, I don't really know. It could happen at link time, might be interesting to figure out if there was a way to do it on-demand.

Yes, I know it's an NP-complete problem. So is register allocation.

@randall77
Copy link
Contributor Author

@dr2chase I totally don't understand what you are suggesting. What is this graph of? Interfaces? Why are two different interfaces interfering? Is this just to make a small index out of an interface name?

We need to do this lookup even when the compiler never sees any of these types together (they are in different packages, say). So I don't think we could construct a complete interference graph at compile time anyway. Maybe link time, but then plugins break it.

@dr2chase
Copy link
Contributor

dr2chase commented May 26, 2017

Goal is to make a small index out of an interface name. Two interfaces I1 and I2 interfere if type T implements both of them (they can't both have the same index in T's table).

Plugins, the risk is either a new type TP collides two previously non-colliding interfaces, thus extending all the tables by one. A new interface IP also potentially extends all the tables.

Conceivably, "failure" can continue the probe by some other means.

How much of an event is loading a plugin?

More on the graph:

graph nodes are interfaces, graph edges indicate that two interfaces interfere (cannot share the same color) because they are both implemented by the same type. This is much the same as register allocation graph coloring, where nodes are variables, and an edge exists whenever two variables have overlapping lifetimes. In both cases, coloring is used to find and reuse small indices.

@ianlancetaylor
Copy link
Contributor

I think it would be worth doing some dynamic examination of real programs to see what dynamic types actually appear during specific interface-to-interface conversions. Where are all the interface-to-interface conversions coming from? How many are the fmt package checking for Formatter, GoStringer, and Stringer? Is it worth optimizing those specific cases, by, for example, having special hash tables just for those types?

I note that a type with methods always an uncommonType struct as part of its type descriptor. I note that uncommonType has 48 unused bits. That gives us plenty of room to identify the method index of the standard String method, if that is indeed one of the most common methods to look up.

In general, my guess is that the great majority of interface-to-interface conversions are converting to an interface with a single method. Perhaps we could add a pointer to the type descriptor of an interface type with a single method that points to a hash table mapping concrete types to the appropriate itab or nil if the type does not support the interface. The idea here is to simply split up the hash table.

@josharian
Copy link
Contributor

Once we understand the nature of the table use (per Ian), I think a good next step would be to construct a benchmark program that reproduces the problem at different scales, so we can easily profile and experiment.

@randall77
Copy link
Contributor Author

Of the static entries, there were more than 30,000 proto.Message, plus another 3000 or so helpers, like the additional Oneof types.
I don't have any dynamic counts, unfortunately. That requires being able to run the binary, which isn't easy.
I do have a benchmark which demonstrates the problem. I'll send it out tomorrow.

@walken-google
Copy link
Contributor

I think it would make sense to dynamically resize the itab hash table so that we can address both the dynamic and static itab entries cases in one go.

I think this can be simpler than sync.Map: the hash table would need to be a slice (instead of a fixed size array) and there would be an atomic pointer to the hash table slice. Lookups would follow the same "look twice - once without lock, once with" pattern as today. If the entry is not found the second time, we proceed to the insertion case which could be made to resize the table when above a given load factor.

During runtime init we may not be able to handle memory allocation (?); if that is a problem we can just disable hash resizing during early init.

Another issue is that the itab hash modulo would not be a constant anymore. We may want to avoid having an integer divide on that path. Is there a strong requirement for the prime hash size ? I suspect power of two hash sizes could be made to work too...

@rasky
Copy link
Member

rasky commented May 30, 2017

Another option is to precalc the prime sequence and the magic factors (libdivide style)

@gopherbot
Copy link

CL https://golang.org/cl/44338 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/44339 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/44341 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/44472 mentions this issue.

gopherbot pushed a commit that referenced this issue Aug 15, 2017
I noticed that we don't set an itab's function pointers at compile
time. Instead, we currently do it at executable startup.

Set the function pointers at compile time instead. This shortens
startup time. It has no effect on normal binary size. Object files
will have more relocations, but that isn't a big deal.

For PIE there are additional pointers that will need to be adjusted at
load time. There are already other pointers in an itab that need to be
adjusted, so the cache line will already be paged in. There might be
some binary size overhead to mark these pointers. The "go test -c
-buildmode=pie net/http" binary is 0.18% bigger.

Update #20505

Change-Id: I267c82489915b509ff66e512fc7319b2dd79b8f7
Reviewed-on: https://go-review.googlesource.com/44341
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
@golang golang locked and limited conversation to collaborators Aug 15, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants