runtime: efficient access to thread-local data #8884

dvyukov · 2014-10-07T10:38:35Z

Currently we have 3 performance issues with accesses to thread-local data (g/m/p):
1. Accesses require non-inlinable function calls.
2. The only thread-local var is now g, while most frequently accesses data is in m. So
most of the accesses has an additional indirection.
3. We do lots of duplicate loads of g/m.

We need to:
1. Make the thread-local var m (instead of g).
2. Move stack guard of the current g into m (that's the only hot data in g).
3. Declare runtime.curm variable in runtime, teach the compiler to recognize it and turn
into tls access.
4. Teach compiler to not do unnecessary duplicate loads of curm (like in
https://golang.org/issue/4946).

rsc · 2014-10-07T21:16:46Z

Comment 1:

I believe that changing from g to m is a mistake.
The most frequently accessed thread-local data is g->stackguard0, which is in g. It is
accessed once per function call. g is also much easier to reason about in programs,
because it cannot change from line to line as a particular function executes.
Eventually I would like to put g back into a dedicated register on amd64, like we do on
arm. Then getting at g->stackguard0 will be just one load, and getting at m will be just
one load too.

dvyukov · 2014-10-08T09:04:34Z

Comment 2:

> The most frequently accessed thread-local data is g->stackguard0, which is in g.
Yes, it's the most frequently accessed, that's I propose to move it to M. But there are
also m->mcache, m->locks, m->p and m->ptr/scalarargs. Duplicating them in G looks bad
because it will bloat G and open door to bugs. While what was called stackguard0 can
moved to M rather than duplicated.
> g is also much easier to reason about in programs, because it cannot change from line
to line as a particular function executes.
It's true that it can change, but I don't see how naming things differently changes
something. It can change regardless of whether you call it 'm' or 'g->m'. If you want to
prevent m from changing, you do 'm->locks++' or 'g->m->locks++'. No difference (other
than additional indirection).

dvyukov added accepted Performance labels Oct 8, 2014

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed release-none labels Apr 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: efficient access to thread-local data #8884

runtime: efficient access to thread-local data #8884

dvyukov commented Oct 7, 2014

rsc commented Oct 7, 2014

dvyukov commented Oct 8, 2014

runtime: efficient access to thread-local data #8884

runtime: efficient access to thread-local data #8884

Comments

dvyukov commented Oct 7, 2014

rsc commented Oct 7, 2014

dvyukov commented Oct 8, 2014