|
|
Descriptioncmd/cc, cmd/gc, runtime: emit bitmaps for scanning locals.
Previously, all word aligned locations in the local variables
area were scanned as conservative roots. With this change, a
bitmap is generated describing the locations of pointer values
in local variables.
With this change the argument bitmap information has been
changed to only store information about arguments. The locals
member, has been removed. In its place, the bitmap data for
local variables is now used to store the size of locals. If
the size is negative, the magnitude indicates the size of the
local variables area.
Patch Set 1 #Patch Set 2 : diff -r d656f46cfb72 https://code.google.com/p/go/ #Patch Set 3 : diff -r d656f46cfb72 https://code.google.com/p/go/ #
Total comments: 15
Patch Set 4 : diff -r 4a3d328fe8e9 https://code.google.com/p/go/ #Patch Set 5 : diff -r a14c507d13e9 https://code.google.com/p/go/ #
Total comments: 4
Patch Set 6 : diff -r a14c507d13e9 https://code.google.com/p/go/ #Patch Set 7 : diff -r 6058f523a441 https://code.google.com/p/go/ #
MessagesTotal messages: 17
Hello golang-dev@googlegroups.com, I'd like you to review this change to https://code.google.com/p/go/
Sign in to reply to this message.
Looks pretty good. I was hoping that there would be just a single FUNCDATA for garbage collection information. It should be straightforward to put the data in a single symbol, and doing so will keep the pcln table entries small, since they will not need pointers to multiple symbols. What do you think?
Sign in to reply to this message.
https://codereview.appspot.com/12328044/diff/5001/src/cmd/gc/pgen.c File src/cmd/gc/pgen.c (left): https://codereview.appspot.com/12328044/diff/5001/src/cmd/gc/pgen.c#oldcode187 src/cmd/gc/pgen.c:187: off = 0; Specifically, if dumpgcargs and dumpgclocals return the number of words they wrote, you can do: off = 8; n = dumpgcargs(gcsym, off, curfn); duint32(gcsym, 0, n); // uint32 numargs off += 4*n; n = dumpgclocals(gcsym, off, curfn); duint32(gcsym, 4, n); // uint32 numlocals off += 4*n; ggloblsym(gcsym, off, 0, 1); And it is still possible to add a pointer to a shared bitmap if we start coalescing.
Sign in to reply to this message.
If I understand your comment in src/cmd/gc/pgen.c correctly, the data would end up looking like this struct gcsym { int32 args uint32 data[args] int32 locals uint32 moredata[] } Is that right? I think the opportunities for coalescing are better when the data is kept separate. However, combining the data as you describe is certainly an easy change to make.
Sign in to reply to this message.
On Fri, Aug 2, 2013 at 9:53 PM, <cshapiro@google.com> wrote: > If I understand your comment in src/cmd/gc/pgen.c correctly, the data > would end up looking like this > > struct gcsym { > int32 args > uint32 data[args] > int32 locals > uint32 moredata[] > } > > Is that right? > Almost. The data structure in my comment would end up being int32 numargs int32 numlocals uint32 args[numargs] uint32 locals[numlocals] off starts at 8 to reserve space for the first two, and then they are filled in as they are learned. I do understand that coalescing will be slightly more difficult, but when we get to pc-dependent bitmaps I don't believe we should emit a funcdata for each bitmap, because it could require the entire space of identifiers. Instead I expect that this struct will evolve to include pointers to other symbols, still using a single funcdata for the struct. Russ
Sign in to reply to this message.
On Fri, Aug 2, 2013 at 7:06 PM, Russ Cox <rsc@golang.org> wrote: > Almost. The data structure in my comment would end up being > > int32 numargs > int32 numlocals > uint32 args[numargs] > uint32 locals[numlocals] > > off starts at 8 to reserve space for the first two, and then they are > filled in as they are learned. > Got it. > I do understand that coalescing will be slightly more difficult, but when > we get to pc-dependent bitmaps I don't believe we should emit a funcdata > for each bitmap, because it could require the entire space of identifiers. > The data structure in that change is quite different. A single bitmap is used that is an image of the entire frame. It covers both local variables and arguments. A single funcdata is used that contains the concatenation of all bitmaps for a function at each program counter location. To retrieve a bitmap the corresponding pcdata is multiplied by the bitmap size to become an index into the array of bitmap values. I am not sure my decision to combine the local variables and arguments was a good one. Because the argument data varies less over time than the local variable data, splitting the data structure in two has a strong advantage when coalescing. Instead I expect that this struct will evolve to include pointers to other > symbols, still using a single funcdata for the struct. > I understand the concern. I believe there will not be a need for more than two funcdata structures.
Sign in to reply to this message.
https://codereview.appspot.com/12328044/diff/5001/src/cmd/gc/pgen.c File src/cmd/gc/pgen.c (right): https://codereview.appspot.com/12328044/diff/5001/src/cmd/gc/pgen.c#newcode172 src/cmd/gc/pgen.c:172: dumpgcargs(fn, gcargssym); can this move down next to dumpglocals, so that all the gc stuff is together? https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/funcdata.h File src/pkg/runtime/funcdata.h (right): https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/funcdata.h#... src/pkg/runtime/funcdata.h:12: #define FUNCDATA_GCARGS 0 /* garbage collector blocks */ GCArgs GCLocals (to match Go naming for the suffix, and PCDATA_ArgSize) https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/mgc0.c#newc... src/pkg/runtime/mgc0.c:1395: typedef struct BitVector BitVector; move above struct definition (that's just more common in this code) https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/mgc0.c#newc... src/pkg/runtime/mgc0.c:1399: scanbitvector(byte* scanp, BitVector *bv) s/byte* /byte */ Even better, though, use uintptr *scanp and then scanp increments as you need. I think the loop can be significantly tighter: j, remptrs, and scanp are operating in lock step and can be reduced to just scanp. The bit b can be removed by shifting w right instead of shifting b left. uint32 i; uintptr *endw, *end; i = 0; end = scanp + bv->n; while(scanp < end) { // Process up to 32 pointers using next uint32 in bitmap. w = bv->data[i++]; if(end - scanp < 32) endw = end; else endw = scanp + 32; for(; scanp < endw; scanp++) { if(w & 1) addroot((Obj){scanp, PtrSize, 0}); w >>= 1; } } I believe this loop also fixes a bug in the current code: remptrs = bv->n implies that bv->n is the number of bits, while i < bv->n implies that bv->n is the number of uint32s. I think it is the number of bits, so the current test should be i*32 < bv->n. But that goes away in the rewrite. https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/mgc0.c#newc... src/pkg/runtime/mgc0.c:1428: intptr sz; intptr is not declared on all of our operating systems, and it's hard to use correctly on 32-bit systems, although in this case i don't think you need to worry about 2 GB frames. Still, avoid it. Here you can replace sz > 0 with varp > sp, and then compute sz inside the if. https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/mgc0.c#newc... src/pkg/runtime/mgc0.c:1439: } else if(locals->n >= 0) { Please flip this so that the tiny cases are earlier. It just makes it a little easier to read because I don't have to remember the condition across the long body here. There's also a good opportunity to reuse sz, to make the cases clearly similar. if(locals == nil) { addroot((Obj){frame->varp - sz, sz, 0}); } else if(locals->n < 0) { // No bitmap, only size of local block. sz = -locals->n; addroot((Obj){frame->varp - sz, sz, 0}); } else { // Bitmap present. ... } https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/mgc0.c#newc... src/pkg/runtime/mgc0.c:1452: if(f->args > 0 && args != nil && args->n > 0) { Can we drop f->args > 0 ? I'd like the option to remove that from the Func entirely.
Sign in to reply to this message.
PTAL https://codereview.appspot.com/12328044/diff/5001/src/cmd/gc/pgen.c File src/cmd/gc/pgen.c (right): https://codereview.appspot.com/12328044/diff/5001/src/cmd/gc/pgen.c#newcode172 src/cmd/gc/pgen.c:172: dumpgcargs(fn, gcargssym); On 2013/08/03 03:15:31, rsc wrote: > can this move down next to dumpglocals, so that all the gc stuff is together? Done. https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/funcdata.h File src/pkg/runtime/funcdata.h (right): https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/funcdata.h#... src/pkg/runtime/funcdata.h:12: #define FUNCDATA_GCARGS 0 /* garbage collector blocks */ On 2013/08/03 03:15:31, rsc wrote: > GCArgs > GCLocals > > (to match Go naming for the suffix, and PCDATA_ArgSize) Done. https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/mgc0.c#newc... src/pkg/runtime/mgc0.c:1395: typedef struct BitVector BitVector; On 2013/08/03 03:15:31, rsc wrote: > move above struct definition (that's just more common in this code) Done. https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/mgc0.c#newc... src/pkg/runtime/mgc0.c:1399: scanbitvector(byte* scanp, BitVector *bv) I have rewritten this function to improve its brevity. I originally used your code as is but it generated more object code. https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/mgc0.c#newc... src/pkg/runtime/mgc0.c:1428: intptr sz; On 2013/08/03 03:15:31, rsc wrote: > intptr is not declared on all of our operating systems, and it's hard to use > correctly on 32-bit systems, although in this case i don't think you need to > worry about 2 GB frames. Still, avoid it. Here you can replace sz > 0 with varp > > sp, and then compute sz inside the if. Done. https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/mgc0.c#newc... src/pkg/runtime/mgc0.c:1439: } else if(locals->n >= 0) { On 2013/08/03 03:15:31, rsc wrote: > Please flip this so that the tiny cases are earlier. It just makes it a little > easier to read because I don't have to remember the condition across the long > body here. There's also a good opportunity to reuse sz, to make the cases > clearly similar. > > if(locals == nil) { > addroot((Obj){frame->varp - sz, sz, 0}); > } else if(locals->n < 0) { > // No bitmap, only size of local block. > sz = -locals->n; > addroot((Obj){frame->varp - sz, sz, 0}); > } else { > // Bitmap present. > ... > } Done. https://codereview.appspot.com/12328044/diff/5001/src/pkg/runtime/mgc0.c#newc... src/pkg/runtime/mgc0.c:1452: if(f->args > 0 && args != nil && args->n > 0) { On 2013/08/03 03:15:31, rsc wrote: > Can we drop f->args > 0 ? > I'd like the option to remove that from the Func entirely. Done.
Sign in to reply to this message.
LGTM with changes I assume the next step (in a different CL) is to emit code at the beginning of Go functions to zero the pointer slots in the stack frame. At that point we should not have any uninitialized data being treated as pointers in Go frames. https://codereview.appspot.com/12328044/diff/21001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/12328044/diff/21001/src/pkg/runtime/mgc0.c#new... src/pkg/runtime/mgc0.c:1408: for(i = MIN(remptrs, 32); i > 0; i--) { Please just use a variable and an if statement instead of a macro. Macros make code harder to read and maintain. You've already got the variable. https://codereview.appspot.com/12328044/diff/21001/src/pkg/runtime/runtime.h File src/pkg/runtime/runtime.h (right): https://codereview.appspot.com/12328044/diff/21001/src/pkg/runtime/runtime.h#... src/pkg/runtime/runtime.h:546: #define MIN(x, y) ((x)<(y)?(x):(y)) More #define macros are not okay here. See the comment above. Even I don't have super-gopher-guru privilege to add to this list.
Sign in to reply to this message.
I will submit this change shortly. https://codereview.appspot.com/12328044/diff/21001/src/pkg/runtime/mgc0.c File src/pkg/runtime/mgc0.c (right): https://codereview.appspot.com/12328044/diff/21001/src/pkg/runtime/mgc0.c#new... src/pkg/runtime/mgc0.c:1408: for(i = MIN(remptrs, 32); i > 0; i--) { On 2013/08/06 17:11:40, rsc wrote: > Please just use a variable and an if statement instead of a macro. > Macros make code harder to read and maintain. > You've already got the variable. > Done. https://codereview.appspot.com/12328044/diff/21001/src/pkg/runtime/runtime.h File src/pkg/runtime/runtime.h (right): https://codereview.appspot.com/12328044/diff/21001/src/pkg/runtime/runtime.h#... src/pkg/runtime/runtime.h:546: #define MIN(x, y) ((x)<(y)?(x):(y)) On 2013/08/06 17:11:40, rsc wrote: > More #define macros are not okay here. See the comment above. > Even I don't have super-gopher-guru privilege to add to this list. Done.
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=d91993212194 *** cmd/cc, cmd/gc, runtime: emit bitmaps for scanning locals. Previously, all word aligned locations in the local variables area were scanned as conservative roots. With this change, a bitmap is generated describing the locations of pointer values in local variables. With this change the argument bitmap information has been changed to only store information about arguments. The locals member, has been removed. In its place, the bitmap data for local variables is now used to store the size of locals. If the size is negative, the magnitude indicates the size of the local variables area. R=rsc CC=golang-dev https://codereview.appspot.com/12328044
Sign in to reply to this message.
Message was sent while issue was closed.
On 2013/08/07 19:47:04, cshapiro1 wrote: > *** Submitted as https://code.google.com/p/go/source/detail?r=d91993212194 *** > > cmd/cc, cmd/gc, runtime: emit bitmaps for scanning locals. > > Previously, all word aligned locations in the local variables > area were scanned as conservative roots. With this change, a > bitmap is generated describing the locations of pointer values > in local variables. > > With this change the argument bitmap information has been > changed to only store information about arguments. The locals > member, has been removed. In its place, the bitmap data for > local variables is now used to store the size of locals. If > the size is negative, the magnitude indicates the size of the > local variables area. > > R=rsc > CC=golang-dev > https://codereview.appspot.com/12328044 I think CL is triggering a large memory allocation in the compiler. Some of the arm builders, running without swap, are failing.
Sign in to reply to this message.
On Wed, Aug 7, 2013 at 5:50 PM, <dave@cheney.net> wrote: > I think CL is triggering a large memory allocation in the compiler. Some > of the arm builders, running without swap, are failing. > Thanks for the report, I will look into this. How much memory do these machines typically have free?
Sign in to reply to this message.
My panda has ~800 mb free, the rpi on to its right < 400 [2196103.666107] Out of memory: Kill process 31904 (5g) score 843 or sacrifice child [2196103.674072] Killed process 31904 (5g) total-vm:846788kB, anon-rss:843692kB, file-rss:8kB On Thu, Aug 8, 2013 at 11:03 AM, Carl Shapiro <cshapiro@google.com> wrote: > On Wed, Aug 7, 2013 at 5:50 PM, <dave@cheney.net> wrote: >> >> I think CL is triggering a large memory allocation in the compiler. Some >> of the arm builders, running without swap, are failing. > > > Thanks for the report, I will look into this. How much memory do these > machines typically have free?
Sign in to reply to this message.
On Wed, Aug 7, 2013 at 6:05 PM, Dave Cheney <dave@cheney.net> wrote: > My panda has ~800 mb free, the rpi on to its right < 400 > > [2196103.666107] Out of memory: Kill process 31904 (5g) score 843 or > sacrifice child > [2196103.674072] Killed process 31904 (5g) total-vm:846788kB, > anon-rss:843692kB, file-rss:8kB > Interesting. The ARM machine I use for testing has about 1.3gb free which might explain why I have not observe an OOM. I will take a close look at this later tonight. Hopefully, something will jump out.
Sign in to reply to this message.
I think the other panda builder has some swap. Combined with the linux overcommit behavior, means you can allocate gigabytes. On Thu, Aug 8, 2013 at 11:12 AM, Carl Shapiro <cshapiro@google.com> wrote: > On Wed, Aug 7, 2013 at 6:05 PM, Dave Cheney <dave@cheney.net> wrote: >> >> My panda has ~800 mb free, the rpi on to its right < 400 >> >> [2196103.666107] Out of memory: Kill process 31904 (5g) score 843 or >> sacrifice child >> [2196103.674072] Killed process 31904 (5g) total-vm:846788kB, >> anon-rss:843692kB, file-rss:8kB > > > Interesting. The ARM machine I use for testing has about 1.3gb free which > might explain why I have not observe an OOM. I will take a close look at > this later tonight. Hopefully, something will jump out. >
Sign in to reply to this message.
Looks pretty bad, this isn't an arm only thing. https://code.google.com/p/go/issues/detail?id=6077 On Thu, Aug 8, 2013 at 11:26 AM, Dave Cheney <dave@cheney.net> wrote: > I think the other panda builder has some swap. Combined with the linux > overcommit behavior, means you can allocate gigabytes. > > On Thu, Aug 8, 2013 at 11:12 AM, Carl Shapiro <cshapiro@google.com> wrote: >> On Wed, Aug 7, 2013 at 6:05 PM, Dave Cheney <dave@cheney.net> wrote: >>> >>> My panda has ~800 mb free, the rpi on to its right < 400 >>> >>> [2196103.666107] Out of memory: Kill process 31904 (5g) score 843 or >>> sacrifice child >>> [2196103.674072] Killed process 31904 (5g) total-vm:846788kB, >>> anon-rss:843692kB, file-rss:8kB >> >> >> Interesting. The ARM machine I use for testing has about 1.3gb free which >> might explain why I have not observe an OOM. I will take a close look at >> this later tonight. Hopefully, something will jump out. >>
Sign in to reply to this message.
|