cmd/compile: killed while compiling gogoprotobuf #15537

gpaul · 2016-05-04T13:17:30Z

What version of Go are you using (go version)?
go version devel +ba6765c Wed May 4 05:55:07 2016 +0000 linux/amd64
What operating system and processor architecture are you using (go env)?
linux x86_64
What did you do?

mkdir /tmp/gopath
cd $!
export GOPATH=`pwd`
go get -d -v github.com/gogo/protobuf
go install -v ...

What did you expect to see?
should succeed
What did you see instead?
a couple of signal: killed errors:

github.com/gogo/protobuf/test/casttype/combos/unsafeunmarshaler
go build github.com/gogo/protobuf/test: /home/gustav/go/pkg/tool/linux_amd64/compile: signal: killed
github.com/gogo/protobuf/test/castvalue
go build github.com/gogo/protobuf/test/casttype/combos/neither: /home/gustav/go/pkg/tool/linux_amd64/compile: signal: killed

This is on a pretty decent laptop with 16GB of RAM and is doing little else:

$ free -g
              total        used        free      shared  buff/cache   available
Mem:             15           1          13           0           0          13
Swap:             0           0           0

The text was updated successfully, but these errors were encountered:

bradfitz · 2016-05-04T14:40:23Z

What's your ulimit -a?

What does time go install -v .. say?

gpaul · 2016-05-04T15:10:27Z

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 63873
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 63873
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Sometimes strace shows failure with ENOMEM, other times I just see SIGKILLs as the oom-killer steps in.

dmesg shows the following (oomkiller)

[Wed May  4 17:04:11 2016] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
...
[Wed May  4 17:04:11 2016] [ 9770]     0  9770    93361    26797      90       3        0             0 perf
[Wed May  4 17:04:11 2016] [13022]  1000 13022     1828       92       9       3        0             0 strace
[Wed May  4 17:04:11 2016] [13024]  1000 13024    75524     2265      36       6        0             0 go
[Wed May  4 17:04:11 2016] [13123]  1000 13123   661874   518769    1140       7        0             0 compile
[Wed May  4 17:04:11 2016] [13131]  1000 13131   662488   521653    1146       7        0             0 compile
[Wed May  4 17:04:11 2016] [13132]  1000 13132   597042   437581    1005       7        0             0 compile
[Wed May  4 17:04:11 2016] [13143]  1000 13143   685774   497712    1099       7        0             0 compile
[Wed May  4 17:04:11 2016] [13144]  1000 13144   588702   389245     897       7        0             0 compile
[Wed May  4 17:04:11 2016] [13154]  1000 13154   661903   522499    1170       7        0             0 compile
[Wed May  4 17:04:11 2016] [13157]  1000 13157   661655   518735    1141       8        0             0 compile
[Wed May  4 17:04:11 2016] [14667]  1000 14667    30977    27699      64       5        0             0 compile
[Wed May  4 17:04:11 2016] [14713]  1000 14713    91364     1211      30       5        0             0 go
[Wed May  4 17:04:11 2016] Out of memory: Kill process 13154 (compile) score 127 or sacrifice child
[Wed May  4 17:04:11 2016] Killed process 13154 (compile) total-vm:2647612kB, anon-rss:2089996kB, file-rss:0kB

gpaul · 2016-05-04T15:23:51Z

Specifically, strace sometimes shows the following kind of failure:

17263 mmap(0xc8a5dc0000, 256507904, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
17263 mincore(0xc8a5dc0000, 4096, 0xb104a9) = -1 ENOMEM (Cannot allocate memory)
17263 mincore(0xc8a5dc1000, 4096, 0xb104a9) = -1 ENOMEM (Cannot allocate memory)
17263 mincore(0xc8a5dc2000, 4096, 0xb104a9) = -1 ENOMEM (Cannot allocate memory)
17263 mincore(0xc8a5dc3000, 4096, 0xb104a9) = -1 ENOMEM (Cannot allocate memory)
...

gpaul · 2016-05-04T15:34:00Z

The output of ps ax is given below. Watching these with htop shows them growing rapidly to the point where they use upwards of 1GB each and finally trigger the oom-killer.

    17225 pts/1    Rl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test -complete -buildid 1bb3edbfa185d998f7210687c649cc52b6422a76 -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/thetest.pb.go /home/gustav/gopath2/src/github.com/gogo/protobuf/test/uuid.go
    17226 pts/1    Rl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/both.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/both -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/both -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/both/casttype.pb.go
    17236 pts/1    Sl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/marshaler.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/marshaler -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/marshaler -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/marshaler/casttype.pb.go
    17238 pts/1    Rl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/neither.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/neither -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/neither -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/neither/casttype.pb.go
    17242 pts/1    Rl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/unmarshaler.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/unmarshaler -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unmarshaler -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unmarshaler/casttype.pb.go
    17248 pts/1    Sl+    0:09 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/unsafeboth.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/unsafeboth -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unsafeboth -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unsafeboth/casttype.pb.go
    17253 pts/1    Sl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/unsafemarshaler.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/unsafemarshaler -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unsafemarshaler -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unsafemarshaler/casttype.pb.go
    17254 pts/1    Sl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/unsafeunmarshaler.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/unsafeunmarshaler -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unsafeunmarshaler -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unsafeunmarshaler/casttype.pb.go

kostya-sh · 2016-05-04T16:01:23Z

Possible reason: the generated casttype.pb.go files have lots (1000+) anonymous function invocations like this:

   {Name: func(v string) *string { return &v }("google/protobuf/descriptor.proto"),
    Package: func(v string) *string { return &v }("google.protobuf"),
    MessageType: []*descriptor.DescriptorProto{{Name: func(v string) *string { return &v }("FileDescriptorSet"),
        Field: []*descriptor.FieldDescriptorProto{{Name: func(v string) *string { return &v }("file"),
            Number:   func(v int32) *int32 { return &v }(1),
            Label:    func(v descriptor.FieldDescriptorProto_Label) *descriptor.FieldDescriptorProto_Label { return &v }(3),
            Type:     func(v descriptor.FieldDescriptorProto_Type) *descriptor.FieldDescriptorProto_Type { return &v }(11),
            TypeName: func(v string) *string { return &v }(".google.protobuf.FileDescriptorProto"),
            JsonName: func(v string) *string { return &v }("file"),
        }},
    }

Memprofile shows:

  299.64MB 36.26% 36.26%   300.37MB 36.35%  cmd/compile/internal/ssa.(*stackAllocState).stackalloc
  195.68MB 23.68% 59.94%   342.44MB 41.44%  cmd/compile/internal/ssa.(*stackAllocState).init
  146.77MB 17.76% 77.70%   146.77MB 17.76%  cmd/compile/internal/ssa.(*Func).newSparseSet
  146.76MB 17.76% 95.46%   146.76MB 17.76%  cmd/compile/internal/ssa.(*stackAllocState).buildInterferenceGraph

josharian · 2016-05-04T18:13:16Z

Thanks for profiling. If you have the binary and profile still around, will you re-run pprof with the -lines argument added as well? Thanks!

davecheney · 2016-05-04T20:00:23Z

Does your machine had any swap partition or file allocated? Even though you may have huge amounts of physical memory available, Linux behaves very conservatively if no swap is available.

On 4 May 2016, at 23:34, Gustav Paul notifications@github.com wrote:

The output of ps ax is given below. Watching these with htop shows them growing rapidly to the point where they use upwards of 1GB each and finally trigger the oom-killer.

17225 pts/1    Rl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test -complete -buildid 1bb3edbfa185d998f7210687c649cc52b6422a76 -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/thetest.pb.go /home/gustav/gopath2/src/github.com/gogo/protobuf/test/uuid.go
17226 pts/1    Rl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/both.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/both -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/both -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/both/casttype.pb.go
17236 pts/1    Sl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/marshaler.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/marshaler -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/marshaler -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/marshaler/casttype.pb.go
17238 pts/1    Rl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/neither.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/neither -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/neither -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/neither/casttype.pb.go
17242 pts/1    Rl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/unmarshaler.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/unmarshaler -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unmarshaler -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unmarshaler/casttype.pb.go
17248 pts/1    Sl+    0:09 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/unsafeboth.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/unsafeboth -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unsafeboth -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unsafeboth/casttype.pb.go
17253 pts/1    Sl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/unsafemarshaler.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/unsafemarshaler -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unsafemarshaler -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unsafemarshaler/casttype.pb.go
17254 pts/1    Sl+    0:10 /home/gustav/go/pkg/tool/linux_amd64/compile -o /dev/shm/go-build625944181/github.com/gogo/protobuf/test/casttype/combos/unsafeunmarshaler.a -trimpath /dev/shm/go-build625944181 -p github.com/gogo/protobuf/test/casttype/combos/unsafeunmarshaler -complete -buildid 0dec27d7c21ec48f91578f15d0f446edc5ee76aa -D _/home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unsafeunmarshaler -I /dev/shm/go-build625944181 -I /home/gustav/gopath2/pkg/linux_amd64 -pack /home/gustav/gopath2/src/github.com/gogo/protobuf/test/casttype/combos/unsafeunmarshaler/casttype.pb.go

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub

josharian · 2016-05-04T20:07:06Z

I just reproduced here. (Not the killing, but high memory usage.) I'm investigating, although no promises. An initial observation is that we're generating way too many OpCopies during SSA construction.

randall77 · 2016-05-04T20:24:25Z

@dr2chase : if this is phi insertion your CL might help.

josharian · 2016-05-04T20:47:17Z

I think I have a simpler fix. Hang tight.

dr2chase · 2016-05-04T21:44:33Z

Sparse-phi code looks like a big help. On my laptop (8GB, OS X) it appears to max out at 4 simultaneous compiles each over 2G if sparse-phi is activated, but it runs to completion with only a brief instant of red in the system memory monitor.

With the sparse-phi code deactivated, at an earlier stage one of the compiles grows to 6.7G (at least -- I killed it, it did this twice in a row so it is repeatable enough for me) alongside three others, one of those over 2G and two over 1G, and the system memory monitor goes red and sticks there.

gopherbot · 2016-05-04T22:00:25Z

CL https://golang.org/cl/22790 mentions this issue.

josharian · 2016-05-04T23:17:13Z

@dr2chase if I want to benchmark the sparse phi change locally, do I just patch in the CL (which again?) or do I also need to fiddle with any settings? I'm curious how my little optimization compares.

kostya-sh · 2016-05-04T23:47:34Z

Not sure if this is still useful but here are profile results with -line. CL 22790 helps a lot:

Before the CL:

(pprof) top
796.84MB of 826.39MB total (96.42%)
Dropped 567 nodes (cum <= 4.13MB)
Showing top 10 nodes out of 30 (cum >= 790.81MB)
      flat  flat%   sum%        cum   cum%
  244.60MB 29.60% 29.60%   244.60MB 29.60%  cmd/compile/internal/ssa.(*stackAllocState).stackalloc src/cmd/compile/internal/ssa/stackalloc.go:134
  195.68MB 23.68% 53.28%   195.68MB 23.68%  cmd/compile/internal/ssa.(*stackAllocState).init src/cmd/compile/internal/ssa/stackalloc.go:103
  146.77MB 17.76% 71.04%   146.77MB 17.76%  cmd/compile/internal/ssa.(*Func).newSparseSet src/cmd/compile/internal/ssa/func.go:64
  146.76MB 17.76% 88.80%   146.76MB 17.76%  cmd/compile/internal/ssa.(*stackAllocState).buildInterferenceGraph src/cmd/compile/internal/ssa/stackalloc.go:361
   48.92MB  5.92% 94.72%    48.92MB  5.92%  cmd/compile/internal/ssa.(*stackAllocState).stackalloc src/cmd/compile/internal/ssa/stackalloc.go:170

After the CL:

(pprof) top
164.39MB of 181.64MB total (90.50%)
Dropped 417 nodes (cum <= 0.91MB)
Showing top 10 nodes out of 178 (cum >= 2.31MB)
      flat  flat%   sum%        cum   cum%
   44.47MB 24.48% 24.48%    44.47MB 24.48%  cmd/compile/internal/ssa.(*stackAllocState).stackalloc src/cmd/compile/internal/ssa/stackalloc.go:134
   35.57MB 19.58% 44.06%    35.57MB 19.58%  cmd/compile/internal/ssa.(*stackAllocState).init src/cmd/compile/internal/ssa/stackalloc.go:103
   26.70MB 14.70% 58.76%    26.70MB 14.70%  cmd/compile/internal/ssa.(*Func).newSparseSet src/cmd/compile/internal/ssa/func.go:64
   26.68MB 14.69% 73.45%    26.68MB 14.69%  cmd/compile/internal/ssa.(*stackAllocState).buildInterferenceGraph src/cmd/compile/internal/ssa/stackalloc.go:361

dr2chase · 2016-05-04T23:50:24Z

Your optimization helps a lot on Dave Cheney's types.go compilation benchmark, and helps a good bit on this one but the sparse phi change does notably better. I need to check one other slowed-way-down build.

If you patch in the CL ( https://go-review.googlesource.com/#/c/22342/ ) there is a cutover point based on program size (product of #vars and #blocks) that is normally 2,500,000 but you can change it on the command line.

export GO_SSA_PHI_LOC_CUTOFF=0  # enables sparse for all >= 0
export GO_SSA_PHI_LOC_CUTOFF=-1 # disables sparse completely

dr2chase · 2016-05-04T23:51:53Z

This is the other one: #14934

dr2chase · 2016-05-05T00:11:31Z

Checking against the 38x slowdown -- your change gets most of it, (down to 15s user on my laptop) sparse manages to get one second faster, not enough to matter for that one. This is actually the only one I've seen where it seems to make a real difference.

Keith was wondering if you ought to decorate s.defvar for some of the blocks on your recursive walk. Not nearly all, but enough to prevent quadratic badness if you had a bunch of uses far from the first def. Say, you could iterate instead of recurse, count as you go, and then stick a def halfway to the end of the iteration if it was more than 3ish.

josharian · 2016-05-05T00:21:31Z

Thanks, @dr2chase. It's probable that both the sparse phi and this optimization should go in; this one is good for small and general cases, and sparse is good for large ones. I'm going to run some more benchmarks (last time was low iterations and still had browser etc running) and take one more look.

I'm not 100% sure I follow the decoration suggestion, but I'll read it again tomorrow. (Off to take my kid to the playground while some benchmarks run.) I will observe that we were actually generating O(n^2) values to avoid an O(n^2) recursion, so we are going from expensive quadratric to cheap quadratic. If you want things in sooner for freeze reasons, feel free to just submit my change and build atop it.

Oh, and reminder to self: There's also still some stackAlloc to look into here. :)

josharian · 2016-05-05T02:12:20Z

Ok, I think I understand the decoration suggestion. I don't think halfway if > 3 is the right answer--I suspect something more like every 100 iterations is probably going to yield better results. I will do some experimenting. I have a nice, simple benchmark case to use for this:

package p

func f() []*int {
    x := [...]*int{
        func(v int) *int { return &v }(1),
        func(v int) *int { return &v }(2),
        func(v int) *int { return &v }(3),
        func(v int) *int { return &v }(4),
        func(v int) *int { return &v }(5),
        func(v int) *int { return &v }(6),
        func(v int) *int { return &v }(7),
        func(v int) *int { return &v }(8),
                // and so on, as desired to create quadratic behavior
    }
    return x[:]
}

(Note to self: Why aren't these getting inlined, anyway?)

If b has exactly one predecessor, as happens frequently with static calls, we can make lookupVarOutgoing generate less garbage. Instead of generating a value that is just going to be an OpCopy and then get eliminated, loop. This can lead to lots of looping. However, this loop is way cheaper than generating lots of ssa.Values and then eliminating them. For a subset of the code in #15537: Before: 28.31 real 36.17 user 1.68 sys 2282450944 maximum resident set size After: 9.63 real 11.66 user 0.51 sys 638144512 maximum resident set size Updates #15537. Excitingly, it appears that this also helps regular code: name old time/op new time/op delta Template 288ms ± 6% 276ms ± 7% -4.13% (p=0.000 n=21+24) Unicode 143ms ± 8% 141ms ±10% ~ (p=0.287 n=24+25) GoTypes 932ms ± 4% 874ms ± 4% -6.20% (p=0.000 n=23+22) Compiler 4.89s ± 4% 4.58s ± 4% -6.46% (p=0.000 n=22+23) MakeBash 40.2s ±13% 39.8s ± 9% ~ (p=0.648 n=23+23) name old user-ns/op new user-ns/op delta Template 388user-ms ±10% 373user-ms ± 5% -3.80% (p=0.000 n=24+25) Unicode 203user-ms ± 6% 202user-ms ± 7% ~ (p=0.492 n=22+24) GoTypes 1.29user-s ± 4% 1.17user-s ± 4% -9.67% (p=0.000 n=25+23) Compiler 6.86user-s ± 5% 6.28user-s ± 4% -8.49% (p=0.000 n=25+25) name old alloc/op new alloc/op delta Template 51.5MB ± 0% 47.6MB ± 0% -7.47% (p=0.000 n=22+25) Unicode 37.2MB ± 0% 37.1MB ± 0% -0.21% (p=0.000 n=25+25) GoTypes 166MB ± 0% 138MB ± 0% -16.83% (p=0.000 n=25+25) Compiler 756MB ± 0% 628MB ± 0% -16.96% (p=0.000 n=25+23) name old allocs/op new allocs/op delta Template 450k ± 0% 445k ± 0% -1.02% (p=0.000 n=25+25) Unicode 356k ± 0% 356k ± 0% ~ (p=0.374 n=24+25) GoTypes 1.31M ± 0% 1.25M ± 0% -4.18% (p=0.000 n=25+25) Compiler 5.29M ± 0% 5.02M ± 0% -5.15% (p=0.000 n=25+23) It also seems to help in other cases in which phi insertion is a pain point (#14774, #14934). Change-Id: Ibd05ed7b99d262117ece7bb250dfa8c3d1cc5dd2 Reviewed-on: https://go-review.googlesource.com/22790 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>

dr2chase · 2016-05-06T00:16:41Z

With Josh's fix, and sparse layered on top of that, the big memory pigs seem to be in register and stack allocation:

regalloc:

   1280     448.16MB   448.16MB             s.spillLive[b.ID] = append(s.spillLive[b.ID], spill.ID) 
   2170     345.39MB   345.39MB                     l = make([]liveInfo, 0, t.size())

This is a typical memory profile from one of the fattest compilations.

2530.22MB of 2766.97MB total (91.44%)
Dropped 356 nodes (cum <= 13.83MB)
Showing top 30 nodes out of 123 (cum >= 17.13MB)
      flat  flat%   sum%        cum   cum%
  497.29MB 17.97% 17.97%   951.42MB 34.38%  cmd/compile/internal/ssa.(*regAllocState).regalloc
  389.79MB 14.09% 32.06%   401.64MB 14.52%  cmd/compile/internal/ssa.(*regAllocState).computeLive
  216.63MB  7.83% 39.89%   216.63MB  7.83%  cmd/compile/internal/ssa.(*Func).newValue
  124.35MB  4.49% 44.38%   130.05MB  4.70%  cmd/compile/internal/ssa.(*stackAllocState).computeLive
  114.73MB  4.15% 48.53%   311.30MB 11.25%  cmd/compile/internal/gc.(*state).lookupVarOutgoing
  104.51MB  3.78% 52.31%   104.51MB  3.78%  cmd/compile/internal/gc.Nod
   97.95MB  3.54% 55.85%   499.59MB 18.06%  cmd/compile/internal/ssa.(*regAllocState).init
   90.50MB  3.27% 59.12%    90.50MB  3.27%  cmd/compile/internal/ssa.(*edgeState).set
   86.55MB  3.13% 62.24%    87.05MB  3.15%  cmd/compile/internal/ssa.cse
   86.08MB  3.11% 65.36%    86.08MB  3.11%  cmd/compile/internal/ssa.(*node32).insert
   81.10MB  2.93% 68.29%    81.10MB  2.93%  cmd/compile/internal/ssa.schedule
   78.81MB  2.85% 71.14%    82.37MB  2.98%  cmd/compile/internal/ssa.(*stackAllocState).stackalloc
   70.90MB  2.56% 73.70%    70.90MB  2.56%  cmd/compile/internal/ssa.(*stackAllocState).buildInterferenceGraph
   51.51MB  1.86% 75.56%    51.51MB  1.86%  cmd/compile/internal/gc.newliveness
   49.70MB  1.80% 77.36%   250.65MB  9.06%  cmd/compile/internal/ssa.(*stackAllocState).init
   48.81MB  1.76% 79.12%    48.81MB  1.76%  cmd/internal/obj.(*LSym).Grow
   37.29MB  1.35% 80.47%    37.29MB  1.35%  cmd/compile/internal/ssa.(*Func).newSparseSet
   37.01MB  1.34% 81.80%   169.70MB  6.13%  cmd/compile/internal/gc.(*state).locatePotentialPhiFunctions
   30.10MB  1.09% 82.89%    30.10MB  1.09%  cmd/compile/internal/ssa.(*Func).setHome
   29.32MB  1.06% 83.95%    29.32MB  1.06%  cmd/compile/internal/ssa.tighten
   26.01MB  0.94% 84.89%    26.01MB  0.94%  cmd/compile/internal/gc.Prog
   25.44MB  0.92% 85.81%    25.44MB  0.92%  cmd/compile/internal/ssa.liveValues
      24MB  0.87% 86.68%       24MB  0.87%  cmd/compile/internal/gc.typ
   20.51MB  0.74% 87.42%   115.10MB  4.16%  cmd/compile/internal/ssa.(*SparseTreeMapper).Insert
   20.50MB  0.74% 88.16%    20.50MB  0.74%  cmd/compile/internal/ssa.critical
   19.92MB  0.72% 88.88%    19.92MB  0.72%  cmd/compile/internal/gc.livenessepilogue
      18MB  0.65% 89.53%       18MB  0.65%  cmd/compile/internal/gc.newblock
   17.95MB  0.65% 90.18%    17.95MB  0.65%  cmd/compile/internal/gc.Naddr

The (*Func).newValue consumption is still driven by lookupVarOutgoing:

   4057            .   196.58MB     v := b.NewValue0A(line, ssa.OpFwdRef, t, name) 
   4058            .          .     s.fwdRefs = append(s.fwdRefs, v) 
   4059     114.73MB   114.73MB     s.defvars[b.ID][name] = v

gpaul · 2016-05-07T09:03:30Z

Thank you guys - the next hurdle seems to be the compilation of
https://github.com/gogo/protobuf/blob/master/test/combos/both/thetest.pb.go

josharian · 2016-05-08T02:24:30Z

@dr2chase thinking out loud, I wonder whether instead of storing liveness as block+value pairs (which can grow quadratically), we could store it as value+[]range where range is an sdom entry/exit pair representing a subset of the CFG. In theory this could also be quadratic, if a value pops in and out of liveness, but that seems less likely.

dr2chase · 2016-05-08T15:16:10Z

@josharian Maybe. I tried plain first+last postorder numbers for stack allocation (this is essentially linear scan) and didn't get happy results, but couldn't tell if I had made some mistake with self-conflict (there are issues with loop phi functions and uses transmitted through backedges).

Can you give the sparse CL a look for style/explanation etc?
https://go-review.googlesource.com/#/c/22342/

@gpaul, what's the exact recipe for that example? I'm here:

/tmp/gopath/src/github.com/gogo/protobuf/test/combos/both$

and I see (using sparse CL still under review):

/tmp/gopath/src/github.com/gogo/protobuf/test/combos/both$ GO_SSA_PHI_LOC_CUTOFF=0
/tmp/gopath/src/github.com/gogo/protobuf/test/combos/both$ time go test .
ok      github.com/gogo/protobuf/test/combos/both   0.168s

real    0m46.232s
user    0m49.512s
sys 0m1.726s
/tmp/gopath/src/github.com/gogo/protobuf/test/combos/both$ GO_SSA_PHI_LOC_CUTOFF=-1
/tmp/gopath/src/github.com/gogo/protobuf/test/combos/both$ time go test .
ok      github.com/gogo/protobuf/test/combos/both   0.183s

real    1m20.361s
user    1m26.718s
sys 0m3.962s

There's a huge difference in memory allocation between the two, though still plenty of room for additional improvement in register/stack allocation.

josharian · 2016-05-08T16:25:50Z

@dr2chase will take a look at sparse phi CL on Monday. (Mother's Day today.) Sorry that I've been a delinquent reviewer in general. Juggling too many things.

A few other random, somewhat related thoughts:

The self-conflict bit is a fly in the ointment indeed. On my simple benchmark case in the comment above, most of the allocation is the two append statements building s.interfere, so I looked into representing interference as sets of equivalence classes. I know that the interferes relationship is not transitive in general, but it is frequently, so I was going to model it with multiple equivalence classes. The interferes relationship is also not reflexive, but I was going to assume it was anti-reflexive and compensate...boom. :(

I noticed in the list of SSA stack size regressions that most of the really big increases are init functions, which generally look a lot like this code--building a large composite literal. But the components of a composite literal can always be computed independently of each other. I wonder whether a better short term fix might be teaching the compiler that fact. We still need to fix the quadratic behavior, but reducing spills from composite literals would (a) generate better code anyway and (b) whack a whole class of problem cases.

It scares me that Type is used as a map key in stackalloc, given that everywhere else we use .Compare == CMPeq for equality. Given that, I wonder whether we should use == for the related cases in stackalloc. For 1.8, I'll plan switch us to using the GC pattern instead (onebitwalktype). That'll make the interference sets bigger, though, exacerbating the performance problems here.

josharian · 2016-05-08T16:38:32Z

Scratch the comment about composite literal independence. The evaluation of the components may have side-effects. Duh. There's still some independence, in that we can safely write each component into the final destination as calculated, but I'm not sure that that is enough to help.

gpaul · 2016-05-09T07:00:41Z

@dr2chase: The sparse phi CL let's the combos/both package compile in isolation. Building its siblings concurrently still leads to OOM issues with 8 cores and 16GB of memory:

go install -v github.com/gogo/protobuf/test/theproto3/combos/...

dr2chase · 2016-05-09T15:27:03Z

I hate to say "works for me" but it actually does, 4 cores and 8GB, OSX, I did shut down Chrome first:
time go get -v -u github.com/gogo/protobuf/test/theproto3/combos/...
takes about 13 user minutes, 5,15s real.

OSX is playing games with compressed memory, not sure what that means, but this is a screen shot taken seconds before the memory consumption peak:

I'll see what I can do to get the sparse CL in -- it may still need polish -- but I'm not sure what else can be done in 1.7.

josharian · 2016-05-09T15:42:19Z

@dr2chase again thinking out loud, if we can predict/detect when liveness info has stabilized for a block, we could interleave liveness and interference for stackalloc, since interference works arbitrary block-at-a-time, which would (probably) allow us to store less liveness info. Reactions?

gpaul · 2016-05-09T16:27:36Z

@dr2chase 'works for me' is totally valid. It's open source after all. Unfortunately I'm not so lucky. For this specific issue we may just remove the gogoprotobuf ./test packages from our repo. It seems like a pretty good test for the compiler and since it's quite a popular library it would be great if you and the rest of the compiler team would continue working towards pre-SSA memory usage and compilation times for it.

We have several large swaths of generated code internally which I haven't tried to compile yet. If they trip the compiler we'll extract some test cases early in the 1.8 cycle.

The performance improvements we're seeing with at least one high-performance component compiled with tip are pretty amazing (~30% improvements over 1.6) so I hope we can figure something out.

gopherbot · 2016-05-10T19:00:23Z

CL https://golang.org/cl/22342 mentions this issue.

For golang#15537 Compiling github.com/gogo/protobuf/test/combos/both Before: 69.83 real 82.72 user 2.01 sys 4206456832 maximum resident set size After: 62.94 real 74.74 user 1.92 sys 3832225792 maximum resident set size name old time/op new time/op delta Template 279ms ± 4% 277ms ± 4% ~ (p=0.529 n=20+20) Unicode 142ms ±10% 143ms ± 6% ~ (p=0.461 n=20+20) GoTypes 850ms ± 1% 846ms ± 2% ~ (p=0.175 n=19+20) Compiler 4.43s ± 2% 4.38s ± 1% -1.12% (p=0.000 n=19+20) name old user-ns/op new user-ns/op delta Template 376user-ms ± 4% 378user-ms ± 3% ~ (p=0.755 n=19+20) Unicode 203user-ms ± 3% 205user-ms ± 8% ~ (p=0.314 n=15+20) GoTypes 1.15user-s ± 2% 1.15user-s ± 3% ~ (p=0.799 n=20+20) Compiler 6.14user-s ± 3% 6.12user-s ± 2% ~ (p=0.547 n=20+20) name old alloc/op new alloc/op delta Template 47.6MB ± 0% 46.9MB ± 0% -1.41% (p=0.000 n=19+20) Unicode 37.1MB ± 0% 37.1MB ± 0% ~ (p=0.134 n=20+20) GoTypes 138MB ± 0% 136MB ± 0% -1.65% (p=0.000 n=20+20) Compiler 632MB ± 0% 614MB ± 0% -2.82% (p=0.000 n=20+20) name old allocs/op new allocs/op delta Template 445k ± 0% 444k ± 0% -0.17% (p=0.000 n=20+20) Unicode 356k ± 0% 356k ± 0% ~ (p=0.204 n=20+20) GoTypes 1.25M ± 0% 1.25M ± 0% -0.41% (p=0.000 n=20+19) Compiler 5.03M ± 0% 4.99M ± 0% -0.87% (p=0.000 n=19+20) Change-Id: Iee7b56ede4c319e4194160502f0796f478a9b73f

gopherbot · 2016-05-18T20:00:21Z

CL https://golang.org/cl/23211 mentions this issue.

gopherbot · 2016-07-13T19:00:25Z

CL https://golang.org/cl/24904 mentions this issue.

For golang#15537 Compiling github.com/gogo/protobuf/test/combos/both Before: 69.83 real 82.72 user 2.01 sys 4206456832 maximum resident set size After: 62.94 real 74.74 user 1.92 sys 3832225792 maximum resident set size name old time/op new time/op delta Template 279ms ± 4% 277ms ± 4% ~ (p=0.529 n=20+20) Unicode 142ms ±10% 143ms ± 6% ~ (p=0.461 n=20+20) GoTypes 850ms ± 1% 846ms ± 2% ~ (p=0.175 n=19+20) Compiler 4.43s ± 2% 4.38s ± 1% -1.12% (p=0.000 n=19+20) name old user-ns/op new user-ns/op delta Template 376user-ms ± 4% 378user-ms ± 3% ~ (p=0.755 n=19+20) Unicode 203user-ms ± 3% 205user-ms ± 8% ~ (p=0.314 n=15+20) GoTypes 1.15user-s ± 2% 1.15user-s ± 3% ~ (p=0.799 n=20+20) Compiler 6.14user-s ± 3% 6.12user-s ± 2% ~ (p=0.547 n=20+20) name old alloc/op new alloc/op delta Template 47.6MB ± 0% 46.9MB ± 0% -1.41% (p=0.000 n=19+20) Unicode 37.1MB ± 0% 37.1MB ± 0% ~ (p=0.134 n=20+20) GoTypes 138MB ± 0% 136MB ± 0% -1.65% (p=0.000 n=20+20) Compiler 632MB ± 0% 614MB ± 0% -2.82% (p=0.000 n=20+20) name old allocs/op new allocs/op delta Template 445k ± 0% 444k ± 0% -0.17% (p=0.000 n=20+20) Unicode 356k ± 0% 356k ± 0% ~ (p=0.204 n=20+20) GoTypes 1.25M ± 0% 1.25M ± 0% -0.41% (p=0.000 n=20+19) Compiler 5.03M ± 0% 4.99M ± 0% -0.87% (p=0.000 n=19+20) Change-Id: Iee7b56ede4c319e4194160502f0796f478a9b73f

bradfitz changed the title ~~compile: killed while compiling gogoprotobuf~~ cmd/compile: killed while compiling gogoprotobuf May 4, 2016

bradfitz added this to the Go1.7Maybe milestone May 4, 2016

bradfitz assigned randall77 May 4, 2016

josharian mentioned this issue May 5, 2016

cmd/compile: support inlining non-escaping closures #15561

Open

This was referenced May 11, 2016

cmd/compile: get rid of SSA call block type #15631

Closed

cmd/compile: Renumbering Value IDs before register allocation can save time/space #15652

Closed

gopherbot closed this as completed in 6b99fb5 May 16, 2016

dr2chase mentioned this issue May 19, 2016

cmd/compile: resolving forward references in large functions slows down SSA compiler #14774

Closed

awalterschulze referenced this issue in gogo/protobuf May 31, 2016

descriptor is marshaled and gzipped for faster compilation

7883e14

golang locked and limited conversation to collaborators Jul 13, 2017

gopherbot added the FrozenDueToAge label Jul 13, 2017

rsc unassigned randall77 Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: killed while compiling gogoprotobuf #15537

cmd/compile: killed while compiling gogoprotobuf #15537

gpaul commented May 4, 2016 •

edited

Loading

bradfitz commented May 4, 2016

gpaul commented May 4, 2016 •

edited

Loading

gpaul commented May 4, 2016

gpaul commented May 4, 2016

kostya-sh commented May 4, 2016

josharian commented May 4, 2016

davecheney commented May 4, 2016

josharian commented May 4, 2016

randall77 commented May 4, 2016

josharian commented May 4, 2016

dr2chase commented May 4, 2016

gopherbot commented May 4, 2016

josharian commented May 4, 2016

kostya-sh commented May 4, 2016

dr2chase commented May 4, 2016

dr2chase commented May 4, 2016

dr2chase commented May 5, 2016

josharian commented May 5, 2016

josharian commented May 5, 2016

dr2chase commented May 6, 2016

gpaul commented May 7, 2016

josharian commented May 8, 2016

dr2chase commented May 8, 2016

josharian commented May 8, 2016

josharian commented May 8, 2016

gpaul commented May 9, 2016

dr2chase commented May 9, 2016

josharian commented May 9, 2016

gpaul commented May 9, 2016

gopherbot commented May 10, 2016

gopherbot commented May 18, 2016

gopherbot commented Jul 13, 2016

cmd/compile: killed while compiling gogoprotobuf #15537

cmd/compile: killed while compiling gogoprotobuf #15537

Comments

gpaul commented May 4, 2016 • edited Loading

bradfitz commented May 4, 2016

gpaul commented May 4, 2016 • edited Loading

gpaul commented May 4, 2016

gpaul commented May 4, 2016

kostya-sh commented May 4, 2016

josharian commented May 4, 2016

davecheney commented May 4, 2016

josharian commented May 4, 2016

randall77 commented May 4, 2016

josharian commented May 4, 2016

dr2chase commented May 4, 2016

gopherbot commented May 4, 2016

josharian commented May 4, 2016

kostya-sh commented May 4, 2016

dr2chase commented May 4, 2016

dr2chase commented May 4, 2016

dr2chase commented May 5, 2016

josharian commented May 5, 2016

josharian commented May 5, 2016

dr2chase commented May 6, 2016

gpaul commented May 7, 2016

josharian commented May 8, 2016

dr2chase commented May 8, 2016

josharian commented May 8, 2016

josharian commented May 8, 2016

gpaul commented May 9, 2016

dr2chase commented May 9, 2016

josharian commented May 9, 2016

gpaul commented May 9, 2016

gopherbot commented May 10, 2016

gopherbot commented May 18, 2016

gopherbot commented Jul 13, 2016

gpaul commented May 4, 2016 •

edited

Loading

gpaul commented May 4, 2016 •

edited

Loading