cmd/compile: runs out of memory allocating 12k + items in a map at init, works as literal map #13554

pindamonhangaba · 2015-12-09T20:38:21Z

On Ubuntu 14.04, Go 1.4.1 64 bits code compiles fine, with 1.5.2 64 bits I get the following error:

    # command-line-arguments
    fatal error: runtime: out of memory

    runtime stack:
    runtime.throw(0x843f10, 0x16)
            /usr/local/go/src/runtime/panic.go:527 +0x90
    runtime.sysMap(0xc8fc0c0000, 0xf6d0000, 0x44a600, 0xdab818)
            /usr/local/go/src/runtime/mem_linux.go:203 +0x9b
    runtime.mHeap_SysAlloc(0xc95400, 0xf6d0000, 0x0)
            /usr/local/go/src/runtime/malloc.go:426 +0x160
    runtime.mHeap_Grow(0xc95400, 0x7b68, 0x0)
            /usr/local/go/src/runtime/mheap.go:628 +0x63
    runtime.mHeap_AllocSpanLocked(0xc95400, 0x7b63, 0x0)
            /usr/local/go/src/runtime/mheap.go:532 +0x5f1
    runtime.mHeap_Alloc_m(0xc95400, 0x7b63, 0x100000000, 0xc81982ccff)
            /usr/local/go/src/runtime/mheap.go:425 +0x1ac
    runtime.mHeap_Alloc.func1()
            /usr/local/go/src/runtime/mheap.go:484 +0x41
    runtime.systemstack(0xc82003dee0)
            /usr/local/go/src/runtime/asm_amd64.s:278 +0xab
    runtime.mHeap_Alloc(0xc95400, 0x7b63, 0x100000000, 0xc820022720)
            /usr/local/go/src/runtime/mheap.go:485 +0x63
    runtime.largeAlloc(0xf6c6000, 0x3, 0x0)
            /usr/local/go/src/runtime/malloc.go:748 +0xb3
    runtime.mallocgc.func3()
            /usr/local/go/src/runtime/malloc.go:637 +0x33
    runtime.systemstack(0xc820021500)
            /usr/local/go/src/runtime/asm_amd64.s:262 +0x79
    runtime.mstart()
            /usr/local/go/src/runtime/proc1.go:668

    goroutine 1 [running]:
    runtime.systemstack_switch()
            /usr/local/go/src/runtime/asm_amd64.s:216 fp=0xc8b55892c8 sp=0xc8b55892c0
    runtime.mallocgc(0xf6c6000, 0x0, 0x3, 0xc8efa66000)
            /usr/local/go/src/runtime/malloc.go:638 +0x9c4 fp=0xc8b5589398 sp=0xc8b55892c8
    runtime.rawmem(0xf6c6000, 0xc8b55893d8)
            /usr/local/go/src/runtime/malloc.go:791 +0x32 fp=0xc8b55893c0 sp=0xc8b5589398
    runtime.growslice(0x785c20, 0xc8efa66000, 0xc56a000, 0xc56a000, 0xc56a001, 0x0, 0x0, 0x0)
            /usr/local/go/src/runtime/slice.go:92 +0x233 fp=0xc8b5589430 sp=0xc8b55893c0
    cmd/internal/obj.Symgrow(0xc8200aa000, 0xc8309ec1e0, 0xc56a001)
            /usr/local/go/src/cmd/internal/obj/data.go:52 +0x23b fp=0xc8b55894f0 sp=0xc8b5589430
    cmd/internal/obj.Setuintxx(0xc8200aa000, 0xc8309ec1e0, 0xc56a000, 0x0, 0x1, 0xc56a000)
            /usr/local/go/src/cmd/internal/obj/data.go:126 +0x76 fp=0xc8b5589548 sp=0xc8b55894f0
    cmd/compile/internal/gc.duintxx(0xc8309a5100, 0xc56a000, 0x0, 0x1, 0xc56a000)
            /usr/local/go/src/cmd/compile/internal/gc/obj.go:178 +0x97 fp=0xc8b5589630 sp=0xc8b5589548
    cmd/compile/internal/gc.duint8(0xc8309a5100, 0xc56a000, 0x12b00, 0xc56a000)
            /usr/local/go/src/cmd/compile/internal/gc/obj.go:182 +0x3e fp=0xc8b5589660 sp=0xc8b5589630
    cmd/compile/internal/gc.onebitwritesymbol(0xc8bcc82000, 0x63df, 0x7600, 0xc8309a5100)
            /usr/local/go/src/cmd/compile/internal/gc/plive.go:1747 +0x19b fp=0xc8b55896e8 sp=0xc8b5589660
    cmd/compile/internal/gc.liveness(0xc8201fcc60, 0xc8214de000, 0xc8309a5080, 0xc8309a5100)
            /usr/local/go/src/cmd/compile/internal/gc/plive.go:1815 +0x3ba fp=0xc8b55897c8 sp=0xc8b55896e8
    cmd/compile/internal/gc.compile(0xc8201fcc60)
            /usr/local/go/src/cmd/compile/internal/gc/pgen.go:528 +0xe07 fp=0xc8b5589a38 sp=0xc8b55897c8
    cmd/compile/internal/gc.funccompile(0xc8201fcc60)
            /usr/local/go/src/cmd/compile/internal/gc/dcl.go:1484 +0x1c9 fp=0xc8b5589ab0 sp=0xc8b5589a38
    cmd/compile/internal/gc.Main()
            /usr/local/go/src/cmd/compile/internal/gc/lex.go:473 +0x1f08 fp=0xc8b5589db0 sp=0xc8b5589ab0
    cmd/compile/internal/amd64.Main()
            /usr/local/go/src/cmd/compile/internal/amd64/galign.go:127 +0x58d fp=0xc8b5589e18 sp=0xc8b5589db0
    main.main()
            /usr/local/go/src/cmd/compile/main.go:26 +0x189 fp=0xc8b5589ef0 sp=0xc8b5589e18
    runtime.main()
            /usr/local/go/src/runtime/proc.go:111 +0x2b0 fp=0xc8b5589f40 sp=0xc8b5589ef0
    runtime.goexit()
            /usr/local/go/src/runtime/asm_amd64.s:1721 +0x1 fp=0xc8b5589f48 sp=0xc8b5589f40

As a side note, after compiling this example I saw that the executable had around 500 MB in size! So I realized that I should've just made it a literal map, and that brought it back to 8MB. And it compiled in both 1.4.1 and 1.5.2.

noram.go

The text was updated successfully, but these errors were encountered:

bradfitz · 2015-12-09T20:42:10Z

/cc @aclements for triage.

ianlancetaylor · 2015-12-09T20:42:58Z

This is the compiler running out of memory, not the runtime.

aclements · 2015-12-10T20:51:41Z

I don't think this is new in 1.5.2. Both 1.5.2 and 1.4.1 use ~3.5GB of memory when compiling this.

$ /usr/bin/time $(gover run 1.5.2 go tool -n compile noram.go)
28.48user 8.36system 0:31.88elapsed 115%CPU (0avgtext+0avgdata 3512724maxresident)k
25224inputs+506144outputs (108major+800295minor)pagefaults 0swaps
$ du -h noram.o
248M    noram.o

$ /usr/bin/time $(gover run 1.4.1 go tool -n 6g noram.go) 
17.16user 2.46system 0:20.93elapsed 93%CPU (0avgtext+0avgdata 3701620maxresident)k
248inputs+983576outputs (1major+922620minor)pagefaults 0swaps
$ du -h noram.o
248M    noram.o

Tip (b8b65c1) is basically the same as 1.5.2. Since the memory use of the C version of the compiler is similar to the Go version of the compiler (a little worse, actually), this is certainly just the compiler using a lot of memory and not a runtime memory management issue.

randall77 · 2016-04-29T05:01:21Z

This appears somewhat better now. It only used ~2GB to compile at tip.
I'm going to punt this to 1.8 for now. Almost freeze and no pending changes I know of will help this issue.

josharian · 2017-05-17T05:02:08Z

Comparing 1.7.5, 1.8.1, and tip at c20e545 shows a mixed bag. Compile time is up, but memory use is down, and object file size is way down.

name \ time/op       17           18           tip
Pkg                   18.1s ± 3%   32.3s ± 3%   53.1s ± 2%

name \ user-time/op  17           18           tip
Pkg                   20.9s ± 4%   35.2s ± 3%   55.0s ± 2%

name \ alloc/op      17           18           tip
Pkg                  2.58GB ± 0%  4.14GB ± 0%  2.04GB ± 0%

name \ allocs/op     17           18           tip
Pkg                   6.34M ± 0%   6.99M ± 0%   6.21M ± 0%

name \ object-bytes  17           18           tip
Pkg                    252M ± 0%    193M ± 0%     28M ± 0%

name \ export-bytes  17           18           tip
Pkg                     176 ± 0%     181 ± 0%     208 ± 0%

I suspect that the memory usage decrease is due to CL 35114.

Compiling this file now spends a whopping 64% of its CPU time on line fuse.go:123, which is v.Block = c from:

	// move all of b's values to c.
	for _, v := range b.Values {
		v.Block = c
	}

Looks like fuseBlockPlain is accidentally quadratic. In a long series of plain blocks, every value gets moved along one at a time, block by block. I'll think about whether there's a low risk fix for that.

gopherbot · 2017-05-17T14:54:01Z

CL https://golang.org/cl/43570 mentions this issue.

gopherbot · 2017-05-17T14:54:02Z

CL https://golang.org/cl/43571 mentions this issue.

josharian · 2017-05-17T14:58:58Z

There were actually two quadratic bits. CLs 43570 and 43571 contain fairly minimal fixes for them. Up to @randall77 and @cherrymui whether they want them for 1.9 or 1.10, and whether @cherrymui would like to do 43571 differently.

After those CLs, this code now compiles in a somewhat acceptable 8s, although memory usage is still high at 2gb. I don't see any obvious next fixes for the memory usage. The next likely source of speed-ups would be to fix #20393.

fuseBlockPlain was accidentally quadratic. If you had plain blocks b1 -> b2 -> b3 -> b4, each containing single values v1, v2, v3, and v4 respectively, fuseBlockPlain would move v1 from b1 to b2 to b3 to b4, then v2 from b2 to b3 to b4, etc. There are two obvious fixes. * Look for runs of blocks in fuseBlockPlain and handle them in a single go. * Fuse from end to beginning; any given value in a run of blocks to fuse then moves only once. The latter is much simpler, so that's what this CL does. Somewhat surprisingly, this change does not pass toolstash-check. The resulting set of blocks is the same, and the values in them are the same, but the order of values in them differ, and that order of values (while arbitrary) is enough to change the compiler's output. This may be due to #20178; deadstore is the next pass after fuse. Adding basic sorting to the beginning of deadstore is enough to make this CL pass toolstash-check: for _, b := range f.Blocks { obj.SortSlice(b.Values, func(i, j int) bool { return b.Values[i].ID < b.Values[j].ID }) } Happily, this CL appears to result in better code on average, if only by accident. It cuts 4k off of cmd/go; go1 benchmarks are noisy as always but don't regress (numbers below). No impact on the standard compilebench benchmarks. For the code in #13554, this speeds up compilation dramatically: name old time/op new time/op delta Pkg 53.1s ± 2% 12.8s ± 3% -75.92% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Pkg 55.0s ± 2% 14.9s ± 3% -73.00% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Pkg 2.04GB ± 0% 2.04GB ± 0% +0.18% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Pkg 6.21M ± 0% 6.21M ± 0% ~ (p=0.222 n=5+5) name old object-bytes new object-bytes delta Pkg 28.4M ± 0% 28.4M ± 0% +0.00% (p=0.008 n=5+5) name old export-bytes new export-bytes delta Pkg 208 ± 0% 208 ± 0% ~ (all equal) Updates #13554 go1 benchmarks: name old time/op new time/op delta BinaryTree17-8 2.29s ± 2% 2.26s ± 2% -1.43% (p=0.000 n=48+50) Fannkuch11-8 2.74s ± 2% 2.79s ± 2% +1.63% (p=0.000 n=50+49) FmtFprintfEmpty-8 36.6ns ± 3% 34.6ns ± 4% -5.29% (p=0.000 n=49+50) FmtFprintfString-8 58.3ns ± 3% 59.1ns ± 3% +1.35% (p=0.000 n=50+49) FmtFprintfInt-8 62.4ns ± 2% 63.2ns ± 3% +1.19% (p=0.000 n=49+49) FmtFprintfIntInt-8 95.1ns ± 2% 96.7ns ± 3% +1.61% (p=0.000 n=49+50) FmtFprintfPrefixedInt-8 118ns ± 3% 113ns ± 2% -4.00% (p=0.000 n=50+49) FmtFprintfFloat-8 191ns ± 2% 192ns ± 2% +0.40% (p=0.034 n=50+50) FmtManyArgs-8 419ns ± 2% 420ns ± 2% ~ (p=0.228 n=49+49) GobDecode-8 5.26ms ± 3% 5.19ms ± 2% -1.33% (p=0.000 n=50+49) GobEncode-8 4.12ms ± 2% 4.15ms ± 3% +0.68% (p=0.007 n=49+50) Gzip-8 198ms ± 2% 197ms ± 2% -0.50% (p=0.018 n=48+48) Gunzip-8 31.9ms ± 3% 31.8ms ± 3% -0.47% (p=0.024 n=50+50) HTTPClientServer-8 64.4µs ± 0% 64.0µs ± 0% -0.55% (p=0.000 n=43+46) JSONEncode-8 10.6ms ± 2% 10.6ms ± 3% ~ (p=0.543 n=49+49) JSONDecode-8 43.3ms ± 3% 43.1ms ± 2% ~ (p=0.079 n=50+50) Mandelbrot200-8 3.70ms ± 2% 3.70ms ± 2% ~ (p=0.553 n=47+50) GoParse-8 2.70ms ± 2% 2.71ms ± 3% ~ (p=0.843 n=49+50) RegexpMatchEasy0_32-8 70.5ns ± 4% 70.4ns ± 4% ~ (p=0.867 n=48+50) RegexpMatchEasy0_1K-8 162ns ± 3% 162ns ± 2% ~ (p=0.739 n=48+48) RegexpMatchEasy1_32-8 66.1ns ± 5% 66.2ns ± 4% ~ (p=0.970 n=50+50) RegexpMatchEasy1_1K-8 297ns ± 7% 296ns ± 7% ~ (p=0.406 n=50+50) RegexpMatchMedium_32-8 105ns ± 5% 105ns ± 5% ~ (p=0.702 n=50+50) RegexpMatchMedium_1K-8 32.3µs ± 4% 32.2µs ± 3% ~ (p=0.614 n=49+49) RegexpMatchHard_32-8 1.75µs ±18% 1.74µs ±12% ~ (p=0.738 n=50+48) RegexpMatchHard_1K-8 52.2µs ±14% 51.3µs ±13% ~ (p=0.230 n=50+50) Revcomp-8 366ms ± 3% 367ms ± 3% ~ (p=0.745 n=49+49) Template-8 48.5ms ± 4% 48.5ms ± 4% ~ (p=0.824 n=50+48) TimeParse-8 263ns ± 2% 256ns ± 2% -2.98% (p=0.000 n=48+49) TimeFormat-8 265ns ± 3% 262ns ± 3% -1.35% (p=0.000 n=48+49) [Geo mean] 41.1µs 40.9µs -0.48% Change-Id: Ib35fa15b54282abb39c077d150beee27f610891a Reviewed-on: https://go-review.googlesource.com/43570 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>

The writebarrier pass processes WB ops from beginning to end, replacing them by other values. But it also checks whether there are more ops to process by walking from beginning to end. This is quadratic, so walk from end to beginning instead. This speeds up compiling the code in issue 13554: name old time/op new time/op delta Pkg 11.9s ± 2% 8.3s ± 3% -29.88% (p=0.000 n=18+17) Updates #13554 Passes toolstash-check. Change-Id: I5f8a872ddc4b783540220d89ea2ee188a6d2b2ff Reviewed-on: https://go-review.googlesource.com/43571 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>

gopherbot · 2018-05-01T06:19:54Z

Change https://golang.org/cl/110560 mentions this issue: cmd/compile: optimize bvec routines

The recent improvements to the prove pass make it possible to provide bounds hints to the compiler in some bvec routines. This speeds up the compilation of the code in name old time/op new time/op delta Pkg 7.93s ± 4% 7.69s ± 3% -2.98% (p=0.000 n=29+26) While we're here, clean up some C-isms. Updates #13554 Updates #20393 Change-Id: I47a0ec68543a9fc95c5359c3f37813fb529cb4f0 Reviewed-on: https://go-review.googlesource.com/110560 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>

gopherbot · 2018-11-27T17:55:02Z

Change https://golang.org/cl/151497 mentions this issue: cmd/compile: don't generate algorithms for slice literal backing store

gopherbot · 2018-11-27T20:58:55Z

Change https://golang.org/cl/151498 mentions this issue: cmd/compile: eliminate write barriers when writing non-heap ptrs

We don't need a write barrier if: 1) The location we're writing to doesn't hold a heap pointer, and 2) The value we're writing isn't a heap pointer. The freshly returned value from runtime.newobject satisfies (1). Pointers to globals, and the contents of the read-only data section satisfy (2). This is particularly helpful for code like: p := []string{"abc", "def", "ghi"} Where the compiler generates: a := new([3]string) move(a, statictmp_) // eliminates write barriers here p := a[:] For big slice literals, this makes the code a smaller and faster to compile. Update #13554. Reduces the compile time by ~10% and RSS by ~30%. Change-Id: Icab81db7591c8777f68e5d528abd48c7e44c87eb Reviewed-on: https://go-review.googlesource.com/c/151498 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Cherry Zhang <cherryyz@google.com>

odeke-em · 2019-02-03T20:52:09Z

@aclements, you added "wait-release" to @randall77's CL https://go-review.googlesource.com/c/go/+/151497/ but this issue is still triaged for Go1.12.
Should we perhaps move this to Go1.13? After all this isn't necessarily a new regression as it has existed since Go1.4*

randall77 · 2019-05-06T19:42:38Z

Tip uses only about 1.2GB to compile this example now.
I'm going to close as fixed.
(I'm sure there's more we could do, but we can track that work on other issues.)

bradfitz assigned aclements Dec 9, 2015

bradfitz changed the title ~~1.5.2 out of memory when allocating 12k + items in a map at init(), works when defined as a literal map~~ runtime: 1.5.2 out of memory when allocating 12k + items in a map at init(), works when defined as a literal map Dec 9, 2015

ianlancetaylor changed the title ~~runtime: 1.5.2 out of memory when allocating 12k + items in a map at init(), works when defined as a literal map~~ cmd/compile: runs out of memory allocating 12k + items in a map at init, works as literal map Dec 9, 2015

ianlancetaylor added this to the Go1.7 milestone Dec 9, 2015

aclements assigned randall77 and unassigned aclements Dec 10, 2015

randall77 modified the milestones: Go1.8, Go1.7 Apr 29, 2016

quentinmit added the NeedsFix label Oct 11, 2016

rsc modified the milestones: Go1.9, Go1.8 Oct 21, 2016

josharian modified the milestones: Go1.10, Go1.9 May 29, 2017

mdempsky modified the milestones: Go1.10, Go1.11 Nov 30, 2017

josharian modified the milestones: Go1.11, Go1.12 May 1, 2018

andybons modified the milestones: Go1.12, Go1.13 Feb 12, 2019

randall77 closed this as completed May 6, 2019

golang locked and limited conversation to collaborators May 5, 2020

gopherbot added the FrozenDueToAge label May 5, 2020

rsc unassigned randall77 Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: runs out of memory allocating 12k + items in a map at init, works as literal map #13554

cmd/compile: runs out of memory allocating 12k + items in a map at init, works as literal map #13554

pindamonhangaba commented Dec 9, 2015

bradfitz commented Dec 9, 2015

ianlancetaylor commented Dec 9, 2015

aclements commented Dec 10, 2015

randall77 commented Apr 29, 2016

josharian commented May 17, 2017

gopherbot commented May 17, 2017

gopherbot commented May 17, 2017

josharian commented May 17, 2017

gopherbot commented May 1, 2018

gopherbot commented Nov 27, 2018

gopherbot commented Nov 27, 2018

odeke-em commented Feb 3, 2019

randall77 commented May 6, 2019

cmd/compile: runs out of memory allocating 12k + items in a map at init, works as literal map #13554

cmd/compile: runs out of memory allocating 12k + items in a map at init, works as literal map #13554

Comments

pindamonhangaba commented Dec 9, 2015

bradfitz commented Dec 9, 2015

ianlancetaylor commented Dec 9, 2015

aclements commented Dec 10, 2015

randall77 commented Apr 29, 2016

josharian commented May 17, 2017

gopherbot commented May 17, 2017

gopherbot commented May 17, 2017

josharian commented May 17, 2017

gopherbot commented May 1, 2018

gopherbot commented Nov 27, 2018

gopherbot commented Nov 27, 2018

odeke-em commented Feb 3, 2019

randall77 commented May 6, 2019