runtime: Heap fragmentation causing memory leaks #11035

timtadh · 2015-06-02T16:45:13Z

What version of Go are you using (go version)?

$ go version
go version go1.4.2 linux/amd64
$ go version
go version devel +70cf735 Tue Jun 2 13:55:40 2015 +0000 linux/amd64

What operating system and processor architecture are you using?

$ uname -a
Linux nitric 3.13.0-53-generic #89-Ubuntu SMP Wed May 20 10:34:39 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/cpuinfo 
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 58
model name      : Intel(R) Core(TM) i7-3537U CPU @ 2.00GHz
stepping        : 9
microcode       : 0x17
cpu MHz         : 2000.000
cache size      : 4096 KB
physical id     : 0
siblings        : 4
core id         : 0
cpu cores       : 2
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 13
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips        : 4988.67
clflush size    : 64
cache_alignment : 64
address sizes   : 36 bits physical, 48 bits virtual
power management:
$ cat /proc/meminfo 
MemTotal:        7730856 kB

What did you do?

run https://github.com/timtadh/fsm on a large graph (see explanation below)
What did you expect to see?

memory usage remain relatively stable
What did you see instead?

Growth of memory throughout run of program, sometimes extremely rapid growth.

Explanation

I have been working on a frequent subgraph mining algorithm in Go. In
case you are not familiar with frequent subgraph mining, it involves
looking in a very large graph for repeated subgraphs. This is an
exponential problem and can require a lot of memory.

To solve the memory issue, I wrote a memory mapped B+Tree
fs2/bptree which stores the candidate
subgraphs. The B+Tree works fine and generally has very little overhead
in terms of using memory.

What does have overhead is generating the candidate graphs. Basically
the process goes something like this:

A partition of subgraphs is loaded from the B+Tree. Each graph in the
partition is isomorphic to every other graph in the partition. These
are known as the embeddings.
If there are not enough subgraphs in the partition the partition is
discarded (and therefore eventually garbage collected).
Otherwise, each embedding is extended by one edge to create
candidates for the next round of mining. The embeddings are extended
multiple times, once for each edge in the surround context.
The extensions are then stored in the B+Tree.
The extensions are garbaged collected, as is the partition.

In theory, the overall memory usage (outside of the B+Trees) should not
grow in this process. However, it does grow. I believe it is due to the
heap becoming fragmented. I am willing to do considerable legwork to
both show this conclusively and to help fix the problem.

This may not be fixable

Before, we go into my reasons for believing this is a fragmentation
issue, let me first not that this may not be fixable. The solutions I
have outlined at the bottom of the post may be impractical. However, I
think it would be worth trying to fix it.

In Depth

I have been profiling my program extensively for several months to try
and get rid of memory allocations as much as possible. However, there
are some limits to my ability to completely get rid of allocations
during the mining process. It also becomes difficult to reason about
ownership of particular bits of memory (like the subgraph objects) when
they are flying around between different goroutines.

One piece of evidence I have gathered for this being a fragmentation
problem is the statistics reported by GODEBUG=gctrace=1. Here is a
snapshot after the program has been running for a while:

gc7993(4): 8+9+1060+6 us, 2 -> 4 MB, 77327 (431357683-431280356) objects, 33 goroutines, 779/676/0 sweeps, 17(1319) handoff, 6(35) steal, 69/0/0 yields
gc7994(4): 7+9+1556+21 us, 2 -> 4 MB, 76847 (431409057-431332210) objects, 33 goroutines, 779/679/0 sweeps, 19(891) handoff, 10(67) steal, 118/60/8 yields
gc7995(4): 17+6+1151+12 us, 2 -> 4 MB, 76512 (431459886-431383374) objects, 33 goroutines, 779/633/0 sweeps, 22(1467) handoff, 9(70) steal, 103/32/3 yields
gc7996(4): 6+7+1082+122 us, 2 -> 4 MB, 76689 (431510921-431434232) objects, 33 goroutines, 779/634/0 sweeps, 30(1316) handoff, 6(49) steal, 114/37/4 yields
gc7997(4): 7+9+1155+112 us, 2 -> 4 MB, 76474 (431561707-431485233) objects, 33 goroutines, 779/627/0 sweeps, 13(704) handoff, 11(76) steal, 89/12/1 yields
gc7998(4): 8+10+1341+5 us, 2 -> 4 MB, 77134 (431613156-431536022) objects, 33 goroutines, 779/644/0 sweeps, 23(1502) handoff, 9(65) steal, 70/0/0 yields
gc7999(4): 7+13+1188+5 us, 2 -> 4 MB, 76252 (431663721-431587469) objects, 33 goroutines, 779/642/0 sweeps, 19(1183) handoff, 7(43) steal, 124/40/3 yields
gc8000(4): 16+6+1055+4 us, 2 -> 4 MB, 77080 (431715341-431638261) objects, 33 goroutines, 779/667/0 sweeps, 27(2229) handoff, 13(94) steal, 110/21/0 yields
gc8001(4): 8+9+1099+4 us, 2 -> 4 MB, 76757 (431766506-431689749) objects, 33 goroutines, 779/615/0 sweeps, 37(2424) handoff, 13(80) steal, 112/11/0 yields
gc8002(4): 6+6+1046+10 us, 2 -> 4 MB, 76458 (431817363-431740905) objects, 33 goroutines, 779/673/0 sweeps, 24(1090) handoff, 8(54) steal, 99/30/3 yields
gc8003(4): 7+9+981+4 us, 2 -> 4 MB, 77446 (431869376-431791930) objects, 33 goroutines, 779/657/0 sweeps, 18(1567) handoff, 12(84) steal, 71/0/0 yields
gc8004(4): 9+11+956+4 us, 2 -> 4 MB, 76508 (431920315-431843807) objects, 33 goroutines, 779/675/0 sweeps, 35(2390) handoff, 7(35) steal, 122/11/0 yields
gc8005(4): 7+1530+270+3 us, 2 -> 4 MB, 76762 (431971504-431894742) objects, 33 goroutines, 779/648/0 sweeps, 18(1058) handoff, 11(76) steal, 72/30/3 yields
gc8006(4): 7+7+1151+5 us, 2 -> 4 MB, 77295 (432023032-431945737) objects, 33 goroutines, 779/549/0 sweeps, 21(1517) handoff, 11(78) steal, 96/30/3 yields
gc8007(4): 9+13+1447+5 us, 2 -> 4 MB, 75554 (432072776-431997222) objects, 33 goroutines, 779/692/0 sweeps, 22(1054) handoff, 6(42) steal, 132/35/3 yields
gc8008(4): 17+26+1594+21 us, 2 -> 4 MB, 76957 (432124020-432047063) objects, 33 goroutines, 779/421/0 sweeps, 18(1065) handoff, 12(82) steal, 127/43/4 yields
gc8009(4): 7+7+1206+5 us, 2 -> 4 MB, 77113 (432175501-432098388) objects, 33 goroutines, 779/639/0 sweeps, 42(2590) handoff, 10(66) steal, 153/31/3 yields

One, thing to notice, the number of live objects stays relatively
constant throughout the execution of the program. It stays in the
neighborhood of 70k-80k on a 4 core machine. Inspecting the memory (as
reported in /proc//maps) shows that the Go heap does continue to
grow. Allowing the program to run for a very long time essentially
causes the B+Trees memory maps to be evicted from memory to make space
for the ever growing heap.

Sources of Memory Allocations

go tool pprof -png bin/fsm mem.pprof > mem.png

go tool pprof -png -alloc_space bin/fsm mem.pprof > mem.png

As you can see, the sources of allocations are what you would expect:

De-serializing the subgraphs
Creating new ones (the canonSubGraph)
Creating the cursor objects for iterating through the B+Trees

I am obviously experimenting with ways to reduce the allocations
performed by these operations. However, at some point I may just reach
the limits with what is easily achievable in Go and I will have to
switch to either C++ or Rust (which did not really exist when I started
this project in 2012).

Does Go 1.5 Help?

I compiled and ran master yesterday. Unfortunately, my program does not
currently execute on master. I get lots of weird errors. Maybe this is
because I do a lot of unsafe things in my B+Tree implementation? I don't
know. I can post those errors if you want to see them but I think they
are separate issue.

Possible Solutions

The best idea I can think of would be a runtime routine which would
stop the world and do a heap compaction. This would never be triggered
automatically (that would be catastrophic in say an http server).
However, a program could call runtime.HeapCompact() which would cause
the mutator to stop and a full garbage collect + compaction to run.

The way this could work is to have a second garbage collector which is
either a copying-collector or a mark-and-compact collector. The
basic idea of both is to find all of the current objects active at the
time of collection. Then, to one by one move each object to a new
location and rewrite all pointers into the object. This is obviously an
expensive operations.

Drawbacks

We can only find pointers from Go structures. This means any pointers
passed to native code now point somewhere else.
If we believe (conservatively) an object is being pointed to by a
native structure we cannot move that object.

Benefits

This operation only happens when the programmer asks it to happen. So
they can potentially control negative side effects from pointers
involving non Go code.
It can potentially perfectly compact the heap.
Some algorithms can improve memory locality by moving frequently
accessed objects close together.

Conclusion

I believe Go currently has a heap fragmentation problem that impacts
long running programs which do lots of small and medium sized
allocations. A potential solution is to add a function
runtime.HeapCompact() which would trigger a stop the world compaction.

The text was updated successfully, but these errors were encountered:

bradfitz · 2015-06-02T16:50:10Z

No major work is going to be done on the 1.4 branch, so your best bet is making your program work on tip (Go 1.5) and discussing from there.

timtadh · 2015-06-02T17:16:41Z

@bradfitz Ok. I am definitely willing to do that but I do have any idea where errors such as:

2015/06/02 13:11:50 Size is too large. Cannot allocate an array that big
goroutine 127 [running]:
github.com/timtadh/fs2/errors.Errorf(0x6477b0, 0x34, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/errors/error.go:16 +0x8c
github.com/timtadh/fs2/bptree.(*Varchar).Alloc(0xc2087f00c0, 0xdeaddeaddeaddead, 0x0, 0x0, 0x0)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/varchar.go:189 +0x89
github.com/timtadh/fs2/bptree.(*BpTree).newVarcharKey(0xc2084a2640, 0x72000, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/insert.go:196 +0x1a1
github.com/timtadh/fs2/bptree.(*BpTree).leafInsert(0xc2084a2640, 0x72000, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0xc2081ba040, 0x8, 0x8, 0x4b5994, 0xc2084a2640, ...)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/insert.go:214 +0xb5
github.com/timtadh/fs2/bptree.(*BpTree).insert(0xc2084a2640, 0x72000, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0xc2081ba040, 0x8, 0x8, 0x0, 0x0, ...)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/insert.go:102 +0x1a4
github.com/timtadh/fs2/bptree.(*BpTree).internalInsert(0xc2084a2640, 0x91000, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0xc2081ba040, 0x8, 0x8, 0x0, 0x0, ...)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/insert.go:134 +0x163
github.com/timtadh/fs2/bptree.(*BpTree).insert(0xc2084a2640, 0x91000, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0xc2081ba040, 0x8, 0x8, 0x0, 0xc2085ecea8, ...)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/insert.go:100 +0x131
github.com/timtadh/fs2/bptree.(*BpTree).Add(0xc2084a2640, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0xc2081ba040, 0x8, 0x8, 0x0, 0x0)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/insert.go:54 +0x311
github.com/timtadh/fsm/store.(*Fs2BpTree).Add(0xc2084a2660, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0xc208350380)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fsm/store/fs.go:136 +0x106
github.com/timtadh/fsm/mine.(*Miner).collector(0xc208018410, 0x7f0ea41b7640, 0xc2084a2660, 0xc20822a480)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fsm/mine/mine.go:331 +0xf2
created by github.com/timtadh/fsm/mine.(*Miner).makeCollectors
    /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fsm/mine/mine.go:343 +0x1fb

are comming from. The error here is some how the bptree.(*Varchar).Alloc() method is getting 0xdeaddeaddeaddead as the size to allocate. That seems to be a "magic" value. All of my tests pass on 1.5 so I am really not sure why this happens. I should note that I do interface with some native code via cgo if that make any difference. Perhaps this error is coming from issues with the new garbage collector?

ianlancetaylor · 2015-06-02T17:29:09Z

That means that you have a dangling pointer. It's gotten easier to get those if you allocate memory in cgo and use a finalizer to free it. Or it could be cgo related in some other way.

timtadh · 2015-06-02T21:31:33Z

$ go version
go version devel +70cf735 Tue Jun 2 13:55:40 2015 +0000 linux/amd64

@bradfitz / @ianlancetaylor Something funky is definitely going on. The
pointer that was marked at 0xdead comes from a channel. Basically the
code looks like this:

type labelGraph struct {
    label []byte
    sg    *goiso.SubGraph
}

type Collector struct {
    ch   chan *labelGraph
    tree store.SubGraphs
}

func (c *Collector) send(sg *SubGraph) {
    label := sg.Label()
    c.ch<-&labelGraph{label, sg}
}

func (c *Collector) collect() {
    for lg := range c.ch {
        tree.Add(lg.label, lg.sg)
    }
}

func (c *Collector) close() {
    close(c.ch)
}

func otherThread() {
    go c.collect()
    for sg := range candidates {
        c.send(sg)
    }
    c.close()
}

The error was coming from deep inside the implementation of
tree.Add().

In send(sg) I added a check to make sure the values I was putting in
were sane. They always were. I added the same check inside of Add().
There the check would fail. I pushed the check up one level to be inside
of collect. It also failed there.

So basically, I was putting good values into the channel but getting
dead values out. Super weird. If I turn off using the mmap'ed B+Tree and
use a regular datastructure the errors go away.

I tried doing something which should have had no effect, I added
buffering to the channels.

    ch: make(chan *labelGraph, 1)

This made the error stop. Which is great. However, now I get a new even
more mysterious error:

unexpected fault address 0xc208246c90
fatal error: fault
[signal 0xb code=0x2 addr=0xc208246c90 pc=0xc208246c90]

goroutine 200 [running]:
runtime.throw(0x602db0, 0x5)
        /home/hendersont/srcs/go/src/runtime/panic.go:527 +0x96 fp=0xc208ef3dd8 sp=0xc208ef3dc0
runtime.sigpanic()
        /home/hendersont/srcs/go/src/runtime/sigpanic_unix.go:27 +0x2ae fp=0xc208ef3e28 sp=0xc208ef3dd8
created by github.com/timtadh/fsm/mine.(*Miner).filterAndExtend
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fsm/mine/mine.go:235 +0x8e

Which looks like it might be a segmentation fault? These do not happen every
run. The other error was happening consistently. This error seems to only
happen when I use anonomous memory maps. If they are backed by files it does
not occur (well has not yet occurred).

Whatever my program's problems are, the fragmentation problem in the runtime
also exists in 1.5. For instance after a successful run I get this with
GODEBUG=gctrace=1

gc #13181 @266.838s 18%: 0+0+0+10+0 ms clock, 0+0+0+0/10/0+1 ms cpu, 4->4->3 MB, 4 MB goal, 4 P
gc #13182 @266.869s 18%: 0+0+12+8+0 ms clock, 0+0+0+4/8/0+1 ms cpu, 4->6->4 MB, 4 MB goal, 4 P
gc #13183 @266.885s 18%: 0+0+0+13+0 ms clock, 0+0+0+0/13/0+2 ms cpu, 4->4->3 MB, 7 MB goal, 4 P
gc #13184 @266.905s 18%: 0+0+0+11+0 ms clock, 0+0+0+0/11/0+1 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13185 @266.929s 18%: 0+0+5+9+0 ms clock, 0+0+0+2/9/0+1 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13186 @266.949s 18%: 0+0+0+7+5 ms clock, 0+0+0+0/7/0+23 ms cpu, 4->4->2 MB, 5 MB goal, 4 P
gc #13187 @266.988s 18%: 0+0+0+32+0 ms clock, 0+0+0+11/12/0+2 ms cpu, 4->7->4 MB, 4 MB goal, 4 P
gc #13188 @267.012s 18%: 0+0+3+14+0 ms clock, 0+0+0+0/14/0+2 ms cpu, 4->5->3 MB, 8 MB goal, 4 P
gc #13189 @267.033s 18%: 0+0+0+13+0 ms clock, 0+0+0+0/13/0+1 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13190 @267.050s 18%: 0+0+0+10+0 ms clock, 0+0+0+0/10/0+2 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13191 @267.082s 18%: 0+0+3+16+0 ms clock, 3+0+0+8/16/0+0 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13192 @267.095s 18%: 0+0+0+7+0 ms clock, 0+0+0+0/7/0+2 ms cpu, 4->4->2 MB, 6 MB goal, 4 P
gc #13193 @267.127s 18%: 4+1+4+12+0 ms clock, 18+1+0+0/12/0+1 ms cpu, 4->6->3 MB, 5 MB goal, 4 P
gc #13194 @267.145s 18%: 0+0+0+10+0 ms clock, 0+0+0+0/10/0+0 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13195 @267.170s 18%: 0+0+1+11+0 ms clock, 2+0+0+0/11/0+2 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13196 @267.192s 18%: 0+0+1+10+0 ms clock, 0+0+0+0/10/0+2 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13197 @267.215s 18%: 0+0+0+14+0 ms clock, 0+0+0+0/14/0+1 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13198 @267.240s 18%: 0+0+3+10+0 ms clock, 3+0+0+0/10/0+2 ms cpu, 4->5->3 MB, 4 MB goal, 4 P
gc #13199 @267.260s 18%: 0+0+0+14+0 ms clock, 0+0+0+1/14/0+2 ms cpu, 4->4->3 MB, 4 MB goal, 4 P
gc #13200 @267.287s 18%: 0+0+6+9+0 ms clock, 0+0+0+0/9/0+1 ms cpu, 4->5->3 MB, 4 MB goal, 4 P
gc #13201 @267.318s 18%: 0+0+10+13+0 ms clock, 0+0+0+0/13/0+1 ms cpu, 4->5->4 MB, 5 MB goal, 4 P
gc #13202 @267.331s 18%: 0+0+0+9+0 ms clock, 0+0+0+0/9/0+2 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13203 @267.367s 18%: 2+0+7+13+0 ms clock, 11+0+0+0/13/0+0 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13204 @267.384s 18%: 0+0+0+9+0 ms clock, 0+0+0+0/9/0+1 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13205 @267.416s 18%: 0+0+7+12+0 ms clock, 0+0+0+3/12/0+1 ms cpu, 4->6->3 MB, 5 MB goal, 4 P

But the resident size was continuing to grow at a healthy clip!

This image shows the size of the memory mapped files (as files rather than
anonmous mappings) versus the size of the heap which is growing (and continuing
to grow actually out pacing the growth rate of the memory maps).

Conclusion

It looks like

There might be a bug with respect to the unbuffered channels.
Difficult for me to tell if it is your bug or mine. Let me know if
you have any suggestions for sorting that out.
The fragmentation is not fixed by the new collector. The target heap
size is fairly constantly around 4-8 MB but the heap grows at an
unbounded rate.

I am interested in helping solve both problems. Let me know how I can
help.

ianlancetaylor · 2015-06-02T22:35:12Z

Can you give us a standalone test case that we can use to reproduce the problem?

The error you are describing is consistent with what I suggested earlier: something is freeing C memory when Go thinks it is still active. Where is sg allocated?

ianlancetaylor · 2015-06-03T00:41:24Z

Setting milestone to 1.5 to make sure we figure out why the program crashes with 1.5, if possible.

RLH · 2015-06-03T11:10:18Z

The Go GCed heap uses a segregated size heap so if the sizes of the objects
being freed are similar to those been allocated fragmentation should not be
a problem.

On Tue, Jun 2, 2015 at 5:31 PM, Tim Henderson notifications@github.com
wrote:

$ go version
go version devel +70cf735 Tue Jun 2 13:55:40 2015 +0000 linux/amd64

@bradfitz https://github.com/bradfitz / @ianlancetaylor
https://github.com/ianlancetaylor Something funky is definitely going
on. The
pointer that was marked at 0xdead comes from a channel. Basically the
code looks like this:

type labelGraph struct {
label []byte
sg *goiso.SubGraph
}
type Collector struct {
ch chan *labelGraph
tree store.SubGraphs
}
func (c *Collector) send(sg *SubGraph) {
label := sg.Label()
c.ch<-&labelGraph{label, sg}
}
func (c *Collector) collect() {
for lg := range c.ch {
tree.Add(lg.label, lg.sg)
}
}
func (c *Collector) close() {
close(c.ch)
}
func otherThread() {
go c.collect()
for sg := range candidates {
c.send(sg)
}
c.close()
}

The error was coming from deep inside the implementation of
tree.Add().

In send(sg) I added a check to make sure the values I was putting in
were sane. They always were. I added the same check inside of Add().
There the check would fail. I pushed the check up one level to be inside
of collect. It also failed there.

So basically, I was putting good values into the channel but getting
dead values out. Super weird. If I turn off using the mmap'ed B+Tree and
use a regular datastructure the errors go away.

I tried doing something which should have had no effect, I added
buffering to the channels.
ch: make(chan *labelGraph, 1)
This made the error stop. Which is great. However, now I get a new even
more mysterious error:

unexpected fault address 0xc208246c90
fatal error: fault
[signal 0xb code=0x2 addr=0xc208246c90 pc=0xc208246c90]

goroutine 200 [running]:
runtime.throw(0x602db0, 0x5)
/home/hendersont/srcs/go/src/runtime/panic.go:527 +0x96 fp=0xc208ef3dd8 sp=0xc208ef3dc0
runtime.sigpanic()
/home/hendersont/srcs/go/src/runtime/sigpanic_unix.go:27 +0x2ae fp=0xc208ef3e28 sp=0xc208ef3dd8
created by github.com/timtadh/fsm/mine.(*Miner).filterAndExtend
/home/hendersont/stuff/research/fsm/src/github.com/timtadh/fsm/mine/mine.go:235 +0x8e

Which looks like it might be a segmentation fault? These do not happen
every
run. The other error was happening consistently. This error seems to only
happen when I use anonomous memory maps. If they are backed by files it
does
not occur (well has not yet occurred).

Whatever my program's problems are, the fragmentation problem in the
runtime
also exists in 1.5. For instance after a successful run I get this with
GODEBUG=gctrace=1

gc #13181 @266.838s 18%: 0+0+0+10+0 ms clock, 0+0+0+0/10/0+1 ms cpu, 4->4->3 MB, 4 MB goal, 4 P
gc #13182 @266.869s 18%: 0+0+12+8+0 ms clock, 0+0+0+4/8/0+1 ms cpu, 4->6->4 MB, 4 MB goal, 4 P
gc #13183 @266.885s 18%: 0+0+0+13+0 ms clock, 0+0+0+0/13/0+2 ms cpu, 4->4->3 MB, 7 MB goal, 4 P
gc #13184 @266.905s 18%: 0+0+0+11+0 ms clock, 0+0+0+0/11/0+1 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13185 @266.929s 18%: 0+0+5+9+0 ms clock, 0+0+0+2/9/0+1 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13186 @266.949s 18%: 0+0+0+7+5 ms clock, 0+0+0+0/7/0+23 ms cpu, 4->4->2 MB, 5 MB goal, 4 P
gc #13187 @266.988s 18%: 0+0+0+32+0 ms clock, 0+0+0+11/12/0+2 ms cpu, 4->7->4 MB, 4 MB goal, 4 P
gc #13188 @267.012s 18%: 0+0+3+14+0 ms clock, 0+0+0+0/14/0+2 ms cpu, 4->5->3 MB, 8 MB goal, 4 P
gc #13189 @267.033s 18%: 0+0+0+13+0 ms clock, 0+0+0+0/13/0+1 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13190 @267.050s 18%: 0+0+0+10+0 ms clock, 0+0+0+0/10/0+2 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13191 @267.082s 18%: 0+0+3+16+0 ms clock, 3+0+0+8/16/0+0 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13192 @267.095s 18%: 0+0+0+7+0 ms clock, 0+0+0+0/7/0+2 ms cpu, 4->4->2 MB, 6 MB goal, 4 P
gc #13193 @267.127s 18%: 4+1+4+12+0 ms clock, 18+1+0+0/12/0+1 ms cpu, 4->6->3 MB, 5 MB goal, 4 P
gc #13194 @267.145s 18%: 0+0+0+10+0 ms clock, 0+0+0+0/10/0+0 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13195 @267.170s 18%: 0+0+1+11+0 ms clock, 2+0+0+0/11/0+2 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13196 @267.192s 18%: 0+0+1+10+0 ms clock, 0+0+0+0/10/0+2 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13197 @267.215s 18%: 0+0+0+14+0 ms clock, 0+0+0+0/14/0+1 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13198 @267.240s 18%: 0+0+3+10+0 ms clock, 3+0+0+0/10/0+2 ms cpu, 4->5->3 MB, 4 MB goal, 4 P
gc #13199 @267.260s 18%: 0+0+0+14+0 ms clock, 0+0+0+1/14/0+2 ms cpu, 4->4->3 MB, 4 MB goal, 4 P
gc #13200 @267.287s 18%: 0+0+6+9+0 ms clock, 0+0+0+0/9/0+1 ms cpu, 4->5->3 MB, 4 MB goal, 4 P
gc #13201 @267.318s 18%: 0+0+10+13+0 ms clock, 0+0+0+0/13/0+1 ms cpu, 4->5->4 MB, 5 MB goal, 4 P
gc #13202 @267.331s 18%: 0+0+0+9+0 ms clock, 0+0+0+0/9/0+2 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13203 @267.367s 18%: 2+0+7+13+0 ms clock, 11+0+0+0/13/0+0 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13204 @267.384s 18%: 0+0+0+9+0 ms clock, 0+0+0+0/9/0+1 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13205 @267.416s 18%: 0+0+7+12+0 ms clock, 0+0+0+3/12/0+1 ms cpu, 4->6->3 MB, 5 MB goal, 4 P

But the resident size was continuing to grow at a healthy clip!

[image: resource-usage]
https://cloud.githubusercontent.com/assets/38620/7947587/0cbd18dc-094c-11e5-923e-03baaf5e7e8c.png

This image shows the size of the memory mapped files (as files rather than
anonmous mappings) versus the size of the heap which is growing (and
continuing
to grow actually out pacing the growth rate of the memory maps).
Conclusion

It looks like

There might be a bug with respect to the unbuffered channels.
Difficult for me to tell if it is your bug or mine. Let me know if
you have any suggestions for sorting that out.
2.

The fragmentation is not fixed by the new collector. The target heap
size is fairly constantly around 4-8 MB but the heap grows at an
unbounded rate.

I am interested in helping solve both problems. Let me know how I can
help.

—
Reply to this email directly or view it on GitHub
#11035 (comment).

timtadh · 2015-06-03T14:06:01Z

@RLH w.r.t.

so if the sizes of the objects being freed are similar to those been
allocated fragmentation should not be a problem.

In Frequent Subgraph Mining the sizes of the subgraphs (and therefore
the allocations) grow as the program proceeds. This may be something of
a pathological case for a memory allocator. There are two things going
on

The program is allocating space for the subgraphs (and their
serializations). The sizes of these allocations grow as the program
proceeds.
The program is allocating many small objects throughout the run of
the program, such as function closures.

So, what I think tends to happen is the space where the subgraphs are
allocated

a. Becomes too small during the next round

b. Gets fragmented by other allocations

These are both hypothesis. I do not have data to back up those claims. I
will think on how to collect it.

@ianlancetaylor I can provide the program and a data set to run it on
immediately (as well as an appropriate configuration). Providing a
minimal test case may be harder as I am not sure what aspect of the
program is causing the segmentation fault/dangling pointer. I will try
and make a small program today that demonstrates it. I will also give a
shot at making a program which replicates the allocation behavior of my
program.

timtadh · 2015-06-03T14:14:00Z

@ianlancetaylor Also in response to your question

The error you are describing is consistent with what I suggested
earlier: something is freeing C memory when Go thinks it is still
active. Where is sg allocated?

Subgraphs are allocated in 2 functions. Both are normal Go allocations.
This part does not deal with unsafe code.

Thanks for your help!

dr2chase · 2015-06-03T14:19:02Z

Saw this, wanted to be sure that the contents of the mmap'd B+Tree were visible/known to the GC:

So basically, I was putting good values into the channel but getting
dead values out. Super weird. If I turn off using the mmap'ed B+Tree and
use a regular datastructure the errors go away

I'm still new to go and the library, so I could be barking up the wrong tree. But if I'm not, that's a very likely cause of 0xdeaddeaddead -- hide pointers from GC, run GC, it reclaims the storage, when you return pointers from your hiding place, they reference junk.

timtadh · 2015-06-03T14:33:19Z

@dr2chase The values being put into the channel are normal Go values.
See my comment above to @ianlancetaylor on where the subgraphs are
constructed.

For clarity, when things get put into the B+Tree they have to first be
seralized into a byte slice. The byte slice is ultimately copied into
the mmap area. In code:

In the case that keys/values are fixed size instead of variable sized
they get copied here:

See golang/go#11035 Signed-off-by: Tim Henderson <tim.tadh@gmail.com>

timtadh · 2015-06-03T15:22:44Z

@ianlancetaylor @bradfitz Here is how to reproduce the bug

Running fsm to reproduce the bug.

Prerequisites:

A Linux system. I do not guarantee anything works on any other
system. This software was written with Linux in mind!
Git
Go built from master (for reproducing the bug)
A C++ compiler supporting C++11 I use gcc 4.8.2. Later versions are
also known to work. The C++ compiler is for compiling bliss in
https://github.com/timtadh/goiso .

Note, this program is part of my research for my PhD. It is not super
friendly to run and only works on Linux. Sorry for the inconvenience
this may cause you!

Steps:

$ git clone git@github.com:timtadh/fsm
$ git checkout 2510774dfe2a578e94bc8ba52dbc3e467af001c4
$ cd fsm
$ git submodule init
$ git submodule update
$ cat .activate 
export GOPATH=$(pwd):$GOPATH
export PATH=$(pwd)/bin:$PATH
$ source .activate
$ go install github.com/timtadh/fsm

At this point you should have a working fsm program. Note that you may
get some gcc warnings about non-static data initializers.

Test that fsm is working

$ fsm -h

you should see a helpful message on how to use fsm.

Now you need to download the data
http://tyrocine.case.edu/static/wiki-sort.gz . If you use wget you may
see a certificate error. This is due to the way CWRU signed the cert
(this is a server I run to host student code). If you download via
Firefox or Chrome the cert should correctly validate.

To run:

$ fsm -s 5 -m 8 --mem-profile=mem.pprof --mem-cache -o /tmp/wiki ./wiki-sort.gz

See the -h flag for definitions of each of the flags used. This
program is a research project so it is a bit rough around the edges!

The program should run for a while and then fail with a panic such as:

panic: Key was a bad value -2401018187971961171 [] 0xc208455fa0
0xc208965200

goroutine 110 [running]:
github.com/timtadh/fsm/mine.(*Miner).collector(0xc2084d8230, 0x7fb150290510, 0xc208449400, 0xc208f780c0)
        /tmp/fsm/src/github.com/timtadh/fsm/mine/mine.go:329 +0x321
created by github.com/timtadh/fsm/mine.(*Miner).makeCollectors
        /tmp/fsm/src/github.com/timtadh/fsm/mine/mine.go:354 +0x1fb

goroutine 1 [chan receive, 1 minutes]:
main.main()
        /tmp/fsm/src/github.com/timtadh/fsm/main.go:458 +0x232a

goroutine 17 [syscall, 1 minutes, locked to thread]:
runtime.goexit()

If you re-run with go 1.4.2 you should no longer see the crash.

dr2chase · 2015-06-03T19:51:47Z

Indeed, it does fail for me. I may not know enough about details of the GC to make quick sense of this, but it could be an educational adventure.

timtadh · 2015-06-03T21:23:44Z

@dr2chase The interesting thing (at least to me) is it doesn't happen on every dataset. However, the wiki-sort graph seems to trigger it every single time. The fragmentation (which is what I hope I can get some feedback on) happens every time however.

This commit stops the errors seen in - golang/go#11035 - specifically golang/go#11035 (comment) However, in general it is not ideal to make these channels buffered and really they should not need to be buffered. Past experience has shown that buffering the "send to collector" channel simply masks performance problems until sufficient memory pressure is reached. It is far better to distribute the work evenly accross all the collectors (which was implemented in 05fa9e9 If the Go 1.5 runtime bug gets fixed this change can be safely reverted. Signed-off-by: Tim Henderson <tim.tadh@gmail.com>

timtadh · 2015-06-03T22:51:28Z

@ianlancetaylor @dr2chase @bradfitz @RLH This commit
timtadh/fsm@0f87395
stops the errors I was seeing in master from occurring. I have not seen
any segmentation faults today, so perhaps that was fixed in a different
commit.

note:

$ go version
go version devel +0f6da89 Wed Jun 3 20:43:51 2015 +0000 linux/amd64

As I note in the commit message, I should not have to buffer the
channel. My guess is there is some sort of race condition where things
going through an unbuffered channel are briefly "invisible" to the
garbage collector. I think that exploring that hypothesis for the error
I was seeing is the best road.

Any thoughts of the fragmentation issue would be appreciated. It
definitely seems to be a real issue.

dr2chase · 2015-06-03T23:20:43Z

We're exploring that hypothesis, because it looks suspiciously like that to
us (me, Rick) too. I'm collecting 4000 failure logs tonight.
On Jun 3, 2015 6:51 PM, "Tim Henderson" notifications@github.com wrote:

@ianlancetaylor https://github.com/ianlancetaylor @dr2chase
https://github.com/dr2chase @bradfitz https://github.com/bradfitz @RLH
https://github.com/RLH This commit
timtadh/fsm@0f87395
timtadh/fsm@0f87395
stops the errors I was seeing in master from occurring. I have not seen
any segmentation faults today, so perhaps that was fixed in a different
commit.

note:

$ go version
go version devel +0f6da89 Wed Jun 3 20:43:51 2015 +0000 linux/amd64

As I note in the commit message, I should not have to buffer the
channel. My guess is there is some sort of race condition where things
going through an unbuffered channel are briefly "invisible" to the
garbage collector. I think that exploring that hypothesis for the error
I was seeing is the best road.

Any thoughts of the fragmentation issue would be appreciated. It
definitely seems to be a real issue.

—
Reply to this email directly or view it on GitHub
#11035 (comment).

dr2chase · 2015-06-04T12:02:12Z

Saw this in chan.go, thought "Oh really?":

if elem.kind&kindNoPointers != 0 || size == 0 {
        // Allocate memory in one call.
        // Hchan does not contain pointers interesting for GC in this case:
        // buf points into the same allocation, elemtype is persistent.
        // SudoG's are referenced from their owning thread so they can't be collected.

@RLH

RLH · 2015-06-04T12:43:17Z

Damien's email seems #11053 seems more interesting.

#11053

On Thu, Jun 4, 2015 at 8:02 AM, dr2chase notifications@github.com wrote:

Saw this in chan.go, thought "Oh really?":

if elem.kind&kindNoPointers != 0 || size == 0 {
// Allocate memory in one call.
// Hchan does not contain pointers interesting for GC in this case:
// buf points into the same allocation, elemtype is persistent.
// SudoG's are referenced from their owning thread so they can't be collected.

@RLH https://github.com/RLH

—
Reply to this email directly or view it on GitHub
#11035 (comment).

randall77 · 2015-06-04T20:55:02Z

I can reproduce the memory growth you are seeing (with 1.4.2). But it is not the Go heap. The Go heap remains constant-sized at around 8MB, as you can see in your gctrace=1 output. Something else is growing.

Diffing two instances of /proc/pid/maps, I see lots of instances like this:

< 7f6670000000-7f667238e000 rw-p 00000000 00:00 0 
< 7f667238e000-7f6674000000 ---p 00000000 00:00 0 
---
> 7f6670000000-7f66724e7000 rw-p 00000000 00:00 0 
> 7f66724e7000-7f6674000000 ---p 00000000 00:00 0

which means larger portions of existing anonymous mappings are acquiring backing store. In this case, ~1.4MB extra in a 64MB mapping.

Are you sure you are unmmapping everything you mmap?
Are you using lots of C stack? These mappings might be C stack, although I think the default Linux stack is 8MB, not 64MB.

timtadh · 2015-06-05T15:15:46Z

@randall77 Thanks for the help!

I am going to investigate this today. I wrote most of the code that my
system uses but I am making use of a C++ library for graph isomorphism
that I did not write. My initial guess is a memory management error
there.

One note, if you run the program with --cache=/path/to/tmp/dir/ it
will use file backed maps instead of anonymous maps for the B+Trees.
When I look at diffs of /proc/<pid>/maps using that option. I see the
memory maps for the B+Trees moving around (as I expect because a use
mremap when necessary to expand the mapping with MREMAP_MAYMOVE).
However, I see a bunch of other maps in there as well moving around. I
am not allocating those maps so there are two options:

Go is allocating them as part of its memory management scheme. I know
that by default the allocator on linux does this. Excerpt from
man malloc

Normally, malloc() allocates memory from the heap, and
adjusts the size of the heap as required, using sbrk(2).
When allocating blocks of memory larger than MMAP_THRESHOLD
bytes, the glibc malloc() implementation allocates the memory
as a private anonymous mapping using mmap(2).
MMAP_THRESHOLD is 128 kB by default, but is adjustable using
mallopt(3). Allocations performed using mmap(2) are
unaffected by the RLIMIT_DATA resource limit (see
getrlimit(2))
libbliss which I use for graph isomorphism is doing a bad job
managing its memory. I think to test this I will write a C program to
create lots of graphs and run it under valgrind.

Thanks for the insights.

timtadh · 2015-06-05T16:31:17Z

@randall77 Looks like libbliss has a memory leak when computing canonical computations. I am going to chase that down for now. I will report if I get an more unexpected errors in Go 1.5. Thanks for all of your help everyone!

timtadh · 2015-06-05T21:20:38Z

@randall77 @dr2chase @ianlancetaylor @RLH The memory leak in libbliss has been plugged and that seems to have solved the heap growth problem! Thank you all for helping me out. I have been getting a few inconsistent panics in a GC thread today however they have been too inconsistent to track down. If I can find a consistent way to trigger it I will open another bug. Feel free to rename or close this issue.

Thanks again.
-Tim

dr2chase · 2015-06-06T02:50:40Z

It's suspected that some recent changes for concurrent GC may be involved in the panic. It reproduces quite nicely on my Linux box, and is robust in the face of debugging printlns, and my suspicion is that the reference is sometimes dropped during unbuffered channel send/receive.

timtadh · 2015-06-06T12:48:10Z

Let's open a separate issue for the panic.

…hansend A send on an unbuffered channel to a blocked receiver is the only case in the runtime where one goroutine writes directly to the stack of another. The garbage collector assumes that if a goroutine is blocked, its stack contains no new pointers since the last time it ran. The send on an unbuffered channel violates this, so it needs an explicit write barrier. It has an explicit write barrier, but not one that can handle a write to another stack. Use one that can (based on type bitmap instead of heap bitmap). To make this work, raise the limit for type bitmaps so that they are used for all types up to 64 kB in size (256 bytes of bitmap). (The runtime already imposes a limit of 64 kB for a channel element size.) I have been unable to reproduce this problem in a simple test program. Could help #11035. Change-Id: I06ad994032d8cff3438c9b3eaa8d853915128af5 Reviewed-on: https://go-review.googlesource.com/10815 Reviewed-by: Austin Clements <austin@google.com>

timtadh changed the title ~~runtime: Heap fragmentation causing memory leaks in 1.4.2~~ runtime: Heap fragmentation causing memory leaks Jun 2, 2015

ianlancetaylor added this to the Go1.5 milestone Jun 3, 2015

timtadh added a commit to timtadh/fsm that referenced this issue Jun 3, 2015

Added extra checks for go 1.5 errors

2510774

See golang/go#11035 Signed-off-by: Tim Henderson <tim.tadh@gmail.com>

randall77 closed this as completed Jun 5, 2015

golang locked and limited conversation to collaborators Jun 25, 2016

gopherbot added the FrozenDueToAge label Jun 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: Heap fragmentation causing memory leaks #11035

runtime: Heap fragmentation causing memory leaks #11035

timtadh commented Jun 2, 2015

bradfitz commented Jun 2, 2015

timtadh commented Jun 2, 2015

ianlancetaylor commented Jun 2, 2015

timtadh commented Jun 2, 2015

ianlancetaylor commented Jun 2, 2015

ianlancetaylor commented Jun 3, 2015

RLH commented Jun 3, 2015

timtadh commented Jun 3, 2015

timtadh commented Jun 3, 2015

dr2chase commented Jun 3, 2015

timtadh commented Jun 3, 2015

timtadh commented Jun 3, 2015

dr2chase commented Jun 3, 2015

timtadh commented Jun 3, 2015

timtadh commented Jun 3, 2015

dr2chase commented Jun 3, 2015

dr2chase commented Jun 4, 2015

RLH commented Jun 4, 2015

randall77 commented Jun 4, 2015

timtadh commented Jun 5, 2015

timtadh commented Jun 5, 2015

timtadh commented Jun 5, 2015

dr2chase commented Jun 6, 2015

timtadh commented Jun 6, 2015 via email

runtime: Heap fragmentation causing memory leaks #11035

runtime: Heap fragmentation causing memory leaks #11035

Comments

timtadh commented Jun 2, 2015

Explanation

This may not be fixable

In Depth

Sources of Memory Allocations

Does Go 1.5 Help?

Possible Solutions

Other Solutions

Incremental

Improve the Allocator

Conclusion

bradfitz commented Jun 2, 2015

timtadh commented Jun 2, 2015

ianlancetaylor commented Jun 2, 2015

timtadh commented Jun 2, 2015

Conclusion

ianlancetaylor commented Jun 2, 2015

ianlancetaylor commented Jun 3, 2015

RLH commented Jun 3, 2015

timtadh commented Jun 3, 2015

timtadh commented Jun 3, 2015

dr2chase commented Jun 3, 2015

timtadh commented Jun 3, 2015

timtadh commented Jun 3, 2015

dr2chase commented Jun 3, 2015

timtadh commented Jun 3, 2015

timtadh commented Jun 3, 2015

dr2chase commented Jun 3, 2015

dr2chase commented Jun 4, 2015

RLH commented Jun 4, 2015

randall77 commented Jun 4, 2015

timtadh commented Jun 5, 2015

timtadh commented Jun 5, 2015

timtadh commented Jun 5, 2015

dr2chase commented Jun 6, 2015

timtadh commented Jun 6, 2015 via email