Skip to content

runtime: Heap fragmentation causing memory leaks #11035

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
timtadh opened this issue Jun 2, 2015 · 24 comments
Closed

runtime: Heap fragmentation causing memory leaks #11035

timtadh opened this issue Jun 2, 2015 · 24 comments
Milestone

Comments

@timtadh
Copy link

timtadh commented Jun 2, 2015

  1. What version of Go are you using (go version)?

    $ go version
    go version go1.4.2 linux/amd64
    $ go version
    go version devel +70cf735 Tue Jun 2 13:55:40 2015 +0000 linux/amd64
    
  2. What operating system and processor architecture are you using?

    $ uname -a
    Linux nitric 3.13.0-53-generic #89-Ubuntu SMP Wed May 20 10:34:39 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
    $ cat /proc/cpuinfo 
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 58
    model name      : Intel(R) Core(TM) i7-3537U CPU @ 2.00GHz
    stepping        : 9
    microcode       : 0x17
    cpu MHz         : 2000.000
    cache size      : 4096 KB
    physical id     : 0
    siblings        : 4
    core id         : 0
    cpu cores       : 2
    apicid          : 0
    initial apicid  : 0
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 13
    wp              : yes
    flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
    bogomips        : 4988.67
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 36 bits physical, 48 bits virtual
    power management:
    $ cat /proc/meminfo 
    MemTotal:        7730856 kB
    
  3. What did you do?

    run https://github.com/timtadh/fsm on a large graph (see explanation below)

  4. What did you expect to see?

    memory usage remain relatively stable

  5. What did you see instead?

    Growth of memory throughout run of program, sometimes extremely rapid growth.

Explanation

I have been working on a frequent subgraph mining algorithm in Go. In
case you are not familiar with frequent subgraph mining, it involves
looking in a very large graph for repeated subgraphs. This is an
exponential problem and can require a lot of memory.

To solve the memory issue, I wrote a memory mapped B+Tree
fs2/bptree which stores the candidate
subgraphs. The B+Tree works fine and generally has very little overhead
in terms of using memory.

What does have overhead is generating the candidate graphs. Basically
the process goes something like this:

  1. A partition of subgraphs is loaded from the B+Tree. Each graph in the
    partition is isomorphic to every other graph in the partition. These
    are known as the embeddings.
  2. If there are not enough subgraphs in the partition the partition is
    discarded (and therefore eventually garbage collected).
  3. Otherwise, each embedding is extended by one edge to create
    candidates for the next round of mining. The embeddings are extended
    multiple times, once for each edge in the surround context.
  4. The extensions are then stored in the B+Tree.
  5. The extensions are garbaged collected, as is the partition.

In theory, the overall memory usage (outside of the B+Trees) should not
grow in this process. However, it does grow. I believe it is due to the
heap becoming fragmented. I am willing to do considerable legwork to
both show this conclusively and to help fix the problem.

This may not be fixable

Before, we go into my reasons for believing this is a fragmentation
issue, let me first not that this may not be fixable. The solutions I
have outlined at the bottom of the post may be impractical. However, I
think it would be worth trying to fix it.

In Depth

I have been profiling my program extensively for several months to try
and get rid of memory allocations as much as possible. However, there
are some limits to my ability to completely get rid of allocations
during the mining process. It also becomes difficult to reason about
ownership of particular bits of memory (like the subgraph objects) when
they are flying around between different goroutines.

One piece of evidence I have gathered for this being a fragmentation
problem is the statistics reported by GODEBUG=gctrace=1. Here is a
snapshot after the program has been running for a while:

gc7993(4): 8+9+1060+6 us, 2 -> 4 MB, 77327 (431357683-431280356) objects, 33 goroutines, 779/676/0 sweeps, 17(1319) handoff, 6(35) steal, 69/0/0 yields
gc7994(4): 7+9+1556+21 us, 2 -> 4 MB, 76847 (431409057-431332210) objects, 33 goroutines, 779/679/0 sweeps, 19(891) handoff, 10(67) steal, 118/60/8 yields
gc7995(4): 17+6+1151+12 us, 2 -> 4 MB, 76512 (431459886-431383374) objects, 33 goroutines, 779/633/0 sweeps, 22(1467) handoff, 9(70) steal, 103/32/3 yields
gc7996(4): 6+7+1082+122 us, 2 -> 4 MB, 76689 (431510921-431434232) objects, 33 goroutines, 779/634/0 sweeps, 30(1316) handoff, 6(49) steal, 114/37/4 yields
gc7997(4): 7+9+1155+112 us, 2 -> 4 MB, 76474 (431561707-431485233) objects, 33 goroutines, 779/627/0 sweeps, 13(704) handoff, 11(76) steal, 89/12/1 yields
gc7998(4): 8+10+1341+5 us, 2 -> 4 MB, 77134 (431613156-431536022) objects, 33 goroutines, 779/644/0 sweeps, 23(1502) handoff, 9(65) steal, 70/0/0 yields
gc7999(4): 7+13+1188+5 us, 2 -> 4 MB, 76252 (431663721-431587469) objects, 33 goroutines, 779/642/0 sweeps, 19(1183) handoff, 7(43) steal, 124/40/3 yields
gc8000(4): 16+6+1055+4 us, 2 -> 4 MB, 77080 (431715341-431638261) objects, 33 goroutines, 779/667/0 sweeps, 27(2229) handoff, 13(94) steal, 110/21/0 yields
gc8001(4): 8+9+1099+4 us, 2 -> 4 MB, 76757 (431766506-431689749) objects, 33 goroutines, 779/615/0 sweeps, 37(2424) handoff, 13(80) steal, 112/11/0 yields
gc8002(4): 6+6+1046+10 us, 2 -> 4 MB, 76458 (431817363-431740905) objects, 33 goroutines, 779/673/0 sweeps, 24(1090) handoff, 8(54) steal, 99/30/3 yields
gc8003(4): 7+9+981+4 us, 2 -> 4 MB, 77446 (431869376-431791930) objects, 33 goroutines, 779/657/0 sweeps, 18(1567) handoff, 12(84) steal, 71/0/0 yields
gc8004(4): 9+11+956+4 us, 2 -> 4 MB, 76508 (431920315-431843807) objects, 33 goroutines, 779/675/0 sweeps, 35(2390) handoff, 7(35) steal, 122/11/0 yields
gc8005(4): 7+1530+270+3 us, 2 -> 4 MB, 76762 (431971504-431894742) objects, 33 goroutines, 779/648/0 sweeps, 18(1058) handoff, 11(76) steal, 72/30/3 yields
gc8006(4): 7+7+1151+5 us, 2 -> 4 MB, 77295 (432023032-431945737) objects, 33 goroutines, 779/549/0 sweeps, 21(1517) handoff, 11(78) steal, 96/30/3 yields
gc8007(4): 9+13+1447+5 us, 2 -> 4 MB, 75554 (432072776-431997222) objects, 33 goroutines, 779/692/0 sweeps, 22(1054) handoff, 6(42) steal, 132/35/3 yields
gc8008(4): 17+26+1594+21 us, 2 -> 4 MB, 76957 (432124020-432047063) objects, 33 goroutines, 779/421/0 sweeps, 18(1065) handoff, 12(82) steal, 127/43/4 yields
gc8009(4): 7+7+1206+5 us, 2 -> 4 MB, 77113 (432175501-432098388) objects, 33 goroutines, 779/639/0 sweeps, 42(2590) handoff, 10(66) steal, 153/31/3 yields

One, thing to notice, the number of live objects stays relatively
constant throughout the execution of the program. It stays in the
neighborhood of 70k-80k on a 4 core machine. Inspecting the memory (as
reported in /proc//maps) shows that the Go heap does continue to
grow. Allowing the program to run for a very long time essentially
causes the B+Trees memory maps to be evicted from memory to make space
for the ever growing heap.

Sources of Memory Allocations

go tool pprof -png bin/fsm mem.pprof > mem.png

mem

go tool pprof -png -alloc_space bin/fsm mem.pprof > mem.png

mem

As you can see, the sources of allocations are what you would expect:

  1. De-serializing the subgraphs
  2. Creating new ones (the canonSubGraph)
  3. Creating the cursor objects for iterating through the B+Trees

I am obviously experimenting with ways to reduce the allocations
performed by these operations. However, at some point I may just reach
the limits with what is easily achievable in Go and I will have to
switch to either C++ or Rust (which did not really exist when I started
this project in 2012).

Does Go 1.5 Help?

I compiled and ran master yesterday. Unfortunately, my program does not
currently execute on master. I get lots of weird errors. Maybe this is
because I do a lot of unsafe things in my B+Tree implementation? I don't
know. I can post those errors if you want to see them but I think they
are separate issue.

Possible Solutions

The best idea I can think of would be a runtime routine which would
stop the world and do a heap compaction. This would never be triggered
automatically (that would be catastrophic in say an http server).
However, a program could call runtime.HeapCompact() which would cause
the mutator to stop and a full garbage collect + compaction to run.

The way this could work is to have a second garbage collector which is
either a copying-collector or a mark-and-compact collector. The
basic idea of both is to find all of the current objects active at the
time of collection. Then, to one by one move each object to a new
location and rewrite all pointers into the object. This is obviously an
expensive operations.

Drawbacks

  1. We can only find pointers from Go structures. This means any pointers
    passed to native code now point somewhere else.
  2. If we believe (conservatively) an object is being pointed to by a
    native structure we cannot move that object.

Benefits

  1. This operation only happens when the programmer asks it to happen. So
    they can potentially control negative side effects from pointers
    involving non Go code.
  2. It can potentially perfectly compact the heap.
  3. Some algorithms can improve memory locality by moving frequently
    accessed objects close together.

Other Solutions

Incremental

Incremental compaction during GC or allocations. This would mean that
the heap would never be fully compacted however, it would limit the
amount of work that is done and make the process fully automatic:

Drawbacks

  1. Everything from a programmer triggered compaction
  2. Breaking expectation of objects staying in one place
  3. Imperfect compaction

Benefits

  1. Fully automatic
  2. No new runtime methods which need to be supported in the future

Improve the Allocator

The other idea is to simply improve the allocator to reduce the
fragmentation in the first place. Programs, like mine, which cause
fragmentation can be collected and their behavior can be studied to
figure out if there are ways to reduce the fragmentation.

Drawbacks

  1. No compaction, so the heap will be larger than the idea

Benefits

  1. Fully automatic
  2. No changes in semantics of the Go memory system
  3. No new runtime methods

Conclusion

I believe Go currently has a heap fragmentation problem that impacts
long running programs which do lots of small and medium sized
allocations. A potential solution is to add a function
runtime.HeapCompact() which would trigger a stop the world compaction.

@bradfitz
Copy link
Contributor

bradfitz commented Jun 2, 2015

No major work is going to be done on the 1.4 branch, so your best bet is making your program work on tip (Go 1.5) and discussing from there.

@timtadh
Copy link
Author

timtadh commented Jun 2, 2015

@bradfitz Ok. I am definitely willing to do that but I do have any idea where errors such as:

2015/06/02 13:11:50 Size is too large. Cannot allocate an array that big
goroutine 127 [running]:
github.com/timtadh/fs2/errors.Errorf(0x6477b0, 0x34, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/errors/error.go:16 +0x8c
github.com/timtadh/fs2/bptree.(*Varchar).Alloc(0xc2087f00c0, 0xdeaddeaddeaddead, 0x0, 0x0, 0x0)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/varchar.go:189 +0x89
github.com/timtadh/fs2/bptree.(*BpTree).newVarcharKey(0xc2084a2640, 0x72000, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0x0, 0x0, 0x0, 0x0, 0x0)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/insert.go:196 +0x1a1
github.com/timtadh/fs2/bptree.(*BpTree).leafInsert(0xc2084a2640, 0x72000, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0xc2081ba040, 0x8, 0x8, 0x4b5994, 0xc2084a2640, ...)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/insert.go:214 +0xb5
github.com/timtadh/fs2/bptree.(*BpTree).insert(0xc2084a2640, 0x72000, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0xc2081ba040, 0x8, 0x8, 0x0, 0x0, ...)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/insert.go:102 +0x1a4
github.com/timtadh/fs2/bptree.(*BpTree).internalInsert(0xc2084a2640, 0x91000, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0xc2081ba040, 0x8, 0x8, 0x0, 0x0, ...)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/insert.go:134 +0x163
github.com/timtadh/fs2/bptree.(*BpTree).insert(0xc2084a2640, 0x91000, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0xc2081ba040, 0x8, 0x8, 0x0, 0xc2085ecea8, ...)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/insert.go:100 +0x131
github.com/timtadh/fs2/bptree.(*BpTree).Add(0xc2084a2640, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0xc2081ba040, 0x8, 0x8, 0x0, 0x0)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fs2/bptree/insert.go:54 +0x311
github.com/timtadh/fsm/store.(*Fs2BpTree).Add(0xc2084a2660, 0xc20842c740, 0xdeaddeaddeaddead, 0x4c, 0xc208350380)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fsm/store/fs.go:136 +0x106
github.com/timtadh/fsm/mine.(*Miner).collector(0xc208018410, 0x7f0ea41b7640, 0xc2084a2660, 0xc20822a480)
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fsm/mine/mine.go:331 +0xf2
created by github.com/timtadh/fsm/mine.(*Miner).makeCollectors
    /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fsm/mine/mine.go:343 +0x1fb

are comming from. The error here is some how the bptree.(*Varchar).Alloc() method is getting 0xdeaddeaddeaddead as the size to allocate. That seems to be a "magic" value. All of my tests pass on 1.5 so I am really not sure why this happens. I should note that I do interface with some native code via cgo if that make any difference. Perhaps this error is coming from issues with the new garbage collector?

@ianlancetaylor
Copy link
Member

That means that you have a dangling pointer. It's gotten easier to get those if you allocate memory in cgo and use a finalizer to free it. Or it could be cgo related in some other way.

@timtadh
Copy link
Author

timtadh commented Jun 2, 2015

$ go version
go version devel +70cf735 Tue Jun 2 13:55:40 2015 +0000 linux/amd64

@bradfitz / @ianlancetaylor Something funky is definitely going on. The
pointer that was marked at 0xdead comes from a channel. Basically the
code looks like this:

type labelGraph struct {
    label []byte
    sg    *goiso.SubGraph
}

type Collector struct {
    ch   chan *labelGraph
    tree store.SubGraphs
}

func (c *Collector) send(sg *SubGraph) {
    label := sg.Label()
    c.ch<-&labelGraph{label, sg}
}

func (c *Collector) collect() {
    for lg := range c.ch {
        tree.Add(lg.label, lg.sg)
    }
}

func (c *Collector) close() {
    close(c.ch)
}

func otherThread() {
    go c.collect()
    for sg := range candidates {
        c.send(sg)
    }
    c.close()
}

The error was coming from deep inside the implementation of
tree.Add().

In send(sg) I added a check to make sure the values I was putting in
were sane. They always were. I added the same check inside of Add().
There the check would fail. I pushed the check up one level to be inside
of collect. It also failed there.

So basically, I was putting good values into the channel but getting
dead values out. Super weird. If I turn off using the mmap'ed B+Tree and
use a regular datastructure the errors go away.

I tried doing something which should have had no effect, I added
buffering to the channels.

    ch: make(chan *labelGraph, 1)

This made the error stop. Which is great. However, now I get a new even
more mysterious error:

unexpected fault address 0xc208246c90
fatal error: fault
[signal 0xb code=0x2 addr=0xc208246c90 pc=0xc208246c90]

goroutine 200 [running]:
runtime.throw(0x602db0, 0x5)
        /home/hendersont/srcs/go/src/runtime/panic.go:527 +0x96 fp=0xc208ef3dd8 sp=0xc208ef3dc0
runtime.sigpanic()
        /home/hendersont/srcs/go/src/runtime/sigpanic_unix.go:27 +0x2ae fp=0xc208ef3e28 sp=0xc208ef3dd8
created by github.com/timtadh/fsm/mine.(*Miner).filterAndExtend
        /home/hendersont/stuff/research/fsm/src/github.com/timtadh/fsm/mine/mine.go:235 +0x8e

Which looks like it might be a segmentation fault? These do not happen every
run. The other error was happening consistently. This error seems to only
happen when I use anonomous memory maps. If they are backed by files it does
not occur (well has not yet occurred).

Whatever my program's problems are, the fragmentation problem in the runtime
also exists in 1.5. For instance after a successful run I get this with
GODEBUG=gctrace=1

gc #13181 @266.838s 18%: 0+0+0+10+0 ms clock, 0+0+0+0/10/0+1 ms cpu, 4->4->3 MB, 4 MB goal, 4 P
gc #13182 @266.869s 18%: 0+0+12+8+0 ms clock, 0+0+0+4/8/0+1 ms cpu, 4->6->4 MB, 4 MB goal, 4 P
gc #13183 @266.885s 18%: 0+0+0+13+0 ms clock, 0+0+0+0/13/0+2 ms cpu, 4->4->3 MB, 7 MB goal, 4 P
gc #13184 @266.905s 18%: 0+0+0+11+0 ms clock, 0+0+0+0/11/0+1 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13185 @266.929s 18%: 0+0+5+9+0 ms clock, 0+0+0+2/9/0+1 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13186 @266.949s 18%: 0+0+0+7+5 ms clock, 0+0+0+0/7/0+23 ms cpu, 4->4->2 MB, 5 MB goal, 4 P
gc #13187 @266.988s 18%: 0+0+0+32+0 ms clock, 0+0+0+11/12/0+2 ms cpu, 4->7->4 MB, 4 MB goal, 4 P
gc #13188 @267.012s 18%: 0+0+3+14+0 ms clock, 0+0+0+0/14/0+2 ms cpu, 4->5->3 MB, 8 MB goal, 4 P
gc #13189 @267.033s 18%: 0+0+0+13+0 ms clock, 0+0+0+0/13/0+1 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13190 @267.050s 18%: 0+0+0+10+0 ms clock, 0+0+0+0/10/0+2 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13191 @267.082s 18%: 0+0+3+16+0 ms clock, 3+0+0+8/16/0+0 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13192 @267.095s 18%: 0+0+0+7+0 ms clock, 0+0+0+0/7/0+2 ms cpu, 4->4->2 MB, 6 MB goal, 4 P
gc #13193 @267.127s 18%: 4+1+4+12+0 ms clock, 18+1+0+0/12/0+1 ms cpu, 4->6->3 MB, 5 MB goal, 4 P
gc #13194 @267.145s 18%: 0+0+0+10+0 ms clock, 0+0+0+0/10/0+0 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13195 @267.170s 18%: 0+0+1+11+0 ms clock, 2+0+0+0/11/0+2 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13196 @267.192s 18%: 0+0+1+10+0 ms clock, 0+0+0+0/10/0+2 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13197 @267.215s 18%: 0+0+0+14+0 ms clock, 0+0+0+0/14/0+1 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13198 @267.240s 18%: 0+0+3+10+0 ms clock, 3+0+0+0/10/0+2 ms cpu, 4->5->3 MB, 4 MB goal, 4 P
gc #13199 @267.260s 18%: 0+0+0+14+0 ms clock, 0+0+0+1/14/0+2 ms cpu, 4->4->3 MB, 4 MB goal, 4 P
gc #13200 @267.287s 18%: 0+0+6+9+0 ms clock, 0+0+0+0/9/0+1 ms cpu, 4->5->3 MB, 4 MB goal, 4 P
gc #13201 @267.318s 18%: 0+0+10+13+0 ms clock, 0+0+0+0/13/0+1 ms cpu, 4->5->4 MB, 5 MB goal, 4 P
gc #13202 @267.331s 18%: 0+0+0+9+0 ms clock, 0+0+0+0/9/0+2 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13203 @267.367s 18%: 2+0+7+13+0 ms clock, 11+0+0+0/13/0+0 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13204 @267.384s 18%: 0+0+0+9+0 ms clock, 0+0+0+0/9/0+1 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13205 @267.416s 18%: 0+0+7+12+0 ms clock, 0+0+0+3/12/0+1 ms cpu, 4->6->3 MB, 5 MB goal, 4 P

But the resident size was continuing to grow at a healthy clip!

resource-usage

This image shows the size of the memory mapped files (as files rather than
anonmous mappings) versus the size of the heap which is growing (and continuing
to grow actually out pacing the growth rate of the memory maps).

Conclusion

It looks like

  1. There might be a bug with respect to the unbuffered channels.
    Difficult for me to tell if it is your bug or mine. Let me know if
    you have any suggestions for sorting that out.
  2. The fragmentation is not fixed by the new collector. The target heap
    size is fairly constantly around 4-8 MB but the heap grows at an
    unbounded rate.

I am interested in helping solve both problems. Let me know how I can
help.

@timtadh timtadh changed the title runtime: Heap fragmentation causing memory leaks in 1.4.2 runtime: Heap fragmentation causing memory leaks Jun 2, 2015
@ianlancetaylor
Copy link
Member

Can you give us a standalone test case that we can use to reproduce the problem?

The error you are describing is consistent with what I suggested earlier: something is freeing C memory when Go thinks it is still active. Where is sg allocated?

@ianlancetaylor ianlancetaylor added this to the Go1.5 milestone Jun 3, 2015
@ianlancetaylor
Copy link
Member

Setting milestone to 1.5 to make sure we figure out why the program crashes with 1.5, if possible.

@RLH
Copy link
Contributor

RLH commented Jun 3, 2015

The Go GCed heap uses a segregated size heap so if the sizes of the objects
being freed are similar to those been allocated fragmentation should not be
a problem.

On Tue, Jun 2, 2015 at 5:31 PM, Tim Henderson notifications@github.com
wrote:

$ go version
go version devel +70cf735 Tue Jun 2 13:55:40 2015 +0000 linux/amd64

@bradfitz https://github.com/bradfitz / @ianlancetaylor
https://github.com/ianlancetaylor Something funky is definitely going
on. The
pointer that was marked at 0xdead comes from a channel. Basically the
code looks like this:

type labelGraph struct {
label []byte
sg *goiso.SubGraph
}
type Collector struct {
ch chan *labelGraph
tree store.SubGraphs
}
func (c *Collector) send(sg *SubGraph) {
label := sg.Label()
c.ch<-&labelGraph{label, sg}
}
func (c *Collector) collect() {
for lg := range c.ch {
tree.Add(lg.label, lg.sg)
}
}
func (c *Collector) close() {
close(c.ch)
}
func otherThread() {
go c.collect()
for sg := range candidates {
c.send(sg)
}
c.close()
}

The error was coming from deep inside the implementation of
tree.Add().

In send(sg) I added a check to make sure the values I was putting in
were sane. They always were. I added the same check inside of Add().
There the check would fail. I pushed the check up one level to be inside
of collect. It also failed there.

So basically, I was putting good values into the channel but getting
dead values out. Super weird. If I turn off using the mmap'ed B+Tree and
use a regular datastructure the errors go away.

I tried doing something which should have had no effect, I added
buffering to the channels.

ch: make(chan *labelGraph, 1)

This made the error stop. Which is great. However, now I get a new even
more mysterious error:

unexpected fault address 0xc208246c90
fatal error: fault
[signal 0xb code=0x2 addr=0xc208246c90 pc=0xc208246c90]

goroutine 200 [running]:
runtime.throw(0x602db0, 0x5)
/home/hendersont/srcs/go/src/runtime/panic.go:527 +0x96 fp=0xc208ef3dd8 sp=0xc208ef3dc0
runtime.sigpanic()
/home/hendersont/srcs/go/src/runtime/sigpanic_unix.go:27 +0x2ae fp=0xc208ef3e28 sp=0xc208ef3dd8
created by github.com/timtadh/fsm/mine.(*Miner).filterAndExtend
/home/hendersont/stuff/research/fsm/src/github.com/timtadh/fsm/mine/mine.go:235 +0x8e

Which looks like it might be a segmentation fault? These do not happen
every
run. The other error was happening consistently. This error seems to only
happen when I use anonomous memory maps. If they are backed by files it
does
not occur (well has not yet occurred).

Whatever my program's problems are, the fragmentation problem in the
runtime
also exists in 1.5. For instance after a successful run I get this with
GODEBUG=gctrace=1

gc #13181 @266.838s 18%: 0+0+0+10+0 ms clock, 0+0+0+0/10/0+1 ms cpu, 4->4->3 MB, 4 MB goal, 4 P
gc #13182 @266.869s 18%: 0+0+12+8+0 ms clock, 0+0+0+4/8/0+1 ms cpu, 4->6->4 MB, 4 MB goal, 4 P
gc #13183 @266.885s 18%: 0+0+0+13+0 ms clock, 0+0+0+0/13/0+2 ms cpu, 4->4->3 MB, 7 MB goal, 4 P
gc #13184 @266.905s 18%: 0+0+0+11+0 ms clock, 0+0+0+0/11/0+1 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13185 @266.929s 18%: 0+0+5+9+0 ms clock, 0+0+0+2/9/0+1 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13186 @266.949s 18%: 0+0+0+7+5 ms clock, 0+0+0+0/7/0+23 ms cpu, 4->4->2 MB, 5 MB goal, 4 P
gc #13187 @266.988s 18%: 0+0+0+32+0 ms clock, 0+0+0+11/12/0+2 ms cpu, 4->7->4 MB, 4 MB goal, 4 P
gc #13188 @267.012s 18%: 0+0+3+14+0 ms clock, 0+0+0+0/14/0+2 ms cpu, 4->5->3 MB, 8 MB goal, 4 P
gc #13189 @267.033s 18%: 0+0+0+13+0 ms clock, 0+0+0+0/13/0+1 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13190 @267.050s 18%: 0+0+0+10+0 ms clock, 0+0+0+0/10/0+2 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13191 @267.082s 18%: 0+0+3+16+0 ms clock, 3+0+0+8/16/0+0 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13192 @267.095s 18%: 0+0+0+7+0 ms clock, 0+0+0+0/7/0+2 ms cpu, 4->4->2 MB, 6 MB goal, 4 P
gc #13193 @267.127s 18%: 4+1+4+12+0 ms clock, 18+1+0+0/12/0+1 ms cpu, 4->6->3 MB, 5 MB goal, 4 P
gc #13194 @267.145s 18%: 0+0+0+10+0 ms clock, 0+0+0+0/10/0+0 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13195 @267.170s 18%: 0+0+1+11+0 ms clock, 2+0+0+0/11/0+2 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13196 @267.192s 18%: 0+0+1+10+0 ms clock, 0+0+0+0/10/0+2 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13197 @267.215s 18%: 0+0+0+14+0 ms clock, 0+0+0+0/14/0+1 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13198 @267.240s 18%: 0+0+3+10+0 ms clock, 3+0+0+0/10/0+2 ms cpu, 4->5->3 MB, 4 MB goal, 4 P
gc #13199 @267.260s 18%: 0+0+0+14+0 ms clock, 0+0+0+1/14/0+2 ms cpu, 4->4->3 MB, 4 MB goal, 4 P
gc #13200 @267.287s 18%: 0+0+6+9+0 ms clock, 0+0+0+0/9/0+1 ms cpu, 4->5->3 MB, 4 MB goal, 4 P
gc #13201 @267.318s 18%: 0+0+10+13+0 ms clock, 0+0+0+0/13/0+1 ms cpu, 4->5->4 MB, 5 MB goal, 4 P
gc #13202 @267.331s 18%: 0+0+0+9+0 ms clock, 0+0+0+0/9/0+2 ms cpu, 4->4->3 MB, 5 MB goal, 4 P
gc #13203 @267.367s 18%: 2+0+7+13+0 ms clock, 11+0+0+0/13/0+0 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13204 @267.384s 18%: 0+0+0+9+0 ms clock, 0+0+0+0/9/0+1 ms cpu, 4->5->3 MB, 5 MB goal, 4 P
gc #13205 @267.416s 18%: 0+0+7+12+0 ms clock, 0+0+0+3/12/0+1 ms cpu, 4->6->3 MB, 5 MB goal, 4 P

But the resident size was continuing to grow at a healthy clip!

[image: resource-usage]
https://cloud.githubusercontent.com/assets/38620/7947587/0cbd18dc-094c-11e5-923e-03baaf5e7e8c.png

This image shows the size of the memory mapped files (as files rather than
anonmous mappings) versus the size of the heap which is growing (and
continuing
to grow actually out pacing the growth rate of the memory maps).
Conclusion

It looks like

There might be a bug with respect to the unbuffered channels.
Difficult for me to tell if it is your bug or mine. Let me know if
you have any suggestions for sorting that out.
2.

The fragmentation is not fixed by the new collector. The target heap
size is fairly constantly around 4-8 MB but the heap grows at an
unbounded rate.

I am interested in helping solve both problems. Let me know how I can
help.


Reply to this email directly or view it on GitHub
#11035 (comment).

@timtadh
Copy link
Author

timtadh commented Jun 3, 2015

@RLH w.r.t.

so if the sizes of the objects being freed are similar to those been
allocated fragmentation should not be a problem.

In Frequent Subgraph Mining the sizes of the subgraphs (and therefore
the allocations) grow as the program proceeds. This may be something of
a pathological case for a memory allocator. There are two things going
on

  1. The program is allocating space for the subgraphs (and their
    serializations). The sizes of these allocations grow as the program
    proceeds.
  2. The program is allocating many small objects throughout the run of
    the program, such as function closures.

So, what I think tends to happen is the space where the subgraphs are
allocated

a. Becomes too small during the next round

b. Gets fragmented by other allocations

These are both hypothesis. I do not have data to back up those claims. I
will think on how to collect it.

@ianlancetaylor I can provide the program and a data set to run it on
immediately (as well as an appropriate configuration). Providing a
minimal test case may be harder as I am not sure what aspect of the
program is causing the segmentation fault/dangling pointer. I will try
and make a small program today that demonstrates it. I will also give a
shot at making a program which replicates the allocation behavior of my
program.

@timtadh
Copy link
Author

timtadh commented Jun 3, 2015

@ianlancetaylor Also in response to your question

The error you are describing is consistent with what I suggested
earlier: something is freeing C memory when Go thinks it is still
active. Where is sg allocated?

Subgraphs are allocated in 2 functions. Both are normal Go allocations.
This part does not deal with unsafe code.

  1. canonSubGraph (single vertex
    case)
  2. canonSubGraph (general
    case)
  3. DeserializeSubGraph

Thanks for your help!

@dr2chase
Copy link
Contributor

dr2chase commented Jun 3, 2015

Saw this, wanted to be sure that the contents of the mmap'd B+Tree were visible/known to the GC:

So basically, I was putting good values into the channel but getting
dead values out. Super weird. If I turn off using the mmap'ed B+Tree and
use a regular datastructure the errors go away

I'm still new to go and the library, so I could be barking up the wrong tree. But if I'm not, that's a very likely cause of 0xdeaddeaddead -- hide pointers from GC, run GC, it reclaims the storage, when you return pointers from your hiding place, they reference junk.

@timtadh
Copy link
Author

timtadh commented Jun 3, 2015

@dr2chase The values being put into the channel are normal Go values.
See my comment above to @ianlancetaylor on where the subgraphs are
constructed.

For clarity, when things get put into the B+Tree they have to first be
seralized into a byte slice. The byte slice is ultimately copied into
the mmap area. In code:

  1. Serialization
  2. BpTree.Add()
  3. Varchar.Alloc call (for values)
  4. Copy value into varchar
  5. Varchar.Alloc call (for keys)
  6. Copy key into varchar

In the case that keys/values are fixed size instead of variable sized
they get copied here:

  1. leaf nodes
  2. internal nodes

timtadh added a commit to timtadh/fsm that referenced this issue Jun 3, 2015
See golang/go#11035

Signed-off-by: Tim Henderson <tim.tadh@gmail.com>
@timtadh
Copy link
Author

timtadh commented Jun 3, 2015

@ianlancetaylor @bradfitz Here is how to reproduce the bug

Running fsm to reproduce the bug.

Prerequisites:

  1. A Linux system. I do not guarantee anything works on any other
    system. This software was written with Linux in mind!
  2. Git
  3. Go built from master (for reproducing the bug)
  4. A C++ compiler supporting C++11 I use gcc 4.8.2. Later versions are
    also known to work. The C++ compiler is for compiling bliss in
    https://github.com/timtadh/goiso .

Note, this program is part of my research for my PhD. It is not super
friendly to run and only works on Linux. Sorry for the inconvenience
this may cause you!

Steps:

$ git clone git@github.com:timtadh/fsm
$ git checkout 2510774dfe2a578e94bc8ba52dbc3e467af001c4
$ cd fsm
$ git submodule init
$ git submodule update
$ cat .activate 
export GOPATH=$(pwd):$GOPATH
export PATH=$(pwd)/bin:$PATH
$ source .activate
$ go install github.com/timtadh/fsm

At this point you should have a working fsm program. Note that you may
get some gcc warnings about non-static data initializers.

Test that fsm is working

$ fsm -h

you should see a helpful message on how to use fsm.

Now you need to download the data
http://tyrocine.case.edu/static/wiki-sort.gz . If you use wget you may
see a certificate error. This is due to the way CWRU signed the cert
(this is a server I run to host student code). If you download via
Firefox or Chrome the cert should correctly validate.

To run:

$ fsm -s 5 -m 8 --mem-profile=mem.pprof --mem-cache -o /tmp/wiki ./wiki-sort.gz

See the -h flag for definitions of each of the flags used. This
program is a research project so it is a bit rough around the edges!

The program should run for a while and then fail with a panic such as:

panic: Key was a bad value -2401018187971961171 [] 0xc208455fa0
0xc208965200

goroutine 110 [running]:
github.com/timtadh/fsm/mine.(*Miner).collector(0xc2084d8230, 0x7fb150290510, 0xc208449400, 0xc208f780c0)
        /tmp/fsm/src/github.com/timtadh/fsm/mine/mine.go:329 +0x321
created by github.com/timtadh/fsm/mine.(*Miner).makeCollectors
        /tmp/fsm/src/github.com/timtadh/fsm/mine/mine.go:354 +0x1fb

goroutine 1 [chan receive, 1 minutes]:
main.main()
        /tmp/fsm/src/github.com/timtadh/fsm/main.go:458 +0x232a

goroutine 17 [syscall, 1 minutes, locked to thread]:
runtime.goexit()

If you re-run with go 1.4.2 you should no longer see the crash.

@dr2chase
Copy link
Contributor

dr2chase commented Jun 3, 2015

Indeed, it does fail for me. I may not know enough about details of the GC to make quick sense of this, but it could be an educational adventure.

@timtadh
Copy link
Author

timtadh commented Jun 3, 2015

@dr2chase The interesting thing (at least to me) is it doesn't happen on every dataset. However, the wiki-sort graph seems to trigger it every single time. The fragmentation (which is what I hope I can get some feedback on) happens every time however.

timtadh added a commit to timtadh/fsm that referenced this issue Jun 3, 2015
This commit stops the errors seen in

- golang/go#11035
- specifically golang/go#11035 (comment)

However, in general it is not ideal to make these channels buffered and
really they should not need to be buffered. Past experience has shown
that buffering the "send to collector" channel simply masks performance
problems until sufficient memory pressure is reached. It is far better
to distribute the work evenly accross all the collectors (which was
implemented in 05fa9e9

If the Go 1.5 runtime bug gets fixed this change can be safely reverted.

Signed-off-by: Tim Henderson <tim.tadh@gmail.com>
@timtadh
Copy link
Author

timtadh commented Jun 3, 2015

@ianlancetaylor @dr2chase @bradfitz @RLH This commit
timtadh/fsm@0f87395
stops the errors I was seeing in master from occurring. I have not seen
any segmentation faults today, so perhaps that was fixed in a different
commit.

note:

$ go version
go version devel +0f6da89 Wed Jun 3 20:43:51 2015 +0000 linux/amd64

As I note in the commit message, I should not have to buffer the
channel. My guess is there is some sort of race condition where things
going through an unbuffered channel are briefly "invisible" to the
garbage collector. I think that exploring that hypothesis for the error
I was seeing is the best road.

Any thoughts of the fragmentation issue would be appreciated. It
definitely seems to be a real issue.

@dr2chase
Copy link
Contributor

dr2chase commented Jun 3, 2015

We're exploring that hypothesis, because it looks suspiciously like that to
us (me, Rick) too. I'm collecting 4000 failure logs tonight.
On Jun 3, 2015 6:51 PM, "Tim Henderson" notifications@github.com wrote:

@ianlancetaylor https://github.com/ianlancetaylor @dr2chase
https://github.com/dr2chase @bradfitz https://github.com/bradfitz @RLH
https://github.com/RLH This commit
timtadh/fsm@0f87395
timtadh/fsm@0f87395
stops the errors I was seeing in master from occurring. I have not seen
any segmentation faults today, so perhaps that was fixed in a different
commit.

note:

$ go version
go version devel +0f6da89 Wed Jun 3 20:43:51 2015 +0000 linux/amd64

As I note in the commit message, I should not have to buffer the
channel. My guess is there is some sort of race condition where things
going through an unbuffered channel are briefly "invisible" to the
garbage collector. I think that exploring that hypothesis for the error
I was seeing is the best road.

Any thoughts of the fragmentation issue would be appreciated. It
definitely seems to be a real issue.


Reply to this email directly or view it on GitHub
#11035 (comment).

@dr2chase
Copy link
Contributor

dr2chase commented Jun 4, 2015

Saw this in chan.go, thought "Oh really?":

if elem.kind&kindNoPointers != 0 || size == 0 {
        // Allocate memory in one call.
        // Hchan does not contain pointers interesting for GC in this case:
        // buf points into the same allocation, elemtype is persistent.
        // SudoG's are referenced from their owning thread so they can't be collected.

@RLH

@RLH
Copy link
Contributor

RLH commented Jun 4, 2015

Damien's email seems #11053 seems more interesting.

#11053

On Thu, Jun 4, 2015 at 8:02 AM, dr2chase notifications@github.com wrote:

Saw this in chan.go, thought "Oh really?":

if elem.kind&kindNoPointers != 0 || size == 0 {
// Allocate memory in one call.
// Hchan does not contain pointers interesting for GC in this case:
// buf points into the same allocation, elemtype is persistent.
// SudoG's are referenced from their owning thread so they can't be collected.

@RLH https://github.com/RLH


Reply to this email directly or view it on GitHub
#11035 (comment).

@randall77
Copy link
Contributor

I can reproduce the memory growth you are seeing (with 1.4.2). But it is not the Go heap. The Go heap remains constant-sized at around 8MB, as you can see in your gctrace=1 output. Something else is growing.

Diffing two instances of /proc/pid/maps, I see lots of instances like this:

< 7f6670000000-7f667238e000 rw-p 00000000 00:00 0 
< 7f667238e000-7f6674000000 ---p 00000000 00:00 0 
---
> 7f6670000000-7f66724e7000 rw-p 00000000 00:00 0 
> 7f66724e7000-7f6674000000 ---p 00000000 00:00 0 

which means larger portions of existing anonymous mappings are acquiring backing store. In this case, ~1.4MB extra in a 64MB mapping.

  1. Are you sure you are unmmapping everything you mmap?
  2. Are you using lots of C stack? These mappings might be C stack, although I think the default Linux stack is 8MB, not 64MB.

@timtadh
Copy link
Author

timtadh commented Jun 5, 2015

@randall77 Thanks for the help!

I am going to investigate this today. I wrote most of the code that my
system uses but I am making use of a C++ library for graph isomorphism
that I did not write. My initial guess is a memory management error
there.

One note, if you run the program with --cache=/path/to/tmp/dir/ it
will use file backed maps instead of anonymous maps for the B+Trees.
When I look at diffs of /proc/<pid>/maps using that option. I see the
memory maps for the B+Trees moving around (as I expect because a use
mremap when necessary to expand the mapping with MREMAP_MAYMOVE).
However, I see a bunch of other maps in there as well moving around. I
am not allocating those maps so there are two options:

  1. Go is allocating them as part of its memory management scheme. I know
    that by default the allocator on linux does this. Excerpt from
    man malloc

    Normally, malloc() allocates memory from the heap, and
    adjusts the size of the heap as required, using sbrk(2).
    When allocating blocks of memory larger than MMAP_THRESHOLD
    bytes, the glibc malloc() implementation allocates the memory
    as a private anonymous mapping using mmap(2).
    MMAP_THRESHOLD is 128 kB by default, but is adjustable using
    mallopt(3). Allocations performed using mmap(2) are
    unaffected by the RLIMIT_DATA resource limit (see
    getrlimit(2))

  2. libbliss which I use for graph isomorphism is doing a bad job
    managing its memory. I think to test this I will write a C program to
    create lots of graphs and run it under valgrind.

Thanks for the insights.

@timtadh
Copy link
Author

timtadh commented Jun 5, 2015

@randall77 Looks like libbliss has a memory leak when computing canonical computations. I am going to chase that down for now. I will report if I get an more unexpected errors in Go 1.5. Thanks for all of your help everyone!

@timtadh
Copy link
Author

timtadh commented Jun 5, 2015

@randall77 @dr2chase @ianlancetaylor @RLH The memory leak in libbliss has been plugged and that seems to have solved the heap growth problem! Thank you all for helping me out. I have been getting a few inconsistent panics in a GC thread today however they have been too inconsistent to track down. If I can find a consistent way to trigger it I will open another bug. Feel free to rename or close this issue.

Thanks again.
-Tim

@dr2chase
Copy link
Contributor

dr2chase commented Jun 6, 2015

It's suspected that some recent changes for concurrent GC may be involved in the panic. It reproduces quite nicely on my Linux box, and is robust in the face of debugging printlns, and my suspicion is that the reference is sometimes dropped during unbuffered channel send/receive.

@timtadh
Copy link
Author

timtadh commented Jun 6, 2015 via email

rsc added a commit that referenced this issue Jun 15, 2015

Verified

This commit was signed with the committer’s verified signature.
bradfitz Brad Fitzpatrick
…hansend

A send on an unbuffered channel to a blocked receiver is the only
case in the runtime where one goroutine writes directly to the stack
of another. The garbage collector assumes that if a goroutine is
blocked, its stack contains no new pointers since the last time it ran.
The send on an unbuffered channel violates this, so it needs an
explicit write barrier. It has an explicit write barrier, but not one that
can handle a write to another stack. Use one that can (based on type bitmap
instead of heap bitmap).

To make this work, raise the limit for type bitmaps so that they are
used for all types up to 64 kB in size (256 bytes of bitmap).
(The runtime already imposes a limit of 64 kB for a channel element size.)

I have been unable to reproduce this problem in a simple test program.

Could help #11035.

Change-Id: I06ad994032d8cff3438c9b3eaa8d853915128af5
Reviewed-on: https://go-review.googlesource.com/10815
Reviewed-by: Austin Clements <austin@google.com>
@golang golang locked and limited conversation to collaborators Jun 25, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants