-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime: Heap fragmentation causing memory leaks #11035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
No major work is going to be done on the 1.4 branch, so your best bet is making your program work on tip (Go 1.5) and discussing from there. |
@bradfitz Ok. I am definitely willing to do that but I do have any idea where errors such as:
are comming from. The error here is some how the |
That means that you have a dangling pointer. It's gotten easier to get those if you allocate memory in cgo and use a finalizer to free it. Or it could be cgo related in some other way. |
@bradfitz / @ianlancetaylor Something funky is definitely going on. The type labelGraph struct {
label []byte
sg *goiso.SubGraph
}
type Collector struct {
ch chan *labelGraph
tree store.SubGraphs
}
func (c *Collector) send(sg *SubGraph) {
label := sg.Label()
c.ch<-&labelGraph{label, sg}
}
func (c *Collector) collect() {
for lg := range c.ch {
tree.Add(lg.label, lg.sg)
}
}
func (c *Collector) close() {
close(c.ch)
}
func otherThread() {
go c.collect()
for sg := range candidates {
c.send(sg)
}
c.close()
} The error was coming from deep inside the implementation of In So basically, I was putting good values into the channel but getting I tried doing something which should have had no effect, I added ch: make(chan *labelGraph, 1) This made the error stop. Which is great. However, now I get a new even
Which looks like it might be a segmentation fault? These do not happen every Whatever my program's problems are, the fragmentation problem in the runtime
But the resident size was continuing to grow at a healthy clip! This image shows the size of the memory mapped files (as files rather than ConclusionIt looks like
I am interested in helping solve both problems. Let me know how I can |
Can you give us a standalone test case that we can use to reproduce the problem? The error you are describing is consistent with what I suggested earlier: something is freeing C memory when Go thinks it is still active. Where is sg allocated? |
Setting milestone to 1.5 to make sure we figure out why the program crashes with 1.5, if possible. |
The Go GCed heap uses a segregated size heap so if the sizes of the objects On Tue, Jun 2, 2015 at 5:31 PM, Tim Henderson notifications@github.com
|
@RLH w.r.t.
In Frequent Subgraph Mining the sizes of the subgraphs (and therefore
So, what I think tends to happen is the space where the subgraphs are a. Becomes too small during the next round b. Gets fragmented by other allocations These are both hypothesis. I do not have data to back up those claims. I @ianlancetaylor I can provide the program and a data set to run it on |
@ianlancetaylor Also in response to your question
Subgraphs are allocated in 2 functions. Both are normal Go allocations. Thanks for your help! |
Saw this, wanted to be sure that the contents of the mmap'd B+Tree were visible/known to the GC:
I'm still new to go and the library, so I could be barking up the wrong tree. But if I'm not, that's a very likely cause of 0xdeaddeaddead -- hide pointers from GC, run GC, it reclaims the storage, when you return pointers from your hiding place, they reference junk. |
@dr2chase The values being put into the channel are normal Go values. For clarity, when things get put into the B+Tree they have to first be
In the case that keys/values are fixed size instead of variable sized |
See golang/go#11035 Signed-off-by: Tim Henderson <tim.tadh@gmail.com>
@ianlancetaylor @bradfitz Here is how to reproduce the bug Running Prerequisites:
Note, this program is part of my research for my PhD. It is not super Steps:
At this point you should have a working Test that fsm is working
you should see a helpful message on how to use fsm. Now you need to download the data To run:
See the The program should run for a while and then fail with a panic such as:
If you re-run with go 1.4.2 you should no longer see the crash. |
Indeed, it does fail for me. I may not know enough about details of the GC to make quick sense of this, but it could be an educational adventure. |
@dr2chase The interesting thing (at least to me) is it doesn't happen on every dataset. However, the wiki-sort graph seems to trigger it every single time. The fragmentation (which is what I hope I can get some feedback on) happens every time however. |
This commit stops the errors seen in - golang/go#11035 - specifically golang/go#11035 (comment) However, in general it is not ideal to make these channels buffered and really they should not need to be buffered. Past experience has shown that buffering the "send to collector" channel simply masks performance problems until sufficient memory pressure is reached. It is far better to distribute the work evenly accross all the collectors (which was implemented in 05fa9e9 If the Go 1.5 runtime bug gets fixed this change can be safely reverted. Signed-off-by: Tim Henderson <tim.tadh@gmail.com>
@ianlancetaylor @dr2chase @bradfitz @RLH This commit note:
As I note in the commit message, I should not have to buffer the Any thoughts of the fragmentation issue would be appreciated. It |
We're exploring that hypothesis, because it looks suspiciously like that to
|
Saw this in chan.go, thought "Oh really?":
|
Damien's email seems #11053 seems more interesting. On Thu, Jun 4, 2015 at 8:02 AM, dr2chase notifications@github.com wrote:
|
I can reproduce the memory growth you are seeing (with 1.4.2). But it is not the Go heap. The Go heap remains constant-sized at around 8MB, as you can see in your gctrace=1 output. Something else is growing. Diffing two instances of /proc/pid/maps, I see lots of instances like this:
which means larger portions of existing anonymous mappings are acquiring backing store. In this case, ~1.4MB extra in a 64MB mapping.
|
@randall77 Thanks for the help! I am going to investigate this today. I wrote most of the code that my One note, if you run the program with
Thanks for the insights. |
@randall77 Looks like libbliss has a memory leak when computing canonical computations. I am going to chase that down for now. I will report if I get an more unexpected errors in Go 1.5. Thanks for all of your help everyone! |
@randall77 @dr2chase @ianlancetaylor @RLH The memory leak in libbliss has been plugged and that seems to have solved the heap growth problem! Thank you all for helping me out. I have been getting a few inconsistent panics in a GC thread today however they have been too inconsistent to track down. If I can find a consistent way to trigger it I will open another bug. Feel free to rename or close this issue. Thanks again. |
It's suspected that some recent changes for concurrent GC may be involved in the panic. It reproduces quite nicely on my Linux box, and is robust in the face of debugging printlns, and my suspicion is that the reference is sometimes dropped during unbuffered channel send/receive. |
Let's open a separate issue for the panic.
|
…hansend A send on an unbuffered channel to a blocked receiver is the only case in the runtime where one goroutine writes directly to the stack of another. The garbage collector assumes that if a goroutine is blocked, its stack contains no new pointers since the last time it ran. The send on an unbuffered channel violates this, so it needs an explicit write barrier. It has an explicit write barrier, but not one that can handle a write to another stack. Use one that can (based on type bitmap instead of heap bitmap). To make this work, raise the limit for type bitmaps so that they are used for all types up to 64 kB in size (256 bytes of bitmap). (The runtime already imposes a limit of 64 kB for a channel element size.) I have been unable to reproduce this problem in a simple test program. Could help #11035. Change-Id: I06ad994032d8cff3438c9b3eaa8d853915128af5 Reviewed-on: https://go-review.googlesource.com/10815 Reviewed-by: Austin Clements <austin@google.com>
What version of Go are you using (
go version
)?What operating system and processor architecture are you using?
What did you do?
run https://github.com/timtadh/fsm on a large graph (see explanation below)
What did you expect to see?
memory usage remain relatively stable
What did you see instead?
Growth of memory throughout run of program, sometimes extremely rapid growth.
Explanation
I have been working on a frequent subgraph mining algorithm in Go. In
case you are not familiar with frequent subgraph mining, it involves
looking in a very large graph for repeated subgraphs. This is an
exponential problem and can require a lot of memory.
To solve the memory issue, I wrote a memory mapped B+Tree
fs2/bptree which stores the candidate
subgraphs. The B+Tree works fine and generally has very little overhead
in terms of using memory.
What does have overhead is generating the candidate graphs. Basically
the process goes something like this:
partition is isomorphic to every other graph in the partition. These
are known as the embeddings.
discarded (and therefore eventually garbage collected).
candidates for the next round of mining. The embeddings are extended
multiple times, once for each edge in the surround context.
In theory, the overall memory usage (outside of the B+Trees) should not
grow in this process. However, it does grow. I believe it is due to the
heap becoming fragmented. I am willing to do considerable legwork to
both show this conclusively and to help fix the problem.
This may not be fixable
Before, we go into my reasons for believing this is a fragmentation
issue, let me first not that this may not be fixable. The solutions I
have outlined at the bottom of the post may be impractical. However, I
think it would be worth trying to fix it.
In Depth
I have been profiling my program extensively for several months to try
and get rid of memory allocations as much as possible. However, there
are some limits to my ability to completely get rid of allocations
during the mining process. It also becomes difficult to reason about
ownership of particular bits of memory (like the subgraph objects) when
they are flying around between different goroutines.
One piece of evidence I have gathered for this being a fragmentation
problem is the statistics reported by
GODEBUG=gctrace=1
. Here is asnapshot after the program has been running for a while:
One, thing to notice, the number of live objects stays relatively
constant throughout the execution of the program. It stays in the
neighborhood of 70k-80k on a 4 core machine. Inspecting the memory (as
reported in /proc//maps) shows that the Go heap does continue to
grow. Allowing the program to run for a very long time essentially
causes the B+Trees memory maps to be evicted from memory to make space
for the ever growing heap.
Sources of Memory Allocations
As you can see, the sources of allocations are what you would expect:
canonSubGraph
)I am obviously experimenting with ways to reduce the allocations
performed by these operations. However, at some point I may just reach
the limits with what is easily achievable in Go and I will have to
switch to either C++ or Rust (which did not really exist when I started
this project in 2012).
Does Go 1.5 Help?
I compiled and ran master yesterday. Unfortunately, my program does not
currently execute on master. I get lots of weird errors. Maybe this is
because I do a lot of unsafe things in my B+Tree implementation? I don't
know. I can post those errors if you want to see them but I think they
are separate issue.
Possible Solutions
The best idea I can think of would be a
runtime
routine which wouldstop the world and do a heap compaction. This would never be triggered
automatically (that would be catastrophic in say an http server).
However, a program could call
runtime.HeapCompact()
which would causethe mutator to stop and a full garbage collect + compaction to run.
The way this could work is to have a second garbage collector which is
either a copying-collector or a mark-and-compact collector. The
basic idea of both is to find all of the current objects active at the
time of collection. Then, to one by one move each object to a new
location and rewrite all pointers into the object. This is obviously an
expensive operations.
Drawbacks
passed to native code now point somewhere else.
native structure we cannot move that object.
Benefits
they can potentially control negative side effects from pointers
involving non Go code.
accessed objects close together.
Other Solutions
Incremental
Incremental compaction during GC or allocations. This would mean that
the heap would never be fully compacted however, it would limit the
amount of work that is done and make the process fully automatic:
Drawbacks
Benefits
Improve the Allocator
The other idea is to simply improve the allocator to reduce the
fragmentation in the first place. Programs, like mine, which cause
fragmentation can be collected and their behavior can be studied to
figure out if there are ways to reduce the fragmentation.
Drawbacks
Benefits
Conclusion
I believe Go currently has a heap fragmentation problem that impacts
long running programs which do lots of small and medium sized
allocations. A potential solution is to add a function
runtime.HeapCompact()
which would trigger a stop the world compaction.The text was updated successfully, but these errors were encountered: