Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: Odd crash with go1.5rc1 #12176

Closed
dgryski opened this issue Aug 18, 2015 · 9 comments
Closed

runtime: Odd crash with go1.5rc1 #12176

dgryski opened this issue Aug 18, 2015 · 9 comments
Milestone

Comments

@dgryski
Copy link
Contributor

dgryski commented Aug 18, 2015

I have an in-memory time-series data store ( https://github.com/dgryski/carbonmem ).

I had a crash with Go 1.5beta2 but due to lack of logging lots the reason. I upgraded to 1.5beta3, and had another panic, this time with

runtime/cgo: pthread_create failed: Resource temporarily unavailable

I upgraded to rc1, and today had another crash, this time with

fatal error: runtime: out of memory

It's highly unlikely this box is actually running out of memory. It has 384G of RAM, and monitor the actual memory usage on the box doesn't show any sort of leak or spike in memory usage before the crash.

The only connection I can see between the crashes is that they've each happened approximately 10 days apart.

I understand this is not a particularly useful bug report. I'll try to get some more information from the process (repeated dumps of /debug/vars, etc) and maybe we can track it down to something in our environment rather than a bug in the Go runtime.

@ianlancetaylor ianlancetaylor changed the title Odd crash with go1.5rc1 runtime: Odd crash with go1.5rc1 Aug 18, 2015
@ianlancetaylor ianlancetaylor added this to the Go1.6 milestone Aug 18, 2015
@ianlancetaylor
Copy link
Member

What processor? What OS?

@ianlancetaylor
Copy link
Member

At least on GNU/Linux the "fatal error: runtime: out of memory" message can only occur when mmap returns ENOMEM.

@dgryski
Copy link
Contributor Author

dgryski commented Aug 18, 2015

24-core Linux box. CentOS, kernel 2.6.32-431.29.2.el6.x86_64

From /proc/cpuinfo:

model name : Intel(R) Xeon(R) CPU L5640 @ 2.27GHz

@dgryski
Copy link
Contributor Author

dgryski commented Aug 18, 2015

I have the output from free -m and ps from at most one minute before it crashed:

             total       used       free     shared    buffers     cached
Mem:        387592      15402     372190          0        428       6836
-/+ buffers/cache:       8138     379454
Swap:         1023          0       1023

And the output from

    ps fax -ouser,pid,ppid,%cpu,%mem,vsz,rss,tty,stat,start,time,cmd k-vsz | grep -v USER 2>/dev/null;

(trimmed to remove everything other than the process)

dgryski   5685 21823 12.4  1.0 5256760 4130092 pts/1 Sl+    Aug 10 23:27:50      \_ ./cmserver.15rc1 -w=172800 -gp=42013 -p=8080 -e=3600 -prefix=5 -stdout

@dgryski
Copy link
Contributor Author

dgryski commented Aug 18, 2015

I suppose I should point out too this ran for months rock solid on 1.4.2.

@tsuna
Copy link
Contributor

tsuna commented Aug 20, 2015

Are you sure that you're not running out of process/threads (both are accounted the same way on Linux)? In Go 1.5, GOMAXPROCS has changed from 1 to 24 on your Linux box. Could this lead to many additional threads being created when you perform certain kind of operations (especially blocking operations other than sleeping or socket I/O). Do you do any file I/O?

What's the output of ulimit -a? Can you compare the output of ps -eLf with Go 1.4.2 and Go 1.5 to see how many threads there are for your application's PID before/after?

@dgryski
Copy link
Contributor Author

dgryski commented Aug 20, 2015

I'm going to see if this still fails with 1.5. I have reason to believe that after ~10 days of traffic, one of the metrics gets enough leaves that the radix tree supporting prefix queries is determined to be "full" and freed, leaving a very large amount of work for the garbage collector to do. If this is related to 3ae1704 , then this patch was not present in rc1 but was present in 1.5 final so the bug should not be occurring. I will build the server with 1.5 final on Monday and report back in a few weeks.

@dgryski
Copy link
Contributor Author

dgryski commented Sep 10, 2015

Crashed again with 1.5, after a little more than two weeks. Currently my assumption is that it might be an issue with one of the queries we process, which (for whatever reason), nobody had tried while running 1.4.2.

@dgryski
Copy link
Contributor Author

dgryski commented Oct 12, 2015

Closing this for now, as I'm pretty sure this is a dup of #12233 .

For a currently running server:

<dgryski@()memstore[~]> sysctl vm.max_map_count
vm.max_map_count = 65530
<dgryski@()memstore[~]> grep -c ^Size /proc/`pidof cmserver.151`/smaps 
55199

@dgryski dgryski closed this as completed Oct 12, 2015
@golang golang locked and limited conversation to collaborators Oct 12, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants