-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: munmap failure in program with very large heap (dup #12233?) #13227
Comments
The crash is because munmap is failing. When munmap fails, the code in sys_linux_amd64.s crashes the program. This is simple but not especially helpful, especially since the errno value is lost. Just from looking at the code I think it's suspicious that the call to sysFree in gcCopySpans uses len(work.spans). I think it needs to use cap(work.spans). I could imagine that that could cause this problem in certain circumstances. CC @aclements @RLH |
The user has re-run the above command with the same result, so this looks pretty reproducible. The input data file is large and closed so I can't post it here, but I can try fixes when they become available. |
I guess you could ask the user to run it under strace... Taking a quick squizz at the kernel source for munmap makes me thing it miiiiiiiiiight be a version of 12233 (esp if munmap is returning -ENOMEM) |
Ah yeah, you could also ask the user to try with a version built from current tip? |
Today I asked him to try with the change Ian suggested, I'll post with the results of that when I hear back. I'll also ask if he's able to try the things you've suggested (I'll probably go over and do those). |
The change of len(work.spans) => cap(work.spans) makes no difference. |
I believe this is a duplicate of #12233. Looking at the Linux kernel source, the primary failure mode of munmap is EINVAL for invalid arguments, but these arguments look fine. A secondary failure mode, however, is ENOMEM when too many VMA segments would be created, and that's exactly the problem #12233 was running into that @aclements fixed. That fix will be in Go 1.5.2. If you're feeling adventurous, you can 'git fetch && git checkout release-branch.go1.5' and make.bash to get Go 1.5.1 + some of the patches we've queued for 1.5.2, including the VMA fragmentation fix. |
I've just started the run with strace, so I'll leave that going until failure. The failure takes more than six hours of run time, so I'll try the patches tomorrow. The user is being very patient with me. |
I flipped through the munmap code and agree with Russ' assessment. The only other way I found for munmap to fail is for the kernel itself to run out of memory. Another way you can test this without patching in 1.5.2 is to run |
I'll wait for the report back from the user and then organise this for later today. |
No munmap returned non-success. Here is the ~100 strace lines prior to the panic:
|
Thanks for the strace output. Unfortunately, it looks like you ran strace without -f, so it only got the system calls from one thread and it wasn't the thread that caused the crash :( In addition to -f, you may want to run with -e mmap,mmap2,munmap to filter down to just those calls. Otherwise your log may get very large after a few hours. Alternatively, since it seems like this is fully reproducible (but somewhat painful) and it's highly likely 1.5.2 fixes this, you could just try 1.5.2. |
Yes, don't worry about strace. Just run against release-branch.go1.5 (or wait for 1.5.2). |
Closing as dup of #12233. |
Sorry about that; I don't often use strace. The program is running under the release branch as we speak, I will post back when I know the results. |
The go1.5 release-branch patches fix the problem. |
One of my users reported this panic to me.
Expect normal successful completion, got "fatal error: unexpected signal during runtime execution".
Go version is go1.5.1.
See https://groups.google.com/d/topic/golang-nuts/Men8KFhERN0/discussion
OS:
3.2.0-26-generic #41-Ubuntu x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: