New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: rare SIGBUS in runtime.handoff #16705
Comments
I don't suppose you were lucky enough to get core files or any other debug info from these? I'm rather perplexed by what could specifically cause a SIGBUS here. A SIGSEGV would mean we just followed a bad pointer, but a SIGBUS indicates that we somehow wound up in a truncated file mapping. Furthermore, the pointer has the proper 4K alignment + 16 bytes in both tracebacks, which is unlikely to happen by accident, further indicating that these were valid workbuf pointers. It almost seems like a file got mapped over our workbuf allocation, but AFAIK nothing in the standard library or commands ever mmaps a file. |
More observations:
|
Sorry, no core dumps. |
@colincross, too bad. Do you know what kernel version these were running on (approximately if you don't know exactly)? |
Kernel version for panic1.txt was 3.13.0-85-generic #129-Ubuntu SMP. |
Just got a 3rd one:
|
Where would the 0x2xxxxxxxxxxx addresses come from in persistentalloc? Code inspection and strace locally suggest that persistentalloc addresses should be more like 0xc4xxxxxxxx. |
The most recent panic was associated with a kernel BUG message, so likely not a go issue. I'll close this unless we find some evidence that go is tickling a kernel bug. [10181793.153768] BUG: Bad page map in process soong_build pte:f000dee8f000dee8 pmd:00000000 |
0xc4... is where the Go heap lives, but persistentalloc uses whatever the kernel returns for a non-MAP_FIXED mapping.
Huh. I was spiraling in on this being a kernel bug, but that's not an accusation I wanted to make lightly. :) Glad you found the BUG in the kernel log! I found some other reports of page table corruption with kernels around 3.13, such as https://lkml.org/lkml/2014/4/8/173, so I suspect this has been fixed in the kernel. If this turns out to be a real problem, I suspect we can track down the exact path through the kernel that caused this using the print_bad_pte output from your kernel log. |
go version
)?go1.7rc1
go env
)?linux amd64
Android updated its go prebuilts on July 8th to go1.7rc1. Since then we have seen two SIGBUS panics in runtime.handoff on our build servers. One was when running the go compiler:
The other was when running our ninja manifest generator:
Full panics attached:
panic1.txt
panic2.txt
The text was updated successfully, but these errors were encountered: