-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime/trace: TestTraceSymbolize fails on solaris #12056
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The problem here seems to be that the stack trace gathered by the tracing logic is truncated. The one being looked for is:
and the one that's available is:
We've lost a few frames somehow. Solaris is special in that it makes all its system calls through asmcgocall, but that should be okay. In fact nearly all the traces are missing frames. If I change syscall.read to syscall.readXXX and run the test on Linux, I get:
Notice all the frames that Solaris does not have. It seems to be consistently missing the top three frames. @4ad do you have any ideas what might be special about Solaris for gathering stack traces? |
Very interesting. I am afraid I am in vacation right now, very far away from any computer where I might take a look at this. Off the top of my head there should not be anything special about Solaris. The most magic things I can think off are some trampolines that jump into unexported parts of the runtime. |
This is unfortunate but Solaris is not one of the first-class ports, so it won't hold up the Go 1.5 release. |
I would like to help contribute to the solution but I have limited time right now. Can you recommend where I should start looking in the source other than looking at the solaris specific files under src/runtime? Any other sage words of advice? It would be nice to have this fixed in time for 1.5.1. Thanks,
|
Okay, I've figured out the cause of this, will post a root cause. |
The various trace* functions found in src/runtime/trace.go provide a skip parameter to elide "uninteresting" bits from the results of traces that is used by In particular, traceGoSysCall specifies a skip of 4 which seems to match what @rsc noted about Solaris missing the top three frames consistently (yes, it's off-by-one, but read on). If we zero out the use of skip in traceEvent on Linux and then run this test again, we can see the extra bits that are normally elided:
Understandably, we'd generally want to elide the calls to Syscall, entersyscall, reentersyscall, and mcall when tracing. On Solaris without any other changes, when we do the same thing, we see:
At first glance, it seems like the right fix would be to simply change traceGoSysCall() to specify a skip of 1 instead of 4 on Solaris, and that would "fix" the test. However, what bothered me was that this wasn't happening on other platforms, and as @rsc noted, while Solaris is special in that it makes its system calls through asmcgocall, that should be okay. So I reverted that change. Digging deeper, we find that on every other platform, system calls pass through functions that invoke
However, on Solaris, system calls generally pass through syscall_sysvicall6, which invokes
If we change sysvicall6() to invoke entersyscall() instead of entersyscallblock(), then the test passes and we see this trace stack instead:
Now that stack is nearly identical to the Linux one and importantly, the number of items to skip is the same. So the cause of the issue appears to be the use of entersyscallblock() vs. entersyscall(). What I don't know yet is how entersyscallblock() is broken. If we only change the Linux version of Go to use
So this is not a problem unique to Solaris; any platform that uses entersyscallblock() will hit the same issues when tracing. I would like to suggest this bug be retitled as "tracing broken when using entersyscallblock()". I don't know what the right fix is here at the moment. |
So the cause of the issue is ultimately the placement and invocation of Notably, normally when
This causes three additional functions to be recorded in the trace stack on Linux (with respect to the invocation of syscall.Syscall itself):
However, when using
Because of where the invocation happens and how it is done, all three calls to the runtime methods are omitted. So it looks like there are two possible solutions. The first is to simply change how the
The second would be to change
Of the two options, the latter seems preferable since we'll be recording less information which we're going to throw away anyway. However, this is not my area of expertise and I would appreciate advice as to the best solution. With either of my proposed solutions, the test passes on Solaris. |
CL https://golang.org/cl/13861 mentions this issue. |
Running on OmniOS 151015, using go 1.4.2 to bootstrap.
I have the following environment variables set:
GOROOT_BOOTSTRAP=/usr/local/go
GOROOT_FINAL=/usr/local/go_1.5
GOARCH=amd64
GOOS=solaris
CGO=1
PATH=/usr/local/go/bin:/opt/gcc-5.1.0/bin:/opt/omni/bin:/usr/gnu/bin:/usr/bin:/usr/sbin:/sbin
This fails on both release-branch.go1.5 and master.
The text was updated successfully, but these errors were encountered: