Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/trace: TestTraceSymbolize fails on solaris #12056

Closed
kimshrier opened this issue Aug 6, 2015 · 8 comments
Closed

runtime/trace: TestTraceSymbolize fails on solaris #12056

kimshrier opened this issue Aug 6, 2015 · 8 comments

Comments

@kimshrier
Copy link

Running on OmniOS 151015, using go 1.4.2 to bootstrap.

I have the following environment variables set:

GOROOT_BOOTSTRAP=/usr/local/go
GOROOT_FINAL=/usr/local/go_1.5
GOARCH=amd64
GOOS=solaris
CGO=1

PATH=/usr/local/go/bin:/opt/gcc-5.1.0/bin:/opt/omni/bin:/usr/gnu/bin:/usr/bin:/usr/sbin:/sbin

This fails on both release-branch.go1.5 and master.

$ cd src
$ /bin/bash all.bash
##### Building Go bootstrap tool.
cmd/dist

##### Building Go toolchain using /usr/local/go.
bootstrap/internal/obj
...
bootstrap/compile

##### Building go_bootstrap for host, solaris/amd64.
runtime
...
cmd/go

##### Building packages and commands for solaris/amd64.
runtime
...
cmd/pprof

##### Testing packages.
ok      archive/tar     0.010s
...
ok      runtime 34.324s
ok      runtime/debug   0.073s
ok      runtime/pprof   6.531s
--- FAIL: TestTraceSymbolize (0.09s)
    trace_stack_test.go:273: did not match event GoSysCall at syscall.read:0
    trace_stack_test.go:274: seen the following events of this type:
    trace_stack_test.go:280:   net.socket:38
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.ListenTCP:329
    trace_stack_test.go:280:   net.Listen:395
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.socket:38
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.ListenTCP:329
    trace_stack_test.go:280:   net.Listen:395
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.socket:38
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.ListenTCP:329
    trace_stack_test.go:280:   net.Listen:395
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.socket:38
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.ListenTCP:329
    trace_stack_test.go:280:   net.Listen:395
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.socket:42
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.ListenTCP:329
    trace_stack_test.go:280:   net.Listen:395
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.(*netFD).listenStream:156
    trace_stack_test.go:280:   net.socket:76
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.ListenTCP:329
    trace_stack_test.go:280:   net.Listen:395
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.socket:76
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.ListenTCP:329
    trace_stack_test.go:280:   net.Listen:395
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.ListenTCP:329
    trace_stack_test.go:280:   net.Listen:395
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.socket:76
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.ListenTCP:329
    trace_stack_test.go:280:   net.Listen:395
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.(*netFD).accept:400
    trace_stack_test.go:280:   net.(*TCPListener).AcceptTCP:254
    trace_stack_test.go:280:   net.(*TCPListener).Accept:264
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func10:86
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:92
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:92
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   os.(*File).Read:95
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func11:101
    trace_stack_test.go:282: ---
    trace_stack_test.go:282: ---
    trace_stack_test.go:282: ---
    trace_stack_test.go:282: ---
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.socket:38
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.dialTCP:171
    trace_stack_test.go:280:   net.dialSingle:364
    trace_stack_test.go:280:   net.dialSerial.func1:336
    trace_stack_test.go:280:   net.dial:40
    trace_stack_test.go:280:   net.dialSerial:338
    trace_stack_test.go:280:   net.(*Dialer).Dial:232
    trace_stack_test.go:280:   net.Dial:186
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.socket:38
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.dialTCP:171
    trace_stack_test.go:280:   net.dialSingle:364
    trace_stack_test.go:280:   net.dialSerial.func1:336
    trace_stack_test.go:280:   net.dial:40
    trace_stack_test.go:280:   net.dialSerial:338
    trace_stack_test.go:280:   net.(*Dialer).Dial:232
    trace_stack_test.go:280:   net.Dial:186
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.socket:38
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.dialTCP:171
    trace_stack_test.go:280:   net.dialSingle:364
    trace_stack_test.go:280:   net.dialSerial.func1:336
    trace_stack_test.go:280:   net.dial:40
    trace_stack_test.go:280:   net.dialSerial:338
    trace_stack_test.go:280:   net.(*Dialer).Dial:232
    trace_stack_test.go:280:   net.Dial:186
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.socket:38
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.dialTCP:171
    trace_stack_test.go:280:   net.dialSingle:364
    trace_stack_test.go:280:   net.dialSerial.func1:336
    trace_stack_test.go:280:   net.dial:40
    trace_stack_test.go:280:   net.dialSerial:338
    trace_stack_test.go:280:   net.(*Dialer).Dial:232
    trace_stack_test.go:280:   net.Dial:186
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.socket:42
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.dialTCP:171
    trace_stack_test.go:280:   net.dialSingle:364
    trace_stack_test.go:280:   net.dialSerial.func1:336
    trace_stack_test.go:280:   net.dial:40
    trace_stack_test.go:280:   net.dialSerial:338
    trace_stack_test.go:280:   net.(*Dialer).Dial:232
    trace_stack_test.go:280:   net.Dial:186
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.(*netFD).dial:137
    trace_stack_test.go:280:   net.socket:89
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.dialTCP:171
    trace_stack_test.go:280:   net.dialSingle:364
    trace_stack_test.go:280:   net.dialSerial.func1:336
    trace_stack_test.go:280:   net.dial:40
    trace_stack_test.go:280:   net.dialSerial:338
    trace_stack_test.go:280:   net.(*Dialer).Dial:232
    trace_stack_test.go:280:   net.Dial:186
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.(*netFD).accept:400
    trace_stack_test.go:280:   net.(*TCPListener).AcceptTCP:254
    trace_stack_test.go:280:   net.(*TCPListener).Accept:264
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func10:86
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.(*netFD).dial:137
    trace_stack_test.go:280:   net.socket:89
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.dialTCP:171
    trace_stack_test.go:280:   net.dialSingle:364
    trace_stack_test.go:280:   net.dialSerial.func1:336
    trace_stack_test.go:280:   net.dial:40
    trace_stack_test.go:280:   net.dialSerial:338
    trace_stack_test.go:280:   net.(*Dialer).Dial:232
    trace_stack_test.go:280:   net.Dial:186
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.socket:89
    trace_stack_test.go:280:   net.internetSocket:160
    trace_stack_test.go:280:   net.dialTCP:171
    trace_stack_test.go:280:   net.dialSingle:364
    trace_stack_test.go:280:   net.dialSerial.func1:336
    trace_stack_test.go:280:   net.dial:40
    trace_stack_test.go:280:   net.dialSerial:338
    trace_stack_test.go:280:   net.(*Dialer).Dial:232
    trace_stack_test.go:280:   net.Dial:186
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.(*netFD).accept:400
    trace_stack_test.go:280:   net.(*TCPListener).AcceptTCP:254
    trace_stack_test.go:280:   net.(*TCPListener).Accept:264
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func10:86
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.(*netFD).accept:400
    trace_stack_test.go:280:   net.(*TCPListener).AcceptTCP:254
    trace_stack_test.go:280:   net.(*TCPListener).Accept:264
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func10:86
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.(*netFD).accept:400
    trace_stack_test.go:280:   net.(*TCPListener).AcceptTCP:254
    trace_stack_test.go:280:   net.(*TCPListener).Accept:264
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func10:86
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.newTCPConn:51
    trace_stack_test.go:280:   net.dialTCP:207
    trace_stack_test.go:280:   net.dialSingle:364
    trace_stack_test.go:280:   net.dialSerial.func1:336
    trace_stack_test.go:280:   net.dial:40
    trace_stack_test.go:280:   net.dialSerial:338
    trace_stack_test.go:280:   net.(*Dialer).Dial:232
    trace_stack_test.go:280:   net.Dial:186
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.(*TCPListener).AcceptTCP:254
    trace_stack_test.go:280:   net.(*TCPListener).Accept:264
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func10:86
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.(*netFD).Close:199
    trace_stack_test.go:280:   net.(*conn).Close:156
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:122
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.newTCPConn:51
    trace_stack_test.go:280:   net.(*TCPListener).AcceptTCP:258
    trace_stack_test.go:280:   net.(*TCPListener).Accept:264
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func10:86
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   net.(*netFD).Close:199
    trace_stack_test.go:280:   net.(*conn).Close:156
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func10:90
    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   os.(*File).Write:139
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:124
    trace_stack_test.go:280:   testing.tRunner:456
    trace_stack_test.go:282: ---
FAIL
FAIL    runtime/trace   19.589s
ok      sort    0.083s
...
ok      cmd/vet 4.671s
2015/08/06 11:15:31 Failed: exit status 1
@rsc rsc added this to the Go1.5Maybe milestone Aug 6, 2015
@rsc
Copy link
Contributor

rsc commented Aug 6, 2015

The problem here seems to be that the stack trace gathered by the tracing logic is truncated. The one being looked for is:

        eventDesc{trace.EvGoSysCall, []frame{
            frame{"syscall.read", 0},
            frame{"syscall.Read", 0},
            frame{"os.(*File).read", 0},
            frame{"os.(*File).Read", 0},
            frame{"runtime/trace_test.TestTraceSymbolize.func11", 101},
        }},

and the one that's available is:

trace_stack_test.go:282: ---
trace_stack_test.go:280:   os.(*File).Read:95
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func11:101
trace_stack_test.go:282: ---

We've lost a few frames somehow. Solaris is special in that it makes all its system calls through asmcgocall, but that should be okay.

In fact nearly all the traces are missing frames. If I change syscall.read to syscall.readXXX and run the test on Linux, I get:

trace_stack_test.go:273: did not match event GoSysCall at syscall.readXXX:0
trace_stack_test.go:274: seen the following events of this type:
trace_stack_test.go:280:   syscall.setsockopt:1655
trace_stack_test.go:280:   syscall.SetsockoptInt:267
trace_stack_test.go:280:   net.setDefaultSockopts:20
trace_stack_test.go:280:   net.socket:42
trace_stack_test.go:280:   net.internetSocket:160
trace_stack_test.go:280:   net.ListenTCP:329
trace_stack_test.go:280:   net.Listen:395
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
trace_stack_test.go:280:   testing.tRunner:456
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.setsockopt:1655
trace_stack_test.go:280:   syscall.SetsockoptInt:267
trace_stack_test.go:280:   net.setDefaultListenerSockopts:25
trace_stack_test.go:280:   net.(*netFD).listenStream:156
trace_stack_test.go:280:   net.socket:76
trace_stack_test.go:280:   net.internetSocket:160
trace_stack_test.go:280:   net.ListenTCP:329
trace_stack_test.go:280:   net.Listen:395
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
trace_stack_test.go:280:   testing.tRunner:456
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.bind:1604
trace_stack_test.go:280:   syscall.Bind:214
trace_stack_test.go:280:   net.(*netFD).listenStream:162
trace_stack_test.go:280:   net.socket:76
trace_stack_test.go:280:   net.internetSocket:160
trace_stack_test.go:280:   net.ListenTCP:329
trace_stack_test.go:280:   net.Listen:395
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
trace_stack_test.go:280:   testing.tRunner:456
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.Listen:1340
trace_stack_test.go:280:   net.(*netFD).listenStream:166
trace_stack_test.go:280:   net.socket:76
trace_stack_test.go:280:   net.internetSocket:160
trace_stack_test.go:280:   net.ListenTCP:329
trace_stack_test.go:280:   net.Listen:395
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:81
trace_stack_test.go:280:   testing.tRunner:456
trace_stack_test.go:282: ---
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.accept4:1593
trace_stack_test.go:280:   syscall.Accept4:453
trace_stack_test.go:280:   net.accept:54
trace_stack_test.go:280:   net.(*netFD).accept:400
trace_stack_test.go:280:   net.(*TCPListener).AcceptTCP:254
trace_stack_test.go:280:   net.(*TCPListener).Accept:264
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func10:86
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.read:783
trace_stack_test.go:280:   syscall.Read:160
trace_stack_test.go:280:   os.(*File).read:211
trace_stack_test.go:280:   os.(*File).Read:95
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func11:101
trace_stack_test.go:282: ---
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.setsockopt:1655
trace_stack_test.go:280:   syscall.SetsockoptInt:267
trace_stack_test.go:280:   net.setDefaultSockopts:20
trace_stack_test.go:280:   net.socket:42
trace_stack_test.go:280:   net.internetSocket:160
trace_stack_test.go:280:   net.dialTCP:171
trace_stack_test.go:280:   net.dialSingle:364
trace_stack_test.go:280:   net.dialSerial.func1:336
trace_stack_test.go:280:   net.dial:40
trace_stack_test.go:280:   net.dialSerial:338
trace_stack_test.go:280:   net.(*Dialer).Dial:232
trace_stack_test.go:280:   net.Dial:186
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
trace_stack_test.go:280:   testing.tRunner:456
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.connect:1614
trace_stack_test.go:280:   syscall.Connect:222
trace_stack_test.go:280:   net.(*netFD).connect:75
trace_stack_test.go:280:   net.(*netFD).dial:137
trace_stack_test.go:280:   net.socket:89
trace_stack_test.go:280:   net.internetSocket:160
trace_stack_test.go:280:   net.dialTCP:171
trace_stack_test.go:280:   net.dialSingle:364
trace_stack_test.go:280:   net.dialSerial.func1:336
trace_stack_test.go:280:   net.dial:40
trace_stack_test.go:280:   net.dialSerial:338
trace_stack_test.go:280:   net.(*Dialer).Dial:232
trace_stack_test.go:280:   net.Dial:186
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
trace_stack_test.go:280:   testing.tRunner:456
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.getsockopt:1645
trace_stack_test.go:280:   syscall.GetsockoptInt:237
trace_stack_test.go:280:   net.(*netFD).connect:117
trace_stack_test.go:280:   net.(*netFD).dial:137
trace_stack_test.go:280:   net.socket:89
trace_stack_test.go:280:   net.internetSocket:160
trace_stack_test.go:280:   net.dialTCP:171
trace_stack_test.go:280:   net.dialSingle:364
trace_stack_test.go:280:   net.dialSerial.func1:336
trace_stack_test.go:280:   net.dial:40
trace_stack_test.go:280:   net.dialSerial:338
trace_stack_test.go:280:   net.(*Dialer).Dial:232
trace_stack_test.go:280:   net.Dial:186
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
trace_stack_test.go:280:   testing.tRunner:456
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.setsockopt:1655
trace_stack_test.go:280:   syscall.SetsockoptInt:267
trace_stack_test.go:280:   net.setNoDelay:19
trace_stack_test.go:280:   net.newTCPConn:51
trace_stack_test.go:280:   net.dialTCP:207
trace_stack_test.go:280:   net.dialSingle:364
trace_stack_test.go:280:   net.dialSerial.func1:336
trace_stack_test.go:280:   net.dial:40
trace_stack_test.go:280:   net.dialSerial:338
trace_stack_test.go:280:   net.(*Dialer).Dial:232
trace_stack_test.go:280:   net.Dial:186
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:118
trace_stack_test.go:280:   testing.tRunner:456
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.Close:296
trace_stack_test.go:280:   net.(*netFD).destroy:135
trace_stack_test.go:280:   net.(*netFD).decref:153
trace_stack_test.go:280:   net.(*netFD).Close:199
trace_stack_test.go:280:   net.(*conn).Close:156
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:122
trace_stack_test.go:280:   testing.tRunner:456
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.write:1064
trace_stack_test.go:280:   syscall.Write:176
trace_stack_test.go:280:   os.(*File).write:232
trace_stack_test.go:280:   os.(*File).Write:139
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize:124
trace_stack_test.go:280:   testing.tRunner:456
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.accept4:1593
trace_stack_test.go:280:   syscall.Accept4:453
trace_stack_test.go:280:   net.accept:54
trace_stack_test.go:280:   net.(*netFD).accept:400
trace_stack_test.go:280:   net.(*TCPListener).AcceptTCP:254
trace_stack_test.go:280:   net.(*TCPListener).Accept:264
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func10:86
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.setsockopt:1655
trace_stack_test.go:280:   syscall.SetsockoptInt:267
trace_stack_test.go:280:   net.setNoDelay:19
trace_stack_test.go:280:   net.newTCPConn:51
trace_stack_test.go:280:   net.(*TCPListener).AcceptTCP:258
trace_stack_test.go:280:   net.(*TCPListener).Accept:264
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func10:86
trace_stack_test.go:282: ---
trace_stack_test.go:280:   syscall.Close:296
trace_stack_test.go:280:   net.(*netFD).destroy:135
trace_stack_test.go:280:   net.(*netFD).decref:153
trace_stack_test.go:280:   net.(*netFD).Close:199
trace_stack_test.go:280:   net.(*conn).Close:156
trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func10:90
trace_stack_test.go:282: ---

Notice all the frames that Solaris does not have. It seems to be consistently missing the top three frames.

@4ad do you have any ideas what might be special about Solaris for gathering stack traces?

@4ad
Copy link
Member

4ad commented Aug 7, 2015

Very interesting.

I am afraid I am in vacation right now, very far away from any computer where I might take a look at this. Off the top of my head there should not be anything special about Solaris. The most magic things I can think off are some trampolines that jump into unexported parts of the runtime.

@rsc rsc modified the milestones: Go1.6Early, Go1.5Maybe Aug 7, 2015
@rsc
Copy link
Contributor

rsc commented Aug 7, 2015

This is unfortunate but Solaris is not one of the first-class ports, so it won't hold up the Go 1.5 release.

@kimshrier
Copy link
Author

I would like to help contribute to the solution but I have limited time right now. Can you recommend where I should start looking in the source other than looking at the solaris specific files under src/runtime? Any other sage words of advice? It would be nice to have this fixed in time for 1.5.1.

Thanks,
Kim

On Aug 7, 2015, at 11:55 AM, Russ Cox notifications@github.com wrote:

This is unfortunate but Solaris is not one of the first-class ports, so it won't hold up the Go 1.5 release.


Reply to this email directly or view it on GitHub.

@binarycrusader
Copy link
Contributor

Okay, I've figured out the cause of this, will post a root cause.

@binarycrusader
Copy link
Contributor

The various trace* functions found in src/runtime/trace.go provide a skip parameter to elide "uninteresting" bits from the results of traces that is used by traceEvent().

In particular, traceGoSysCall specifies a skip of 4 which seems to match what @rsc noted about Solaris missing the top three frames consistently (yes, it's off-by-one, but read on).

If we zero out the use of skip in traceEvent on Linux and then run this test again, we can see the extra bits that are normally elided:

    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   runtime.mcall:207
    trace_stack_test.go:280:   runtime.reentersyscall:1826
    trace_stack_test.go:280:   runtime.entersyscall:1879
    trace_stack_test.go:280:   syscall.Syscall:18
    trace_stack_test.go:280:   syscall.read:783
    trace_stack_test.go:280:   syscall.Read:160
    trace_stack_test.go:280:   os.(*File).read:211
    trace_stack_test.go:280:   os.(*File).Read:95
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func11:101
    trace_stack_test.go:280:   runtime.goexit:1696
    trace_stack_test.go:282: ---

Understandably, we'd generally want to elide the calls to Syscall, entersyscall, reentersyscall, and mcall when tracing.

On Solaris without any other changes, when we do the same thing, we see:

        trace_stack_test.go:282: ---
        trace_stack_test.go:280:   runtime.syscall_sysvicall6:36
        trace_stack_test.go:280:   syscall.read:680
        trace_stack_test.go:280:   syscall.Read:160
        trace_stack_test.go:280:   os.(*File).read:211
        trace_stack_test.go:280:   os.(*File).Read:95
        trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func11:101
        trace_stack_test.go:280:   runtime.goexit:1696
        trace_stack_test.go:282: ---

At first glance, it seems like the right fix would be to simply change traceGoSysCall() to specify a skip of 1 instead of 4 on Solaris, and that would "fix" the test.

However, what bothered me was that this wasn't happening on other platforms, and as @rsc noted, while Solaris is special in that it makes its system calls through asmcgocall, that should be okay. So I reverted that change.

Digging deeper, we find that on every other platform, system calls pass through functions that invoke runtime.entersyscall(), usually like this:

 17 TEXT    ·Syscall(SB),NOSPLIT,$0-56
 18         CALL    runtime·entersyscall(SB)

However, on Solaris, system calls generally pass through syscall_sysvicall6, which invokes runtime.entersyscallblock() instead:

 29 //go:nosplit
 30 func syscall_sysvicall6(fn, nargs, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr) {
 31         call := libcall{
 32                 fn:   fn,
 33                 n:    nargs,
 34                 args: uintptr(unsafe.Pointer(&a1)),
 35         }
 36         entersyscallblock(0)
 37         asmcgocall(unsafe.Pointer(&asmsysvicall6), unsafe.Pointer(&call))
 38         exitsyscall(0)
 39         return call.r1, call.r2, call.err
 40 }

If we change sysvicall6() to invoke entersyscall() instead of entersyscallblock(), then the test passes and we see this trace stack instead:

        trace_stack_test.go:282: ---
        trace_stack_test.go:280:   runtime.mcall:207
        trace_stack_test.go:280:   runtime.reentersyscall:1826
        trace_stack_test.go:280:   runtime.entersyscall:1879
        trace_stack_test.go:280:   runtime.syscall_sysvicall6:36
        trace_stack_test.go:280:   syscall.read:680
        trace_stack_test.go:280:   syscall.Read:160
        trace_stack_test.go:280:   os.(*File).read:211
        trace_stack_test.go:280:   os.(*File).Read:95
        trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func11:101
        trace_stack_test.go:280:   runtime.goexit:1696
        trace_stack_test.go:282: ---

Now that stack is nearly identical to the Linux one and importantly, the number of items to skip is the same.

So the cause of the issue appears to be the use of entersyscallblock() vs. entersyscall(). What I don't know yet is how entersyscallblock() is broken.

If we only change the Linux version of Go to use runtime.entersyscallblock() for its system call invocations as well, we see the same problem as Solaris:

    trace_stack_test.go:282: ---
    trace_stack_test.go:280:   os.(*File).Read:95
    trace_stack_test.go:280:   runtime/trace_test.TestTraceSymbolize.func11:101
    trace_stack_test.go:282: ---

So this is not a problem unique to Solaris; any platform that uses entersyscallblock() will hit the same issues when tracing.

I would like to suggest this bug be retitled as "tracing broken when using entersyscallblock()". I don't know what the right fix is here at the moment.

@binarycrusader
Copy link
Contributor

So the cause of the issue is ultimately the placement and invocation of traceGoSysCall when using entersyscallblock() instead of entersyscall().

Notably, normally when entersyscall() starts, the first thing it does it trigger traceGoSysCall:

1804 // Syscall tracing:
1805 // At the start of a syscall we emit traceGoSysCall to capture the stack trace.    
...
1825         if trace.enabled {
1826                 systemstack(traceGoSysCall)
1827         }
...

This causes three additional functions to be recorded in the trace stack on Linux (with respect to the invocation of syscall.Syscall itself):

    trace_stack_test.go:280:   runtime.mcall:207
    trace_stack_test.go:280:   runtime.reentersyscall:1826
    trace_stack_test.go:280:   runtime.entersyscall:1879
    trace_stack_test.go:280:   syscall.Syscall:18

However, when using entersyscallblock(), it waits until almost before the function is done and then invokes systemstack(entersyscallblock_handoff) which in turn invokes traceGoSysCall():

1952 func entersyscallblock_handoff() {                                                                              
1953         if trace.enabled {                                                                                      
1954                 traceGoSysCall() 
1955                 traceGoSysBlock(getg().m.p.ptr())                                                               
1956         }       
1957         handoffp(releasep())
1958 }       

Because of where the invocation happens and how it is done, all three calls to the runtime methods are omitted.

So it looks like there are two possible solutions. The first is to simply change how the traceGoSysCall is emitted / invoked to match entersyscall():

diff --git a/src/runtime/proc1.go b/src/runtime/proc1.go
index 09cb775..3d1fc93 100644
--- a/src/runtime/proc1.go
+++ b/src/runtime/proc1.go
@@ -1911,6 +1911,10 @@ func entersyscall_gcwait() {
 func entersyscallblock(dummy int32) {
        _g_ := getg()

+       if trace.enabled {
+               traceGoSysCall()
+       }
+
        _g_.m.locks++ // see comment in entersyscall
        _g_.throwsplit = true
        _g_.stackguard0 = stackPreempt // see comment in entersyscall
@@ -1951,7 +1955,6 @@ func entersyscallblock(dummy int32) {

 func entersyscallblock_handoff() {
        if trace.enabled {
-               traceGoSysCall()
                traceGoSysBlock(getg().m.p.ptr())
        }
        handoffp(releasep())

The second would be to change entersyscall() to delay the invocation of traceGoSysCall to later just as entersyscallblock does and change runtime/trace.go to only skip 1 event:

diff --git a/src/runtime/proc1.go b/src/runtime/proc1.go
index 09cb775..b7c7698 100644
--- a/src/runtime/proc1.go
+++ b/src/runtime/proc1.go
@@ -1822,10 +1822,6 @@ func reentersyscall(pc, sp uintptr) {
        // but can have inconsistent g->sched, do not let GC observe it.
        _g_.m.locks++

-       if trace.enabled {
-               systemstack(traceGoSysCall)
-       }
-
        // Entersyscall must not call any function that might split/grow the stack.
        // (See details in comment above.)
        // Catch calls that might, by replacing the stack guard with something that
@@ -1845,6 +1841,10 @@ func reentersyscall(pc, sp uintptr) {
                })
        }

+       if trace.enabled {
+               systemstack(traceGoSysCall)
+       }
+
        if atomicload(&sched.sysmonwait) != 0 { // TODO: fast atomic
                systemstack(entersyscall_sysmon)
                save(pc, sp)
diff --git a/src/runtime/trace.go b/src/runtime/trace.go
index c818462..af3bf61 100644
--- a/src/runtime/trace.go
+++ b/src/runtime/trace.go
@@ -828,7 +828,7 @@ func traceGoUnpark(gp *g, skip int) {
 }

 func traceGoSysCall() {
-       traceEvent(traceEvGoSysCall, 4)
+       traceEvent(traceEvGoSysCall, 1)
 }

 func traceGoSysExit(seq uint64, ts int64) {

Of the two options, the latter seems preferable since we'll be recording less information which we're going to throw away anyway. However, this is not my area of expertise and I would appreciate advice as to the best solution.

With either of my proposed solutions, the test passes on Solaris.

@gopherbot
Copy link

CL https://golang.org/cl/13861 mentions this issue.

@golang golang locked and limited conversation to collaborators Sep 22, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants