Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/trace: provide aggregate 'go tool trace' goroutine analysis view #29103

Open
robaho opened this issue Dec 5, 2018 · 19 comments
Open

cmd/trace: provide aggregate 'go tool trace' goroutine analysis view #29103

robaho opened this issue Dec 5, 2018 · 19 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@robaho
Copy link

robaho commented Dec 5, 2018

The current 'go tool trace' has a 'goroutine analysis' page, that shows data similar to this:

image

and after selecting a go routine, a detail page is displayed similar to this:

image

This should be changed, so that either 1) the main summary page shows the 'detail' for each in a list, or 2) add an option to the summary page labelled 'details' that provides this information.

If it far more useful than having to view the detail on each go routine individually.

@robaho
Copy link
Author

robaho commented Dec 5, 2018

A slight wrinkle is that the detail view shows an entry for each routine of the "type", similar to:

image

for ease of use the summary page probably should have a "total/avg entry" with possibly a collapsable "detail" listing each routine.

@ianlancetaylor ianlancetaylor changed the title provide aggregate 'go tool trace' goroutine analysis view cmd/trace: provide aggregate 'go tool trace' goroutine analysis view Dec 5, 2018
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 5, 2018
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Dec 5, 2018
@ianlancetaylor
Copy link
Contributor

CC @hyangah @pjweinbgo

@robaho
Copy link
Author

robaho commented Dec 5, 2018

Btw, I am working on a PR that fixes this - its fairly straightforward.

@robaho
Copy link
Author

robaho commented Dec 5, 2018

Additionally, accumulator buckets like 'sync time' are not very useful. The system should report min/avg/max, or even percentiles - the total does not provide enough information.

@robaho
Copy link
Author

robaho commented Dec 11, 2018

I have created a project that has the completed code changes here

Here is a screenshot of the new summary page:

new

The PR is going to be some work, because they were made externally, and involve the internal package, so maybe before I perform the PR, someone might review the changes in the project, and we can discuss there ?

@hyangah
Copy link
Contributor

hyangah commented Dec 11, 2018

Thanks for the change. I agree that having some sort of summary in the goroutines view will be helpful.
Can you explain how you compute each column? Is it a sum?

Based on the screen shot, I guess the Total is a sum, which is not very meaningful. The color bar seems useful to tell the characteristics of each goroutine.

The goroutine pc:0 are the goroutines that pre-existed before the tracing started so we couldn't the origin of the goroutines. So, they are mix of various goroutines, which I am not sure the data other than count is meaningful.

@robaho
Copy link
Author

robaho commented Dec 11, 2018

It is a sum, but it is the same sum that was used in the 'detail' view to provide the 'exec time %' - it was just duplicated into the summary. Similarly for the other aspects. I think all of the sums are useful to a certain extent, as it gives a profile of Go routines (which is what the color bar represents). Again, this is the same representation as the original detail view.

The min/avg/max fields are most important, since as you state, in many cases the sum isn't helpful when doing latency or performance analysis.

I am suspect that your pc:0 assessment is correct, only because in the sample I test with it is taken after ALL of the go routines are created (so they should all be PC:0). Furthermore, the 1:1 between the pc:0 routines and the number of actual routines seems too much of a coincidence.

@robaho
Copy link
Author

robaho commented Dec 11, 2018

@hyangah also, for the PC:0 routines, the detail is essentially the same for all of them, which seems strange.

@robaho
Copy link
Author

robaho commented Dec 11, 2018

@hyangah OK, so the count of the PC:0 routines is not related to the "real" go routine count. But, I just did a trace, and it shows the following Go routines as PC:0

image

But if I perform a full stack dump of the application, I get:

stack dump
goroutine 71 [running]:
runtime/pprof.writeGoroutineStacks(0x1b4c360, 0xc000098000, 0x100d82f, 0xc0001464e0)
	/usr/local/Cellar/go/1.11.2/libexec/src/runtime/pprof/pprof.go:678 +0xa7
runtime/pprof.writeGoroutine(0x1b4c360, 0xc000098000, 0x2, 0xc000082400, 0x0)
	/usr/local/Cellar/go/1.11.2/libexec/src/runtime/pprof/pprof.go:667 +0x44
runtime/pprof.(*Profile).WriteTo(0x25ea580, 0x1b4c360, 0xc000098000, 0x2, 0xc000098000, 0x1ab3212)
	/usr/local/Cellar/go/1.11.2/libexec/src/runtime/pprof/pprof.go:328 +0x3e4
net/http/pprof.handler.ServeHTTP(0xc0002ae011, 0x9, 0x1b61760, 0xc000098000, 0xc0001a2000)
	/usr/local/Cellar/go/1.11.2/libexec/src/net/http/pprof/pprof.go:245 +0x210
net/http/pprof.Index(0x1b61760, 0xc000098000, 0xc0001a2000)
	/usr/local/Cellar/go/1.11.2/libexec/src/net/http/pprof/pprof.go:268 +0x723
net/http.HandlerFunc.ServeHTTP(0x1ad37b8, 0x1b61760, 0xc000098000, 0xc0001a2000)
	/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:1964 +0x44
net/http.(*ServeMux).ServeHTTP(0x25f9ea0, 0x1b61760, 0xc000098000, 0xc0001a2000)
	/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:2361 +0x127
net/http.serverHandler.ServeHTTP(0xc000238000, 0x1b61760, 0xc000098000, 0xc0001a2000)
	/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc0001fc0a0, 0x1b62760, 0xc000152000)
	/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
	/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:2851 +0x2f5

goroutine 1 [syscall, 3 minutes]:
syscall.Syscall(0x3, 0x0, 0xc000230000, 0x1000, 0xc000080080, 0x1, 0xc000046140)
/usr/local/Cellar/go/1.11.2/libexec/src/syscall/asm_darwin_amd64.s:16 +0x5
syscall.read(0x0, 0xc000230000, 0x1000, 0x1000, 0xc000080001, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/syscall/zsyscall_darwin_amd64.go:947 +0x5a
syscall.Read(0x0, 0xc000230000, 0x1000, 0x1000, 0xc0000b4057, 0xc000223aa8, 0x100d82f)
/usr/local/Cellar/go/1.11.2/libexec/src/syscall/syscall_unix.go:172 +0x49
internal/poll.(*FD).Read(0xc0000b4000, 0xc000230000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_unix.go:165 +0x117
os.(*File).read(0xc00000e010, 0xc000230000, 0x1000, 0x1000, 0xc000223af8, 0x106548a, 0xc000223ad8)
/usr/local/Cellar/go/1.11.2/libexec/src/os/file_unix.go:249 +0x4e
os.(*File).Read(0xc00000e010, 0xc000230000, 0x1000, 0x1000, 0x1000, 0x1000, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/os/file.go:108 +0x69
bufio.(*Scanner).Scan(0xc000223f08, 0x1)
/usr/local/Cellar/go/1.11.2/libexec/src/bufio/scan.go:213 +0xa6
main.main()
/Users/robertengels/go/src/github.com/robaho/go-trader/cmd/exchange/main.go:109 +0x955

goroutine 34 [chan receive, 3 minutes]:
github.com/robaho/go-trader/internal/exchange.publish()
/Users/robertengels/go/src/github.com/robaho/go-trader/internal/exchange/marketdata.go:119 +0x371
github.com/robaho/go-trader/internal/exchange.startMarketData.func1()
/Users/robertengels/go/src/github.com/robaho/go-trader/internal/exchange/marketdata.go:286 +0x20
created by github.com/robaho/go-trader/internal/exchange.startMarketData
/Users/robertengels/go/src/github.com/robaho/go-trader/internal/exchange/marketdata.go:285 +0x385

goroutine 35 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x2f90b60, 0x72, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc00011c118, 0x72, 0xc000080200, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc00011c118, 0xffffffffffffff00, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc00011c100, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_unix.go:384 +0x1a0
net.(*netFD).accept(0xc00011c100, 0xc00017ea80, 0xc0000b6030, 0x25fa480)
/usr/local/Cellar/go/1.11.2/libexec/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc00000e058, 0x2, 0xc0000407b0, 0x2b)
/usr/local/Cellar/go/1.11.2/libexec/src/net/tcpsock_posix.go:139 +0x2e
net.(*TCPListener).Accept(0xc00000e058, 0x2, 0x2, 0xc, 0x1b614e0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/tcpsock.go:260 +0x47
github.com/robaho/go-trader/internal/exchange.startMarketData.func2(0xc00015e7bc, 0x4)
/Users/robertengels/go/src/github.com/robaho/go-trader/internal/exchange/marketdata.go:297 +0x131
created by github.com/robaho/go-trader/internal/exchange.startMarketData
/Users/robertengels/go/src/github.com/robaho/go-trader/internal/exchange/marketdata.go:289 +0x3b1

goroutine 36 [select]:
github.com/quickfixgo/quickfix.(*session).run(0xc000194000)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/session.go:739 +0x2b9
github.com/quickfixgo/quickfix.(*Acceptor).Start.func1(0xc000194000, 0xc000192000)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:65 +0x2b
created by github.com/quickfixgo/quickfix.(*Acceptor).Start
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:64 +0x357

goroutine 37 [select]:
github.com/quickfixgo/quickfix.(*session).run(0xc000194200)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/session.go:739 +0x2b9
github.com/quickfixgo/quickfix.(*Acceptor).Start.func1(0xc000194200, 0xc000192000)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:65 +0x2b
created by github.com/quickfixgo/quickfix.(*Acceptor).Start
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:64 +0x357

goroutine 38 [select]:
github.com/quickfixgo/quickfix.(*session).run(0xc000194400)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/session.go:739 +0x2b9
github.com/quickfixgo/quickfix.(*Acceptor).Start.func1(0xc000194400, 0xc000192000)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:65 +0x2b
created by github.com/quickfixgo/quickfix.(*Acceptor).Start
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:64 +0x357

goroutine 39 [select]:
github.com/quickfixgo/quickfix.(*session).run(0xc000194600)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/session.go:739 +0x2b9
github.com/quickfixgo/quickfix.(*Acceptor).Start.func1(0xc000194600, 0xc000192000)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:65 +0x2b
created by github.com/quickfixgo/quickfix.(*Acceptor).Start
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:64 +0x357

goroutine 40 [select]:
github.com/quickfixgo/quickfix.(*session).run(0xc000194800)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/session.go:739 +0x2b9
github.com/quickfixgo/quickfix.(*Acceptor).Start.func1(0xc000194800, 0xc000192000)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:65 +0x2b
created by github.com/quickfixgo/quickfix.(*Acceptor).Start
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:64 +0x357

goroutine 41 [select]:
github.com/quickfixgo/quickfix.(*session).run(0xc000194a00)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/session.go:739 +0x2b9
github.com/quickfixgo/quickfix.(*Acceptor).Start.func1(0xc000194a00, 0xc000192000)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:65 +0x2b
created by github.com/quickfixgo/quickfix.(*Acceptor).Start
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:64 +0x357

goroutine 42 [select]:
github.com/quickfixgo/quickfix.(*session).run(0xc000194c00)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/session.go:739 +0x2b9
github.com/quickfixgo/quickfix.(*Acceptor).Start.func1(0xc000194c00, 0xc000192000)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:65 +0x2b
created by github.com/quickfixgo/quickfix.(*Acceptor).Start
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:64 +0x357

goroutine 43 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x2f90d00, 0x72, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000192d98, 0x72, 0xc0001d6100, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000192d98, 0xffffffffffffff00, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc000192d80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_unix.go:384 +0x1a0
net.(*netFD).accept(0xc000192d80, 0x8, 0xc0001aa720, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc000154128, 0x102c6e2, 0x8, 0xc0001d6180)
/usr/local/Cellar/go/1.11.2/libexec/src/net/tcpsock_posix.go:139 +0x2e
net.(*TCPListener).Accept(0xc000154128, 0x1ad3dd0, 0xc000192070, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/tcpsock.go:260 +0x47
github.com/quickfixgo/quickfix.(*Acceptor).listenForConnections(0xc000192000)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:123 +0x92
created by github.com/quickfixgo/quickfix.(*Acceptor).Start
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/acceptor.go:71 +0x3bc

goroutine 20 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc000146270)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 6 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc000138510)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 51 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc0001d2000)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 21 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc0001462a0)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 7 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc000138540)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 52 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc0001d2030)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 8 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc000138570)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 9 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc0001385a0)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 10 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc0001385d0)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 11 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc000138600)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 12 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc000138630)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 13 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc000138660)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 53 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc0001d2060)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 54 [select, 3 minutes]:
github.com/quickfixgo/quickfix/internal.NewEventTimer.func1(0xc0001d2090)
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:27 +0xe8
created by github.com/quickfixgo/quickfix/internal.NewEventTimer
/Users/robertengels/go/src/github.com/quickfixgo/quickfix/internal/event_timer.go:23 +0xc7

goroutine 44 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x2f90c30, 0x72, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000192e98, 0x72, 0xc000152100, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000192e98, 0xffffffffffffff00, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc000192e80, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_unix.go:384 +0x1a0
net.(*netFD).accept(0xc000192e80, 0x1000000000000010, 0xc0001aae00, 0x1081d9fecee095d8)
/usr/local/Cellar/go/1.11.2/libexec/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc000154130, 0x102c6e2, 0xc000152170, 0xc0001aae90)
/usr/local/Cellar/go/1.11.2/libexec/src/net/tcpsock_posix.go:139 +0x2e
net.(*TCPListener).Accept(0xc000154130, 0x1ad3080, 0xc000177200, 0xc0001480a0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/tcpsock.go:260 +0x47
google.golang.org/grpc.(*Server).Serve(0xc000177200, 0x1b614e0, 0xc000154130, 0x0, 0x0)
/Users/robertengels/go/src/google.golang.org/grpc/server.go:557 +0x210
main.main.func1(0xc000177200, 0x1b614e0, 0xc000154130)
/Users/robertengels/go/src/github.com/robaho/go-trader/cmd/exchange/main.go:90 +0x43
created by main.main
/Users/robertengels/go/src/github.com/robaho/go-trader/cmd/exchange/main.go:89 +0x6b1

goroutine 45 [syscall]:
syscall.Syscall6(0x16b, 0x9, 0x0, 0x0, 0xc0001ab688, 0xa, 0x26180a0, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/syscall/asm_darwin_amd64.s:41 +0x5
golang.org/x/sys/unix.kevent(0x9, 0x0, 0x0, 0xc0001ab688, 0xa, 0x26180a0, 0x0, 0x0, 0x0)
/Users/robertengels/go/src/golang.org/x/sys/unix/zsyscall_darwin_amd64.go:207 +0x9a
golang.org/x/sys/unix.Kevent(0x9, 0x0, 0x0, 0x0, 0xc0001ab688, 0xa, 0xa, 0x26180a0, 0x0, 0x0, ...)
/Users/robertengels/go/src/golang.org/x/sys/unix/syscall_bsd.go:419 +0x71
gopkg.in/fsnotify%2ev1.read(0x9, 0xc0001ab688, 0xa, 0xa, 0x26180a0, 0xc0001ab688, 0x0, 0xa, 0x0, 0x0)
/Users/robertengels/go/src/gopkg.in/fsnotify.v1/kqueue.go:511 +0x6e
gopkg.in/fsnotify%2ev1.(*Watcher).readEvents(0xc00013e660)
/Users/robertengels/go/src/gopkg.in/fsnotify.v1/kqueue.go:274 +0x6a6
created by gopkg.in/fsnotify%2ev1.NewWatcher
/Users/robertengels/go/src/gopkg.in/fsnotify.v1/kqueue.go:62 +0x185

goroutine 46 [select, 3 minutes]:
github.com/gernest/hot.(*Template).Init.func1(0xc0001416b0, 0xc00013e660)
/Users/robertengels/go/src/github.com/gernest/hot/hot.go:75 +0x1ab
created by github.com/gernest/hot.(*Template).Init
/Users/robertengels/go/src/github.com/gernest/hot/hot.go:73 +0x2a8

goroutine 16 [IO wait]:
internal/poll.runtime_pollWait(0x2f909c0, 0x72, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000242018, 0x72, 0xc00015a400, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000242018, 0xffffffffffffff00, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc000242000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_unix.go:384 +0x1a0
net.(*netFD).accept(0xc000242000, 0x104c12a, 0x105e690, 0xc0001deda0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc0001be018, 0x1910980, 0x1ad3b38, 0xc0001dedf0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/tcpsock_posix.go:139 +0x2e
net.(*TCPListener).AcceptTCP(0xc0001be018, 0xc0001dee18, 0x1098146, 0x5c104107)
/usr/local/Cellar/go/1.11.2/libexec/src/net/tcpsock.go:247 +0x47
net/http.tcpKeepAliveListener.Accept(0xc0001be018, 0xc0001dee68, 0x18, 0xc0001d4d80, 0x132b915)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:3232 +0x2f
net/http.(*Server).Serve(0xc000238000, 0x1b62520, 0xc0001be018, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:2826 +0x22f
net/http.(*Server).ListenAndServe(0xc000238000, 0xc000238000, 0x1ad2ee0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:2764 +0xb6
net/http.ListenAndServe(0xc00015e9d8, 0x5, 0x0, 0x0, 0x1b4d120, 0xc0001d2240)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:3004 +0x74
github.com/robaho/go-trader/internal/exchange.StartWebServer.func1(0xc00015e9d8, 0x5)
/Users/robertengels/go/src/github.com/robaho/go-trader/internal/exchange/webserver.go:50 +0x1c9
created by github.com/robaho/go-trader/internal/exchange.StartWebServer
/Users/robertengels/go/src/github.com/robaho/go-trader/internal/exchange/webserver.go:40 +0x13b

goroutine 66 [IO wait, 3 minutes]:
internal/poll.runtime_pollWait(0x2f90a90, 0x72, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000193098, 0x72, 0xc00015a600, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000193098, 0xffffffffffffff00, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Accept(0xc000193080, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_unix.go:384 +0x1a0
net.(*netFD).accept(0xc000193080, 0x25fb040, 0x30, 0x30)
/usr/local/Cellar/go/1.11.2/libexec/src/net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc0001541d8, 0x50, 0x2d00d80, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/tcpsock_posix.go:139 +0x2e
net.(*TCPListener).AcceptTCP(0xc0001541d8, 0x100e0b8, 0x30, 0x19d55a0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/tcpsock.go:247 +0x47
net/http.tcpKeepAliveListener.Accept(0xc0001541d8, 0x19d55a0, 0xc0002246c0, 0x1954400, 0x25e7a90)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:3232 +0x2f
net/http.(*Server).Serve(0xc0001635f0, 0x1b62520, 0xc0001541d8, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:2826 +0x22f
net/http.(*Server).ListenAndServe(0xc0001635f0, 0xc0001635f0, 0xc000224540)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:2764 +0xb6
net/http.ListenAndServe(0x1aad6a6, 0x5, 0x1b4c120, 0xc000224540, 0x1ad2e80, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:3004 +0x74
github.com/robaho/go-trader/internal/exchange.StartWebServer.func2()
/Users/robertengels/go/src/github.com/robaho/go-trader/internal/exchange/webserver.go:56 +0xa0
created by github.com/robaho/go-trader/internal/exchange.StartWebServer
/Users/robertengels/go/src/github.com/robaho/go-trader/internal/exchange/webserver.go:53 +0x153

goroutine 67 [sleep]:
time.Sleep(0x3b9aca00)
/usr/local/Cellar/go/1.11.2/libexec/src/runtime/time.go:105 +0x14f
github.com/robaho/go-trader/internal/exchange.websocketPublisher()
/Users/robertengels/go/src/github.com/robaho/go-trader/internal/exchange/webserver.go:183 +0x13c
created by github.com/robaho/go-trader/internal/exchange.StartWebServer
/Users/robertengels/go/src/github.com/robaho/go-trader/internal/exchange/webserver.go:62 +0x16b

goroutine 72 [IO wait]:
internal/poll.runtime_pollWait(0x2f908f0, 0x72, 0xc0001ad658)
/usr/local/Cellar/go/1.11.2/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc000160118, 0x72, 0xffffffffffffff00, 0x1b4d5e0, 0x25b0738)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc000160118, 0xc000146300, 0x1, 0x1)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc000160100, 0xc0001463a1, 0x1, 0x1, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_unix.go:169 +0x1d6
net.(*netFD).Read(0xc000160100, 0xc0001463a1, 0x1, 0x1, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc00014c008, 0xc0001463a1, 0x1, 0x1, 0x0, 0x0, 0x0)
/usr/local/Cellar/go/1.11.2/libexec/src/net/net.go:177 +0x68
net/http.(*connReader).backgroundRead(0xc000146390)
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:676 +0x5a
created by net/http.(*connReader).startBackgroundRead
/usr/local/Cellar/go/1.11.2/libexec/src/net/http/server.go:672 +0xd2

so it does seem that these routines are there, but the problem is they had no state transitions during the trace capture time.

So I think that either 1) they should be omitted entirely, or 2) the trace capture should also perform a 'full stack dump' at trace start, and then create events like "in syscall", not just "start syscall", "stop syscall", or some other way of tracking where/what these routines are.

Btw, I captured the trace while the application was idle - which leads to lots of PC:0 routines.

*** Notice though that some of the routines in the trace capture do not appear at all in the 'stack dump', like 2 or 3.... which is very strange.

@hyangah
Copy link
Contributor

hyangah commented Dec 12, 2018

Those goroutines don't affect any operation latency which the tracing tool aims to analyze. I'd go with omitting them. Still the count is meaningful since it give us an idea about how many goroutines are in the system during system.

What performance problems do you want the new page to help diagnosing?
Btw the individual goroutine list is useful especially when analyzing tail latency.

@robaho
Copy link
Author

robaho commented Dec 12, 2018

@hyangah I think the next enhancement would be to record the 'event' where the 'max' occurred, so then you could click on the 'max' on open the detailed trace window to that exact moment in time.

I believe the purpose of the tool is to easily observe "anti-patterns" - code doing little work before a context switch, scheduler inefficiencies (scheduler delays). The min/avg/max gives you a simplified histogram of the performance of the routine, and for a request/response, or event/reaction type system this is very important.

The individual list by routine is still there - you click on the routine name from the summary - which then reports by routine the various timings, including min/avg/max.

To summarize, the old summary page and pprof are not suitable for diagnosing performance problems in a concurrent environment.

I will say that the 'user events/tasks' is the way to Go! This is ideal for monitoring this type of work (multi-stage, multi-event) - I think it is a shame that it is not promoted and documented more.

@hyangah
Copy link
Contributor

hyangah commented Dec 12, 2018

You can select goroutine with min/max of each metric from the goroutine page (The table is sortable).
The reason that I presented only the sum of the execution time in the goroutine page is I couldn't figure out how to interpret the sum of other metrics (e.g. why should I care about the sum of sync block time from thousands of goroutines?)

On the other hand, I found the pprof outputs are more helpful in analyzing inefficiencies involved in context switch, or scheduling while debugging production issues. If some of the information you present in your prototype is useful, how about presenting them in each goroutine view?

I agree that this tool needs more love and the UI needs improvement. I think the discussion will be more fruitful if we have the list of specific questions along with examples, the new visualization can help to answer.

@robaho
Copy link
Author

robaho commented Dec 12, 2018

@hyangah it is not the sorting - that is min/max by total - I added the min/avg/max which is per event.

For example you may have 1000 ms of sync block, but if you see that the min/avg/max is all under 100 nanos, you don't really have any contention - it is just acquiring the mutex a lot...

Similarly, you might have 1000 ms of execution time, but if the min/max/avg is 10 us, that means that the code if doing very little before it is descheduled, but it is running a lot.

Using "totals" is difficult for performance analysis.

Similarly in pprof, it doesn't break things apart by go routine - this is the biggest problem with pprof. It s the aggregate only - so it if very hard to see what is happening - if you look at the pprof output in issue #29104 you will see that it is not very useful.

@hyangah
Copy link
Contributor

hyangah commented Dec 12, 2018

@robaho I am not sure if I understand all correctly but it seems like I was questioning the value of the new goroutine overview page in #29103 (comment)
and you are talking about the detail page (maybe?)

I think the distribution of the each observed event duration is a good addition and I want it.
Is it possible to use traditional histograms instead of simple min/avg/max (e.g. what about sample size, variance, and median?), and present them in the detail page?

For the parsing api (those in internal/), the api is not stable and no where near to be public - at some point, we were thinking about changing it dramatically for optimization. Good news is that it's part of Go release and doesn't change often. So what I usually do is just have a shell script that copies and replaces the import paths with sed and run it twice a year.

@robaho
Copy link
Author

robaho commented Dec 12, 2018

@hyangah I think I understand your concerns. The overview page is essential IMO, otherwise you need to jump into each go routine to find the cpu usage (for example), and the remember it to perform comparisons - the summary allows easy determination of which go routines are consuming the cpu (pprof doesn't give this). It also allows easy access to the analysis graphs without jumping into each routine.

I did change the detail page to show the min/avg/max - I just didn't post the screen shot - which is really important as it makes it easy to determine when you have multiple routines of the same type if they are distributing the load equally, and where the max latencies are occurring - totals don't allow this.

I think using a histogram would be great, the only problem is that the UI might need to be more advanced, as it is already crowded just showing min/avg/max in addition to the total.

In some ways the totals could be removed (in both the summary and the detail), since they are reflected in the bar, and then there would be more room to show histogram details.

@robaho
Copy link
Author

robaho commented Dec 12, 2018

@hyangah to clarify, the summary page shows the min/avg/max across all go routines of that type, so it is very easy to find high max and avg then drill deeper.

@robaho
Copy link
Author

robaho commented Dec 12, 2018

@hyangah I just posted another commit that allows you to perform a detailed trace on the selected go routine group by clicking on the 'count' of that group in the summary page. This is very useful as well. I think it may be helpful if you clone the repo and run it live, as the screen shots don't represent the usage too well.

@aktau
Copy link
Contributor

aktau commented Feb 17, 2022

I agree with @robaho that this sort of summary view would be nice. What's the current blocker?

@robaho
Copy link
Author

robaho commented Feb 18, 2022

You can use the goanalyzer in my github. I think it is simply a lower priority item to address. Plus I need to probably create a patch set which I’ve been reluctant to do since it didn’t seem it was going to be accepted.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants