|
|
Descriptionruntime: collect profiles even while on g0 stack
Fixes issue 6417
Patch Set 1 #Patch Set 2 : diff -r f14c05d8145f https://go.googlecode.com/hg/ #Patch Set 3 : diff -r 0408b08e6330 https://go.googlecode.com/hg/ #Patch Set 4 : diff -r 5a1eb92621bf https://go.googlecode.com/hg/ #MessagesTotal messages: 7
Hello golang-dev@googlegroups.com, I'd like you to review this change to https://go.googlecode.com/hg/
Sign in to reply to this message.
*** Submitted as https://code.google.com/p/go/source/detail?r=2f4c2dde2756 *** runtime: collect profiles even while on g0 stack Fixes issue 6417 R=golang-dev, rsc CC=golang-dev https://codereview.appspot.com/14231047
Sign in to reply to this message.
Message was sent while issue was closed.
Russ, I don't have any problems with my computers here, but this has broken windows-amd64 builder: --- FAIL: TestCPUProfileMultithreaded (1.85 seconds) pprof_test.go:129: crc32.ChecksumIEEE: 22 pprof_test.go:129: crc32.Update: 2 pprof_test.go:139: crc32.Update has 2 samples out of 24, want at least 4, ideally 12 FAIL FAIL runtime/pprof 3.265s only 2 samples of crc32.Update. Since this change, profiler calls runtime·stdcall(runtime·GetThreadContext, 2, mp->thread, r) more often: https://code.google.com/p/go/source/browse/src/pkg/runtime/os_windows.c?l1#409 I think it is relatively expensive in this context. And it runs before anything else, because profiler thread has highest priority. crc32.ChecksumIEEE and crc32.Update are CPU intensive too and running on different goroutines. I suspect, our scheduler struggles in this situation. We can run test for longer: c:\go\src\pkg\runtime\pprof>hg diff -g diff --git a/src/pkg/runtime/pprof/pprof_test.go b/src/pkg/runtime/pprof/pprof_ est.go --- a/src/pkg/runtime/pprof/pprof_test.go +++ b/src/pkg/runtime/pprof/pprof_test.go @@ -37,7 +37,7 @@ testCPUProfile(t, []string{"crc32.ChecksumIEEE", "crc32.Update"}, func( { c := make(chan int) go func() { - for i := 0; i < 2000; i++ { + for i := 0; i < 6000; i++ { crc32.Update(0, crc32.IEEETable, buf) } c <- 1 @@ -45,7 +45,7 @@ // This loop takes about a quarter second on a 2 GHz laptop. // We only need to get one 100 Hz clock tick, so we've got // a 25x safety buffer. - for i := 0; i < 2000; i++ { + for i := 0; i < 6000; i++ { crc32.ChecksumIEEE(buf) } <-c c:\go\src\pkg\runtime\pprof>go test -v -i && go test -short -v === RUN TestCPUProfile --- PASS: TestCPUProfile (0.46 seconds) pprof_test.go:129: crc32.ChecksumIEEE: 2 === RUN TestCPUProfileMultithreaded --- PASS: TestCPUProfileMultithreaded (5.53 seconds) pprof_test.go:129: crc32.ChecksumIEEE: 26 pprof_test.go:129: crc32.Update: 8 === RUN TestCPUProfileWithFork --- PASS: TestCPUProfileWithFork (0.47 seconds) === RUN TestGoroutineSwitch --- PASS: TestGoroutineSwitch (0.20 seconds) === RUN TestBlockProfile --- PASS: TestBlockProfile (0.16 seconds) PASS ok runtime/pprof 6.956s but I am not convinced os_windows.c change is good, because it slows things down. What do you think? Alex
Sign in to reply to this message.
Message was sent while issue was closed.
I measured GetThreadContext execution time: diff --git a/src/pkg/runtime/os_windows.c b/src/pkg/runtime/os_windows.c --- a/src/pkg/runtime/os_windows.c +++ b/src/pkg/runtime/os_windows.c @@ -396,6 +396,9 @@ Context *r; void *tls; G *gp; + int64 ns; + extern int64 runtime·nanotime(void); + void *ret; tls = mp->tls; if(mp == &runtime·m0) @@ -406,7 +409,9 @@ // align Context to 16 bytes r = (Context*)((uintptr)(&rbuf[15]) & ~15); r->ContextFlags = CONTEXT_CONTROL; - runtime·stdcall(runtime·GetThreadContext, 2, mp->thread, r); + ns = runtime·nanotime(); + ret = runtime·stdcall(runtime·GetThreadContext, 2, mp->thread, r); + runtime·printf("runtime: gp=%p ret=%p time=%D\n", gp, ret, runtime·nanotime() - ns); runtime·dosigprof(r, gp); } } and I get this: C:\>u:\test -test.v -test.run=Multi === RUN TestCPUProfileMultithreaded runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=62500000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=93750000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=93750000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=93750000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=93750000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=93750000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=93750000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=15625000 runtime: gp=0x620740 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=0 runtime: gp=0x620740 ret=0x1 time=0 --- PASS: TestCPUProfileMultithreaded (0.66 seconds) pprof_test.go:129: crc32.ChecksumIEEE: 7 pprof_test.go:129: crc32.Update: 9 PASS As you can see, for gp=0x10dae280 thread it takes too long. Perhaps it is not good to suspend and analize threads that are in syscall (http://blogs.msdn.com/b/oldnewthing/archive/2003/12/09/55988.aspx). I think we should revert this change :-(. But I have no fix for original problem. Alex
Sign in to reply to this message.
If you do gp != nil && gp->status != Gsyscall, does that help?
Sign in to reply to this message.
Message was sent while issue was closed.
On 2013/10/07 15:26:19, rsc wrote: > If you do gp != nil && gp->status != Gsyscall, does that help? It does not help. It makes no difference: diff --git a/src/pkg/runtime/os_windows.c b/src/pkg/runtime/os_windows.c --- a/src/pkg/runtime/os_windows.c +++ b/src/pkg/runtime/os_windows.c @@ -396,17 +396,22 @@ Context *r; void *tls; G *gp; + int64 ns; + extern int64 runtime·nanotime(void); + void *ret; tls = mp->tls; if(mp == &runtime·m0) tls = runtime·tls0; gp = *(G**)tls; - if(gp != nil) { + if(gp != nil && gp->status != Gsyscall) { // align Context to 16 bytes r = (Context*)((uintptr)(&rbuf[15]) & ~15); r->ContextFlags = CONTEXT_CONTROL; - runtime·stdcall(runtime·GetThreadContext, 2, mp->thread, r); + ns = runtime·nanotime(); + ret = runtime·stdcall(runtime·GetThreadContext, 2, mp->thread, r); + runtime·printf("runtime: gp=%p ret=%p time=%D\n", gp, ret, runtime·nanotime() - ns); runtime·dosigprof(r, gp); } } C:\>u:\test.exe -test.v -test.run=Multi === RUN TestCPUProfileMultithreaded runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=15625000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=46875000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=46875000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=46875000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=46875000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=46875000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=46875000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=46875000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=46875000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=46875000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=46875000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=46875000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=46875000 runtime: gp=0x10dae460 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=31250000 runtime: gp=0x620740 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=0 runtime: gp=0x620740 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=0 runtime: gp=0x620740 ret=0x1 time=0 runtime: gp=0x10dae5a0 ret=0x1 time=0 runtime: gp=0x10dae280 ret=0x1 time=0 runtime: gp=0x620740 ret=0x1 time=0 --- PASS: TestCPUProfileMultithreaded (0.69 seconds) pprof_test.go:129: crc32.ChecksumIEEE: 13 pprof_test.go:129: crc32.Update: 17 PASS Alex
Sign in to reply to this message.
|