Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: give more useful profiling output for time.Now (and other vDSO functions) on Linux #24142

Closed
cespare opened this issue Feb 27, 2018 · 5 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@cespare
Copy link
Contributor

cespare commented Feb 27, 2018

(Moving this over from a discussion at https://groups.google.com/d/msg/golang-nuts/83xaAgzL2cU/OcLIOdi8CwAJ.)

These results are on Go 1.10 / Linux / amd64, but they're unchanged from previous versions.

On Linux, time.Now (and anything else which calls runtime.walltime and runtime.nanotime) are implemented using a vDSO. When a profiling signal arrives during the vDSO call, pprof attributes that time to the catch-all runtime._ExternalCode, unrelated to any Go function.

Here's a demonstration:

go test -run xxx -bench '^BenchmarkNow$' -benchtime 5s -cpuprofile time.prof time
(pprof) top10
Showing nodes accounting for 5320ms, 100% of 5320ms total
      flat  flat%   sum%        cum   cum%
    3910ms 73.50% 73.50%     3910ms 73.50%  runtime._ExternalCode
     400ms  7.52% 81.02%      660ms 12.41%  time.now
     330ms  6.20% 87.22%      330ms  6.20%  runtime.nanotime
     310ms  5.83% 93.05%      970ms 18.23%  time.Now
     240ms  4.51% 97.56%      240ms  4.51%  runtime.walltime
     130ms  2.44%   100%     1100ms 20.68%  time_test.BenchmarkNow
         0     0%   100%     4220ms 79.32%  runtime._System
         0     0%   100%     1100ms 20.68%  testing.(*B).launch
         0     0%   100%     1100ms 20.68%  testing.(*B).runN

screen_20180226140615

This is unhelpful for the user in several ways:

  1. This time is counted in the same bucket as any other external code (such as cgo). Most users will not expect that runtime._ExternalCode is due to a core standard library function like time.Now.
  2. It doesn't actually tell the user what part of the Go code is the source of the time.Now calls.

@ianlancetaylor kindly cooked up https://golang.org/cl/97315 to address (1), so that at least this will be marked as _VDSO instead of _ExternalCode.

Even with this change, it's still hard for users to know that the VDSO time is (likely, but not necessarily) due to time.Now calls, or where those calls originate.

By contrast, if I have a program which is slow because it's making many os.Stat calls, it will be clear from the profile graph that the cause is os.Stat (and what function is calling os.Stat), even though the big red box with all the self time is labeled syscall.Syscall.

I don't know how this would be implemented or even if it's feasible, but I hope that something better can be done for these vDSO calls as well.

This is not a hypothetical or benchmark-only problem. Here is a small part of a pprof graph from an important production service that I'm trying to debug:

screen_20180226142451

(This is a CGO_ENABLED=0 build.)

@cespare cespare added this to the Go1.11 milestone Feb 27, 2018
@ianlancetaylor
Copy link
Contributor

It is certainly possible to get the complete call stack for the call to time.Now, but the only way I can think to do it involves adding at least three memory stores to runtime.walltime, effectively treating it as a cgo call (which is what happens with the stat system call). I'm hesitant to do that since the function is so hot, and this has to be something of an unusual case.

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 27, 2018
@gopherbot
Copy link

Change https://golang.org/cl/97315 mentions this issue: runtime: report profiling samples in a VDSO as _VDSO.

@cespare
Copy link
Contributor Author

cespare commented Feb 27, 2018

OK, I discovered a slight wrinkle here: in production, we're running on virtual instances in EC2 and the default Xen clocksource doesn't support vDSO, so time.Now is very slow (~720ns, vs ~50ns on my local machine). This is why I'm seeing such a large _ExternalCode cost in my profiles.

If I understand this correctly, we're going through the __vdso_gettimeofday_sym fallback which does a syscall under the hood.

One option would be to do the cgo-style bookkeeping only in the fallback case where we know we're going to incur the syscall overhead anyway.

@ianlancetaylor
Copy link
Contributor

It's more painful than I thought, as it turns out that the time functions can be called while we are nominally in system call, so simply reusing the fields breaks traceback also.

@gopherbot
Copy link

Change https://golang.org/cl/99617 mentions this issue: runtime: set VDSO values for Solaris nanotime1

gopherbot pushed a commit that referenced this issue Mar 9, 2018
This lets SIGPROF signals get a useful traceback.
Without it we just see sysvicallN calling asmcgocall.

Updates #24142

Change-Id: I5dfe3add51f0c3a4cb1c98acb7738be6396214bc
Reviewed-on: https://go-review.googlesource.com/99617
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
@golang golang locked and limited conversation to collaborators Mar 8, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants