Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/metrics: add GOMAXPROCS, NumCgoCall, GC CPU time #47216

Closed
GilAddaCyberark opened this issue Jul 15, 2021 · 17 comments
Closed

runtime/metrics: add GOMAXPROCS, NumCgoCall, GC CPU time #47216

GilAddaCyberark opened this issue Jul 15, 2021 · 17 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Proposal Proposal-Accepted Proposal-FinalCommentPeriod
Milestone

Comments

@GilAddaCyberark
Copy link

Hi
I was testing and using the runtime/metrics package introduced in 1.16
I'm using go version 1.16.5.
I didn't find the following metrics that exists in runtime package in the runtime/metrics package :

  • func NumCgoCall() int64
  • func (r *MemProfileRecord) InUseBytes() int64
  • func (r *MemProfileRecord) InUseObjects() int64

Does it make sense to add them in runtime/metrics pacake?

Does it makes sense to add those values (which are static) as well?

  • func Version() string
  • func GOMAXPROCS(n int) int
  • func NumCPU() int
    Thanks a lot, G.
@seankhliao seankhliao changed the title proposal : runtime/metrics package - additional metrics proposal : runtime/metrics: - additional metrics Jul 15, 2021
@seankhliao seankhliao changed the title proposal : runtime/metrics: - additional metrics proposal : runtime/metrics: additional metrics Jul 15, 2021
@seankhliao
Copy link
Member

cc @mknyszek

@mknyszek
Copy link
Contributor

mknyszek commented Jul 15, 2021

Hi
I was testing and using the runtime/metrics package introduced in 1.16
I'm using go version 1.16.5.
I didn't find the following metrics that exists in runtime package in the runtime/metrics package :

  • func NumCgoCall() int64
  • func (r *MemProfileRecord) InUseBytes() int64
  • func (r *MemProfileRecord) InUseObjects() int64

NumCgoCall could be a good candidate. The other two should already be accounted for as /gc/heap/objects:objects and /memory/classes/heap/objects:bytes respectively.

Does it make sense to add them in runtime/metrics pacake?

Does it makes sense to add those values (which are static) as well?

  • func Version() string
  • func GOMAXPROCS(n int) int
  • func NumCPU() int

Version and NumCPU aren't quite what the package was made for. It doesn't really make sense to track these values over time, since they're static. Though, simultaneously, I understand how analyzing something like this across a fleet is nice, and it's also nice to have one API to export everything. I'll think about it.

GOMAXPROCS on the other hand is probably a good candidate. It can change over time (usually it doesn't, but it can) and it seems like it might be useful to correlate against other metrics. It would cost us very little to support this. Same goes for runtime/debug.SetGCPercent.

Thanks a lot, G.

Thanks for the suggestions!

@mknyszek mknyszek changed the title proposal : runtime/metrics: additional metrics proposal: runtime/metrics: additional metrics Jul 15, 2021
@gopherbot gopherbot added this to the Proposal milestone Jul 15, 2021
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 15, 2021
@mknyszek mknyszek modified the milestones: Proposal, Backlog Jul 15, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jul 28, 2021
@rsc rsc moved this from Incoming to Active in Proposals (old) Aug 4, 2021
@rsc
Copy link
Contributor

rsc commented Aug 4, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Aug 11, 2021

ping @mknyszek. any new thoughts?

@mpx
Copy link
Contributor

mpx commented Aug 19, 2021

Please consider exporting the total time spent perfoming GC as well (work.totaltime or similar). It would be more useful than the processed GCCPUFraction currently provided by runtime.Memstats

@mknyszek
Copy link
Contributor

mknyszek commented Sep 2, 2021

@mpx I'm already ahead of you on that one. GCCPUFraction is intentionally missing from runtime/metrics because of issues with it. Instead, I'm hoping to export a distribution of GC CPU utilizations by GC cycle. It's still preprocessed, but given that the API is sample-based rather than event-based, I think it's the best we can do for the moment.

Total work time also seems fine to me, though it doesn't seem quite as useful without a "total CPU time" counter also (sample them together, take a diff to determine GC CPU utilization since the last sample). We could expose something like that, but measuring exact CPU time is unfortunately kind of tough. In the runtime we just make some assumptions that luckily hold true in 99% (maybe more) of cases. It might be useful to export those assumptions anyway, but I think the aforementioned distribution of GC CPU time is more directly useful for monitoring (and includes those assumptions, necessarily, anyway).

Do you have another use-case for total time that I'm not considering?

@GilAddaCyberark
Copy link
Author

Hi all, thanks for the efforts you invest in that issue. As a telemetry consumer telemetry, we use external libraries to measure the total cpu usage of the process. I'm not sure that this is the right place to discuss it. What are the limitations of the solution you present?
Thanks Gil

@mknyszek
Copy link
Contributor

mknyszek commented Sep 8, 2021

Alrighty, from this issue, I think we should add the following metrics:

  1. GOMAXPROCS
  2. Total GC CPU time.
  3. NumCgoCall.

I don't think there's any point to doing Version or NumCPU because they're static. Those should just be exported yourself (to whatever system) if you want them. It seems unnecessary to "resample" them regularly via the runtime/metrics API, though if you have a good use-case, I'm open to hearing it.

Any objections?

@rsc rsc changed the title proposal: runtime/metrics: additional metrics proposal: runtime/metrics: add GOMAXPROCS, NumCgoCall, GC CPU time Sep 15, 2021
@rsc rsc moved this from Active to Likely Accept in Proposals (old) Sep 15, 2021
@rsc
Copy link
Contributor

rsc commented Sep 15, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@mpx
Copy link
Contributor

mpx commented Sep 17, 2021

@mknyszek For total GC time I'd like to compare it against rusage to understand whether a task is GC heavy without needing all the profiling machinery (which is inaccurate/more involved in other ways). I understand we can't sample both simultaneously, but any error should be small, and will become relatively smaller over the lifetime of the process.

Recording a histogram of GC times would be interesting, but I don't have a specific use for it atm.

@rsc rsc moved this from Likely Accept to Accepted in Proposals (old) Sep 22, 2021
@rsc
Copy link
Contributor

rsc commented Sep 22, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: runtime/metrics: add GOMAXPROCS, NumCgoCall, GC CPU time runtime/metrics: add GOMAXPROCS, NumCgoCall, GC CPU time Sep 22, 2021
@gopherbot
Copy link

Change https://go.dev/cl/404307 mentions this issue: runtime/metrics: add CPU stats

@gopherbot
Copy link

Change https://go.dev/cl/404306 mentions this issue: runtime/metrics: add the number of cgo calls

@gopherbot
Copy link

Change https://go.dev/cl/404305 mentions this issue: runtime/metrics: add gomaxprocs metric

gopherbot pushed a commit that referenced this issue May 13, 2022
For #47216.

Change-Id: Ib2d48c4583570a2dae9510a52d4c6ffc20161b31
Reviewed-on: https://go-review.googlesource.com/c/go/+/404305
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue May 13, 2022
For #47216.

Change-Id: I1c2cd518e6ff510cc3ac8d8f72fd52eadcabc16c
Reviewed-on: https://go-review.googlesource.com/c/go/+/404306
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
@mknyszek
Copy link
Contributor

I got to everything except the GC CPU stats. To get that right it needs a little more thought and a little more implementation work that I just couldn't get to before the freeze. :( My apologies. It's about 80% done, so my hope is that I can land it early in the 1.20 cycle.

@mknyszek mknyszek modified the milestones: Backlog, Go1.20 May 13, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@jmacd
Copy link

jmacd commented Aug 5, 2022

OpenTelemetry-Go would like runtime/metrics access to GCCPUFraction -- however it would be better expressed as a total number of CPU seconds spent in GC for reasons explained here: open-telemetry/opentelemetry-go-contrib#316 and above

I.e., it should have Cumulative=True, ValueKind=Float64, name=/gc/cpu/time:seconds

However, to make that useful, we should also have the rusage number we're comparing to in the same snapshot, so metrics like /cpu/user/time:seconds and /cpu/sys/time:seconds containing the cumulative count of CPU seconds used. Then, the former GCCPUFraction would equal to the metric derived from (/gc/cpu:seconds / (/cpu/user:seconds + /cpu/sys:seconds)) or something like that.

@mknyszek
Copy link
Contributor

mknyszek commented Sep 1, 2022

OpenTelemetry-Go would like runtime/metrics access to GCCPUFraction

We intentionally omitted GCCPUFraction from runtime/metrics initially because it's a somewhat misleading metric. It's averaged over the entire lifetime application, so you'll never be able to identify the GC overheads over a critical window you actually care about. Idle periods, or even just startup time can significantly skew the value, and its changes over time are less useful. Occasionally it still is useful, but as you suggest, it's probably better to just export what it's derived from in the first place.

I.e., it should have Cumulative=True, ValueKind=Float64, name=/gc/cpu/time:seconds

I'm already ahead of you on that. :) Please see https://go/dev/cl/404307 which I hope to land for Go 1.20.

However, to make that useful, we should also have the rusage number we're comparing to in the same snapshot

Comparing against rusage isn't going to work because measuring true CPU time efficiently across platforms is not something the Go runtime currently supports internally (and would likely require a substantial amount of work to do so). But, as you suggest, the aforementioned CL does export a metric for the Go runtime's estimate of total user CPU time that has elapsed (the estimate will be accurate provided that GOMAXPROCS is set such that the environment's resources are not oversubscribed). The other metrics can be compared against that total. The metric descriptions will also explain this caveat.

The metrics may be updated in the future to reflect true CPU time, and I suspect it's better to have an estimate than to not have anything at all.

@mknyszek mknyszek self-assigned this Sep 12, 2022
@golang golang locked and limited conversation to collaborators Sep 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Proposal Proposal-Accepted Proposal-FinalCommentPeriod
Projects
No open projects
Development

No branches or pull requests

7 participants