Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/metrics: add /gc/heap/live:bytes #56857

Closed
felixge opened this issue Nov 20, 2022 · 22 comments
Closed

runtime/metrics: add /gc/heap/live:bytes #56857

felixge opened this issue Nov 20, 2022 · 22 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Proposal Proposal-Accepted
Milestone

Comments

@felixge
Copy link
Contributor

felixge commented Nov 20, 2022

There appears to be no way to monitor what is called the Live Heap in the GC Guide.

Screen Shot 2022-11-20 at 04 25 30

The closest runtime/metric seems to be /memory/classes/heap/objects:bytes which is Live Heap + New Heap (i.e. includes potentially dead but unswept allocations).

Monitoring Live Heap seems to be useful when running with GOGC=off and a GOMEMLIMIT to understand if the live heap of a program is approaching the GOMEMLIMIT which could lead to high CPU usage or OOM kills.

Therefor I'm proposing to add /memory/classes/heap/live_objects:bytes and /gc/heap/live_objects:objects to the runtime/metrics package. They should return similar values as what is currently returned by the InUse* values of the runtime.MemProfile() profile. It's probably worth to discuss whether the term InUse is preferable over Live.

I created a very hacky prototype of this here, but there is probably a much better way to do this.

@gopherbot gopherbot added this to the Proposal milestone Nov 20, 2022
@seankhliao
Copy link
Member

cc @mknyszek

@mknyszek
Copy link
Contributor

A similar proposal was rejected previously (#51966). Part of the problem was inclusion in MemStats, but runtime/metrics also existed and I made some other arguments against it. Mainly, I think relying on it solely as an indicator can be somewhat fragile if you put it in context of the Go 1.18 GOGC changes. I wasn't ever fundamentally opposed to this, but at the time for the use-case presented, it just didn't seem worth it. I also wanted to think more carefully about what implications it could have (and why we didn't do it already).

However, I do think your argument that it's useful for identifying issues with a memory limit is "new information" (now that the memory limit has landed) as far as the proposal processes goes. I think that might be enough to push this over the edge into more serious consideration for me personally.

I think to start with I'd be inclined at this point to call this /gc/heap/live:bytes. This way it lives next to /gc/heap/goal:bytes. /memory/classes to me is more about a breakdown of memory at a higher level and all the metrics inside there are supposed to be orthogonal. We'd have to break up some of the other metrics and I don't think it's worth it for this.

If we do this, we should probably also export scannable stack bytes and scannable globals bytes, at least as far as they go into the GOGC calculation. We should also really export GOGC and the memory limit as well, since they can change during application execution.

I don't really see enough utility from exporting the count of live objects though. I think a memory profile is better for that sort of thing anyway. (Also, we don't currently track that (mod memory profiling) and it's more work.)

@mknyszek mknyszek added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 21, 2022
@mknyszek
Copy link
Contributor

I created a very hacky prototype of this here, but there is probably a much better way to do this.

Yeah, there's a simpler way for bytes. :) You want gcController.heapMarked. You can just read that directly in the metrics callback without any synchronization since it only gets updated during a STW.

@felixge
Copy link
Contributor Author

felixge commented Nov 21, 2022

Sounds great. I'll update my patch based on your suggestions.

I also wanted to think more carefully about what implications it could have (and why we didn't do it already).

Yeah, that makes sense. Seems like this is missing the code freeze anyway.

@felixge
Copy link
Contributor Author

felixge commented Nov 21, 2022

I updated the CL to use gcController.heapMarked, but found one odd thing.

If we do this, we should probably also export scannable stack bytes and scannable globals bytes, at least as far as they go into the GOGC calculation. We should also really export GOGC and the memory limit as well, since they can change during application execution.

I can prepare CLs for those as well. Should I submit them separately?

@mknyszek
Copy link
Contributor

I updated the CL to use gcController.heapMarked, but found one odd thing.

It's an accounting artifact currently required for good performance. It would be nice to fix, but I'm not yet sure how without either (1) impacting the allocation fast path or (2) slowing down metrics.Read to ReadMemStats levels (this path would avoid the global latency impact that ReadMemStats has because it's not necessary to actually, fully stop the world, but the worst-case latency of metrics.Read becomes effectively the same as stopping the world).

For users that really need this consistency (like the testing package), ReadMemStats provides it, so there is a workaround. Most of the time though, the skew doesn't make a very meaningful difference, and the memory stats (in /memory/classes) are all at least guaranteed to be consistent with each other. It might be worth mentioning in the docs that this stat specifically isn't necessarily consistent with the /memory/classes stats. We can lift this restriction in the future if we can figure out a way to flush cached efficiently (I'll keep thinking about it).

If we do this, we should probably also export scannable stack bytes and scannable globals bytes, at least as far as they go into the GOGC calculation. We should also really export GOGC and the memory limit as well, since they can change during application execution.

I can prepare CLs for those as well. Should I submit them separately?

Either or. Your current CL is not very big.

@rsc
Copy link
Contributor

rsc commented Nov 30, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc changed the title proposal: runtime/metrics: add live_objects metrics proposal: runtime/metrics: add /gc/heap/live:bytes Dec 7, 2022
@rsc
Copy link
Contributor

rsc commented Dec 7, 2022

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented Dec 14, 2022

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: runtime/metrics: add /gc/heap/live:bytes runtime/metrics: add /gc/heap/live:bytes Dec 14, 2022
@rsc rsc modified the milestones: Proposal, Backlog Dec 14, 2022
@prattmic prattmic modified the milestones: Backlog, Go1.21 Dec 14, 2022
@prattmic
Copy link
Member

@felixge were you planning to implement this?

@felixge
Copy link
Contributor Author

felixge commented Dec 14, 2022

@felixge were you planning to implement this?

Yes, I'll try to submit a good patch for /gc/heap/live:byte 👍.

If we do this, we should probably also export scannable stack bytes and scannable globals bytes, at least as far as they go into the GOGC calculation. We should also really export GOGC and the memory limit as well, since they can change during application execution.

If time allows I could also try to look into these, but I'm not sure if they'd be covered by the accepted proposal or would need separate proposals?

@mknyszek
Copy link
Contributor

That's a good question. I personally feel we should just include those in the proposal.

@gopherbot
Copy link

Change https://go.dev/cl/497315 mentions this issue: runtime/metrics: add /gc/heap/live:bytes

@gopherbot
Copy link

Change https://go.dev/cl/497317 mentions this issue: runtime/metrics: add /gc/heap/goal:percent

@gopherbot
Copy link

Change https://go.dev/cl/497316 mentions this issue: runtime/metrics: add /gc/heap/limit:bytes

@gopherbot
Copy link

Change https://go.dev/cl/497320 mentions this issue: runtime/metrics: add /gc/global/scannable:bytes

@gopherbot
Copy link

Change https://go.dev/cl/497319 mentions this issue: runtime/metrics: add /gc/stack/scannable:bytes

@gopherbot
Copy link

Change https://go.dev/cl/497575 mentions this issue: runtime/metrics: add /gc/scan/heap:bytes

@gopherbot
Copy link

Change https://go.dev/cl/497576 mentions this issue: runtime/metrics: add /gc/scan/total:bytes

gopherbot pushed a commit that referenced this issue May 23, 2023
For #56857

Change-Id: I0622af974783ab435e91b9fb3c1ba43f256ee4ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/497315
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue May 23, 2023
For #56857

Change-Id: I184d752cc615874ada3d0dbc6ed1bf72c8debd0f
Reviewed-on: https://go-review.googlesource.com/c/go/+/497316
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
gopherbot pushed a commit that referenced this issue May 23, 2023
For #56857

Change-Id: I7e7d2ea3e6ab59291a4cd867c680605ad75bd21f
Reviewed-on: https://go-review.googlesource.com/c/go/+/497317
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
gopherbot pushed a commit that referenced this issue May 23, 2023
For #56857

Change-Id: I58187d7c4112b35951014ab14f2969bed7f4c8e1
Reviewed-on: https://go-review.googlesource.com/c/go/+/497319
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue May 23, 2023
For #56857

Change-Id: I748fd2a33ee76d9a83ea42f2ebf6d9edda243301
Reviewed-on: https://go-review.googlesource.com/c/go/+/497320
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue May 23, 2023
For #56857

Change-Id: If3b962f575c33b2cc29f89e33c7aafb476d98ce9
Reviewed-on: https://go-review.googlesource.com/c/go/+/497575
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
gopherbot pushed a commit that referenced this issue May 23, 2023
For #56857

Change-Id: I10dbc5db506c95b7578c2b6baf051a351f68bb2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/497576
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
@felixge
Copy link
Contributor Author

felixge commented May 23, 2023

All of the metrics discussed in this proposal have landed, see patches above. Closing this 🥳

Thanks @mknyszek et al. for reviews and discussion 🙇

@felixge felixge closed this as completed May 23, 2023
@gopherbot
Copy link

Change https://go.dev/cl/497880 mentions this issue: runtime: ensure consistency of /gc/scan/*

gopherbot pushed a commit that referenced this issue May 24, 2023
Currently /gc/scan/total:bytes is computed as a separate sum. Compute it
using the same inputs so it's always consistent with the sum of
everything else in /gc/scan/*.

For #56857.

Change-Id: I43d9148a23b1d2eb948ae990193dca1da85df8a3
Reviewed-on: https://go-review.googlesource.com/c/go/+/497880
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
@gopherbot
Copy link

Change https://go.dev/cl/498075 mentions this issue: doc: add release note for runtime/metrics additions

gopherbot pushed a commit that referenced this issue May 24, 2023
For #56857.

Change-Id: I03bdba906d271d97ce29874c50d5aba55dc285b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/498075
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Proposal Proposal-Accepted
Projects
Status: Accepted
Development

No branches or pull requests

6 participants