runtime: Additional Allocator Metrics in `runtime.MemStats` #11890

matttproud · 2015-07-27T19:55:33Z

Hi, I am wondering if it would be amenable to include several additional core metrics in runtime.MemStats, namely the following measures:

No. of spans pending being released by size class.

This helps server authors understand the discrepancy between reported
heap reservation/allocation versus process RSS.
No. of live spans (with active allocations contained therein) by size class.

Essential corollary for no. 1.
Measure of sum of span occupancy by size class.

This helps server and runtime authors understand level of span reuse
and divine potential problems with heap fragmentation viz-a-viz no. 2.
Sum of span allocations by size class (not of inner object allocations).

This helps server authors understand the aggregate throughput of
memory flow in realtime, a measure of efficiency that is useful to
capture when comparing releases and automated canary tests.
Sum of span releases by size class (not of inner object allocations).

Useful corollary for no. 4.
Summary statistics about age for spans by size classes: min, median, average, max.

Throughput in no. 4 and no. 5 is useful, but this takes the level of detail to a
deeper level.
Cumulative sums of individual allocations made for each span size class.

Useful throughput measure for individual allocations.

These would be inordinately beneficial in gross measurement of fleet memory allocator performance as well as offer server authors deeper telemetric insights into the lifecycles of their allocations. pprof is great for one-off diagnosis but not real-time analytics of the fleet.

I would be happy coming to compromise on these, especially to enhance the language and requirements as well as to possibly volunteer time in the implementation of these representations should we come to agreement.

The text was updated successfully, but these errors were encountered:

bradfitz · 2015-07-28T06:36:11Z

/cc @RLH @aclements

RLH · 2015-07-28T12:20:08Z

Go seem to have exposed the internal implementation concept of a span in memstats but not in any meaningful way such as indicating the size of a span. This is good since it allows future implementations freedom to change the size of a span. Go should preserve this freedom. This proposal seems to imply that one will need to know the size of a span for it to be useful. Is that true?

matttproud · 2015-07-28T14:01:45Z

Yes, that is true. The core reason why is the size class is a useful hint in indicating the size of the individual objects involved in the measurement. For instance, this is to help differentiate between low volume of large object churn versus high volume of small object churn.

As for hiding the implementation details of the runtime, let me offer a few remarks:

Metrics that abstract/hide what the runtime is really doing under the hood are of specious value,
because it forces the consumers of said metrics to study the runtime in much greater detail than
the case of just exposing the raw internals bare. A quick analogy of this from the world of
Java: MBean and MXBeans (example package) are the
training wheels of internal runtime metrics; whereas anyone worth his or her salt uses HSPerfData
example because of the level of actionable detail it provides. At
the very worst, the Bean metric approach shoehorns implementation-specific details into
standardized types that either do not make sense or lose needed detail. Most of this pain occurs
at runtime when one gets a runtime exception or zero values from a bean that exists as a
placeholder but has no relevance for the runtime in use.

The question I have is this: what happens the moment there are alternative runtimes available for
Go? It seems doubtful that each runtime would be able to meaningfully shoehorn its allocator
metrics into runtime.MemStats unless they all use the same paradigms.
The Go 1.X Stability Grant makes a lot of sense in general, but
it may not be well to apply it doctrinally to the runtime package since that one is out of all of the
packages the most likely to undergo churn behind the scenes.
The domain of runtime is ultimately implementation-specific. Treating it as less than that is a
disservice to the user.

I think there is a middleground; just not sure what it is.

aclements · 2015-07-28T14:13:23Z

Details aside, there's been some discussion of the philosophy of MemStats over on #10323. I agree that we have to expose implementation details in order for it to be truly useful, but we've already made the mistake of exposing such details in MemStats that are no longer really relevant, but are covered under Go 1 compatibility.

matttproud · 2015-07-28T14:19:34Z

Yep, that bug had been the impetus for filing this one (just get the need
out there on public record).

Is it possible to, for instance, keep the legacy fields in the struct and
mark them as "do not use" in the field comments or let go fix assist in
auto-migrations where it makes sense? Alternatively, could we create a
proposal to weaken the stability grant around select aspects of the
runtime? Each of these seems tractable, admittedly with differing levels
of satisfaction, smell, and disgust.

dvyukov · 2015-12-03T11:52:17Z

Current HeapSys/HeapAlloc/HeapInuse/HeapIdle/HeapReleased allow to answer 1, 2 and 3 questions (minus by-size-class part, which is more of an implementation detail, and when you think about RSS size classes are more-of-less irrelevant).
I don't understand what exactly you want to do with 4, 5, 6 and 7 numbers. Please elaborate.

cristaloleg · 2021-07-08T16:00:05Z

Kindly ping @matttproud 😉

matttproud · 2021-07-14T15:47:26Z

The original motivation was spelled out in the top-level filing, but in the interim canary analysis (CAS) has been explicated in the public domain: https://research.google/pubs/pub46908/ The motivation is to determine whether a release regresses in terms of resource efficiency at runtime. Knowing information about size class activity and liveliness has been useful in determining whether design assumptions about memory lifetime are correct, which is useful for scaleable, low-latency server design. I materially needed this when building Prometheus, which had multi-modal memory lifetimes. Particularly is my server using memory in a way that will promote heap fragmentation in a containerized environment where an out-of-memory (OOM) killer will terminate it unceremoniously?

…

mknyszek · 2022-09-29T15:19:29Z

A lot of these are now doable with runtime/metrics, since we have a path out for implementation-defined metrics.

ianlancetaylor changed the title ~~Additional Allocator Metrics in runtime.MemStats~~ runtime: Additional Allocator Metrics in runtime.MemStats Jul 27, 2015

ianlancetaylor added this to the Unplanned milestone Jul 27, 2015

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022

matttproud mentioned this issue Sep 29, 2022

proposal: runtime/pprof: add “heaptime” bytes*GCs memory profile #55900

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: Additional Allocator Metrics in `runtime.MemStats` #11890

runtime: Additional Allocator Metrics in `runtime.MemStats` #11890

matttproud commented Jul 27, 2015

bradfitz commented Jul 28, 2015

RLH commented Jul 28, 2015

matttproud commented Jul 28, 2015

aclements commented Jul 28, 2015

matttproud commented Jul 28, 2015

dvyukov commented Dec 3, 2015

cristaloleg commented Jul 8, 2021

matttproud commented Jul 14, 2021 via email

mknyszek commented Sep 29, 2022

runtime: Additional Allocator Metrics in runtime.MemStats #11890

runtime: Additional Allocator Metrics in runtime.MemStats #11890

Comments

matttproud commented Jul 27, 2015

bradfitz commented Jul 28, 2015

RLH commented Jul 28, 2015

matttproud commented Jul 28, 2015

aclements commented Jul 28, 2015

matttproud commented Jul 28, 2015

dvyukov commented Dec 3, 2015

cristaloleg commented Jul 8, 2021

matttproud commented Jul 14, 2021 via email

mknyszek commented Sep 29, 2022

runtime: Additional Allocator Metrics in `runtime.MemStats` #11890

runtime: Additional Allocator Metrics in `runtime.MemStats` #11890