runtime/metrics: /memory/classes/heap/unused:bytes spikes #67019
Labels
compiler/runtime
Issues related to the Go compiler and/or runtime.
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Go version
go1.22.1
Output of
go env
in your module/workspace:What did you do?
We're trying to build nice dashboards to expose runtime/metrics to all Datadog users. For this reason we started rolling out a package that collects these metrics across our fleet.
What did you see happen?
While graphing the data, we noticed occasional spikes in the memory metrics. Upon closer inspection, we discovered that all of these spikes were caused by the
/memory/classes/heap/unused:bytes
metric.These spikes are pretty rare (e.g. 18 spikes per day in the last 24h for a very large fleet), but frequent enough to cause problems with building nice dashboards. The issue occurs across architectures (arm64, amd64), instance types, and hyperscalers without any clear pattern.
We suspect the large values are the result of an underflow in the runtime/metrics code:
To investigate further we started logging the values (we internally use float64 for storage). Below are a few values we logged and their distance from
math.MaxUint64
(assuming it's indeed an underflow we're seeing here).We also logged the values of all
KindUint64
runtime metrics that were collected as part of the samemetrics.Read()
call. I've dumped the results into this sheet (apologies for the formatting)What did you expect to see?
No spikes.
cc @mknyszek
The text was updated successfully, but these errors were encountered: