New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/telemetry/internal/counter: allow StackCounter to skip custom call depth #61234
Comments
We will see if the increased max len and compression mechanism to be added for #61181 alleviate the issue. |
Change https://go.dev/cl/509435 mentions this issue: |
Stack counter names are multiline, where each line is <import-path>.<function-name>:<line-number>. The import paths tend to be long and repeated, making the names stored in count files significantly longer than necessary. This change replaces a repeated import path in the sequence of lines with a single character " (for 'Ditto'). This is a detail of the internals of the count files. Parse() reconstructs the uncompressed names. For: golang/go#61234 Change-Id: I1f1a4160cf7b16a1e2937b56d8c929736bb6ea5c Reviewed-on: https://go-review.googlesource.com/c/telemetry/+/509435 Run-TryBot: Peter Weinberger <pjw@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Jamal Carvalho <jamal@golang.org>
With a longer limit on stack counter names, this is probably not urgent, and may not even be worth the complexity. For example, there are many redundant frames for bug.Errorf here, but on the other hand it doesn't affect the viability of the data, and is somewhat useful to know that these are all bug reports. |
Per discussion today, if we want to have any automation (such as @adonovan's new x/tools/gopls/internal/telemetry/cmd/stacks) that detects e.g. the most significant frame in the call stack, it could be useful to make that frame the top of the call stack. That's a newly compelling reason to do this. |
The new compression of stack counter names replaces the "path" part of a symbol with a ditto mark (
I propose we revert it. |
Add it to the agenda.
…On Mon, Jan 29, 2024 at 5:19 PM Alan Donovan ***@***.***> wrote:
The new compression of stack counter names replaces the "path" part of a
symbol with a ditto mark (") if it is the same as the previous line. I
dislike it for several reasons.
1. The ditto mark (an unbalanced quote) is surprising and confusing.
While it's common in penmanship, it is not a software convention. The first
time I saw it I started debugging my program to find out why it was
emitting garbage.
2. The compression makes the stack harder to use. Grep doesn't work.
My tests in CL 558256 don't work. It requires a stateful decoder to read it.
3. The split function is incorrect for generic symbols, which may
contain periods, making the result even more confusing.
I propose we revert it.
—
Reply to this email directly, view it on GitHub
<#61234 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABJIAI3I5PKILX2XVA42L23YRAN63AVCNFSM6AAAAAA2CL7UGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJVGY3TKMRSHA>
.
You are receiving this because you are on a team that was mentioned.Message
ID: ***@***.***>
|
@adonovan it looks like that compression was added for a reason. We shouldn't revert without addressing that reason in a different way. Increasing the max name length seems reasonable, as does @hyangah's proposal in this issue to allow skips (which we want to do anyway: #61234 (comment)). More generally, it feels like the real problem here is the loose coupling between x/telemetry and the new functionality added in https://go.dev/cl/558256. The counter file format is not intended as an externally facing API, and so it is not surprising that it doesn't make for a good API. Since we're going eventually move the watchdog functionality into x/telemetry, that concern is only temporary. So I don't find (1) or (2) particularly compelling reasons to revert. But the bugginess of (3) is a real problem. Since fixing that feels like it will involve significant complexity, reverting and instead increasing the max name length sounds like a reasonable path forward. Keep it simple. |
Why is (3) a problem if the raw counter name is not meant to be interpreted directly and running decoder can recover it to the original stack trace? |
Of course; increasing the limit seems like a good solution. I can't imagine the extra size of these messages has a significant dollar cost compared the their immense value for debugging.
I agree that we should move watchdog into x/telemetry at some point, and that would justify writing a more elaborate decoder for the watchdog test, but I don't think that changes the thrust of my argument. The counter names are not an API, but they are an interface, and they are exposed via json files such as https://storage.googleapis.com/prod-telemetry-merged/2023-12-05.json to tools that operate on them such as stacks. The stacks command parses the stacks to extract the likely culprit symbol, and (as you know) I would like it to additionally mark up each frame to the correct file@version:line in CodeSearch. Perhaps you're saying the intent was that counter names are merely unstructured strings, but that seems like a lost opportunity to give stacks the special treatment that they--the primary data structure for post-mortem analysis--surely deserve. We should either specify the grammar of these strings, or provide a package that parses them. I think the existing uncompressed format is already pretty good. |
@adonovan sorry, I thought the uploads already parsed the counter file, and therefore included the expanded name. Is that not the case? |
Ah, I didn't realize that was the case (if it is indeed the case). So it sounds like the compressed form is supposed to be an internal (but stable and precisely specified) detail and you already have routines for compressing and decompressing it that my test could take advantage of, especially when it is moved into x/telemetry. |
You're right, it's not a big problem, since the compression/expansion is deterministic. Just a bit sloppy. @adonovan yes, that is the point I was trying to make: when in x/telemetry you wouldn't only interact with parsed counter files. I confirmed that this is the case, and the uploaded payloads are expanded. |
Currently
StackCounter.Inc
captures theInc
caller's stack usingruntime.Callers(2, ...)
.One use case we want to explore is to have a StackCounter to track calls to
gopls/internal/bug.report
.If we call
StackCounter.Inc
from thisreport
function (wrapper ofStackCounter.Inc
), the first couple of entries are always the functions in thisgopls/internal/bug
package, which isn't very interesting while taking up a good fraction of the counter name which already has pretty small max limit (#61181).Can we allow the api to configure custom skip depth like
runtime.Callers
orlog.Logger.Output
?Or alternatively, can we have
StackCounter.Inc
takepcs
?cc @golang/tools-team @rsc
The text was updated successfully, but these errors were encountered: