Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/telemetry: consolidate initialization into telemetry.Start #65500

Open
findleyr opened this issue Feb 3, 2024 · 4 comments
Open

x/telemetry: consolidate initialization into telemetry.Start #65500

findleyr opened this issue Feb 3, 2024 · 4 comments
Assignees
Labels
telemetry x/telemetry issues
Milestone

Comments

@findleyr
Copy link
Contributor

findleyr commented Feb 3, 2024

This issue summarizes a team discussion about integrating telemetry with cmd/go, or generally any tool that may be short-lived. Currently the only tool with telemetry integration is gopls, which is a long-running server and therefore doesn't need to worry about rate limiting or overhead when calling upload.Run once when it starts serving.

The plan was for cmd/go to occasionally start a subprocess to run upload.Run, but as we discussed the details of this, it became clear that this is something x/telemetry should manage. cmd/go should be able to start telemetry without worrying about overhead.

Here's an outline of how this could work:

  • Add a new telemetry.Start function to encapsulate the telemetry entrypoint:
package telemetry

// Config controls the behavior of [Start].
type Config struct {
	Upload       bool
	WatchCrashes bool
	Logger       io.Writer
}

// Start initializes telemetry, and may start a subprocess to monitor crashes
// and process expired counter files.
//
// Specifically, start opens a counter file if local collection is enabled,
// and starts a subprocess if [Config.WatchCrashes] is set or if the local
// telemetry directory is due for processing. If [Config.Upload] is set, and
// the user has opted in to telemetry uploading, this process may attempt to
// upload approved counters to telemetry.go.dev.
//
// Start may re-execute the current executable as a child process, in a special
// mode. In that mode, the call to Start will never return. Therefore, Start should
// be called near the top of the main function of the application, and the
// application should avoid doing expensive work in init functions as they will
// be executed twice.
func Start(cfg Config)

(some phrasing borrowed from https://go.dev/cl/559503, which added the crashmonitor.Start function that telemetry.Start would supersede).

  • As documented, Start would be in charge of calling counter.Open, crashmonitor.Start, and upload.Run, per the provided configuration. By encapsulating, Start can guarantee that the order of these calls is appropriate, and can share the subprocess for crash monitoring and uploading. Without encapsulation, we'd have to start two subprocesses: one for crash monitoring and one for uploading, and they may interact poorly.
  • Start should daemonize itself, so that the upload may outlast a short-running process. (we can borrow from the gopls daemon)
  • Start should implement rate limiting, to amortize its cost over many invocations. E.g. use lockfile acquisition to rate limit upload.Run to once a day. We do something like this for the gopls telemetry prompt.
  • If necessary, Start can wait a short amount of time in a goroutine (e.g. 1s) before doing anything expensive. This minimizes overhead extremely short-lived processes such as go version. This may or may not be necessary.

CC @matloob @adonovan @hyangah @pjweinb

@gopherbot gopherbot added the telemetry x/telemetry issues label Feb 3, 2024
@gopherbot gopherbot added this to the Unreleased milestone Feb 3, 2024
@adonovan
Copy link
Member

adonovan commented Feb 3, 2024

Sounds good. The current crashmonitor never lives much longer than the parent (application) process because the termination of the parent--for any reason--closes the pipe, causing the ReadAll in the child to unblock and the crashmonitor to finish up quickly thereafter and exit. If you want to consolidate the two subprocesses, you'll need to change the crashmonitor logic to return (not Fatal) and decrement a semaphore; the upload logic would also decrement a semaphore when it is finished; and the Start function (in the child process) would need to wait for both events before exiting.

@gopherbot
Copy link

Change https://go.dev/cl/564615 mentions this issue: telemetry: add support for uploading

gopherbot pushed a commit to golang/telemetry that referenced this issue Feb 16, 2024
No support for rate limiting yet but want to keep things small here.
This change adds a dependency on x/sync for errgroup.

I've also had to run go mod tidy in the godev module because it has a
replacement instead of depending on a version of this module.

For golang/go#65500

Change-Id: Icbcd8d3d58fb443c14cf21e339baa3edb76fbca1
Reviewed-on: https://go-review.googlesource.com/c/telemetry/+/564615
Reviewed-by: Robert Findley <rfindley@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@gopherbot
Copy link

Change https://go.dev/cl/564597 mentions this issue: telemetry: daemonize the child started by telemetry.Start

@gopherbot
Copy link

Change https://go.dev/cl/567075 mentions this issue: telemetry: in Start, run upload.Run at most once a day

gopherbot pushed a commit to golang/telemetry that referenced this issue Feb 28, 2024
This follows the implementation of the daemonize function in
x/tools/gopls/internal/lsprpc. On Windows, we set the DETACHED_PROCESS
creation flag.

For golang/go#65500

Change-Id: Ib7f052e88999444c4166bc7711346d26801b8f0f
Reviewed-on: https://go-review.googlesource.com/c/telemetry/+/564597
Reviewed-by: Robert Findley <rfindley@google.com>
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit to golang/telemetry that referenced this issue Feb 28, 2024
This change will create a token file to acquire a token to limit
uploading to at most once a day: before upload is run, we'll check to
see if an upload token file already exists. If it doesn't, or it's
been a day since the existing file has been created, it's been a day
since the last upload.Run was attempted so we can run it again. If it
doesn't, we'll skip running upload.Run. We don't delete the token
files when we're done with the upload, but instead only delete them
when we want to acquire a new lock so that the existence of the file
and its modification time is a signal of the last time uploading was
run.

The token acquiring code is based on acquireLockFile in
x/tools/gopls/internal/server/prompt.go.

For golang/go#65500

Change-Id: Ie31c1f351e2d31a016fd2ad79b29784e6631c564
Reviewed-on: https://go-review.googlesource.com/c/telemetry/+/567075
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Robert Findley <rfindley@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
telemetry x/telemetry issues
Projects
None yet
Development

No branches or pull requests

4 participants