Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: collect key metrics from the build infrastructure #47325

Open
25 tasks
cagedmantis opened this issue Jul 21, 2021 · 2 comments
Open
25 tasks

x/build: collect key metrics from the build infrastructure #47325

cagedmantis opened this issue Jul 21, 2021 · 2 comments
Assignees
Labels
Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@cagedmantis
Copy link
Contributor

cagedmantis commented Jul 21, 2021

This is a tracking issue for the collection of key operational metrics from the build infrastructure. These metrics are being collected for the following reasons:

  • Increasing speed of root cause analysis when an issue arrises.
  • Understanding how changes correlate to performance changes.
  • Facilitating key areas for possible optimizations.
  • Facilitating monitoring and alerting on metrics.

The task list below will be appended to once a detailed list of key metrics are identified.

  • Collect Metrics

  • Create Dashboards

  • GCP Aggregate Service API metrics

  • AWS Aggregate Service API

  • GitHub Aggregate Service API

  • Gerrit Aggregate Service API

  • TLS certificate lifetime

  • General OS/Application/Container specific metrics

Coordinator

  • Amount of time waiting for VM quota
  • Buildlet creation latency by stage and type
  • Total buildlet creation latency
  • VM instance creation failures
  • Instance creation queue depth
  • Instance creation queue latency
  • Active Trybot count, latency, failures by type
  • Buildlet count by pool
  • Pending build count by type
  • Pending build latency by type
  • Uptime
  • Build rate
  • General API instrumentation (like ochttp)

Gomote

  • Sessions created
  • Sessions destroyed
  • Session duration
  • Command usage (SSH, put, etc.)

@golang/release

@cagedmantis cagedmantis added Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done. labels Jul 21, 2021
@cagedmantis cagedmantis added this to the Unreleased milestone Jul 21, 2021
@cagedmantis cagedmantis self-assigned this Jul 21, 2021
@cagedmantis cagedmantis added this to Planned in Go Release Team Jul 27, 2021
@gopherbot
Copy link

Change https://go.dev/cl/410016 mentions this issue: cmd/coordinator: only expose /metrics in dev mode

@gopherbot
Copy link

Change https://go.dev/cl/410015 mentions this issue: internal/coordinator: measure GetBuildlet latency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done.
Projects
Status: Planned
Development

No branches or pull requests

2 participants