Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: observability using distributed tracing and metrics #26779

Open
odeke-em opened this issue Aug 2, 2018 · 3 comments
Open

x/build: observability using distributed tracing and metrics #26779

odeke-em opened this issue Aug 2, 2018 · 3 comments
Labels
Builders x/build issues (builders, bots, dashboards) FeatureRequest
Milestone

Comments

@odeke-em
Copy link
Member

odeke-em commented Aug 2, 2018

I am coming here from https://groups.google.com/forum/#!msg/golang-dev/MdwFiAx5-PU/UiUvY-8_DwAJ

The OpenCensus project https://opencensus.io/ provides observability into distributed systems(monoliths and microservices alike) by providing mechanisms to record traces and metrics. Those signals help provide insight into the states of a distributed system.

I presented a talk about OpenCensus at GoSF on 18th July 2018(about 3 weeks ago) and I posted the accompanying slides here https://cdn.rawgit.com/orijtech/talks/master/2018/07/18/gosf/gosf.htm#1 or better https://github.com/orijtech/talks/blob/master/2018/07/18/gosf/gosf.slide for the Go present slide

The value of it

Traces can help give play-by-play action/visibility into the state of sampled requests e.g. we can see that invoking os/exec took this long while fetching metadata from Google Cloud Storage took this long https://cdn.rawgit.com/orijtech/talks/master/2018/07/18/gosf/gosf.htm#14

The metrics that are collected are useful to actively check the health of the system e.g. send alerts to the x/build authors when a trybot run takes say 8 minutes or when overall the p99th latency hits 10 minutes.

Maintenance and technical debt

In regards to maintenance, the OpenCensus Go implementation https://github.com/census-instrumentation/opencensus-go implements the tracer, metrics, and we just use the packages to instrument our code e.g excerpted from my slides https://cdn.rawgit.com/orijtech/talks/master/2018/07/18/gosf/gosf.htm#13

func search(w http.ResponseWriter, r *http.Request) {
    ctx, span := trace.StartSpan(r.Context(), "Search")
    defer span.End()

    // Use the context and the rest of the code goes below
    _ = ctx
}

To extract out data, we just need to add an "exporter"/liason-to-our-backend of choice in a main function for example to send traces to Stackdriver

package main

import (
    "log"

    "contrib.go.opencensus.io/exporter/stackdriver"
    "go.opencensus.io/trace"
)

func main() {
    sd, err := stackdriver.NewExporter(stackdriver.Options{ProjectID: "census-demos"})
    if err != nil {
        log.Fatalf("Failed to register Stackdriver Trace exporter: %v", err)
    }
    trace.RegisterExporter(sd)
}

Maintenance work is detached from the Go project, since the OpenCensus project is staffed already with collaborators from a wide range of companies. The Go project only needs to import the respective libraries, start and stop traces as well as record metrics and finally create exporters of the desired backend e.g. Prometheus, Zipkin, AWS X-Ray, Jaeger, Stackdriver Tracing and Monitoring, SignalFx etc.

Next steps

I finally got some dev cycles this quarter to help work on improving our build system but I also would be delighted to delegate/work with people in the community too -- hence why I am filing this right now.

/cc @basvanbeek @Ramonza @bogdandrutu @rakyll @kevinburke

@gopherbot gopherbot added this to the Unreleased milestone Aug 2, 2018
@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Aug 2, 2018
@gopherbot
Copy link

Change https://golang.org/cl/138522 mentions this issue: cmd/coordinator: use OpenCensus for Stackdriver metrics

@gopherbot
Copy link

Change https://golang.org/cl/138523 mentions this issue: cmd/coordinator: initial tracing and metrics using OpenCensus

@gopherbot
Copy link

Change https://golang.org/cl/303669 mentions this issue: cmd/coordinator: migrate to OpenCensus for metrics

gopherbot pushed a commit to golang/build that referenced this issue Mar 23, 2021
Replace low-level Stackdriver monitoring API usage for OpenCensus
with a Stackdriver exporter. To benefit local development, expose
metrics at an /metrics endpoint (to be picked up with Prometheus).

This makes it much easier to add new metrics, to test them locally,
and brings our metrics solution in sync with what's currently in
use in x/playground (see CL 302769). It's expected to be preferable
to migrate to OpenTelemetry in the future when a good migration path
becomes available, and both x/build and x/playground can be updated
at that time.

This CL is based on work in CL 229679 and CL 138522.

For golang/go#26779.
For golang/go#44406.
For golang/go#17104.

Co-authored-by: Alexander Rakoczy <alex@golang.org>
Co-authored-by: Emmanuel T Odeke <emmanuel@orijtech.com>
Change-Id: Iad45730feace471db1668e828b7c9775377be8a9
Reviewed-on: https://go-review.googlesource.com/c/build/+/303669
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) FeatureRequest
Projects
None yet
Development

No branches or pull requests

3 participants