Skip to content

cmd/trace, x/exp/trace: nondeterministic output, seems related to timestamp tie-breaking #68277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rhysh opened this issue Jul 2, 2024 · 10 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@rhysh
Copy link
Contributor

rhysh commented Jul 2, 2024

$ ~/sdk/go1.23/bin/go version
go version go1.23rc1 darwin/arm64

(This also affects go1.22.4.)

After preparing an execution trace like so ...

$ ~/sdk/go1.23/bin/go test net/http -run='^$' -bench='BenchmarkClientServerParallel/64/h2' -benchtime=100ms -trace=/tmp/trace123_h2

... I read it back with go tool trace -d=1 (and with other tools based on golang.org/x/exp/trace). The results vary each time due to differences in the reported timestamp for some events. The events in question have sequential timestamps, which makes me suspect there's a tie which is getting resolved when reading back the trace, exposing some nondeterminism from map iteration order.

I expected the results of go tool trace -d=1 to be identical each time for a given execution trace. I expected that reading an execution trace with a golang.org/x/exp/trace.Reader and passing each Event to fmt.Print would give identical results on each run. I expected that loading an execution trace twice with golang.org/x/exp/trace.Reader and advancing to a particular timestamp would result in seeing the "same" Event.

CC @golang/runtime @mknyszek

Note that the contents of Events 2 and 3 are swapped between the first and second run, with the timestamps remaining monotonic.

$ ~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2 | grep 8653440991110
M=8191703744 P=4 G=-1 StateTransition Time=865344099111040 Resource=Goroutine(20062) Reason="" GoID=20062 Runnable->Running
M=6163394560 P=1 G=21401 StateTransition Time=865344099111041 Resource=Goroutine(28134) Reason="" GoID=28134 Waiting->Runnable
M=6165688320 P=5 G=28541 StateTransition Time=865344099111042 Resource=Goroutine(28541) Reason="select" GoID=28541 Running->Waiting

$ ~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2 | grep 8653440991110
M=8191703744 P=4 G=-1 StateTransition Time=865344099111040 Resource=Goroutine(20062) Reason="" GoID=20062 Runnable->Running
M=6165688320 P=5 G=28541 StateTransition Time=865344099111041 Resource=Goroutine(28541) Reason="select" GoID=28541 Running->Waiting
M=6163394560 P=1 G=21401 StateTransition Time=865344099111042 Resource=Goroutine(28134) Reason="" GoID=28134 Waiting->Runnable

With 930k events, several thousand will end up as additional/non-unique lines between two runs of go tool trace -d=1.

$ ~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2 | grep '^M=' | wc -l
  930838
$ (~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2 ; ~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2) | grep '^M=' | sort -u | wc -l
  934396
$ (~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2 ; ~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2) | grep '^M=' | sort -u | wc -l
  933605
$ (~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2 ; ~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2) | grep '^M=' | sort -u | wc -l
  938976
$ (~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2 ; ~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2) | grep '^M=' | sort | uniq -c | awk '{print $1}' | sort | uniq -c
5618 1
928029 2
$ (~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2 ; ~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2) | grep '^M=' | sort | uniq -c | awk '{print $1}' | sort | uniq -c
4276 1
928700 2
$ (~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2 ; ~/sdk/go1.23/bin/go tool trace -d=1 /tmp/trace123_h2) | grep '^M=' | sort | uniq -c | awk '{print $1}' | sort | uniq -c
5630 1
928023 2
@gopherbot gopherbot added this to the Unreleased milestone Jul 2, 2024
@rhysh rhysh changed the title x/exp/trace: nondeterministic output, seems related to timestamp tie-breaking cmd/trace, x/exp/trace: nondeterministic output, seems related to timestamp tie-breaking Jul 2, 2024
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 2, 2024
@rhysh
Copy link
Contributor Author

rhysh commented Jul 2, 2024

Yes, this appears due to map iteration in https://github.com/golang/go/blob/go1.23rc1/src/internal/trace/reader.go#L145

Forcing a sort (such as with slices.Sorted(maps.Keys(r.gen.batches)), for newer toolchains) resolves it. I can prepare a patch.

$ git -C /usr/local/go diff --stat
 src/internal/trace/reader.go | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

$ (go run cmd/trace -d=1 /tmp/trace123_h2 ; go run cmd/trace -d=1 /tmp/trace123_h2) | grep '^M=' | sort | uniq -c | awk '{print $1}' | sort | uniq -c
930838 2

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/596355 mentions this issue: internal/trace: make Reader output deterministic

@thanm thanm added the NeedsFix The path to resolution is known, but the work has not been done. label Jul 3, 2024
@thanm thanm moved this to In Progress in Go Compiler / Runtime Jul 3, 2024
@thanm thanm modified the milestones: Unreleased, Go1.23 Jul 3, 2024
@thanm
Copy link
Contributor

thanm commented Jul 3, 2024

Moving this to the 1.23 milestone, please comment if any issues with that.

@rhysh
Copy link
Contributor Author

rhysh commented Jul 4, 2024

Thanks. Yes, the problem starts in std's internal/trace (not the generated copy in x/exp/trace) so a release milestone is right. Go 1.23 would be great if the door is still open for small fixes like this. Otherwise 1.24.

I plan to send a CL for x/exp/trace (with a "Fixes" line) once internal/trace has https://go.dev/cl/596355, independent of the main Go release cycle.

@mknyszek
Copy link
Contributor

mknyszek commented Jul 8, 2024

Thanks for sending a fix!

I think whether this fix should go into Go 1.23 depends on whether we would backport it anyway in the Go 1.24 dev cycle. I'm not totally sure that we would, because the issue doesn't actually impact most cmd/trace functionality (?) except for the debug output, so the severity isn't very high.

That being said, I do think this is an important property of the trace parser to maintain, and I don't want any blockers for the public parser API sitting around too long. FWIW, I think golang.org/x/exp/trace could definitely get updated with this patch before it lands on tip.

Here's what I propose:

  • Set https://go.dev/cl/596355 for wait-release, give it Auto-Submit+1 and Code-Review+2 (it LGTM), so it lands in the Go 1.24 dev cycle.
  • Apply the patch locally and point the script in golang.org/x/exp/trace at a GOROOT with the patch applied. Send the CL and submit it.

Thoughts?

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/597177 mentions this issue: trace: regenerate to include CL 596355

@rhysh
Copy link
Contributor Author

rhysh commented Jul 9, 2024

That sounds like a fine plan, yes. My own usage depends on x/exp getting the fix, and not at all on it landing in the core repo before the tree opens for Go 1.24.

The severity in cmd/trace is pretty low. Some day I'd like to be able to deep-link into the go tool trace UI, with the ability to target as closely as individual events .. but I'm not doing that yet, so I'm not affected.

I wonder though if all trace parsers should be able to agree on the full details of all events (including timestamp), and thus if there should be a more principled tie-breaker, rather than forcing the other maintainers ( @dominikh , @felixge , maybe others?) to be bug-compatible with this one.

@mknyszek
Copy link
Contributor

mknyszek commented Jul 9, 2024

and thus if there should be a more principled tie-breaker

I agree, I think that's something that should be documented in a wire format doc (once that exists).

@rhysh rhysh modified the milestones: Go1.23, Go1.24 Jul 12, 2024
gopherbot pushed a commit to golang/exp that referenced this issue Jul 16, 2024
Generated from CL 596355 at PS 1, "make Reader output deterministic".

Updates golang/go#68277

[git-generate]
cd ./trace
./gen.bash $GOROOT

Change-Id: I8cb802a85933271cc7c5df975f6e772a42ea5f7b
Reviewed-on: https://go-review.googlesource.com/c/exp/+/597177
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@rhysh rhysh added the FixPending Issues that have a fix which has not yet been reviewed or submitted. label Jul 18, 2024
@dmitshur dmitshur moved this from In Progress to All-But-Submitted in Go Compiler / Runtime Jul 22, 2024
@dmitshur dmitshur moved this from All-But-Submitted to In Progress in Go Compiler / Runtime Jul 22, 2024
gopherbot pushed a commit that referenced this issue Jul 22, 2024
Multiple Ms can offer Events with identical timestamps. The Reader
edits those so the timestamps are strictly increasing, but it needs a
way to break the tie. Use something deterministic (such as the order of
the batches), rather than map iteration order.

Updates #68277

Change-Id: I4a1f70c1669ce1c9b52d09e2bc99acbc831ef9a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/596355
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Rhys Hiltner <rhys.hiltner@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
@rhysh
Copy link
Contributor Author

rhysh commented Jul 22, 2024

Fixed in x/exp, fixed in internal/trace.

@rhysh rhysh closed this as completed Jul 22, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Go Compiler / Runtime Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FixPending Issues that have a fix which has not yet been reviewed or submitted. NeedsFix The path to resolution is known, but the work has not been done.
Projects
Development

No branches or pull requests

5 participants