Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/trace: "failed to parse trace: no consistent ordering of events possible" #29707

Closed
256dpi opened this issue Jan 12, 2019 · 28 comments
Closed
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@256dpi
Copy link

256dpi commented Jan 12, 2019

What version of Go are you using (go version)?

$ go version
go version go1.11.4 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/256dpi/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/256dpi/Development/Go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11.4/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11.4/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/mj/lp3r462x6tqfd4v_93bvh65h0000gn/T/go-build000614795=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I ran a network intensive go program (closed source) with a fairly light workload.

What did you expect to see?

I expect to be able to obtain a trace profile and explore it with the viewer program.

What did you see instead?

After obtaining the profile with wget -O trace.out "http://localhost:6060/debug/pprof/trace?seconds=10". When running go tool trace trace.out I received:

2019/01/12 12:37:26 Parsing trace...
failed to parse trace: no consistent ordering of events possible
@256dpi
Copy link
Author

256dpi commented Jan 15, 2019

I have to add, I was testing a binary that was linked with C code (CGO). Is it possible that tracing does not work properly in CGO builds?

@bcmills bcmills changed the title failed to parse trace: no consistent ordering of events possible net/http/pprof: "failed to parse trace: no consistent ordering of events possible" Jan 30, 2019
@bcmills
Copy link
Contributor

bcmills commented Jan 30, 2019

CC @rsc @hyangah @matloob @dvyukov for pprof and trace

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 30, 2019
@bcmills bcmills added this to the Go1.13 milestone Jan 30, 2019
@bcmills bcmills changed the title net/http/pprof: "failed to parse trace: no consistent ordering of events possible" cmd/trace: "failed to parse trace: no consistent ordering of events possible" Jan 30, 2019
@hyangah
Copy link
Contributor

hyangah commented Jan 30, 2019

#16755 was the first one that came to my mind, but I've never seen the issue with darwin/amd64, so I am not sure. I am not aware of cmd/trace issues involving cgo.

@256dpi is it possible to share the captured trace with me?

@AlexRouSg
Copy link
Contributor

@hyangah I just ran into this and made a repo.

Basically it happens when you export a Go function to C and then call that function inside a C created thread.

package test

/*
	#include <pthread.h>

	extern void* callback(void*);
	typedef void* (*cb)(void*);

	static void testCallback(cb cb) {
		pthread_t thread_id;
		pthread_create(&thread_id, NULL, cb, NULL);
    	        pthread_join(thread_id, NULL);
	}
*/
import "C"
import (
	"context"
	"runtime/trace"
	"time"
	"unsafe"
)

var traceCtx, traceTask = trace.NewTask(context.Background(), "Debug")

func test() {
	C.testCallback(C.cb(C.callback))
}

//export callback
func callback(unsafe.Pointer) unsafe.Pointer {
	defer trace.StartRegion(traceCtx, "callback").End()
	time.Sleep(time.Millisecond)

	return nil
}

@bcmills
Copy link
Contributor

bcmills commented Jun 26, 2019

Here's a related flake in runtime/trace.TestTraceStressStartStop on the freebsd-amd64-12_0 builder:
https://build.golang.org/log/65ddd079bfd87715555a97b296b41dfa64ff12b6

--- FAIL: TestTraceStressStartStop (1.14s)
    trace_test.go:147: failed to parse trace: no consistent ordering of events possible
FAIL
FAIL	runtime/trace	1.574s

@let4be
Copy link

let4be commented Feb 17, 2020

Any update? I cannot parse trace created on arm64 from x86 machine

@gopherbot
Copy link

Change https://golang.org/cl/234617 mentions this issue: runtime: synchronize StartTrace and StopTrace with sysmon

gopherbot pushed a commit that referenced this issue May 21, 2020
Currently sysmon is not stopped when the world is stopped, which is
in general a difficult thing to do. The result of this is that when
tracing starts and the value of trace.enabled changes, it's possible
for sysmon to fail to emit an event when it really should. This leads to
traces which the execution trace parser deems inconsistent.

Fix this by putting all of sysmon's work behind a new lock sysmonlock.
StartTrace and StopTrace both acquire this lock after stopping the world
but before performing any work in order to ensure sysmon sees the
required state change in tracing. This change is expected to slow down
StartTrace and StopTrace, but will help ensure consistent traces are
generated.

Updates #29707.
Fixes #38794.

Change-Id: I64c58e7c3fd173cd5281ffc208d6db24ff6c0284
Reviewed-on: https://go-review.googlesource.com/c/go/+/234617
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
@davies
Copy link
Contributor

davies commented Jun 22, 2020

I tried with latest master branch, it still has this problem

$ go version
go version devel +60f7876 Sat Jun 20 08:40:13 2020 +0000 linux/amd64

The go code is built as a so, and called by Java. We can reproduce it every time under normal workload, but OK when it's idle.

@entombedvirus
Copy link
Contributor

I can repro this issue on our production services consistently:

go version
go version go1.14.4 linux/amd64

I've also tried running gotip tool trace to parse trace output generated from go1.14.4 binary and that gives the same result. I haven't tried generating trace output from gotip because I want to avoid running gotip in production. Please let me know if having access to the trace output is helpful to debug.

@choleraehyq
Copy link
Contributor

Can repro this issue on our production services every time, with Go 1.15. Our code involves cgo, don't know whether it's related.

@hyangah
Copy link
Contributor

hyangah commented Sep 9, 2020

For anyone who's interested in investigating this, I am attaching a trace from the program @AlexRouSg posted.
trace.out in the attached repro.zip file was generated with go test -trace=trace.out.

This trace parsing failed because of an unexpected GoSysExit event at offset 48 (almost at the beginning of the trace) This is unexpected - there was no preceding event with the goroutine 17 before this event. We usually expect a GoInSyscall if the goroutine was in syscall. See

go/src/runtime/trace.go

Lines 207 to 213 in c847589

// Can't set trace.enabled yet. While the world is stopped, exitsyscall could
// already emit a delayed event (see exitTicks in exitsyscall) if we set trace.enabled here.
// That would lead to an inconsistent trace:
// - either GoSysExit appears before EvGoInSyscall,
// - or GoSysExit appears for a goroutine for which we don't emit EvGoInSyscall below.
// To instruct traceEvent that it must not ignore events below, we set startingtrace.
// trace.enabled is set afterwards once we have emitted all preliminary events.
.

I suspect the code path that should trigger GoInSyscall events for all goroutines in syscall was broken.

traceEvent(traceEvGoInSyscall, -1, uint64(gp.goid))

I couldn't find the corresponding stack info (stack id: 0) in the trace, so don't know what goroutine this is either. :-(

cc runtime people: @dvyukov @aclements @mknyszek @ianlancetaylor

@michaelfig
Copy link

Is there any progress on this? Even a suggested workaround would be welcome.

I'm using the same basic pattern as #29707 (comment)
to invoke a Go shared library from a Node.js driver program.

@AlexRouSg
Copy link
Contributor

@michaelfig the only workaround I can find is to never call Go code from C created threads

@michaelfig
Copy link

@michaelfig the only workaround I can find is to never call Go code from C created threads

Thanks, I think I'll be able to manage that for our application.

@wangfakang
Copy link

same issue,have any update?

@nejisama
Copy link

I use curl http://addr:port/debug/pprof/trace to get the trace file, and get the same error.
by my test, if my program is in a high concurrency state, the trace file will be parsed failed with the error, if my program is idle, the trace file will be ok.

@aclements
Copy link
Member

@wangfakang, @nejisama, do your applications also involve call-backs from C to Go on threads created in C?

@wenxuwan
Copy link

same issue, waiting for update, program no cgo, go version 1.14.15

@leonsal
Copy link

leonsal commented Sep 20, 2021

same problem here using go1.17.1.
My app uses a binding to rtaudio C library which callbacks the Go program periodically to get more samples to play.
If I don't use rtaudio, discarding the generated samples, trace works.

@wangfakang
Copy link

@aclements yeah.

@bcmills
Copy link
Contributor

bcmills commented Feb 16, 2022

The 2022 failure is also not one of the tests that frequently triggered this in the past. 🤔

(I've filed it separately as #51224.)

@xiaoyao
Copy link

xiaoyao commented May 27, 2022

I meet the same problem!

@gopherbot
Copy link

Change https://go.dev/cl/411034 mentions this issue: runtime/trace: missing events for the locked g in extra M.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 7, 2022
@yalon
Copy link

yalon commented Jul 11, 2022

Have the same issue on Go 1.18.3 linux/amd64, in my case Python's the main, calling Go via CFFI (C basically). While Python/C creates its own threads, the problem happens when there's high concurrency. Unfortunately in my case I can't control the Python side so any idea on how to work around this would be great.

@prattmic
Copy link
Member

Temporarily reverted in https://go.dev/cl/423437.

@gopherbot
Copy link

Change https://go.dev/cl/429858 mentions this issue: runtime/trace: add missing events for the locked g in extra M.

@golang golang locked and limited conversation to collaborators Sep 27, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet