Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: TestCgoPprofCallback hang on linux-arm #54778

Closed
heschi opened this issue Aug 30, 2022 · 16 comments
Closed

runtime: TestCgoPprofCallback hang on linux-arm #54778

heschi opened this issue Aug 30, 2022 · 16 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@heschi
Copy link
Contributor

heschi commented Aug 30, 2022

#!watchflakes
post <- pkg == "runtime" && test == "TestCgoPprofCallback"

2022-08-26T18:09:56-62125c9/linux-arm-aws

The failure is giant so no summary here.

cc @golang/runtime

@heschi heschi added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 30, 2022
@heschi heschi added this to the Backlog milestone Aug 30, 2022
@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Aug 30, 2022
@mknyszek
Copy link
Contributor

Looks like reproducing this with GOTRACEBACK=crash (or, do we already do that and it just didn't propagate to the test somehow?) will help; there are a bunch of goroutine running on other thread; stack unavailable goroutines that probably have some necessary information.

@mknyszek mknyszek self-assigned this Aug 31, 2022
@mvdan
Copy link
Member

mvdan commented Sep 6, 2022

Perhaps related: the test sometimes fails on linux-amd64-longtest, e.g. https://build.golang.org/log/a86dc70291e4f1c534e8798731dcec91160e5e99:

##### maymorestack=mayMoreStackPreempt
--- FAIL: TestCgoPprofCallback (0.67s)
    testenv.go:468: [/workdir/tmp/go-build1237047364/testprogcgo.exe CgoPprofCallback] exit status: signal: segmentation fault (core dumped)
    crash_cgo_test.go:228: expected "OK\n" got 
FAIL
FAIL	runtime	72.817s

@aclements
Copy link
Member

This is not the same as #54885, since that was introduced on 2022-09-02 and the original failure here happened well before that.

@mknyszek
Copy link
Contributor

That one failure on linux/arm has been the only one since. Aside from a stream of persistent failures on longtest builders (that seem to be fixed), these are the latest 3:

2022-11-09T22:56:44-89332e0/openbsd-amd64-68
2022-11-01T14:20:31-ad5d2f6/linux-amd64-longtest
2022-11-01T14:11:54-2730170/openbsd-386-70

@aclements
Copy link
Member

This is marked "release-blocker" but is also in "Backlog." I'm not sure what the means, so I'm moving it to the 1.20 milestone.

@aclements aclements modified the milestones: Backlog, Go1.20 Nov 23, 2022
@aclements
Copy link
Member

2022-11-09T22:56:44-89332e0/openbsd-amd64-68

This one appears to be unrelated.

@aclements aclements added the okay-after-rc1 Used by release team to mark a release-blocker issue as okay to resolve either before or after rc1 label Nov 29, 2022
@prattmic prattmic self-assigned this Dec 7, 2022
@gopherbot gopherbot removed the okay-after-rc1 Used by release team to mark a release-blocker issue as okay to resolve either before or after rc1 label Dec 7, 2022
@aclements
Copy link
Member

One more failing TestCgoPprofCallback:
2022-12-09T21:38:33-e8f78cb/openbsd-amd64-68

This looks like a different bug, though.

@thanm
Copy link
Contributor

thanm commented Dec 14, 2022

This was discussed in the release weekly meeting-- it would be great to find out more about this bug so that we can re-evaluate it as a release blocker. Thanks.

@bcmills
Copy link
Contributor

bcmills commented Dec 14, 2022

The openbsd failures in particular could be the general corruption bugs tracked in #55161 / #34988.

@heschi
Copy link
Contributor Author

heschi commented Jan 4, 2023

Ping -- we're getting close to the release.

@gopherbot
Copy link

Found new dashboard test flakes for:

#!watchflakes
post <- pkg == "runtime" && test == "TestCgoPprofCallback"
2022-11-18 23:57 openbsd-386-71 go@04d6aa65 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.00s)
    crash_test.go:58: /tmp/workdir/tmp/go-build3123997662/testprogcgo.exe CgoPprofCallback failed to start: context deadline exceeded
2022-11-21 17:16 openbsd-amd64-71 go@998949c0 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.00s)
    crash_test.go:58: /tmp/workdir/tmp/go-build1527382388/testprogcgo.exe CgoPprofCallback failed to start: context deadline exceeded
2022-12-06 19:52 openbsd-386-71 go@03bf6f49 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.00s)
    crash_test.go:58: /tmp/workdir/tmp/go-build719402250/testprogcgo.exe CgoPprofCallback failed to start: context deadline exceeded
2022-12-09 21:38 openbsd-amd64-68 go@e8f78cb6 runtime.TestCgoPprofCallback (log)
--- FAIL: TestCgoPprofCallback (0.03s)
    crash_test.go:55: building testprogcgo []: exit status 2
        fatal error: runtime: unblock on closing polldesc

        goroutine 335 [running]:
        runtime.throw({0xa52358?, 0x0?})
        	/tmp/workdir/go/src/runtime/panic.go:1047 +0x5d fp=0xc0005fbe38 sp=0xc0005fbe08 pc=0x43559d
        internal/poll.runtime_pollUnblock(0x227405bc8)
        	/tmp/workdir/go/src/runtime/netpoll.go:412 +0x22c fp=0xc0005fbe78 sp=0xc0005fbe38 pc=0x4622ec
        internal/poll.(*pollDesc).evict(...)
        	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:61
        internal/poll.(*FD).Close(0xc0000f25a0)
        	/tmp/workdir/go/src/internal/poll/fd_unix.go:103 +0x45 fp=0xc0005fbea0 sp=0xc0005fbe78 pc=0x4d8c25
        os.(*file).close(0xc0000f25a0)
        	/tmp/workdir/go/src/os/file_unix.go:262 +0xad fp=0xc0005fbef8 sp=0xc0005fbea0 pc=0x4e73ed
        os.(*File).Close(...)
        	/tmp/workdir/go/src/os/file_posix.go:25
        os/exec.(*Cmd).writerDescriptor.func1()
        	/tmp/workdir/go/src/os/exec/exec.go:561 +0x57 fp=0xc0005fbf58 sp=0xc0005fbef8 pc=0x5143d7
        os/exec.(*Cmd).Start.func2(0x0?)
        	/tmp/workdir/go/src/os/exec/exec.go:717 +0x32 fp=0xc0005fbfc8 sp=0xc0005fbf58 pc=0x515112
        os/exec.(*Cmd).Start.func3()
        	/tmp/workdir/go/src/os/exec/exec.go:729 +0x2a fp=0xc0005fbfe0 sp=0xc0005fbfc8 pc=0x5150aa
        runtime.goexit()
        	/tmp/workdir/go/src/runtime/asm_amd64.s:1598 +0x1 fp=0xc0005fbfe8 sp=0xc0005fbfe0 pc=0x467661
        created by os/exec.(*Cmd).Start
        	/tmp/workdir/go/src/os/exec/exec.go:716 +0xa33

watchflakes

@ianlancetaylor
Copy link
Contributor

All the linux-amd64-longtest and linux-386-longtest failures are build with -dmaymorestack=mayMoreStackPreempt. They also haven't happened for some time.

All the other failures appear to be on slow machines. This is a CPU intensive test: it calls runtime.GOMAXPROCS(16) and then starts 64 goroutines. Each goroutine does an endless loop of sleeping for 50 microseconds and calling a Go function. The program keeps this up for a full second. And, the test is run in parallel with other tests.

So I think this is a flaky test. I think we should not return this CPU intensive test in parallel with other tests, and since it takes a full second I think we should not run it in short mode.

CC @prattmic for other opinions since he wrote the test.

@gopherbot
Copy link

Change https://go.dev/cl/460461 mentions this issue: runtime: skip TestCgoPprofCallback in short mode, don't run in parallel

@cherrymui
Copy link
Member

The mayMoreStackPreempt ones are #54885, which is fixed.

@bcmills
Copy link
Contributor

bcmills commented Jan 5, 2023

The openbsd-.*-71 failures are probably #57585.

@golang golang locked and limited conversation to collaborators Jan 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
Status: Done
Development

No branches or pull requests

10 participants