Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os/exec: test timeouts on openbsd #19547

Closed
josharian opened this issue Mar 14, 2017 · 10 comments
Closed

os/exec: test timeouts on openbsd #19547

josharian opened this issue Mar 14, 2017 · 10 comments
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. OS-OpenBSD Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@josharian
Copy link
Contributor

Build dashboard:

panic: test timed out after 3m0s

goroutine 22 [running]:
testing.startAlarm.func1()
	/tmp/workdir/go/src/testing/testing.go:1029 +0xf9
created by time.goFunc
	/tmp/workdir/go/src/time/sleep.go:170 +0x44

goroutine 1 [chan receive, 2 minutes]:
testing.(*T).Run(0xc42006ed00, 0x736e60, 0x1c, 0x741ee0, 0x474f01)
	/tmp/workdir/go/src/testing/testing.go:702 +0x304
testing.runTests.func1(0xc42006ed00)
	/tmp/workdir/go/src/testing/testing.go:888 +0x67
testing.tRunner(0xc42006ed00, 0xc420047e10)
	/tmp/workdir/go/src/testing/testing.go:659 +0x98
testing.runTests(0xc4200b6440, 0x8d6fa0, 0x1b, 0x1b, 0x42)
	/tmp/workdir/go/src/testing/testing.go:886 +0x2b5
testing.(*M).Run(0xc420d20f20, 0xc420047f78)
	/tmp/workdir/go/src/testing/testing.go:828 +0xfc
main.main()
	os/exec/_test/_testmain.go:98 +0xdb

goroutine 17 [syscall, 2 minutes, locked to thread]:
runtime.goexit()
	/tmp/workdir/go/src/runtime/asm_amd64.s:2142 +0x1

goroutine 11 [chan receive, 2 minutes]:
testing.(*T).Parallel(0xc42006f520)
	/tmp/workdir/go/src/testing/testing.go:591 +0x145
os/exec_test.TestWaitid(0xc42006f520)
	/tmp/workdir/go/src/os/exec/exec_posix_test.go:51 +0x40
testing.tRunner(0xc42006f520, 0x741f40)
	/tmp/workdir/go/src/testing/testing.go:659 +0x98
created by testing.(*T).Run
	/tmp/workdir/go/src/testing/testing.go:701 +0x2da

goroutine 102 [chan receive, 2 minutes]:
os/exec.(*Cmd).Wait(0xc4203098c0, 0x0, 0x0)
	/tmp/workdir/go/src/os/exec/exec.go:447 +0x118
os/exec.(*Cmd).Run(0xc4203098c0, 0xc42030c770, 0xa00000)
	/tmp/workdir/go/src/os/exec/exec.go:284 +0x5c
os/exec_test.TestIgnorePipeErrorOnSuccess(0xc42030eea0)
	/tmp/workdir/go/src/os/exec/exec_test.go:893 +0x18a
testing.tRunner(0xc42030eea0, 0x741ee0)
	/tmp/workdir/go/src/testing/testing.go:659 +0x98
created by testing.(*T).Run
	/tmp/workdir/go/src/testing/testing.go:701 +0x2da

goroutine 64 [IO wait, 2 minutes]:
internal/poll.runtime_pollWait(0x275d85d08, 0x77, 0x465ea9)
	/tmp/workdir/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc42012a068, 0x77, 0x8a9ba0, 0x8a5520)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:85 +0xae
internal/poll.(*pollDesc).waitWrite(0xc42012a068, 0xc421734000, 0x9f0000)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:94 +0x34
internal/poll.(*FD).Write(0xc42012a050, 0xc421724000, 0xa00000, 0xa00000, 0x0, 0x0, 0x0)
	/tmp/workdir/go/src/internal/poll/fd_unix.go:221 +0x219
os.(*File).write(0xc42000e028, 0xc421724000, 0xa00000, 0xa00000, 0xc420037d60, 0x445c13, 0xc421724000)
	/tmp/workdir/go/src/os/file_unix.go:220 +0x4e
os.(*File).Write(0xc42000e028, 0xc421724000, 0xa00000, 0xa00000, 0xa00000, 0xa00000, 0x40fa9d)
	/tmp/workdir/go/src/os/file.go:144 +0x7d
os.(*File).WriteString(0xc42000e028, 0xc420d24000, 0xa00000, 0x275d86140, 0xc42000e028, 0x1)
	/tmp/workdir/go/src/os/file.go:202 +0x6f
io.WriteString(0x8a83a0, 0xc42000e028, 0xc420d24000, 0xa00000, 0x6daaa0, 0x70c820, 0x1)
	/tmp/workdir/go/src/io/io.go:289 +0x76
strings.(*Reader).WriteTo(0xc42000c0e0, 0x8a83a0, 0xc42000e028, 0x275d86118, 0xc42000c0e0, 0x1)
	/tmp/workdir/go/src/strings/reader.go:133 +0xa3
io.copyBuffer(0x8a83a0, 0xc42000e028, 0x8a8620, 0xc42000c0e0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
	/tmp/workdir/go/src/io/io.go:380 +0x31a
io.Copy(0x8a83a0, 0xc42000e028, 0x8a8620, 0xc42000c0e0, 0x0, 0x0, 0x0)
	/tmp/workdir/go/src/io/io.go:360 +0x68
os/exec.(*Cmd).stdin.func1(0x0, 0x0)
	/tmp/workdir/go/src/os/exec/exec.go:213 +0x55
os/exec.(*Cmd).Start.func1(0xc4203098c0, 0xc42000c120)
	/tmp/workdir/go/src/os/exec/exec.go:375 +0x27
created by os/exec.(*Cmd).Start
	/tmp/workdir/go/src/os/exec/exec.go:374 +0x4f4
FAIL	os/exec	180.093s

https://build.golang.org/log/7fc2593c7c90818805f1d623fbe9fa9f3d1d4aec

trybot run:

panic: test timed out after 3m0s

goroutine 106 [running]:
testing.startAlarm.func1()
	/tmp/workdir/go/src/testing/testing.go:1029 +0xf9
created by time.goFunc
	/tmp/workdir/go/src/time/sleep.go:170 +0x44

goroutine 1 [chan receive, 2 minutes]:
testing.(*T).Run(0xc420064ea0, 0x736e60, 0x1c, 0x741ee0, 0x474f01)
	/tmp/workdir/go/src/testing/testing.go:702 +0x304
testing.runTests.func1(0xc420064ea0)
	/tmp/workdir/go/src/testing/testing.go:888 +0x67
testing.tRunner(0xc420064ea0, 0xc420047e10)
	/tmp/workdir/go/src/testing/testing.go:659 +0x98
testing.runTests(0xc4200b6460, 0x8d6fa0, 0x1b, 0x1b, 0x42)
	/tmp/workdir/go/src/testing/testing.go:886 +0x2b5
testing.(*M).Run(0xc420037f20, 0xc420047f78)
	/tmp/workdir/go/src/testing/testing.go:828 +0xfc
main.main()
	os/exec/_test/_testmain.go:98 +0xdb

goroutine 17 [syscall, 2 minutes, locked to thread]:
runtime.goexit()
	/tmp/workdir/go/src/runtime/asm_amd64.s:2142 +0x1

goroutine 11 [chan receive, 2 minutes]:
testing.(*T).Parallel(0xc4200656c0)
	/tmp/workdir/go/src/testing/testing.go:591 +0x145
os/exec_test.TestWaitid(0xc4200656c0)
	/tmp/workdir/go/src/os/exec/exec_posix_test.go:51 +0x40
testing.tRunner(0xc4200656c0, 0x741f40)
	/tmp/workdir/go/src/testing/testing.go:659 +0x98
created by testing.(*T).Run
	/tmp/workdir/go/src/testing/testing.go:701 +0x2da

goroutine 120 [chan receive, 2 minutes]:
os/exec.(*Cmd).Wait(0xc42010cc60, 0x0, 0x0)
	/tmp/workdir/go/src/os/exec/exec.go:447 +0x118
os/exec.(*Cmd).Run(0xc42010cc60, 0xc420110460, 0xa00000)
	/tmp/workdir/go/src/os/exec/exec.go:284 +0x5c
os/exec_test.TestIgnorePipeErrorOnSuccess(0xc42010a9c0)
	/tmp/workdir/go/src/os/exec/exec_test.go:893 +0x18a
testing.tRunner(0xc42010a9c0, 0x741ee0)
	/tmp/workdir/go/src/testing/testing.go:659 +0x98
created by testing.(*T).Run
	/tmp/workdir/go/src/testing/testing.go:701 +0x2da

goroutine 104 [IO wait, 2 minutes]:
internal/poll.runtime_pollWait(0x281069d08, 0x77, 0x465ea9)
	/tmp/workdir/go/src/runtime/netpoll.go:173 +0x59
internal/poll.(*pollDesc).wait(0xc42017c068, 0x77, 0x8a9ba0, 0x8a5520)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:85 +0xae
internal/poll.(*pollDesc).waitWrite(0xc42017c068, 0xc42172e000, 0x9f0000)
	/tmp/workdir/go/src/internal/poll/fd_poll_runtime.go:94 +0x34
internal/poll.(*FD).Write(0xc42017c050, 0xc42171e000, 0xa00000, 0xa00000, 0x0, 0x0, 0x0)
	/tmp/workdir/go/src/internal/poll/fd_unix.go:221 +0x219
os.(*File).write(0xc42000e028, 0xc42171e000, 0xa00000, 0xa00000, 0xc420038d60, 0x445c13, 0xc42171e000)
	/tmp/workdir/go/src/os/file_unix.go:220 +0x4e
os.(*File).Write(0xc42000e028, 0xc42171e000, 0xa00000, 0xa00000, 0xa00000, 0xa00000, 0x40fa9d)
	/tmp/workdir/go/src/os/file.go:144 +0x7d
os.(*File).WriteString(0xc42000e028, 0xc420d1e000, 0xa00000, 0x265940078, 0xc42000e028, 0x1)
	/tmp/workdir/go/src/os/file.go:202 +0x6f
io.WriteString(0x8a83a0, 0xc42000e028, 0xc420d1e000, 0xa00000, 0x6daaa0, 0x70c820, 0xc420020e01)
	/tmp/workdir/go/src/io/io.go:289 +0x76
strings.(*Reader).WriteTo(0xc420160000, 0x8a83a0, 0xc42000e028, 0x265940050, 0xc420160000, 0xc420001b01)
	/tmp/workdir/go/src/strings/reader.go:133 +0xa3
io.copyBuffer(0x8a83a0, 0xc42000e028, 0x8a8620, 0xc420160000, 0x0, 0x0, 0x0, 0xc4202f3c70, 0x0, 0x0)
	/tmp/workdir/go/src/io/io.go:380 +0x31a
io.Copy(0x8a83a0, 0xc42000e028, 0x8a8620, 0xc420160000, 0x6bb8e0, 0xc4203125a0, 0xc420020fb0)
	/tmp/workdir/go/src/io/io.go:360 +0x68
os/exec.(*Cmd).stdin.func1(0x6bb8e0, 0xc4203125a0)
	/tmp/workdir/go/src/os/exec/exec.go:213 +0x55
os/exec.(*Cmd).Start.func1(0xc42010cc60, 0xc420160020)
	/tmp/workdir/go/src/os/exec/exec.go:375 +0x27
created by os/exec.(*Cmd).Start
	/tmp/workdir/go/src/os/exec/exec.go:374 +0x4f4
FAIL	os/exec	180.085s

https://storage.googleapis.com/go-build-log/581bc46b/openbsd-amd64-60_3bfc9d56.log

@josharian josharian added Builders x/build issues (builders, bots, dashboards) Testing An issue that has been verified to require only test changes, not just a test failure. labels Mar 14, 2017
@josharian josharian added this to the Go1.9 milestone Mar 14, 2017
@bradfitz
Copy link
Contributor

/cc @mdempsky for OpenBSD

@bradfitz bradfitz added NeedsFix The path to resolution is known, but the work has not been done. help wanted labels Mar 14, 2017
@alexbrainman
Copy link
Member

Found this during CL 38761 trybot run https://storage.googleapis.com/go-build-log/99b40820/openbsd-amd64-60_178381c2.log

Alex

@martisch
Copy link
Contributor

martisch commented Apr 8, 2017

@enj
Copy link
Contributor

enj commented Apr 13, 2017

@mdempsky
Copy link
Member

It looks like the timeouts always happen in TestIgnorePipeErrorOnSuccess. Also, TestWaitid is always blocked waiting in t.Parallel. (The latter seems expected: TestIgnorePipeErrorOnSuccess is a serial test, so TestWaitid logically has to wait for it to finish, but I don't really understand the inner working of package testing.)

Casually reading through the OpenBSD-specific code, I don't see anything obviously wrong. I'll have to try running the code on an actual OpenBSD box and see if I can reproduce a failure under ktrace.

From the trace, it seems like one of the io.Copy calls (either copying the StringReader to pipe, or pipe to bytes.Buffer) is hanging. That would make me suspect a kqueue issue, but then I'd expect to see the issue on 386 too.

One minor observation: OpenBSD 5.7 added functions like pipe2 and dup3 so it's now possible to create FDs with the CLOEXEC flag already set, which would allow us to get rid of a lot of uses of ForkLock. I don't think that's the issue though.

@bradfitz
Copy link
Contributor

I created an OpenBSD VM yesterday with the same builder image & ran the test in a loop tens of thousands of times. I did finally get it to fail once, but it wasn't easy. It seems to happen more easily under load, so then I started running go test -short std in a loop, went away, and later saw my dead ssh window showing that it had indeed hung, but I wasn't around to investigate.

tl;dr: reproducible at least.

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9 Jul 14, 2017
@bradfitz
Copy link
Contributor

bradfitz commented Dec 5, 2017

@mdempsky, can you look into this?

Our openbsd-386 column is 2/3rds red. Currently our 386 builder is only OpenBSD 6.0. I'm brining up an OpenBSD/386 6.2 builder now. That is #22835. But our amd64 OpenBSD columns look happy for both 6.0 and 6.2

@bradfitz
Copy link
Contributor

bradfitz commented Dec 8, 2017

We now have both OpenBSD 6.0 and 6.2 builders for GOARCH=386, and both are super flaky in the same way. But both OpenBSD 6.0 and 6.2 are fine for GOARCH=amd64.

The amd64 vs 386 builders are the same VM sizes and the VM templates were both auto-generated from the same scripts. There should be no difference between them except the architecture.

So what is wrong with our 386 code? (or the OpenBSD 386 kernel perhaps)

Current build.golang.org output:

screen shot 2017-12-08 at 11 28 36 am

(mouseover the architecture heads to see 6.0 vs 6.2)

@gopherbot
Copy link

Change https://golang.org/cl/84035 mentions this issue: dashboard: give openbsd/386 more time, and skip pprof vendor tests on arm5

gopherbot pushed a commit to golang/build that referenced this issue Dec 14, 2017
… arm5

Updates golang/go#19547
Updates golang/go#22594

Change-Id: I0200eebf39c1fada6c6ce146afacb9d140142c5b
Reviewed-on: https://go-review.googlesource.com/84035
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@bradfitz
Copy link
Contributor

Nevermind, it was just cmd/go tests getting too slow and test timeouts. Column happy now. Not sure how I missed that before. :-(

@golang golang locked and limited conversation to collaborators Dec 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. OS-OpenBSD Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

7 participants