Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime/pprof: flaky tests (TestMathBigDivide, TestStackBarrierProfiling) #18332

Closed
broady opened this issue Dec 15, 2016 · 7 comments
Closed
Labels
FrozenDueToAge Testing An issue that has been verified to require only test changes, not just a test failure.
Milestone

Comments

@broady
Copy link
Member

broady commented Dec 15, 2016

Failed twice in a row while performing the go1.8beta2 release for linux-amd64, linux-386. freebsd-amd64 failed once, too.

--- FAIL: TestMathBigDivide (6.39s)
        pprof_test.go:196: total 1 CPU profile samples collected
        pprof_test.go:208: too few samples; got 1, want at least 5, ideally 20
        pprof_test.go:153: retrying with 400ms duration
        pprof_test.go:196: total 1 CPU profile samples collected
        pprof_test.go:208: too few samples; got 1, want at least 10, ideally 40
        pprof_test.go:153: retrying with 800ms duration
        pprof_test.go:196: total 11 CPU profile samples collected
        pprof_test.go:208: too few samples; got 11, want at least 20, ideally 80
        pprof_test.go:153: retrying with 1.6s duration
        pprof_test.go:196: total 21 CPU profile samples collected
        pprof_test.go:208: too few samples; got 21, want at least 40, ideally 160
        pprof_test.go:153: retrying with 3.2s duration
        pprof_test.go:196: total 45 CPU profile samples collected
        pprof_test.go:208: too few samples; got 45, want at least 80, ideally 320

linux-386 also had this one once:

--- FAIL: TestStackBarrierProfiling (6.61s)
        pprof_test.go:424: subprocess failed with exit status 1:
                --- FAIL: TestStackBarrierProfiling (6.49s)
                        pprof_test.go:196: total 0 CPU profile samples collected
                        pprof_test.go:208: too few samples; got 0, want at least 5, ideally 20
                        pprof_test.go:153: retrying with 400ms duration
                        pprof_test.go:196: total 1 CPU profile samples collected
                        pprof_test.go:208: too few samples; got 1, want at least 10, ideally 40
                        pprof_test.go:153: retrying with 800ms duration
                        pprof_test.go:196: total 11 CPU profile samples collected
                        pprof_test.go:208: too few samples; got 11, want at least 20, ideally 80
                        pprof_test.go:153: retrying with 1.6s duration
                        pprof_test.go:196: total 22 CPU profile samples collected
                        pprof_test.go:208: too few samples; got 22, want at least 40, ideally 160
                        pprof_test.go:153: retrying with 3.2s duration
                        pprof_test.go:196: total 64 CPU profile samples collected
                        pprof_test.go:208: too few samples; got 64, want at least 80, ideally 320
@bradfitz
Copy link
Contributor

Odd that we have't seen these on build.golang.org or trybots. But the x/build/cmd/release tool that @broady is using runs all.bash, which the trybot/build.golang.org build system doesn't use. Could that be related? SIGPROF routing differently?

@bradfitz bradfitz added this to the Go1.8Maybe milestone Dec 15, 2016
@bradfitz bradfitz added the Testing An issue that has been verified to require only test changes, not just a test failure. label Dec 15, 2016
@bcmills
Copy link
Contributor

bcmills commented Dec 16, 2016

TestCPUProfileWithFork is also flaky for me using all.bash.

$ go version
go version devel +0cd2bf4 Thu Dec 15 22:43:28 2016 +0000 linux/amd64
$ $(go env CC) --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
SIGQUIT: quit
PC=0x466343 m=0 sigcode=0

goroutine 16 [running]:
syscall.RawSyscall6(0x38, 0x11, 0x0, 0x0, 0x0, 0x0, 0x0, 0x125, 0xc4204f8a50, 0x80000)
	/home/bcmills/src/go/src/syscall/asm_linux_amd64.s:100 +0x23 fp=0xc4204f89a0 sp=0xc4204f8998
syscall.forkAndExecInChild(0xc420016840, 0xc4204d8560, 0x3, 0x3, 0xc4202c21e0, 0x3c, 0x3c, 0x0, 0x0, 0xc4204f8cc0, ...)
	/home/bcmills/src/go/src/syscall/exec_linux.go:101 +0x210 fp=0xc4204f8ab0 sp=0xc4204f89a0
syscall.forkExec(0x7ffc81cdb688, 0x35, 0xc4204d8000, 0x2, 0x2, 0xc4204f8cc0, 0x0, 0x0, 0x0)
	/home/bcmills/src/go/src/syscall/exec_unix.go:193 +0x368 fp=0xc4204f8bd8 sp=0xc4204f8ab0
syscall.StartProcess(0x7ffc81cdb688, 0x35, 0xc4204d8000, 0x2, 0x2, 0xc4204f8cc0, 0x2, 0x4, 0xc4204e20f0, 0xc4204f8c90)
	/home/bcmills/src/go/src/syscall/exec_unix.go:240 +0x64 fp=0xc4204f8c30 sp=0xc4204f8bd8
os.startProcess(0x7ffc81cdb688, 0x35, 0xc4204d8000, 0x2, 0x2, 0xc4204f8e68, 0xc4204d6800, 0x3b, 0x3b)
	/home/bcmills/src/go/src/os/exec_posix.go:45 +0x1a3 fp=0xc4204f8d18 sp=0xc4204f8c30
os.StartProcess(0x7ffc81cdb688, 0x35, 0xc4204d8000, 0x2, 0x2, 0xc4204f8e68, 0x0, 0x0, 0x4512d0)
	/home/bcmills/src/go/src/os/exec.go:94 +0x64 fp=0xc4204f8d70 sp=0xc4204f8d18
os/exec.(*Cmd).Start(0xc4200b8000, 0xc4200fd701, 0xc4200ba1c0)
	/home/bcmills/src/go/src/os/exec/exec.go:359 +0x3d2 fp=0xc4204f8ec0 sp=0xc4204f8d70
os/exec.(*Cmd).Run(0xc4200b8000, 0xc4200ba1c0, 0xc4200b8000)
	/home/bcmills/src/go/src/os/exec/exec.go:277 +0x2b fp=0xc4204f8ee8 sp=0xc4204f8ec0
os/exec.(*Cmd).CombinedOutput(0xc4200b8000, 0x35, 0xc4200fd778, 0x1, 0x1, 0xc4200b8000)
	/home/bcmills/src/go/src/os/exec/exec.go:495 +0x127 fp=0xc4204f8f20 sp=0xc4204f8ee8
runtime/pprof_test.TestCPUProfileWithFork(0xc420070270)
	/home/bcmills/src/go/src/runtime/pprof/pprof_test.go:271 +0x228 fp=0xc4204f8fa8 sp=0xc4204f8f20
testing.tRunner(0xc420070270, 0x5bf5b8)
	/home/bcmills/src/go/src/testing/testing.go:656 +0x93 fp=0xc4204f8fd0 sp=0xc4204f8fa8
runtime.goexit()
	/home/bcmills/src/go/src/runtime/asm_amd64.s:2184 +0x1 fp=0xc4204f8fd8 sp=0xc4204f8fd0
created by testing.(*T).Run
	/home/bcmills/src/go/src/testing/testing.go:693 +0x2c4

goroutine 1 [chan receive, 3 minutes]:
testing.(*T).Run(0xc4200704e0, 0x5b7328, 0x16, 0x5bf5b8, 0xc420043d01)
	/home/bcmills/src/go/src/testing/testing.go:694 +0x2ee
testing.runTests.func1(0xc4200704e0)
	/home/bcmills/src/go/src/testing/testing.go:877 +0x67
testing.tRunner(0xc4200704e0, 0xc420043de0)
	/home/bcmills/src/go/src/testing/testing.go:656 +0x93
testing.runTests(0xc42000ab60, 0x69b100, 0xa, 0xa, 0x5b3376)
	/home/bcmills/src/go/src/testing/testing.go:883 +0x29d
testing.(*M).Run(0xc420043f20, 0xc420043f20)
	/home/bcmills/src/go/src/testing/testing.go:818 +0xfc
main.main()
runtime/pprof/_test/_testmain.go:60 +0xf7

goroutine 36 [syscall]:
runtime.CPUProfile(0x59a540, 0xc4200a8000, 0x69d620)
	/home/bcmills/src/go/src/runtime/cpuprof.go:448 +0x2d
runtime/pprof.profileWriter(0x6843a0, 0xc4200ba0e0)
	/home/bcmills/src/go/src/runtime/pprof/pprof.go:701 +0x63
created by runtime/pprof.StartCPUProfile
	/home/bcmills/src/go/src/runtime/pprof/pprof.go:688 +0x122

rax    0x38
rbx    0x69ddc0
rcx    0xffffffffffffffff
rdx    0x0
rdi    0x11
rsi    0x0
rbp    0xc4204f8aa0
rsp    0xc4204f8998
r8     0x0
r9     0x0
r10    0x0
r11    0x206
r12    0x0
r13    0xffffffee
r14    0x740
r15    0x100
rip    0x466343
rflags 0x206
cs     0x33
fs     0x0
gs     0x0
*** Test killed with quit: ran too long (4m0s).
FAIL    runtime/pprof   240.059s

@ianlancetaylor
Copy link
Contributor

CC @matloob @rauls5382

@ianlancetaylor
Copy link
Contributor

TestCPUProfileWithFork is probably a different problem and should probably be filed as a separate issue.

The TestMathBigDivide and TestStackBarrierProfiling tests are getting profiling samples, they just aren't getting as many as the tests expect. The tests expect 100 profiling samples per second, and will accept as few as 25 samples per second. They are getting more like 14 or 15 per second. This could be due to a heavily overloaded system.

I can't see why using all.bash would make a difference.

It's easy for me to make the tests fail by running 20 of the test programs simultaneously.

Not sure what we should do about this.

@bradfitz
Copy link
Contributor

We should disable the tests if they can't be made reliable.

@ianlancetaylor
Copy link
Contributor

I'll change them to pass if we get at least 10 samples. That is enough to show that the profiler is doing something.

@gopherbot
Copy link

CL https://golang.org/cl/34507 mentions this issue.

wizeman added a commit to wizeman/nixpkgs that referenced this issue Jul 3, 2017
wizeman added a commit to wizeman/nixpkgs that referenced this issue Jul 3, 2017
wizeman added a commit to wizeman/nixpkgs that referenced this issue Jul 3, 2017
@golang golang locked and limited conversation to collaborators Dec 16, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

5 participants