Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile/internal/ssa: test TestNexting failing #37050

Closed
cagedmantis opened this issue Feb 5, 2020 · 15 comments
Closed

cmd/compile/internal/ssa: test TestNexting failing #37050

cagedmantis opened this issue Feb 5, 2020 · 15 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@cagedmantis
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version release-1_14-work.mailed linux/amd64

What operating system and processor architecture are you using?

Debian GNU/Linux rodete
amd64

What did you do?

Ran test from the src directory.
$ go/src> go test cmd

What did you expect to see?

Passing tests

What did you see instead?

--- FAIL: TestNexting (7.71s)
    --- FAIL: TestNexting/dlv-dbg-hist (7.71s)
panic: There was an error writing 'b main.test
', write |1: broken pipe
 [recovered]
        panic: There was an error writing 'b main.test
', write |1: broken pipe

goroutine 249 [running]:
testing.tRunner.func1.1(0xc84b60, 0xc0004aa3b0)
        /home/chuck/Code/go/src/testing/testing.go:942 +0x3d0
testing.tRunner.func1(0xc00051d680)
        /home/chuck/Code/go/src/testing/testing.go:945 +0x3f9
panic(0xc84b60, 0xc0004aa3b0)
        /home/chuck/Code/go/src/runtime/panic.go:967 +0x15d
cmd/compile/internal/ssa_test.(*ioState).writeReadExpect(0xc0002f7030, 0xcf606d, 0xc, 0xcf42fc, 0xa, 0x0, 0x0, 0xc000318000, 0x87)
        /home/chuck/Code/go/src/cmd/compile/internal/ssa/debug_test.go:861 +0x299
cmd/compile/internal/ssa_test.(*delveState).start(0xc00002a1e0)
        /home/chuck/Code/go/src/cmd/compile/internal/ssa/debug_test.go:562 +0x216
cmd/compile/internal/ssa_test.runDbgr(0xed5180, 0xc00002a1e0, 0x3e8, 0x2a)
        /home/chuck/Code/go/src/cmd/compile/internal/ssa/debug_test.go:279 +0x35
cmd/compile/internal/ssa_test.testNexting(0xc00051d680, 0xcec462, 0x4, 0xc0000974d0, 0x7, 0xcec5f0, 0x5, 0x3e8, 0x1423490, 0x0, ...)
        /home/chuck/Code/go/src/cmd/compile/internal/ssa/debug_test.go:246 +0x5dd
cmd/compile/internal/ssa_test.subTest.func1(0xc00051d680)
        /home/chuck/Code/go/src/cmd/compile/internal/ssa/debug_test.go:180 +0xae
testing.tRunner(0xc00051d680, 0xc000076690)
        /home/chuck/Code/go/src/testing/testing.go:993 +0xdc
created by testing.(*T).Run
        /home/chuck/Code/go/src/testing/testing.go:1044 +0x357
FAIL    cmd/compile/internal/ssa        133.862s
@randall77
Copy link
Contributor

@dr2chase

@dr2chase
Copy link
Contributor

dr2chase commented Feb 5, 2020

Works for me.
I removed my build copy of dlv, discovered "/usr/bin/dlv", that worked okay.
I updated dlv to their tip, that worked okay.
Tried it on Mac with dlv completely removed, got expected skip.

I'd assume a flake, unless you can repeat this (this test can flake, worked pretty hard to deflake it but it's not zero). Maybe dlv has some extra behavior that I don't know about, depending on environment, hard to say.

@bcmills
Copy link
Contributor

bcmills commented Feb 6, 2020

How many (if any) of the builders have dlv installed?

Looking at the test body, it appears that we skip the test entirely otherwise (rather that, say, falling back to GDB). Is that intentional?

Given that Delve is a Go program, perhaps the test should instead download and build (a specific version of) the dlv binary locally if testenv.Builder() is non-empty and testing.Short() is false.


} else { // Delve
debugger = "dlv"
_, err := exec.LookPath("dlv")
if err != nil {
skipReasons += "not run because dlv not on path; "
}
}

@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 6, 2020
@bcmills bcmills added this to the Backlog milestone Feb 6, 2020
@cagedmantis
Copy link
Contributor Author

We were running this on our workstations as part of the testing in the release process.

@dr2chase
Copy link
Contributor

dr2chase commented Feb 6, 2020

@bcmills I added dlv to the containers that run longtests, so it should be there.

Using gdb is a recipe for flaky awfulness, different versions behave differently, it can be sensitive to your Python installation, it's not an option on Macs anymore (as in, I have been unable to build and install it correctly despite multiple tries and many searches for how-to recipes -- I follow them, I do not end up debugging code as a non-root user).

@dmitshur
Copy link
Contributor

dmitshur commented May 27, 2020

This seems to be happening on linux-386-longtest builder specifically more often. It has happened on the post-submit builder on release-branch.go1.14 here, and in the SlowBot run here.

I'll do some more investigation to confirm how reproducible it is. It's worth keeping in mind that the linux-386-longtest builder was misconfigured to test linux/amd64 until it was resolved in CL 234520 recently, so past data may be misleading.

@dmitshur
Copy link
Contributor

dmitshur commented May 27, 2020

It is also happening on the release-branch.go1.13 branch. See the post-submit failure here, and the SlowBot run here.

From looking at the recent results (post-CL 234520) in the linux-386-longtest column at https://build.golang.org/?branch=release-branch.go1.14 and https://build.golang.org/?branch=release-branch.go1.13, it is very reproducible on linux/386.

@dr2chase
Copy link
Contributor

I've been playing with gdb-on-Darwin, now that I figured out the truly secret handshake for code signing (use certificate name for email address; only Trust-Always for codesigning, leave the rest alone) and it works by hand, fails by program, no idea why. The new version of Delve continues to work just fine.

@dmitshur
Copy link
Contributor

dmitshur commented May 27, 2020

The version of dlv installed on the linux-386-longtest builder is:

root@buildlet-linux-stretch-morecpu-rn5eea3af:~# dlv version
Delve Debugger
Version: 1.2.0
Build: $Id: 068e2451004e95d0b042e5257e34f0f08ce01466 $

The latest stable release at https://github.com/go-delve/delve/releases seems to be v1.4.1.

The new version of Delve continues to work just fine.

@dr2chase Which exact version did you try? Do you think a good next step would be to update dlv on the linux-386-longtest builder?

@dmitshur
Copy link
Contributor

dmitshur commented May 27, 2020

I just remembered @bcmills's #37050 (comment) from earlier, that seems like another good option to consider, as it would make the test depend on the environment configuration less, and able to run in more environments.

I can reproduce the TestNexting/dlv-dbg-hist failure on linux/386 (not linux/amd64) consistently.


root@buildlet-linux-stretch-morecpu-rn5eea3af:/workdir/go/src# go version
go version devel gomote.XXXXX linux/386
root@buildlet-linux-stretch-morecpu-rn5eea3af:/workdir/go/src# go test -count=1 -run='TestNexting/dlv-dbg-hist' -v cmd/compile/internal/ssa 
=== RUN   TestNexting
=== RUN   TestNexting/dlv-dbg-hist
--- FAIL: TestNexting (5.30s)
    --- FAIL: TestNexting/dlv-dbg-hist (5.30s)
panic: There was an error writing 'b main.test
', write |1: broken pipe
 [recovered]
	panic: There was an error writing 'b main.test
', write |1: broken pipe


goroutine 7 [running]:
testing.tRunner.func1.1(0x883aa40, 0xb216010)
	/workdir/go/src/testing/testing.go:940 +0x27c
testing.tRunner.func1(0xb136140)
	/workdir/go/src/testing/testing.go:943 +0x349
panic(0x883aa40, 0xb216010)
	/workdir/go/src/runtime/panic.go:969 +0x122
cmd/compile/internal/ssa_test.(*ioState).writeReadExpect(0xb18e900, 0x8884b7d, 0xc, 0x8882dea, 0xa, 0x0, 0x0, 0xb1b6000, 0x63)
	/workdir/go/src/cmd/compile/internal/ssa/debug_test.go:861 +0x206
cmd/compile/internal/ssa_test.(*delveState).start(0xb194000)
	/workdir/go/src/cmd/compile/internal/ssa/debug_test.go:562 +0x1cc
cmd/compile/internal/ssa_test.runDbgr(0x8a6ada0, 0xb194000, 0x3e8, 0x2a)
	/workdir/go/src/cmd/compile/internal/ssa/debug_test.go:279 +0x29
cmd/compile/internal/ssa_test.testNexting(0xb136140, 0x887af68, 0x4, 0xb0182b0, 0x7, 0x887b0f1, 0x5, 0x3e8, 0x8f4f148, 0x0, ...)
	/workdir/go/src/cmd/compile/internal/ssa/debug_test.go:246 +0x54e
cmd/compile/internal/ssa_test.subTest.func1(0xb136140)
	/workdir/go/src/cmd/compile/internal/ssa/debug_test.go:180 +0x93
testing.tRunner(0xb136140, 0xb0104e0)
	/workdir/go/src/testing/testing.go:991 +0xb4
created by testing.(*T).Run
	/workdir/go/src/testing/testing.go:1042 +0x2ad
FAIL	cmd/compile/internal/ssa	5.303s
FAIL

Updating to dlv v1.4.1 makes the test pass. Edit: See #37050 (comment).


root@buildlet-linux-stretch-morecpu-rn5eea3af:~# go version
go version devel gomote.XXXXX linux/386
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# dlv version
Delve Debugger
Version: 1.2.0
Build: $Id: 068e2451004e95d0b042e5257e34f0f08ce01466 $
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# go test -count=1 -run='TestNexting/dlv-dbg-hist' -v cmd/compile/internal/ssa 
=== RUN   TestNexting
=== RUN   TestNexting/dlv-dbg-hist
--- FAIL: TestNexting (5.32s)
    --- FAIL: TestNexting/dlv-dbg-hist (5.32s)
panic: There was an error writing 'b main.test
', write |1: broken pipe
 [recovered]
	panic: There was an error writing 'b main.test
', write |1: broken pipe


goroutine 7 [running]:
testing.tRunner.func1.1(0x883aa40, 0x998c0c0)
	/workdir/go/src/testing/testing.go:940 +0x27c
testing.tRunner.func1(0x9936140)
	/workdir/go/src/testing/testing.go:943 +0x349
panic(0x883aa40, 0x998c0c0)
	/workdir/go/src/runtime/panic.go:969 +0x122
cmd/compile/internal/ssa_test.(*ioState).writeReadExpect(0x9990900, 0x8884b7d, 0xc, 0x8882dea, 0xa, 0x0, 0x0, 0x99b6000, 0x63)
	/workdir/go/src/cmd/compile/internal/ssa/debug_test.go:861 +0x206
cmd/compile/internal/ssa_test.(*delveState).start(0x9996000)
	/workdir/go/src/cmd/compile/internal/ssa/debug_test.go:562 +0x1cc
cmd/compile/internal/ssa_test.runDbgr(0x8a6ada0, 0x9996000, 0x3e8, 0x2a)
	/workdir/go/src/cmd/compile/internal/ssa/debug_test.go:279 +0x29
cmd/compile/internal/ssa_test.testNexting(0x9936140, 0x887af68, 0x4, 0x9816270, 0x7, 0x887b0f1, 0x5, 0x3e8, 0x8f4f148, 0x0, ...)
	/workdir/go/src/cmd/compile/internal/ssa/debug_test.go:246 +0x54e
cmd/compile/internal/ssa_test.subTest.func1(0x9936140)
	/workdir/go/src/cmd/compile/internal/ssa/debug_test.go:180 +0x93
testing.tRunner(0x9936140, 0x98104e0)
	/workdir/go/src/testing/testing.go:991 +0xb4
created by testing.(*T).Run
	/workdir/go/src/testing/testing.go:1042 +0x2ad
FAIL	cmd/compile/internal/ssa	5.321s
FAIL
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# dlv version
Delve Debugger
Version: 1.2.0
Build: $Id: 068e2451004e95d0b042e5257e34f0f08ce01466 $
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# export PATH="/workdir/gopath/bin:$PATH"
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# dlv version
Delve Debugger
Version: 1.4.1
Build: $Id: bda606147ff48b58bde39e20b9e11378eaa4db46 $
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# go test -count=1 -run='TestNexting/dlv-dbg-hist' -v cmd/compile/internal/ssa
=== RUN   TestNexting
=== RUN   TestNexting/dlv-dbg-hist
--- PASS: TestNexting (1.44s)
    --- PASS: TestNexting/dlv-dbg-hist (1.44s)
PASS
ok  	cmd/compile/internal/ssa	1.442s
root@buildlet-linux-stretch-morecpu-rn5eea3af:~# 

@dr2chase
Copy link
Contributor

I tried Delve 1.4.1.

I just rebooted after a Catalina security or dot upgrade, and one of my copies of gdb now hangs from the keyboard, the other (same binary, different signing) does not. All of them still hang running under the control of TestNexting.

@dmitshur
Copy link
Contributor

dmitshur commented May 27, 2020

Upon further testing, it seems the problem was not the version of dlv, but the fact that the installed dlv binary was built with GOOS=linux GOARCH=amd64, and the same binary is being used for GOHOSTARCH=386 testing.

I can get the test to fail the same way (in the linux/386 configuration) if I install and use dlv binary with GO111MODULE=on GOARCH=amd64 go get github.com/go-delve/delve/cmd/dlv@v1.4.1 instead of GO111MODULE=on GOARCH=386 go get github.com/go-delve/delve/cmd/dlv@v1.4.1.

@dr2chase
Copy link
Contributor

So, using an amd64 delve to debug a 386 Go binary leads to flakiness?
When this fails with gdb, is gdb also built to be amd64 instead of 386?

@dr2chase
Copy link
Contributor

Delve says it doesn't support 386:

drchase@drchase1:~/work/go/src/cmd/compile/internal/ssa$ TERM=dumb dlv exec testdata/test-hist.dlv-dbg
unsupported architecture - only linux/amd64 and linux/arm64 are supported

and then it panics all over the place.

Still haven't gotten gdb to work consistently-properly on either Darwin or Linux; I have a fear that success depends not just on gdb version, but also on the version of Python that is compiled into it when gdb is built.

@dmitshur
Copy link
Contributor

I've looked into how the linux-386-longtest builder is passing on tip, and found a similar (or duplicate) issue #37404. It's passing on tip because CL 227587 has added a skip for TestNexting.

So, using an amd64 delve to debug a 386 Go binary leads to flakiness?

Yes. It's not just flakiness, it's reproducible failure:

image

I think there are two distinct issues between this issue and #37404.

  1. The linux-386-longtest builder is failing reproducibly now. It's only visible after May 19 because before CL 234520 the builder was testing linux/amd64, not linux/386.

    I've opened a new issue cmd/compile/internal/ssa: TestNexting/dlv-dbg-hist failing on linux-386-longtest builder because it tries to use an older version of dlv which only supports linux/amd64 #39309 for this problem specifically.

  2. The linux-amd64-longest builder has apparent deadlocks that happen infrequently.

    I think cmd/compile/internal/ssa: apparent deadlock in TestNexting #37404 is better suited to track problem 2, so let's use that issue, and close this as duplicate.

@golang golang locked and limited conversation to collaborators May 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants