Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: mysterious CL 314229 failure on 32-bit only #45877

Closed
kels-ng opened this issue Apr 30, 2021 · 6 comments
Closed

runtime: mysterious CL 314229 failure on 32-bit only #45877

kels-ng opened this issue Apr 30, 2021 · 6 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@kels-ng
Copy link
Contributor

kels-ng commented Apr 30, 2021

What version of Go are you using (go version)?

$ go version
go version devel go1.17-df9d274e40 Thu Apr 29 11:47:52 2021 +0000 linux/amd64

Does this issue reproduce with the latest release?

1.16.3 does not reproduce this issue, only master branch is affected.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="auto"
GOARCH="amd64"
GOBIN="/home/randreev/gopath//bin/"
GOCACHE="/home/randreev/.cache/go-build"
GOENV="/home/randreev/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/randreev/gopath/pkg/mod"
GONOPROXY=""
GONOSUMDB="*"
GOOS="linux"
GOPATH="/home/randreev/gopath/"
GOPRIVATE=""
GOROOT="/home/randreev/goroot"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/randreev/goroot/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="devel go1.17-df9d274e40 Thu Apr 29 11:47:52 2021 +0000"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/randreev/goroot/src/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2894805972=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I prepared patch 314229 for sync.Pool that randomize object stealing. After this CL 32-bit platforms are failed:

If this patch apply to 1.16 revision, it will works without problems. So, I used bisect to find commit that conflict with my patch.

It's CL 271537 that introduced pool usage for splices. It uses setFinalizer to close read/write descriptors after object deletion. Also, there is a test that create 64 pipes using pipe2 syscall, put them to Pool, and drop variable which hold reference to this objects. So, the result should be following: after two GCs each object should be not marked by GC (no references to them), called finalizer, and removed. Seems good and according to filtered strace it works well: https://pastebin.com/TB6CvMKY

After my CL this test starts to fail on 32-bit platforms:

go test -timeout 5s -run ^TestSplicePipePool$ internal/poll
--- FAIL: TestSplicePipePool (4.51s)
    splice_linux_test.go:69: at least one pipe is still open
FAIL
FAIL    internal/poll   4.510s
FAIL

This is filtered strace for this case: https://pastebin.com/7syNW3A9

As you can see there is no close syscalls for descriptors 7 and 8. It reveals that finalizer is not executed for first splice. And it's stable reproduced with the same descriptors' numbers. Even if I delete whole logic of my CL except pOrder declaration and its reset (structure was copied from runtime/proc.go without any changes), the issue still reproduces.

But if I just add -v flag to test it's become fixed without any code changes:

go test -v -timeout 5s -run ^TestSplicePipePool$ internal/poll
=== RUN   TestSplicePipePool
--- PASS: TestSplicePipePool (0.02s)
PASS
ok      internal/poll   0.023s

Another method to enable test is adding explicit array initialization in reset:

func (ord *randomOrder) reset(count uint32) {
	ord.count = count
	// ISSUE: 32-bit platforms are failed without array init
	ord.coprimes = []uint32{0} // This line is not presented in original randomOrder
	ord.coprimes = ord.coprimes[:0]
	for i := uint32(1); i <= count; i++ {
		if gcd(i, count) == 1 {
			ord.coprimes = append(ord.coprimes, i)
		}
	}
}

But after rebasing to the last revision this method stops to work. Also, coprime store only integers, no pointers. It can't prevent GC to delete the objects.

The issue could be inside GC and finalizer logic for 32-bit platforms. Maybe the problem is depended on memory layout, even small changes in flow could fix this bug.

@dr2chase also was looking at this problem and advised me to create an issue here. Also he think that it's probably not a differing memory model issue, because it is okay on arm64, fails on 386 (strong memory model).

It would be good to get any information that can help to find the root cause.

What did you expect to see?

TestSplicePipePool is passed and all descriptors are closed: https://pastebin.com/TB6CvMKY

What did you see instead?

TestSplicePipePool is failed due to not closed descriptors by finalizer (not executed for first splice): https://pastebin.com/7syNW3A9

/cc @dr2chase @ianlancetaylor @mknyszek @aclements

@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 30, 2021
@cherrymui cherrymui added this to the Unplanned milestone Apr 30, 2021
@networkimprov
Copy link

Should be milestone 1.17? It's OK on 1.16.

@ALTree
Copy link
Member

ALTree commented May 16, 2021

It seems that OP can reproduce this reliably only on top of a CL they're working on; so while it's possible that the issue is caused by a latent problem that also affects current tip, I'm not sure it needs to be investigated for 1.17. After all there's no test failure on tip.

@bcmills
Copy link
Contributor

bcmills commented Aug 30, 2021

Duplicate of #45782

@bcmills bcmills marked this as a duplicate of #45782 Aug 30, 2021
@bcmills bcmills closed this as completed Aug 30, 2021
@vleonen
Copy link

vleonen commented Aug 30, 2021

@bcmills This issue was created as a result of investigation of failure described in #45782 and CL313630 was just a trigger for the issue.
The current issue describes a case to reproduce the issue on master branch without CL313630 changes applied on it. Please check comments in https://go-review.googlesource.com/c/go/+/314229 (reopened CL for sync.Pool changes).

@networkimprov
Copy link

Vasily, is this still unresolved?

@kels-ng
Copy link
Contributor Author

kels-ng commented Dec 11, 2021

@networkimprov it's fixed by this commit: 60c3069

@golang golang locked and limited conversation to collaborators Dec 11, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

7 participants