runtime: mysterious CL 314229 failure on 32-bit only #45877
Labels
FrozenDueToAge
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
1.16.3 does not reproduce this issue, only master branch is affected.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I prepared patch 314229 for sync.Pool that randomize object stealing. After this CL 32-bit platforms are failed:
If this patch apply to 1.16 revision, it will works without problems. So, I used bisect to find commit that conflict with my patch.
It's CL 271537 that introduced pool usage for splices. It uses setFinalizer to close read/write descriptors after object deletion. Also, there is a test that create 64 pipes using pipe2 syscall, put them to Pool, and drop variable which hold reference to this objects. So, the result should be following: after two GCs each object should be not marked by GC (no references to them), called finalizer, and removed. Seems good and according to filtered strace it works well: https://pastebin.com/TB6CvMKY
After my CL this test starts to fail on 32-bit platforms:
This is filtered strace for this case: https://pastebin.com/7syNW3A9
As you can see there is no close syscalls for descriptors 7 and 8. It reveals that finalizer is not executed for first splice. And it's stable reproduced with the same descriptors' numbers. Even if I delete whole logic of my CL except pOrder declaration and its reset (structure was copied from runtime/proc.go without any changes), the issue still reproduces.
But if I just add -v flag to test it's become fixed without any code changes:
Another method to enable test is adding explicit array initialization in reset:
But after rebasing to the last revision this method stops to work. Also, coprime store only integers, no pointers. It can't prevent GC to delete the objects.
The issue could be inside GC and finalizer logic for 32-bit platforms. Maybe the problem is depended on memory layout, even small changes in flow could fix this bug.
@dr2chase also was looking at this problem and advised me to create an issue here. Also he think that it's probably not a differing memory model issue, because it is okay on arm64, fails on 386 (strong memory model).
It would be good to get any information that can help to find the root cause.
What did you expect to see?
TestSplicePipePool is passed and all descriptors are closed: https://pastebin.com/TB6CvMKY
What did you see instead?
TestSplicePipePool is failed due to not closed descriptors by finalizer (not executed for first splice): https://pastebin.com/7syNW3A9
/cc @dr2chase @ianlancetaylor @mknyszek @aclements
The text was updated successfully, but these errors were encountered: