Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: "sweep increased allocation count" on linux-amd64-staticlockranking builder #38702

Closed
bcmills opened this issue Apr 27, 2020 · 10 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@bcmills
Copy link
Contributor

bcmills commented Apr 27, 2020

2020-04-27T15:53:46-9b9556f/linux-amd64-staticlockranking

CC @danscales @mknyszek @aclements

Tentatively marking as release-blocker because this seems to indicate memory corruption in the runtime.

@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker labels Apr 27, 2020
@bcmills bcmills added this to the Go1.15 milestone Apr 27, 2020
@aclements
Copy link
Member

aclements commented May 14, 2020

These seem not uncommon. Here are the ones from this year. They go to 2016 (which could well be when we introduced this panic), but there was a clear uptick around 2020-03.

(Edited: @mknyszek found that all but the first two in this list were #37881)

$ greplogs -dashboard -E "error: sweep increased allocation count" -l -md

2020-04-27T15:53:46-9b9556f/linux-amd64-staticlockranking
2020-04-08T18:35:49-f7e6ab4/solaris-amd64-oraclerel
2020-03-24T19:05:50-f975485/linux-386-clang
2020-03-24T17:24:24-9dcd6b3/darwin-386-10_14
2020-03-24T14:21:50-ade9886/darwin-386-10_14
2020-03-24T10:33:13-9ef61d5/openbsd-386-64
2020-03-23T19:14:29-6aded25/freebsd-386-12_0
2020-03-23T17:56:24-5c9bd49/linux-386-clang
2020-03-23T17:23:03-5d47f87/freebsd-386-12_0
2020-03-23T17:07:22-67c2dcb/linux-386-387
2020-03-23T03:56:18-bb929b7/linux-386-sid
2020-03-22T08:42:38-787e7b0/linux-386-sid
2020-03-22T00:10:27-36b815e/linux-386-387
2020-03-21T02:46:16-287d67e/linux-386-387
2020-03-20T16:05:35-d965bb6/freebsd-386-11_2
2020-03-20T16:05:33-ab5a40c/linux-386
2020-03-20T08:42:30-9d468f4/openbsd-386-62
2020-03-20T00:27:02-a0917eb/freebsd-386-11_2
2020-03-20T00:27:02-a0917eb/linux-386-clang
2020-03-19T00:08:40-b3b174f/freebsd-386-11_2
2020-03-18T19:44:13-0205790/linux-386-387
2020-03-18T19:13:50-f1f947a/linux-386-sid
2020-03-18T18:59:32-e39de05/linux-386-387
2020-03-18T16:00:44-0c0e8f2/linux-386
2020-03-18T01:03:36-6412750/linux-386
2020-03-17T20:48:23-0eeec4f/freebsd-386-11_2
2020-03-17T17:10:51-14d20dc/freebsd-386-12_0
2020-03-17T01:24:30-0e44c69/linux-386
2020-03-16T20:59:27-ff1eb42/linux-386-387
2020-03-15T08:13:55-32dbccd/linux-386-clang
2020-03-14T07:03:15-d774d97/linux-386-sid
2020-03-13T20:43:12-e2a9ea0/openbsd-386-62

@mknyszek
Copy link
Contributor

There was an uptick in March, but then it slowed down considerably. Either something is masking the bug now, or it got fixed (also all those failures are on 386, interestingly enough). The last one before that block was plan9-arm in November.

@aclements
Copy link
Member

I'm going to put together a CL to at least improve the debugging output from this.

@gopherbot
Copy link

Change https://golang.org/cl/234100 mentions this issue: runtime: detect and report zombie slots during sweeping

gopherbot pushed a commit that referenced this issue May 21, 2020
A zombie slot is a slot that is marked, but isn't allocated. This can
indicate a bug in the GC, or a bad use of unsafe.Pointer. Currently,
the sweeper has best-effort detection for zombie slots: if there are
more marked slots than allocated slots, then there must have been a
zombie slot. However, this is imprecise since it only compares totals
and it reports almost no information that may be helpful to debug the
issue.

Add a precise check that compares the mark and allocation bitmaps and
reports detailed information if it detects a zombie slot.

No appreciable effect on performance as measured by the sweet
benchmarks:

name                                old time/op  new time/op  delta
BiogoIgor                            15.8s ± 2%   15.8s ± 2%    ~     (p=0.421 n=24+25)
BiogoKrishna                         15.6s ± 2%   15.8s ± 5%    ~     (p=0.082 n=22+23)
BleveIndexBatch100                   4.90s ± 3%   4.88s ± 2%    ~     (p=0.627 n=25+24)
CompileTemplate                      204ms ± 1%   205ms ± 0%  +0.22%  (p=0.010 n=24+23)
CompileUnicode                      77.8ms ± 2%  78.0ms ± 1%    ~     (p=0.236 n=25+24)
CompileGoTypes                       729ms ± 0%   731ms ± 0%  +0.26%  (p=0.000 n=24+24)
CompileCompiler                      3.52s ± 0%   3.52s ± 1%    ~     (p=0.152 n=25+25)
CompileSSA                           8.06s ± 1%   8.05s ± 0%    ~     (p=0.192 n=25+24)
CompileFlate                         132ms ± 1%   132ms ± 1%    ~     (p=0.373 n=24+24)
CompileGoParser                      163ms ± 1%   164ms ± 1%  +0.32%  (p=0.003 n=24+25)
CompileReflect                       453ms ± 1%   455ms ± 1%  +0.39%  (p=0.000 n=22+22)
CompileTar                           181ms ± 1%   181ms ± 1%  +0.20%  (p=0.029 n=24+21)
CompileXML                           244ms ± 1%   244ms ± 1%    ~     (p=0.065 n=24+24)
CompileStdCmd                        15.8s ± 2%   15.7s ± 2%    ~     (p=0.059 n=23+24)
FoglemanFauxGLRenderRotateBoat       13.4s ±11%   12.8s ± 0%    ~     (p=0.377 n=25+24)
FoglemanPathTraceRenderGopherIter1   18.6s ± 0%   18.6s ± 0%    ~     (p=0.696 n=23+24)
GopherLuaKNucleotide                 28.7s ± 4%   28.6s ± 5%    ~     (p=0.700 n=25+25)
MarkdownRenderXHTML                  250ms ± 1%   248ms ± 1%  -1.01%  (p=0.000 n=24+24)
[Geo mean]                           1.60s        1.60s       -0.11%

(https://perf.golang.org/search?q=upload:20200517.6)

For #38702.

Change-Id: I8af1fefd5fbf7b9cb665b98f9c4b73d1d08eea81
Reviewed-on: https://go-review.googlesource.com/c/go/+/234100
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
@toothrot
Copy link
Contributor

Hello! This is one of the few remaining issues blocking the Beta release of Go 1.15. We'll need to make a decision on this in the next week in order to keep our release on schedule.

@mknyszek
Copy link
Contributor

@toothrot I don't think we've seen this crash since the failure that caused Bryan to open this issue. @aclements' CL should give us a lot more information should we see this crash again, though. I think it's probably safe to not mark this as a beta-blocking issue, but we should keep an eye out for more such failures.

Also an interesting data point: it looks like all the failures that were happening through March were happening with the same size class (s.nelems=512 in every case I've looked at, which is size class 2; unclear whether they're noscan just from the crashes, could be tiny allocator related?). The two most recent failures aren't for the same size class. I tried to trace this back to a particular CL, but nothing seems obviously wrong. These failures are also all very consistently on 386. I think whatever failures were going on in March are distinct from the last two which happened in April. Ah, digging through issues I found that those build failures are referenced by #37881 which is declared fixed.

I think the only two crashes relevant to this thread are:

2020-04-27T15:53:46-9b9556f/linux-amd64-staticlockranking
2020-04-08T18:35:49-f7e6ab4/solaris-amd64-oraclerel

@aclements
Copy link
Member

Thanks for that sleuthing, @mknyszek ! I agree that given that this now seems rare, it shouldn't block the beta. There's also not much we can do until we get some more debugging information.

@aclements aclements added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label May 26, 2020
@aclements
Copy link
Member

Updated query to catch failures with the new zombie detection: greplogs -dashboard -E "error: sweep increased allocation count|runtime: marked free object" -l -md. There haven't been any new failures since the one of 2020-04-27.

I think we should continue to keep this bug open for now. If we don't see any more failures once we're close to the release, we should just close it.

@toothrot toothrot removed the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Jun 10, 2020
@aclements
Copy link
Member

Still no failures since 2020-04-27.

@andybons
Copy link
Member

Closing for now. If this pops up again we can reopen.

@golang golang locked and limited conversation to collaborators Jun 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

6 participants