Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: post-mortem analysis of Go 1.13 runtime/mgcsweep.go fatal error ("error: non in-use span in unswept list") #58681

Open
SuoXC opened this issue Feb 24, 2023 · 6 comments
Labels
Documentation NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Thinking
Milestone

Comments

@SuoXC
Copy link

SuoXC commented Feb 24, 2023

What version of Go are you using (go version)?

$ go version
go version go1.13.3 linux/amd64


Does this issue reproduce with the latest release?

No

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/root/.cache/go-build"
GOENV="/root/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB="no"
GOOS="linux"
GOPATH="/root/go"
GOPRIVATE=""
GOPROXY="https://goproxy.cn,direct"
GOROOT="/usr/lib/golang"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build603283197=/tmp/go-build -gno-record-gcc-switches"

What did you do?

We have a machine running Docker containers in production, and Docker Engine reported an error that the main process had exited without any operation being performed. The error message is as follows:

error

From the error message, it seems that it may be related to the memory cleanup operation of the runtime. We looked at the relevant code as follows:

code

After the error occurred, the entire dockerd service crashed and dumped the core. We made some guesses based on the error log, but we have not yet come to a conclusion.

To summarize the reproduction conditions:

When the span is taken out of heap.sweepSpan, its status is mSpanFree.
The sweepgen of the span equals heap.sweepgen - 1, indicating that the span is being swept.

We have some guesses about this:

It is possible that when sweepone is executed concurrently, different threads take the same span from heap.sweepSpan.
It is possible that a span failed to be swept and was later inserted back into heap.sweepSpan with an incorrect state.
It is possible that the span returned by alloc_m was modified by another thread after being added to heap.sweepgen.

However, all of these guesses were ruled out later when we reviewed the code, and because we are unable to reproduce this bug, we have made no progress so far. We would be very grateful if anyone could help us with the analysis. If more information is needed, we can provide it at any time.

What did you expect to see?

No GC panic error in dockerd.

What did you see instead?

A GC panic error in dockerd, as described above. If possible, please help us analyze this issue. The Go version used is quite old. If this issue has been resolved in a higher version, we would like to know how it was resolved and whether any relevant code or patches can be provided.

full-stack.log

@sten4eg
Copy link

sten4eg commented Feb 24, 2023

this like 28003 ?

@SuoXC
Copy link
Author

SuoXC commented Feb 24, 2023

@sten4eg Thanks for your reply. These two issues look similar, but they are not the same. In this issue, the sweepgen status is span.sweepgen = heap.sweepgen - 1, while the bug you mentioned is not. Its solution is to modify the judgment condition and let span.sweepgen = heap.sweepgen + 3 be considered a normal phenomenon. However, in the bug I reported, the sweepgen is still abnormal.

this like 28003 ?

@davidgao2021
Copy link

this like 28003 ?

@sten4eg thank you for your response. i m the friend of suoxc. we met the problem described above
from the log, i guess 28003 is not the same problem. because:

runtime: bad span s.state=3 s.sweepgen=1115595 sweepgen=1115596
mspan.sweepgen == mheap.sweepgen - 1

but 28003 is:
runtime: bad span s.state=3 s.sweepgen=7 sweepgen=4
mspan.sweepgen == mheap.sweepgen + 3

by the way:
dockerd: 18.09
golang: 1.13.3

@thanm
Copy link
Contributor

thanm commented Feb 24, 2023

Hi @SuoXC , per Go's release policy we provide active support for the current release and two previous releases-- you're working with Go 1.13, which is several years outside of the support window.

Have you tried using a more recent Go release, to see if that solves your problem? Thanks..

@thanm thanm added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Feb 24, 2023
@SuoXC
Copy link
Author

SuoXC commented Feb 28, 2023

Hi @SuoXC , per Go's release policy we provide active support for the current release and two previous releases-- you're working with Go 1.13, which is several years outside of the support window.

Have you tried using a more recent Go release, to see if that solves your problem? Thanks..

We understand that, according to the current maintenance policy of Go, it is not possible to provide support for older versions. Therefore, we attempted to compile a version of Docker using a higher version of Go to see if it could solve the problem. However, since this bug is a rare occurrence, we have not been able to reproduce it again until now. The purpose of raising this issue here is primarily to confirm whether the community has any experience in dealing with similar bugs. We will continue to try to trace and find the root cause of this issue and see if it has been resolved in the newer versions. If not, we can also contribute to the Golang community by sharing our findings.

@thanm
Copy link
Contributor

thanm commented Feb 28, 2023

OK, SGTM, thanks. I will label the issue "Documentation" to make it clear that this is about understanding "what went wrong with Go 1.13" as opposed to bringing up an issue/problem with the current version of Go.

@thanm thanm added Documentation Thinking and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Feb 28, 2023
@thanm thanm changed the title runtime/mgcsweep.go throw a fatal error: non in-use span in unswept list post-mortem analysis of Go 1.13 runtime/mgcsweep.go fatal error ("error: non in-use span in unswept list") Feb 28, 2023
@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 28, 2023
@dmitshur dmitshur modified the milestones: Unreleased, Unplanned Feb 28, 2023
@seankhliao seankhliao changed the title post-mortem analysis of Go 1.13 runtime/mgcsweep.go fatal error ("error: non in-use span in unswept list") runtime: post-mortem analysis of Go 1.13 runtime/mgcsweep.go fatal error ("error: non in-use span in unswept list") Mar 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Thinking
Projects
None yet
Development

No branches or pull requests

5 participants