Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: marked free object in span #44614

Closed
leoleoasd opened this issue Feb 25, 2021 · 14 comments
Closed

runtime: marked free object in span #44614

leoleoasd opened this issue Feb 25, 2021 · 14 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@leoleoasd
Copy link

leoleoasd commented Feb 25, 2021

What version of Go are you using (go version)?

$ go version
go version go1.16 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/leo/.cache/go-build"
GOENV="/home/leo/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/leo/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/leo/go"
GOPRIVATE=""
GOPROXY="https://goproxy.io,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/leo/EduOJBackend/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3374132961=/tmp/go-build -gno-record-gcc-switches"

What did you do?

This issue is introduced in this commit, causing CI reports "runtime: marked free object in span" or "fatal error: sweep increased allocation count" randomly. Yes, this commit's GitHub action fails for another reason. But testing it locally will report a runtime error.

Testing locally with GODEBUG=cgocheck=2 GOGC=2 go test -gcflags=-d=checkptr ./... -race will add probilities to this error.

Are we using unsafe package?

Yes, we are. But since go's StringBuilder itself is also using it, and don't use it doesn't help either, I don't think the problem is caused by using of unsafe package.

Adding GODEBUG=cgocheck=2 and -gcflags=-d=checkptr (see this commit) doesn't help either.

Errors exist under multiple go versions (1.14, 1.15, 1.16) and os (ubuntu, macOS, ubuntu), but adding GOGC=2 in go 1.16 ubuntu will add probabilities to this error. (on my machine, 100%)

What did you expect to see?

No error because we use unsafe package the same as strings.Builder.String() did, and remove it doesn't help.

What did you see instead?

runtime: marked free object in span 0x7f4ce673e548, elemsize=64 freeindex=7 (bad use of unsafe.Pointer? try -d=checkptr)
0xc000456000 alloc marked
0xc000456040 alloc marked
0xc000456080 alloc marked
0xc0004560c0 alloc marked
0xc000456100 alloc marked
0xc000456140 alloc marked
0xc000456180 alloc marked
0xc0004561c0 alloc marked
0xc000456200 free  unmarked
0xc000456240 free  marked   zombie
000000c000456240:  7124303124613224  41623436676b7848
000000c000456250:  2f494e6774593238  6f45782e48332e6c
000000c000456260:  746f4f4f784d575a  6d47466f5a63527a
000000c000456270:  6e31796256584d6c  0000000053416a33
0xc000456280 free  unmarked
0xc0004562c0 free  unmarked
0xc000456300 free  unmarked
0xc000456340 free  unmarked

Full error reports can be found at our repo's GitHub Actions

@seankhliao seankhliao added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 25, 2021
@randall77
Copy link
Contributor

I can reproduce

git clone https://github.com/leoleoasd/EduOJBackend
cd EduOJBackend
git checkout 0764cbfd0f8eaea90414c1e833651e8c961b64e5
GODEBUG=cgocheck=2 GOGC=2 go test -gcflags=-d=checkptr ./... -race

The data in your zombie object looks like gibberish to me: $2a$10$qHxkg64bA82YtgNI/l.3H.xEoZWMxOOotzRcZoFGmlMXVby1n3jAS. But in my crash, it contains test_get_run_comparer_output_fail_submit_user_0. So it is almost certainly the backing store of a string.
That string look familiar?

@leoleoasd
Copy link
Author

leoleoasd commented Feb 25, 2021

@randall77 $2a$10$qHxkg64bA82YtgNI/l.3H.xEoZWMxOOotzRcZoFGmlMXVby1n3jAS is output of a password hashing algorithm (bcrypt), and should be at user struct's password field

User with username test_get_run_comparer_output_fail_submit_user_0 is constructed at app/controller/submission_test.go

@leoleoasd
Copy link
Author

leoleoasd commented Feb 25, 2021

Checked every password used in tests, test_get_run_compiler_output_fail_user_0_pwd has the hash mentioned above.
User with that password is constructed app/controller/submission_test.go#L714

@randall77
Copy link
Contributor

Some more info

Compile with

~/go1.16/bin/go test -c ./app/controller

Run with

GOGC=2 ./controller.test -test.run=GetRunCompilerOutput -test.count=10

Will fairly regularly crash with a bad pointer found in the stack frame of app/controller_test.TestGetRunCompilerOutput at offset 0x2f8.

It looks like the slot at 0x2f8 is initialized from the results of a Echo.Reverse call. But those calls only happen later in the function than where the failure occurs (Echo.Reverse calls are >= line 730, whereas the failure occurs when at line 722). Which makes me think the slot in question is not yet initialized but is somehow marked as live.

Still investigating.

@randall77
Copy link
Contributor

randall77 commented Feb 27, 2021

This is a nasty compiler liveness bug.

The bug in the OP's code is in app/controller/submission_test.go:createSubmissionForTest. The user argument is never walked during GC, but is returned inside the result struct. Subsequent uses of user by the caller reference already-freed values.

I think a simple workaround would be to use unnamed return values. Use submission := ... and return submission. I think the bug can only happen for named return values.

Here's a reproducer.

package main

import (
	"runtime"
	"strings"
)

type T struct {
	str string
}

func main() {
	// Allocate something on the heap.
	str := strings.Repeat("x", 1024)

	// Allocate a stack object that points to str.
	t := &T{str: str}

	// Pass a pointer to the stack object to g.
	s := g(t)

	// At this point, neither str or t are statically live.
	// They should only be held onto by the reference in s.

	if s.t.str[0] != 'x' {
		panic("bad")
	}
}

type S struct {
	t *T
}

//go:noinline
func g(t *T) (s S) {
	s = S{t: t}

	// s escapes, so it is moved to the heap.
	escape(&s)

	// t is stored in the heap copy of s, but it is a stack pointer.
	// We need that t pointer to keep str alive, and we have to find it during
	// stack scanning. Having it live in the heap means the stack scan of this
	// goroutine will not find the stack object at *t, and hence not keep
	// str alive.
	runtime.GC() // This collects str!

	// At this return the heap copy of s is copied back to the stack, so main can
	// read it (and find t, even though we never scanned *t).
	return
}

//go:noinline
func escape(x interface{}) {
	sink = x
	sink = nil
}

var sink interface{}
$ GODEBUG=clobberfree=1 go run ~/gowork/issue44614.go
panic: bad

goroutine 1 [running]:
main.main()
	/Users/khr/gowork/issue44614.go:26 +0xb7
exit status 2

The problem here is that when we move a return value to the heap, it can still contain pointers into the stack. That's a big no-no, as we won't find any stack objects that those stack pointers point to.
I think this is just an escape analysis bug. Making t escape should fix it.
@mdempsky

@randall77 randall77 self-assigned this Feb 27, 2021
@randall77 randall77 added this to the Go1.17 milestone Feb 27, 2021
@randall77
Copy link
Contributor

This bug appears from 1.14 onwards. 1.13 and earlier are unaffected.

@randall77
Copy link
Contributor

@gopherbot please open backport issues.

@gopherbot
Copy link

Backport issue(s) opened: #44658 (for 1.15), #44659 (for 1.16).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

leoleoasd added a commit to EduOJ/backend that referenced this issue Feb 27, 2021
leoleoasd added a commit to EduOJ/backend that referenced this issue Feb 27, 2021
@leoleoasd
Copy link
Author

Thanks a lot! @randall77

@mdempsky
Copy link
Member

@randall77 Thanks for minimizing the reproducer. I agree this looks like an escape analysis issue.

I think the problem is in location.leakTo, that if a parameter flows to a result parameter that itself needs to be heap allocated, we need to record that as a heap leak rather than a result leak. We probably just need to include !sink.escapes in that check.

@suntt2019
Copy link

Thank you so much @randall77

leoleoasd added a commit to EduOJ/backend that referenced this issue Feb 27, 2021
@mdempsky
Copy link
Member

Further minimized repro:

$ cat test.go
package p

func f(p *int) (r *int) {
	sink = &r
	return p
}

var sink interface{}

$ go tool compile -m -l test.go
test.go:3:8: leaking param: p to result r level=0
test.go:3:17: moved to heap: r

It should be simply leaking param: p without to result r.

@mdempsky
Copy link
Member

I suspect the culprit CL that caused the regression in Go 1.14 is 2620467.

@gopherbot
Copy link

Change https://golang.org/cl/297289 mentions this issue: cmd/compile: fix escape analysis of heap-allocated results

@golang golang locked and limited conversation to collaborators Feb 27, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

6 participants