Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: Stack trace of bound method of embedded interface inside slice causes an infinite loop #51401

Closed
abhinav opened this issue Feb 28, 2022 · 10 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@abhinav
Copy link
Contributor

abhinav commented Feb 28, 2022

(Apologies for the overly long title, but I've only been able to isolate this to that specific case.)

What version of Go are you using (go version)?

Tested on 1.18rc1 and master.

% go version
go version go1.18rc1 darwin/arm64

% ../bin/go version
go version devel go1.19-936c7fbc1c 2022-02-28 22:24:58 +0000 linux/amd64

Does this issue reproduce with the latest release?

Not with the latest stable release (Go 1.17.7). Yes with the latest unstable release.

What operating system and processor architecture are you using (go env)?

Tested on macOS with Go 1.18rc1, and on Linux with 1.18rc1 and master.

go env Output

macOS

% go env
GO111MODULE=""
GOARCH="arm64"
GOBIN=""
GOCACHE="/Users/abg/Library/Caches/go-build"
GOENV="/Users/abg/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/abg/go/pkg/mod"
GONOPROXY="*.uber.internal"
GONOSUMDB="*.uber.internal"
GOOS="darwin"
GOPATH="/Users/abg/go"
GOPRIVATE="*.uber.internal"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/abg/.gimme/versions/go1.18rc1.darwin.arm64"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/abg/.gimme/versions/go1.18rc1.darwin.arm64/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.18rc1"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/abg/Uber/bound-method-loop-repro/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/fk/y8ktgnhs5pnd6kh4tsck5wqr0000gn/T/go-build2122832388=/tmp/go-build -gno-record-gcc-switches -fno-common"

Linux

% ../bin/go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/user/.cache/go-build"
GOENV="/home/user/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/user/go/pkg/mod"
GONOPROXY="*.uber.internal"
GONOSUMDB="*.uber.internal"
GOOS="linux"
GOPATH="/home/user/go"
GOPRIVATE="*.uber.internal"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/user/src/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/user/src/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="devel go1.19-936c7fbc1c 2022-02-28 22:24:58 +0000"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/user/src/go/repro/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3005549744=/tmp/go-build -gno-record-gcc-switches"

What did you do?

In the course of testing Go 1.18rc1 on our internal codebase, I ran into a test that timed out. Upon investigation, I was able to isolate the issue to code with roughly the following structure:
There are two interfaces Outer and Inner where Outer only embeds Inner. Given a reference to Outer, a function builds a slice with a bound reference to the method defined on Inner. A different function takes this slice and calls that bound method. The method attempts to capture its stack trace.

The following code reproduces this issue on Go 1.18rc1 and master.

package repro

import (
	"errors"
	"runtime"
	"testing"
)

type Outer interface{ Inner }

type impl struct{}

func New() Outer { return &impl{} }

type Inner interface {
	DoStuff() error
}

func (a *impl) DoStuff() error {
	return newError()
}

func newError() error {
	// Pretend this is an error library that captures a stack trace.
	stack := make([]uintptr, 50)
	length := runtime.Callers(2, stack[:])
	stack = stack[length:]
	_ = stack

	return errors.New("great sadness")
}

func TestRepro(t *testing.T) {
	funcs := listFuncs(New())
	for _, f := range funcs {
		_ = f()
	}
}

// Inlining this function into TestRepro resolves the issue.
func listFuncs(outer Outer) []func() error {
	return []func() error{outer.DoStuff}
}

https://go.dev/play/p/l7rdkzfwX3I?v=gotip

What did you expect to see?

The code to run and return instantly.

What did you see instead?

It loops forever.

If I terminate it with SIGQUIT (Ctrl-\) the core dump includes the following:

Core dump
=== RUN   TestRepro
SIGQUIT: quit
PC=0x10219acec m=0 sigcode=0

goroutine 0 [idle]:
runtime.pcdatavalue({0x10231d1d0, 0x1023270a0}, 0x2000, 0x16dcb29c8, 0x129113fff)
	/Users/abg/.gimme/versions/go1.18rc1.darwin.arm64/src/runtime/symtab.go:1069 +0xc
runtime.gentraceback(0x16dcb2cc8?, 0x10233e7a0?, 0x0?, 0x14000113380, 0x2, 0x14000044d58, 0x32, 0x0, 0xe9320001021adb08?, 0x0)
	/Users/abg/.gimme/versions/go1.18rc1.darwin.arm64/src/runtime/traceback.go:360 +0x1194
runtime.callers.func1()
	/Users/abg/.gimme/versions/go1.18rc1.darwin.arm64/src/runtime/traceback.go:903 +0x60
runtime.systemstack()
	/Users/abg/.gimme/versions/go1.18rc1.darwin.arm64/src/runtime/asm_arm64.s:237 +0x6c

goroutine 18 [running]:
runtime.systemstack_switch()
	/Users/abg/.gimme/versions/go1.18rc1.darwin.arm64/src/runtime/asm_arm64.s:194 +0x8 fp=0x14000044cc0 sp=0x14000044cb0 pc=0x1021aa9d8
runtime.callers(0x0?, {0x14000044d58?, 0x0?, 0x1021f9e60?})
	/Users/abg/.gimme/versions/go1.18rc1.darwin.arm64/src/runtime/traceback.go:902 +0x7c fp=0x14000044d30 sp=0x14000044cc0 pc=0x1021a487c
runtime.Callers(...)
	/Users/abg/.gimme/versions/go1.18rc1.darwin.arm64/src/runtime/extern.go:235
github.com/abhinav/bound-method-loop-repro.newError()
	/Users/abg/bound-method-loop-repro/getter_test.go:26 +0x50 fp=0x14000044ef0 sp=0x14000044d30 pc=0x1022300c0
github.com/abhinav/bound-method-loop-repro.(*impl).DoStuff(0x606522a492a?)
	/Users/abg/bound-method-loop-repro/getter_test.go:20 +0x20 fp=0x14000044f00 sp=0x14000044ef0 pc=0x102230050
github.com/abhinav/bound-method-loop-repro.listFuncs(...)
	/Users/abg/bound-method-loop-repro/getter_test.go:42
github.com/abhinav/bound-method-loop-repro.listFuncs(...)
	/Users/abg/bound-method-loop-repro/getter_test.go:42
github.com/abhinav/bound-method-loop-repro.listFuncs(...)
	/Users/abg/bound-method-loop-repro/getter_test.go:42
github.com/abhinav/bound-method-loop-repro.listFuncs(...)
[... snip ...]

Worth noting that the core dump also loops infinitely—or at least for 5 seconds (I didn't give it longer)—on reporting this line of code:

	return []func() error{outer.DoStuff}

Additional information

Worth noting that if I run the same with delve, it passes:

% dlv version
Delve Debugger
Version: 1.8.0
Build: $Id: 6a6c9c332d5354ddf1f8a2da3cc477bd18d2be53 $

% dlv test
Type 'help' for list of commands.
(dlv) c
PASS
Process 31721 has exited with status 0
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 28, 2022
@ianlancetaylor ianlancetaylor added this to the Go1.19 milestone Feb 28, 2022
@ianlancetaylor
Copy link
Contributor

Passes with 1.17 and gccgo, fails with 1.18.

CC @aclements @prattmic @mknyszek

@cherrymui
Copy link
Member

The method wrapper Outer.DoStuff-fm has an inline tree attached to it saying the whole function body is inlined from listFuncs, which is clearly incorrect.

The infinite loop comes from that in that malformed inline tree, the "parent PC" of the entry of Outer.DoStuff-fm points to itself.

@cherrymui
Copy link
Member

The method wrapper is generated at https://cs.opensource.google/go/go/+/master:src/cmd/compile/internal/walk/closure.go;l=238 , which sets Pos conditionally. For this case the condition seems false, which leaves base.Pos whatever it is before.

  • Why we don't set the Pos to AutogeneratedPos, like most compiler-generated functions do?
  • While the comment says "Set line number equal to the line number where the method is declared", the code seems to set it to where it is referenced (the selection expression)? Why?
  • (If we do want to use the Pos of that selection expression) why that selection expression has unknown Pos?

cc @mdempsky

@mdempsky
Copy link
Member

mdempsky commented Mar 1, 2022

Why we don't set the Pos to AutogeneratedPos, like most compiler-generated functions do?

I think just historical. If changing it to AutogeneratedPos works and fixes this issue, I think doing that instead sounds good.

Edit: I think maybe it does affect where some diagnostic messages get printed, which then affects test/run.go test expectations? I feel like that's why I always punt on trying to fix it. I've definitely looked at this code several times in the past and thought the position handling was sketchy.

While the comment says "Set line number equal to the line number where the method is declared", the code seems to set it to where it is referenced (the selection expression)? Why?

dot.Selection points to the selected method itself, not the AST identifier's position. (Were you thinking of go/ast's SelectorExpr.Sel?)

@cherrymui
Copy link
Member

Thanks. Do you know a case where using the Pos of the declaration produces better diagnoses? all.bash passes if I change it to use AutogeneratedPos. Or perhaps we could take a hybrid approach, where we use the declaration Pos if it is known, otherwise AutogeneratedPos? In any case, it should not be a Pos that has inline information.

dot.Selection points to the selected method itself, not the AST identifier's position. (Were you thinking of go/ast's SelectorExpr.Sel?)

Thanks for the clarification. Yeah, I was thinking something like that.

@gopherbot
Copy link

Change https://go.dev/cl/388794 mentions this issue: cmd/compile: use AutogeneratedPos for method value wrapper

@cherrymui
Copy link
Member

The CL above tries to use AutogeneratedPos. It seems to work. Not sure if this is the best solution, though.

@mdempsky
Copy link
Member

mdempsky commented Mar 1, 2022

If the trybots are happy with AutogeneratedPos, then I'm happy. :)

@abhinav
Copy link
Contributor Author

abhinav commented Mar 2, 2022

@cherrymui @mdempsky, thanks for the quick diagnosis and fix!

This issue appears to be tagged for the Go 1.19 milestone.
If I think this fix should be included in 1.18, should I create a separate backport request?
Thanks!

@gopherbot
Copy link

Change https://go.dev/cl/388918 mentions this issue: [release-branch.go1.18] cmd/compile: use AutogeneratedPos for method value wrapper

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Mar 4, 2022
@dmitshur dmitshur modified the milestones: Go1.19, Go1.18 Mar 4, 2022
gopherbot pushed a commit that referenced this issue Mar 4, 2022
…value wrapper

We use AutogeneratedPos for most compiler-generated functions. But
for method value wrappers we currently don't. Instead, we use the
Pos for their (direct) declaration if there is one, otherwise
not set it in methodValueWrapper, which will probably cause it to
inherit from the caller, i.e. the Pos of that method value
expression. If that Pos has inline information, it will cause the
method wrapper to have bogus inline information, which could lead
to infinite loop when printing a stack trace.

Change it to use AutogeneratedPos instead.

Fixes #51401.

Change-Id: I398dfe85f9f875e1fd82dc2f489dab63ada6570d
Reviewed-on: https://go-review.googlesource.com/c/go/+/388794
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
(cherry picked from commit b0db2f0)
Reviewed-on: https://go-review.googlesource.com/c/go/+/388918
Trust: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
@golang golang locked and limited conversation to collaborators Mar 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants