Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: pcdata is -2 and 12 locals stack map entries error on nil pointer #40629

Closed
42wim opened this issue Aug 6, 2020 · 15 comments
Closed
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@42wim
Copy link

42wim commented Aug 6, 2020

What version of Go are you using (go version)?

$ go1.14.7 version
go version go1.14.7 linux/amd64

Does this issue reproduce with the latest release?

Not with 1.15rc1 (or 1.13.15)

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/wim/.cache/go-build"
GOENV="/home/wim/.config/go/env"
GOEXE=""
GOFLAGS="-mod=mod"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/wim/go"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/wim/sdk/go1.14.7"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/wim/sdk/go1.14.7/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="0"
GOMOD="/home/wim/lnx/git/other/matterircd/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build535344979=/tmp/go-build -gno-record-gcc-switches"

What did you do?

$ git clone https://github.com/42wim/matterircd.git
$ git checkout 685c4a3ed67d78cee4027d2e735586f824dd70c7
$ go1.14.7 build -mod vendor -trimpath

Run $ ./matterircd -debug
Matterircd will spawn an irc-server on 127.0.0.1:6667 by default, connect with an irc client, do /msg slack login 1 2 3

It crashes because i'm not checking for a nil pointer on u.credentials here: https://github.com/42wim/matterircd/blob/685c4a3ed67d78cee4027d2e735586f824dd70c7/mm-go-irckit/mmservice.go#L51

I'm reporting it because on go1.15rc1 and go1.13.15 it panics "the normal way" :)

What did you expect to see?

Normal panic

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x79c6e7]

goroutine 4 [running]:
github.com/42wim/matterircd/mm-go-irckit.login(0xc0001f2680, 0xc0001f2780, 0xc00012a050, 0x3, 0x3, 0xc0001a1068, 0x5)
        github.com/42wim/matterircd/mm-go-irckit/mmservice.go:51 +0xc7
github.com/42wim/matterircd/mm-go-irckit.(*User).handleServiceBot(0xc0001f2680, 0xc0001a1068, 0x5, 0xc0001f2780, 0xc000022180, 0xb)
        github.com/42wim/matterircd/mm-go-irckit/mmservice.go:367 +0x36c
created by github.com/42wim/matterircd/mm-go-irckit.CmdPrivMsg
        github.com/42wim/matterircd/mm-go-irckit/server_commands.go:373 +0x56b

What did you see instead?

runtime: pcdata is -2 and 12 locals stack map entries for github.com/42wim/matterircd/mm-go-irckit.login (targetpc=0x9240d0)
fatal error: bad symbol table

runtime stack:
runtime.throw(0xa8f4de, 0x10)
        runtime/panic.go:1116 +0x72
runtime.getStackMap(0x7ffc639818a8, 0x7ffc639819b0, 0x923801, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        runtime/stack.go:1247 +0x741
runtime.adjustframe(0x7ffc639818a8, 0x7ffc63981998, 0xf45e00)
        runtime/stack.go:637 +0x84
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000102480, 0x0, 0x0, 0x7fffffff, 0xac3030, 0x7ffc63981998, 0x0, ...)
        runtime/traceback.go:334 +0x110e
runtime.copystack(0xc000102480, 0x1000)
        runtime/stack.go:889 +0x291
runtime.newstack()
        runtime/stack.go:1044 +0x20b
runtime.morestack()
        runtime/asm_amd64.s:449 +0x8f

goroutine 34 [copystack]:
runtime.rawstring(0x40, 0xfa4300, 0xd0, 0x1c, 0xc000070b68, 0xc000070bd8)
        runtime/string.go:258 +0x81 fp=0xc000122b70 sp=0xc000122b68 pc=0x44df61
runtime.rawstringtmp(0x0, 0x40, 0x7fa325916a20, 0xd0010000203000, 0x7fa325916a20, 0xc000070c20, 0x4174dc)
        runtime/string.go:123 +0x6f fp=0xc000122bb0 sp=0xc000122b70 pc=0x44d8ef
runtime.concatstrings(0x0, 0xc000122c90, 0x2, 0x2, 0xb78780, 0x38)
        runtime/string.go:49 +0xae fp=0xc000122c48 sp=0xc000122bb0 pc=0x44d37e
runtime.concatstring2(0x0, 0xa8e729, 0xf, 0xab0e37, 0x31, 0xac3801, 0xb78780)
        runtime/string.go:58 +0x47 fp=0xc000122c88 sp=0xc000122c48 pc=0x44d5f7
runtime.errorString.Error(...)
        runtime/error.go:77
runtime.(*errorString).Error(0xf7b670, 0x9d8e40, 0xf7b670)
        <autogenerated>:1 +0x63 fp=0xc000122cd0 sp=0xc000122c88 pc=0x465ab3
runtime.preprintpanics(0xc000122da0)
        runtime/panic.go:671 +0xa8 fp=0xc000122d30 sp=0xc000122cd0 pc=0x431eb8
panic(0x9d8e40, 0xf7b670)
        runtime/panic.go:1062 +0x45b fp=0xc000122de8 sp=0xc000122d30 pc=0x432b6b
runtime.panicmem(...)
        runtime/panic.go:212
runtime.sigpanic()
        runtime/signal_unix.go:695 +0x3da fp=0xc000122e18 sp=0xc000122de8 pc=0x448a3a
github.com/42wim/matterircd/mm-go-irckit.login(0xc0000fc680, 0xc0000fc780, 0xc00013c050, 0x3, 0x3, 0xc0002a01e8, 0x5)
        github.com/42wim/matterircd/mm-go-irckit/mmservice.go:51 +0xc2 fp=0xc000122e80 sp=0xc000122e18 pc=0x923752                                         [41/9573]github.com/42wim/matterircd/mm-go-irckit.(*User).handleServiceBot(0xc0000fc680, 0xc0002a01e8, 0x5, 0xc0000fc780, 0xc0001280c0, 0xb)
        github.com/42wim/matterircd/mm-go-irckit/mmservice.go:367 +0x367 fp=0xc000122fb0 sp=0xc000122e80 pc=0x926137
runtime.goexit()
        runtime/asm_amd64.s:1373 +0x1 fp=0xc000122fb8 sp=0xc000122fb0 pc=0x461f81
created by github.com/42wim/matterircd/mm-go-irckit.CmdPrivMsg
        github.com/42wim/matterircd/mm-go-irckit/server_commands.go:373 +0x56d

goroutine 1 [select (no cases)]:
main.main()
        github.com/42wim/matterircd/main.go:111 +0xe88

goroutine 6 [select]:
github.com/desertbit/timer.timerRoutine()
        github.com/desertbit/timer@v0.0.0-20180107155436-c41aec40b27f/timers.go:119 +0x304
created by github.com/desertbit/timer.init.0
        github.com/desertbit/timer@v0.0.0-20180107155436-c41aec40b27f/timers.go:15 +0x35

goroutine 7 [IO wait]:
internal/poll.runtime_pollWait(0x7fa3258daf70, 0x72, 0x0)
        runtime/netpoll.go:203 +0x55
internal/poll.(*pollDesc).wait(0xc0000fc598, 0x72, 0x0, 0x0, 0xa82fdf)
        internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Accept(0xc0000fc580, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
        internal/poll/fd_unix.go:384 +0x1d4
net.(*netFD).accept(0xc0000fc580, 0xf8ce20, 0x7fa34c5a7108, 0x0)
        net/fd_unix.go:238 +0x42
net.(*TCPListener).accept(0xc00000ce40, 0x40ca88, 0x10, 0x9e02c0)
        net/tcpsock_posix.go:139 +0x32
net.(*TCPListener).Accept(0xc00000ce40, 0xc00007dd80, 0xb866a0, 0xc00000e140, 0xc00007dd70)
        net/tcpsock.go:261 +0x64
main.start(0xb80da0, 0xc00000ce40)
        github.com/42wim/matterircd/main.go:132 +0x78
main.main.func2()
        github.com/42wim/matterircd/main.go:108 +0x18b
created by main.main
        github.com/42wim/matterircd/main.go:94 +0xa86
goroutine 9 [IO wait]:
internal/poll.runtime_pollWait(0x7fa3258dae90, 0x72, 0xffffffffffffffff)
        runtime/netpoll.go:203 +0x55
internal/poll.(*pollDesc).wait(0xc0000fc618, 0x72, 0x1000, 0x1000, 0xffffffffffffffff)
        internal/poll/fd_poll_runtime.go:87 +0x45
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:92
internal/poll.(*FD).Read(0xc0000fc600, 0xc000276000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        internal/poll/fd_unix.go:169 +0x19b
net.(*netFD).Read(0xc0000fc600, 0xc000276000, 0x1000, 0x1000, 0xc0001bf4d0, 0xc0000ef100, 0x2)
        net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc00000e140, 0xc000276000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        net/net.go:184 +0x8e
bufio.(*Reader).fill(0xc00007efc0)
        bufio/bufio.go:100 +0x103
bufio.(*Reader).ReadSlice(0xc00007efc0, 0x435a0a, 0xc00010fda8, 0x45edd0, 0xc000001b00, 0x4, 0xc00010fdf0)
        bufio/bufio.go:359 +0x3d
bufio.(*Reader).ReadBytes(0xc00007efc0, 0x45c50a, 0xc00009a420, 0xc00010fea8, 0x4050c5, 0xc00009a420, 0xc00010ff58)
        bufio/bufio.go:438 +0x7a
bufio.(*Reader).ReadString(...)
        bufio/bufio.go:475
github.com/sorcix/irc.(*Decoder).Decode(0xc00000cec0, 0x0, 0x0, 0x0)
        github.com/sorcix/irc@v1.1.4/stream.go:80 +0x57
github.com/42wim/matterircd/mm-go-irckit.(*User).Decode(0xc0000fc680)
        github.com/42wim/matterircd/mm-go-irckit/user.go:204 +0x1e7
created by github.com/42wim/matterircd/mm-go-irckit.(*server).handshake
        github.com/42wim/matterircd/mm-go-irckit/server.go:369 +0x93

goroutine 10 [select]:
github.com/42wim/matterircd/mm-go-irckit.(*User).Decode.func1(0xc0000fc680, 0xc00027a360, 0x64, 0xc00009a420, 0xc00009a480)
        github.com/42wim/matterircd/mm-go-irckit/user.go:172 +0xfd
created by github.com/42wim/matterircd/mm-go-irckit.(*User).Decode
        github.com/42wim/matterircd/mm-go-irckit/user.go:170 +0x1c2

goroutine 11 [chan receive]:
github.com/42wim/matterircd/mm-go-irckit.(*server).handle(0xc000254340, 0xc0000fc680)
        github.com/42wim/matterircd/mm-go-irckit/server.go:332 +0x94
created by github.com/42wim/matterircd/mm-go-irckit.(*server).Connect
        github.com/42wim/matterircd/mm-go-irckit/server.go:247 +0xaa
@ianlancetaylor ianlancetaylor changed the title got runtime: pcdata is -2 and 12 locals stack map entries error on nil pointer runtime: pcdata is -2 and 12 locals stack map entries error on nil pointer Aug 6, 2020
@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Aug 6, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Aug 6, 2020
@ianlancetaylor
Copy link
Contributor

CC @aclements @mknyszek @prattmic

@randall77
Copy link
Contributor

I can reproduce. Here's what's happening:

  1. github.com/42wim/matterircd/mm-go-irckit.login is panicking due to a nil pointer exception.
  2. runtime.sigpanic is next on the stack. It then calls lots of other functions (runtime.panicmem, runtime.panic, runtime.preprintpanics, ...).
  3. All those calls run out of stack, causing a stack copy.
  4. When walking the stack to copy it, the copier tries to get a stack map for github.com/42wim/matterircd/mm-go-irckit.login, and can't find one for the faulting instruction.

I'm not really sure why it can't find a good stack map at that instruction. I've tried to hack up a reproducer for this, but I can't get step 4 to fail. I will investigate more.

@randall77 randall77 self-assigned this Aug 7, 2020
@randall77
Copy link
Contributor

This appears to be a bug in open-coded defers. When I turn them off, the problem goes away.

The underlying problem is that the call to runtime.deferreturn in the panicking function is marked as being a non-safepoint (that's the -2). It's marked that way because we never gave it any liveness information. With standard defers, the deferreturn call appears in the SSA representation, so it gets liveness information computed for it ~automatically. With open-coded defers the deferreturn call does not appear in the SSA representation, so it doesn't. There's some special case code to maintain the liveness information for the deferreturn call (grabbed from the first open-coded defer call?), but it appears to not be working in this case.

Looks like the tracking of LastDeferExit is broken. We keep track of the last defer exit call during SSA construction. Unfortunately, lots of stuff happens to the SSA graph in the meantime. What I think is happening in this particular case is that last defer exit call is found to be dead, it gets dead-code eliminated, its Value is reused for some other unrelated computation, and when we go to look up the liveness information for the last defer exit, we get some unrelated liveness info.

I think we need to compute the LastDeferExit field much later, after SSA is done.

@danscales

@randall77
Copy link
Contributor

This patch:

+++ b/src/cmd/compile/internal/ssa/func.go
@@ -274,6 +274,9 @@ func (f *Func) freeValue(v *Value) {
        if len(v.Args) != 0 {
                f.Fatalf("value %s still has %d args", v, len(v.Args))
        }
+       if v == f.LastDeferExit {
+               println("FREEING THE LASTDEFEREXIT")
+       }
        // Clear everything but ID (which we reuse).
        id := v.ID

triggers a bunch of times during make.bash. Both 1.14.2 and tip. Looks like we need to fix this for the release - I think we're just getting lucky that we don't stack copy or gc trace such cases normally (stack copy is only likely to happen with an unrecovered panic?), or that the random other instruction's liveness map is correct (or good enough).

@gopherbot please open a backport issue for 1.14.

1.13 is ok, as open-coded defers were released for 1.14.

@gopherbot
Copy link

Backport issue(s) opened: #40646 (for 1.13), #40647 (for 1.14).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@randall77
Copy link
Contributor

Here's a simple-ish reproducer:

package main

import "fmt"

const N = 40

func main() {
	var x [N]int // stack-allocated memory
	for i := range x {
		x[i] = 0x999
	}

	// This defer checks to see if x is uncorrupted.
	defer func(p *[N]int) {
		recover()
		for i := range p {
			if p[i] != 0x999 {
				for j := range p {
					fmt.Printf("p[%d]=0x%x\n", j, p[j])
				}
				panic("corrupted stack variable")
			}
		}
	}(&x)

	// This defer starts a new goroutine, which will (hopefully)
	// overwrite x on the garbage stack.
	defer func() {
		c := make(chan bool)
		go func() {
			useStack(1000)
			c <- true
		}()
		<-c

	}()

	// This defer causes a stack copy.
	// The old stack is now garbage.
	defer func() {
		useStack(1000)
	}()

	// Trigger a segfault.
	*g = 0

	// Make the return statement unreachable.
	// That makes the stack map at the deferreturn call empty.
	// In particular, the argument to the first defer is not
	// marked as a pointer, so it doesn't get adjusted
	// during the stack copy.
	for {
	}
}

var g *int64

func useStack(n int) {
	if n == 0 {
		return
	}
	useStack(n - 1)
}

@randall77
Copy link
Contributor

I think there's something fundamentally kinda broken with using the last defer as the point to grab the live variables for the deferreturn. With code like:

defer func(p *int){}(p)
if ... { return }
defer func(q *int){}(q)
*g = 0
for {}

There's only one return in this code, and it only has p in its live args, not q. But we need q live in case *g=0 panics.

I was thinking that we could look harder for the last return sequence, but I don't think that actually works in all cases. I think we need to mark the defer args slots live across the whole function (and zero them at entry).
At least, that seems the simplest fix at this stage of 1.15. Maybe there's something cleverer for 1.16.

@danscales
Copy link
Contributor

Yes, the setting of the stack map for the deferreturn stub is tricky, because it is not otherwise connected to the rest of the function. And, as you show, it is especially tricky when we have parts of the function that can never return (have an infinite loop or panic). Note there is already some extra code to deal with a related case - making sure that the defer args are kept alive in code that never returns but could have a panic: https://go-review.googlesource.com/c/go/+/204802 . This code does some forcing of liveness of defer arg slots in the blocks that never return. It could be useful to adapt for the method of fixing that you suggest. Note that we already zero any defer args in the entry block that have pointers in them.

It would be nice if we could just not do open-coded defers if a function has any code that cannot exit because of an infinite loop. Note the change above also determines which blocks cannot reach a return or an exit (panic) (hence must be because of an infinite loop). But that code is only in SSA after the genssa pass that creates the open-coded defers. I don't think it would be easy to write anything equivalent before generating SSA, would it?

@danscales
Copy link
Contributor

Since I am just heading out for vacation starting tomorrow for the next two weeks, and will have only partial connectivity (and mostly in the evenings), Keith was nice enough to say that he would work on the fix for this. I will join in as I can, and definitely plan to review the code for the fix.

@randall77
Copy link
Contributor

I think the OP's example did not use infinite loops. It just had a return that ended up being removed by deadcode (I haven't checked, but probably prove decided that some branch was impossible). If that happened to be the last return in program order, then the live map for the deferreturn is wrong. It's easier to repro with an infinite loop, but probably not necessary.
Of course, we could do some sort of post-optimization scan for still-present return sites, and pick one of those. Together with punting on code with infinite loops, that might work.
But detecting infinite loops at the time we need to make that decision is hard, you're right. An infinite loop can always manifest itself after some optimization.

@gopherbot
Copy link

Change https://golang.org/cl/247522 mentions this issue: cmd/compile: fix live variable computation for deferreturn

@randall77
Copy link
Contributor

@gopherbot, please open a backport issue for 1.15.

@randall77
Copy link
Contributor

@gopherbot is lazy. Did it myself.

@gopherbot
Copy link

Change https://golang.org/cl/248621 mentions this issue: cmd/compile: fix live variable computation for deferreturn

@gopherbot
Copy link

Change https://golang.org/cl/248622 mentions this issue: cmd/compile: fix live variable computation for deferreturn

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Sep 2, 2020
@dmitshur dmitshur modified the milestones: Backlog, Go1.16 Sep 2, 2020
gopherbot pushed a commit that referenced this issue Oct 5, 2020
…r deferreturn

Taking the live variable set from the last return point is problematic.
See #40629 for details, but there may not be a return point, or it may
be before the final defer.

Additionally, keeping track of the last call as a *Value doesn't quite
work. If it is dead-code eliminated, the storage for the Value is reused
for some other random instruction. Its live variable information,
if it is available at all, is wrong.

Instead, just mark all the open-defer argument slots as live
throughout the function. (They are already zero-initialized.)

Fixes #40742

Change-Id: Ie456c7db3082d0de57eaa5234a0f32525a1cce13
Reviewed-on: https://go-review.googlesource.com/c/go/+/247522
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dan Scales <danscales@google.com>
(cherry picked from commit 32a84c9)
Reviewed-on: https://go-review.googlesource.com/c/go/+/248621
Trust: Dmitri Shuralyov <dmitshur@golang.org>
gopherbot pushed a commit that referenced this issue Oct 5, 2020
…r deferreturn

Taking the live variable set from the last return point is problematic.
See #40629 for details, but there may not be a return point, or it may
be before the final defer.

Additionally, keeping track of the last call as a *Value doesn't quite
work. If it is dead-code eliminated, the storage for the Value is reused
for some other random instruction. Its live variable information,
if it is available at all, is wrong.

Instead, just mark all the open-defer argument slots as live
throughout the function. (They are already zero-initialized.)

Fixes #40647

Change-Id: Ie456c7db3082d0de57eaa5234a0f32525a1cce13
Reviewed-on: https://go-review.googlesource.com/c/go/+/247522
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dan Scales <danscales@google.com>
(cherry picked from commit 32a84c9)
Reviewed-on: https://go-review.googlesource.com/c/go/+/248622
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
claucece pushed a commit to claucece/go that referenced this issue Oct 22, 2020
…r deferreturn

Taking the live variable set from the last return point is problematic.
See golang#40629 for details, but there may not be a return point, or it may
be before the final defer.

Additionally, keeping track of the last call as a *Value doesn't quite
work. If it is dead-code eliminated, the storage for the Value is reused
for some other random instruction. Its live variable information,
if it is available at all, is wrong.

Instead, just mark all the open-defer argument slots as live
throughout the function. (They are already zero-initialized.)

Fixes golang#40742

Change-Id: Ie456c7db3082d0de57eaa5234a0f32525a1cce13
Reviewed-on: https://go-review.googlesource.com/c/go/+/247522
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dan Scales <danscales@google.com>
(cherry picked from commit 32a84c9)
Reviewed-on: https://go-review.googlesource.com/c/go/+/248621
Trust: Dmitri Shuralyov <dmitshur@golang.org>
@golang golang locked and limited conversation to collaborators Sep 2, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants