Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: plan9 fatal error: invalid pointer found on stack #62507

Open
bradfitz opened this issue Sep 7, 2023 · 12 comments
Open

cmd/compile: plan9 fatal error: invalid pointer found on stack #62507

bradfitz opened this issue Sep 7, 2023 · 12 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@bradfitz
Copy link
Contributor

bradfitz commented Sep 7, 2023

What version of Go are you using (go version)?

Go 1.21.0

Does this issue reproduce with the latest release?

Haven't yet tried Go 1.21.1.

What operating system and processor architecture are you using (go env)?

plan9/amd64

What did you do?

A Plan 9 user (@9nut) sent me this crash trying to run Tailscale (go install tailscale.com/cmd/tailscaled at tailscale/tailscale@6fd1961). I don't have enough Plan 9 knowledge to debug or repro:

term% ./tailscaled --statedir $home/lib/tailscale
logtail started
Program starting: v1.49.0-dev20230830-tae747a2e4-dirty, Go 1.21.0: []string{"./tailscaled", "--statedir", "/usr/glenda/lib/tailscale"}
LogID: c07ea905679a33eba472be24f4634bb25d3f96ef1d7bb6fe2600cf86dd17c90a
logpolicy: using UserCacheDir, "/usr/glenda/lib/cache/Tailscale"
runtime: bad pointer in frame tailscale.com/logtail.(*Logger).Write at 0x480f59c0: 0x42
fatal error: invalid pointer found on stack

runtime stack:
runtime.throw({0xac1578?, 0x105cda?})
	/Users/fst/go/src/runtime/panic.go:1077 +0x65 fp=0x7fffffffe930 sp=0x7fffffffe900 pc=0x237885
runtime.adjustpointers(0x7fffffffebb8?, 0x7fffffffea00, 0x0?, {0x0?, 0x0?})
	/Users/fst/go/src/runtime/stack.go:627 +0x205 fp=0x7fffffffe9a0 sp=0x7fffffffe930 pc=0x24d6a5
runtime.adjustframe(0x7fffffffebb8, 0x7fffffffea98)
	/Users/fst/go/src/runtime/stack.go:684 +0xdb fp=0x7fffffffea30 sp=0x7fffffffe9a0 pc=0x24d7db
runtime.copystack(0x4817e680, 0x800000002?)
	/Users/fst/go/src/runtime/stack.go:935 +0x2ae fp=0x7fffffffed28 sp=0x7fffffffea30 pc=0x24df6e
runtime.newstack()
	/Users/fst/go/src/runtime/stack.go:1116 +0x7e7 fp=0x7fffffffeed0 sp=0x7fffffffed28 pc=0x24e887
runtime.morestack()
	/Users/fst/go/src/runtime/asm_amd64.s:593 +0x93 fp=0x7fffffffeed8 sp=0x7fffffffeed0 pc=0x2669d3

goroutine 9 [copystack]:
fmt.(*pp).printArg(0x4807e820?, {0x955f20?, 0x4801a498?}, 0x73?)
	/Users/fst/go/src/fmt/print.go:681 +0x6f7 fp=0x480f5540 sp=0x480f5538 pc=0x300db7
fmt.(*pp).doPrintf(0x4807e820, {0xaa9d03, 0x2}, {0x480f58d8?, 0x1, 0x1})
	/Users/fst/go/src/fmt/print.go:1077 +0x39e fp=0x480f5638 sp=0x480f5540 pc=0x3037fe
fmt.Appendf({0x481ae000, 0x0, 0x90}, {0xaa9d03, 0x2}, {0x4806f8d8, 0x1, 0x1})
	/Users/fst/go/src/fmt/print.go:249 +0x7a fp=0x480f5698 sp=0x480f5638 pc=0x2fdaba
tailscale.com/logpolicy.logWriter.Write.(*Logger).Printf.func1({0x481ae000?, 0x4801a3c0?, 0x0?})
	/Users/fst/go/src/log/log.go:269 +0x2c fp=0x480f56e8 sp=0x480f5698 pc=0x6de14c
log.(*Logger).output(0x48118750, 0x0, 0x90?, 0x4806f8e8)
	/Users/fst/go/src/log/log.go:238 +0x36a fp=0x480f58b0 sp=0x480f56e8 pc=0x424cea
log.(*Logger).Printf(...)
	/Users/fst/go/src/log/log.go:268
tailscale.com/logpolicy.logWriter.Write({0x481ae090?}, {0x481ae090?, 0x3c, 0x480469b0?})
	/Users/fst/src/tailscale/logpolicy/logpolicy.go:195 +0xba fp=0x480f5928 sp=0x480f58b0 pc=0x6de0da
tailscale.com/logtail.(*Logger).Write(0x480bafc0, {0x481ae090?, 0x0?, 0x0?})
	/Users/fst/src/tailscale/logtail/logtail.go:741 +0x16b fp=0x480f59d8 sp=0x480f5928 pc=0x5e49ab
log.(*Logger).output(0x4803e3f0, 0x0, 0x4e3485?, 0x4806fbc0)
	/Users/fst/go/src/log/log.go:245 +0x483 fp=0x480f5ba0 sp=0x480f59d8 pc=0x424e03
log.Printf({0xacb8cc?, 0x48119080?}, {0x480469b0?, 0x1264420?, 0x10?})
	/Users/fst/go/src/log/log.go:397 +0x6d fp=0x480f5c00 sp=0x480f5ba0 pc=0x42504d
tailscale.com/types/logger.RateLimitedFnWithClock.func1({0xacb8cc, 0x27}, {0x480469b0, 0x1, 0x1})
	/Users/fst/src/tailscale/types/logger/logger.go:220 +0x7db fp=0x480f5cd8 sp=0x480f5c00 pc=0x4e24bb
main.createEngine(0x481239a0, 0x4807da40?)
	/Users/fst/src/tailscale/cmd/tailscaled/tailscaled.go:571 +0x17e fp=0x480f5d90 sp=0x480f5cd8 pc=0x8f641e
main.getLocalBackend({0x0?, 0x0?}, 0x481239a0, {0xc0, 0x7e, 0xa9, 0x5, 0x67, 0x9a, 0x33, ...}, ...)
	/Users/fst/src/tailscale/cmd/tailscaled/tailscaled.go:481 +0x11a fp=0x480f5ed0 sp=0x480f5d90 pc=0x8f4afa
main.startIPNServer.func2()
	/Users/fst/src/tailscale/cmd/tailscaled/tailscaled.go:447 +0x28b fp=0x480f5fe0 sp=0x480f5ed0 pc=0x8f476b
runtime.goexit()
	/Users/fst/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0x480f5fe8 sp=0x480f5fe0 pc=0x268701
created by main.startIPNServer in goroutine 1
	/Users/fst/src/tailscale/cmd/tailscaled/tailscaled.go:435 +0x4c5

goroutine 1 [runnable]:
context.WithValue({0xbe5ad8?, 0x481239f0?}, {0x9856a0?, 0x1250ca0?}, {0xa84a60?, 0x481ba000?})
	/Users/fst/go/src/context/context.go:708 +0x192 fp=0x480f3a88 sp=0x480f3a80 pc=0x28ca72
net/http.(*Server).Serve(0x481ba000, {0xbe49a0, 0x48048400})
	/Users/fst/go/src/net/http/server.go:3054 +0x30b fp=0x480f3bb8 sp=0x480f3a88 pc=0x4a1feb
tailscale.com/ipn/ipnserver.(*Server).Run(0x4804e900, {0xbe5ad8?, 0x481239f0}, {0xbe49a0?, 0x48048400})
	/Users/fst/src/tailscale/ipn/ipnserver/server.go:521 +0x3bf fp=0x480f3c80 sp=0x480f3bb8 pc=0x8694df
main.startIPNServer({0xbe5a30, 0x12956a0}, 0x481239a0, {0xc0, 0x7e, 0xa9, 0x5, 0x67, 0x9a, 0x33, ...}, ...)
	/Users/fst/src/tailscale/cmd/tailscaled/tailscaled.go:457 +0x4e6 fp=0x480f3d78 sp=0x480f3c80 pc=0x8f4286
main.run()
	/Users/fst/src/tailscale/cmd/tailscaled/tailscaled.go:396 +0x49f fp=0x480f3e90 sp=0x480f3d78 pc=0x8f393f
main.main()
	/Users/fst/src/tailscale/cmd/tailscaled/tailscaled.go:239 +0x8dd fp=0x480f3f38 sp=0x480f3e90 pc=0x8f31bd
runtime.main()
	/Users/fst/go/src/runtime/proc.go:267 +0x2fb fp=0x480f3fe0 sp=0x480f3f38 pc=0x239e7b
runtime.goexit()
	/Users/fst/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0x480f3fe8 sp=0x480f3fe0 pc=0x268701

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/fst/go/src/runtime/proc.go:398 +0xce fp=0x48071fa0 sp=0x48071f80 pc=0x23a2ee
runtime.goparkunlock(...)
	/Users/fst/go/src/runtime/proc.go:404
runtime.forcegchelper()
	/Users/fst/go/src/runtime/proc.go:322 +0xa7 fp=0x48071fe0 sp=0x48071fa0 pc=0x23a127
runtime.goexit()
	/Users/fst/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0x48071fe8 sp=0x48071fe0 pc=0x268701
created by runtime.init.6 in goroutine 1
	/Users/fst/go/src/runtime/proc.go:310 +0x1a

goroutine 3 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/fst/go/src/runtime/proc.go:398 +0xce fp=0x48072f78 sp=0x48072f58 pc=0x23a2ee
runtime.goparkunlock(...)
	/Users/fst/go/src/runtime/proc.go:404
runtime.bgsweep(0x48052150?)
	/Users/fst/go/src/runtime/mgcsweep.go:280 +0x94 fp=0x48072fc8 sp=0x48072f78 pc=0x2250b4
runtime.gcenable.func1()
	/Users/fst/go/src/runtime/mgc.go:200 +0x25 fp=0x48072fe0 sp=0x48072fc8 pc=0x219fc5
runtime.goexit()
	/Users/fst/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0x48072fe8 sp=0x48072fe0 pc=0x268701
created by runtime.gcenable in goroutine 1
	/Users/fst/go/src/runtime/mgc.go:200 +0x66

goroutine 4 [GC scavenge wait]:
runtime.gopark(0x48052150?, 0xbd70b0?, 0x1?, 0x0?, 0x48006d00?)
	/Users/fst/go/src/runtime/proc.go:398 +0xce fp=0x48073f70 sp=0x48073f50 pc=0x23a2ee
runtime.goparkunlock(...)
	/Users/fst/go/src/runtime/proc.go:404
runtime.(*scavengerState).park(0x12646c0)
	/Users/fst/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0x48073fa0 sp=0x48073f70 pc=0x222909
runtime.bgscavenge(0x0?)
	/Users/fst/go/src/runtime/mgcscavenge.go:653 +0x3c fp=0x48073fc8 sp=0x48073fa0 pc=0x222efc
runtime.gcenable.func2()
	/Users/fst/go/src/runtime/mgc.go:201 +0x25 fp=0x48073fe0 sp=0x48073fc8 pc=0x219f65
runtime.goexit()
	/Users/fst/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0x48073fe8 sp=0x48073fe0 pc=0x268701
created by runtime.gcenable in goroutine 1
	/Users/fst/go/src/runtime/mgc.go:201 +0xa5

goroutine 5 [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/Users/fst/go/src/runtime/proc.go:398 +0xce fp=0x4806ce28 sp=0x4806ce08 pc=0x23a2ee
runtime.runfinq()
	/Users/fst/go/src/runtime/mfinal.go:193 +0x107 fp=0x4806cfe0 sp=0x4806ce28 pc=0x219027
runtime.goexit()
	/Users/fst/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0x4806cfe8 sp=0x4806cfe0 pc=0x268701
created by runtime.createfing in goroutine 1
	/Users/fst/go/src/runtime/mfinal.go:163 +0x3d

goroutine 6 [select]:
runtime.gopark(0x48070e08?, 0x2?, 0x2?, 0x0?, 0x48070dd4?)
	/Users/fst/go/src/runtime/proc.go:398 +0xce fp=0x48070c40 sp=0x48070c20 pc=0x23a2ee
runtime.selectgo(0x48070e08, 0x48070dd0, 0xd4?, 0x0, 0x1201ec0?, 0x1)
	/Users/fst/go/src/runtime/select.go:327 +0x725 fp=0x48070d60 sp=0x48070c40 pc=0x249ce5
tailscale.com/logtail.(*Logger).drainBlock(...)
	/Users/fst/src/tailscale/logtail/logtail.go:287
tailscale.com/logtail.(*Logger).drainPending(0x480bafc0, {0x48041000?, 0x1000?, 0x1296ac0?})
	/Users/fst/src/tailscale/logtail/logtail.go:317 +0x1c5 fp=0x48070e60 sp=0x48070d60 pc=0x5e1c85
tailscale.com/logtail.(*Logger).uploading(0x480bafc0, {0xbe5ad8, 0x48123900})
	/Users/fst/src/tailscale/logtail/logtail.go:361 +0xcf fp=0x48070fb8 sp=0x48070e60 pc=0x5e20af
tailscale.com/logtail.NewLogger.func3()
	/Users/fst/src/tailscale/logtail/logtail.go:163 +0x28 fp=0x48070fe0 sp=0x48070fb8 pc=0x5e1768
runtime.goexit()
	/Users/fst/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0x48070fe8 sp=0x48070fe0 pc=0x268701
created by tailscale.com/logtail.NewLogger in goroutine 1
	/Users/fst/src/tailscale/logtail/logtail.go:163 +0xa0c

goroutine 7 [syscall]:
runtime.notetsleepg(0x129b8e0?, 0x0?)
	/Users/fst/go/src/runtime/lock_sema.go:294 +0x29 fp=0x4806dfa0 sp=0x4806df58 pc=0x20b789
os/signal.signal_recv()
	/Users/fst/go/src/runtime/sigqueue_plan9.go:110 +0x52 fp=0x4806dfc0 sp=0x4806dfa0 pc=0x264d92
os/signal.loop()
	/Users/fst/go/src/os/signal/signal_plan9.go:27 +0x13 fp=0x4806dfe0 sp=0x4806dfc0 pc=0x852193
runtime.goexit()
	/Users/fst/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0x4806dfe8 sp=0x4806dfe0 pc=0x268701
created by os/signal.Notify.func1.1 in goroutine 1
	/Users/fst/go/src/os/signal/signal.go:151 +0x1f

goroutine 8 [select]:
runtime.gopark(0x4806efb0?, 0x2?, 0x0?, 0x0?, 0x4806ef7c?)
	/Users/fst/go/src/runtime/proc.go:398 +0xce fp=0x4806ee20 sp=0x4806ee00 pc=0x23a2ee
runtime.selectgo(0x4806efb0, 0x4806ef78, 0x0?, 0x0, 0x0?, 0x1)
	/Users/fst/go/src/runtime/select.go:327 +0x725 fp=0x4806ef40 sp=0x4806ee20 pc=0x249ce5
main.startIPNServer.func1()
	/Users/fst/src/tailscale/cmd/tailscaled/tailscaled.go:420 +0xbc fp=0x4806efe0 sp=0x4806ef40 pc=0x8f493c
runtime.goexit()
	/Users/fst/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0x4806efe8 sp=0x4806efe0 pc=0x268701
created by main.startIPNServer in goroutine 1
	/Users/fst/src/tailscale/cmd/tailscaled/tailscaled.go:419 +0x2a9

goroutine 10 [select]:
runtime.gopark(0x481b6fb0?, 0x2?, 0x0?, 0x0?, 0x481b6f9c?)
	/Users/fst/go/src/runtime/proc.go:398 +0xce fp=0x481b6e40 sp=0x481b6e20 pc=0x23a2ee
runtime.selectgo(0x481b6fb0, 0x481b6f98, 0x0?, 0x0, 0x0?, 0x1)
	/Users/fst/go/src/runtime/select.go:327 +0x725 fp=0x481b6f60 sp=0x481b6e40 pc=0x249ce5
tailscale.com/ipn/ipnserver.(*Server).Run.func2()
	/Users/fst/src/tailscale/ipn/ipnserver/server.go:491 +0x8c fp=0x481b6fe0 sp=0x481b6f60 pc=0x8698ac
runtime.goexit()
	/Users/fst/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0x481b6fe8 sp=0x481b6fe0 pc=0x268701
created by tailscale.com/ipn/ipnserver.(*Server).Run in goroutine 1
	/Users/fst/src/tailscale/ipn/ipnserver/server.go:490 +0x1be
term% 
term% 

What did you expect to see?

Not a crash in runtime.copystack.

What did you see instead?

A crash.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 7, 2023
@randall77
Copy link
Contributor

The offending stack slot is SP+0x98 in tailscale.com/logtail.(*Logger).Write(SB).
It is a pointer slot, and is initialized here:

  logtail.go:734        0x5e76a2                66440fd6bc2498000000            MOVQ X15, 0x98(SP)                                              

If X15 wasn't properly initialized to 0, that might be the issue. I recall plan9 has some strange interactions with floating-point (can't be used in signal handlers?). This could also be the result of some plan9 assembly that doesn't properly zero X15 on transitions from abi0 to abiInternal (which would be in assembly somewhere).

The other write to SP+0x98 can't be the problem, as it is of a known stack pointer.

  logtail.go:751        0x5e7888                4c8d842488000000                LEAQ 0x88(SP), R8                                               
  logtail.go:751        0x5e7890                4c89842498000000                MOVQ R8, 0x98(SP)                                               

@cherrymui
Copy link
Member

We should already don't use X15 for zeroing on Plan 9. The rewriting rules for MOVOstoreconst and DUFFZERO are guarded with useSSE and !noDuffDevice, which are false on Plan 9.
Hmmm, https://cs.opensource.google/go/go/+/master:src/cmd/compile/internal/amd64/ggen.go;l=66 this one is not guarded. Maybe this is the problem.

@cherrymui
Copy link
Member

Looks like CL https://go.dev/cl/453536 is the culprit. It added a new use of X15 without the guard.

@cherrymui cherrymui added NeedsFix The path to resolution is known, but the work has not been done. release-blocker labels Sep 7, 2023
@cherrymui cherrymui changed the title runtime: plan9 fatal error: invalid pointer found on stack cmd/compile: plan9 fatal error: invalid pointer found on stack Sep 7, 2023
@cherrymui cherrymui added this to the Go1.22 milestone Sep 7, 2023
@bradfitz
Copy link
Contributor Author

bradfitz commented Sep 7, 2023

@cherrymui, nice find! I look forward to what a test for this will look like 😅

@cherrymui
Copy link
Member

For a test, I guess we could have an assembly function that clobbers X15, then call a function with a certain frame layout which would trigger the zeroing code, then check it is actually zeroed. The ABI wrapper between assembly and Go would zero X15, but I think we don't do that on Plan 9, so we can keep X15 clobbered.

Maybe we could have the compiler just error out if X15 (or any SSE) is used on Plan 9 in compiled code? (We should probably want to allow assembly code.)

@rminnich
Copy link
Contributor

I am wondering, since you are using SSE: should you be testing for buildcfg.GOAMD64 > 1, not isPlan9?
I got burned at google by SSE when GOAMD64 got defaulted to v2 and I ran on an amd64 machine that did not have SSE enabled. We fixed the glitch, but added GOAMD64=v1 to the build environment. Is this a similar case? Using x15 without checking GOAMD64 seems dicey to me.

@randall77
Copy link
Contributor

@rminnich I'm not sure what you are referring to - maybe 386? Because our amd64 architecture port requires SSE, and has since Go was released.

@mdempsky
Copy link
Member

Removing release blocker since Plan 9 isn't a first class port.

@9nut
Copy link

9nut commented Oct 10, 2023

@bradfitz @cherrymui the change to cmd/compile/internal/amd64/ggen.go:65 didn't fix the crash.

I think I found a second contributor or the cause; it is in runtime/stack.go:72. Both diffs are in go1-21-2-diffs.txt. Does it look reasonable?

Empirically, one or both fixes resolve the issue.

@randall77
Copy link
Contributor

I don't understand what the stack fix is doing. It's just increasing the reserved stack for plan9 from 512 bytes to 4096 bytes. Is there some reason plan9 needs that extra stack? It doesn't sound like it is directly related to the X15 problem.

@orangecms
Copy link

Looking at ggen.go's history:
326df69

This commit switches from using R13 to using X15, saying:

Use XORL and X15 for zeroing in ggen's zerorange on AMD64

Prefer a SSE store from X15 over XORL REG REG -> MOVQ.

Use XORL REG, REG to setup 0 for REP STOS.

@bradfitz
Copy link
Contributor Author

@cherrymui, @randall77, any thoughts on the mentioned ggen.go change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
Development

No branches or pull requests

9 participants