Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: nil-dereference panic refers to addr=0x8 #56568

Closed
dennwc opened this issue Nov 4, 2022 · 16 comments
Closed

runtime: nil-dereference panic refers to addr=0x8 #56568

dennwc opened this issue Nov 4, 2022 · 16 comments
Assignees
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Thinking
Milestone

Comments

@dennwc
Copy link
Contributor

dennwc commented Nov 4, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18.7 linux/amd64

Does this issue reproduce with the latest release?

Yes, confirmed via playground.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/snap/go/9981"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/snap/go/9981/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.7"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build662566094=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Run the following program:

package main

type A struct {
	F int
}

func (a *A) Foo() {
	if a != nil {
		a.F++
	}
}

type B struct {
	A
}

func main() {
	var a *A
	a.Foo()
	var b *B
	b.Foo()
}

Link to a playground.

What did you expect to see?

No panic for both method calls.

What did you see instead?

Panic on the b.Foo() method call:

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x454ea2]

goroutine 1 [running]:
main.main()
	/tmp/sandbox1079324543/prog.go:21 +0x2

Discussion

Technically, I would expect b.Foo() to be equivalent to (*A).Foo(nil) in this case, at least when the struct offset of embedded A field is zero.

It is clear that the following would case panics in any case (struct offset is not zero, equivalent to (*A).Foo(0x8)):

type B struct {
	F2 int
	A
}

As well as this code (always requires pointer de-reference):

type B struct {
	*A
}

If I understand correctly, as of now the behavior of b.Foo() is similar to (*A).Foo(&b.A), where first b.A causes an operation analogous to pointer de-reference. I think this is a surprising behavior that makes it hard to embed structs (promoting methods) and make the new type nil-safe.

The only non-error-prone way to fix it currently, is to make a named type instead of embedding and manually expose all the methods:

type B A

func (b *B) Foo() {
	(*A)(b).Foo()
}

As I mentioned previously, I see no technical reason why the compiler cannot call (*A).Foo(nil) instead of requiring an additional boilerplate. Otherwise it defeats the purpose of struct composition, when nil-safety is required for promoted methods.

I believe this change doesn't break the backward compatibility promise, only makes more programs safe(er).

Open questions are:

  • Can this be implemented without slowing down existing programs?
  • Should the case of (*A).Foo(0x8) be translated to (*A).Foo(nil) as well, so that the struct offset doesn't matter? This complicates the implementation, so likely is unnecessary.

I admit this is not a bug per-se and works "as intended", however it's surprising enough to be worth fixing. Also, when using unsafe/cgo, I was able to somehow confuse the compiler to emit case similar to (*A).Foo(0x8) (thus skipping the initial nil check). I will try to provide a reproducer for it as well.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Nov 4, 2022
@cherrymui
Copy link
Member

I would expect b.Foo() to be equivalent to (*A).Foo(nil)

I'd expect b.Foo() be equivalent to b.A.Foo() (the embedding makes the .A implicit). And per spec, for a nil b, b.A panics.

@ianlancetaylor
Copy link
Contributor

As you say, this is how the language works. It would be quite subtle for the language to behave differently depending on whether the embedded field is the first field in a struct. In any case, the way to propose a language change is the proposal process described at https://go.dev/s/proposal. However, iIn this case I think it is unlikely that the proposal would be accepted.

@ianlancetaylor ianlancetaylor closed this as not planned Won't fix, can't repro, duplicate, stale Nov 4, 2022
@dennwc
Copy link
Contributor Author

dennwc commented Nov 5, 2022

And per spec, for a nil b, b.A panics.

@cherrymui Thank you, indeed I missed that it's actually guaranteed by the spec.

Having said that, I didn't find references in the spec about how exactly the promoted methods must be implemented. The spec currently specifies:

If S contains an embedded field T, the method sets of S and *S both include promoted methods with receiver T. The method set of *S also includes promoted methods with receiver *T.

There's no explicit mention that b.Foo() must be implemented as b.A.Foo(), causing evaluation of b.A instead of directly calling (*A).Foo(<addr of A>).

The reason it caused a confusion for me is closer to this example:

package main

import "unsafe"

type A struct {
	F    int
	next *A
}

func (a *A) NextA() *A {
	return (*A)(unsafe.Pointer(a.next))
}

func (a *A) Foo() {
	if a != nil {
		a.F++
	}
}

type B struct {
	A
}

func (b *B) NextB() *B {
	return (*B)(unsafe.Pointer(b.NextA()))
}

func main() {
	var a *A
	a.Foo()
	var b *B
	c := b.NextB()
	c.Foo()
}

This might be a different bug, or consequences of compiler optimization, but this is what can be observed here:

  • Assuming the spec implies that b.NextA() is the same as b.A.NextA(), the panic must be triggered at line 25 where b.A is first evaluated. But it doesn't. The actual panic is at line 11. Which leads us to...
  • The fact that the panic at line 11 says addr=0x8, meaning the compiler successfully evaluated &b.A without de-referencing b first. This is what my original message implies: there's no need to evaluate b.A when calling a promoted method.

To be clear, it still panics, but for a different reason: de-reference of (*A)(0x8) instead of (*B)(0x0).

Thus, going back to this example, either the spec should allow evaluation of &b.A during the promoted method call without de-referencing b (in which case my original proposal/bug report is valid), or it should explicitly mention that promoted method call must evaluate b.A (which is not how the compiler works currently, according to the example).

So @ianlancetaylor , I'm still not exactly sure how to proceed with this report. Should I then open a separate bug report for the second example? Should I open a proposal to clarify the spec in a way that allows promoting nil-protected methods with this example (e.g. propose an offset-independent implementation)? Both?

In fact, in the real program was able to pass the 0x8 pointer around, which caused a similar panic in unrelated part of the program. I was unable to reproduce it in the playground yet, but maybe I will be able eventually. This is why I think it may still be a bug of some kind.

@ianlancetaylor
Copy link
Contributor

That second case does seem like a potential quality of implementation issue. gccgo reports

panic: runtime error: invalid memory address or nil pointer dereference

goroutine 1 [running]:
main.B.NextB
	foo.go:25
main.main
	foo.go:32

That is, I think we agree that the program should crash, but the crash report from the gc compiler could be clearer as to where the bad nil dereference occurs.

Go 1.4 reports

[signal 0xb code=0x1 addr=0x0 pc=0x400c45]

goroutine 1 [running]:
main.(*B).NextB(0x0, 0x0)
	/home/iant/foo11.go:25 +0x15
main.main()
	/home/iant/foo11.go:32 +0x37

but after that it gets more confusing due to the fact that older gc releases were not as reliable at reporting failures in inlined functions. At least I think that's the reason, I didn't look too deeply. The reference to addr=0x8 starts in Go 1.10.

Reopening in case this is something we want to fix in the gc compiler.

@ianlancetaylor ianlancetaylor reopened this Nov 5, 2022
@ianlancetaylor ianlancetaylor changed the title runtime: Panic calling promoted nil-protected methods on nil embedding struct pointers runtime: nil-dereference panic refers to addr=0x8 Nov 5, 2022
@mknyszek mknyszek added this to the Backlog milestone Nov 7, 2022
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 7, 2022
@cherrymui
Copy link
Member

In the example above, both (*B).NextB and (*A).NextA are inlined into main. In the AST there is a nil check at line 25 (from NextB) before the load at line 11. But at a later stage the compiler somehow schedules the load before the nil check. I think we have logic to schedule nil checks before loads. This may be a compiler bug. I'll take a look. Thanks.

@randall77
Copy link
Contributor

Possibly related to #42673

@cherrymui
Copy link
Member

Yeah, the fix for #42673 fixes this one as well.

@randall77 randall77 modified the milestones: Backlog, Go1.21 Dec 5, 2022
@randall77
Copy link
Contributor

Ok, this (the part that Ian reopened the issue about) should get fixed in 1.21 then.

@gopherbot
Copy link

Change https://go.dev/cl/270940 mentions this issue: cmd/compile: improve scheduling pass

gopherbot pushed a commit that referenced this issue Jan 20, 2023
Convert the scheduling pass from scheduling backwards to scheduling forwards.

Forward scheduling makes it easier to prioritize scheduling values as
soon as they are ready, which is important for things like nil checks,
select ops, etc.

Forward scheduling is also quite a bit clearer. It was originally
backwards because computing uses is tricky, but I found a way to do it
simply and with n lg n complexity. The new scheme also makes it easy
to add new scheduling edges if needed.

Fixes #42673
Update #56568

Change-Id: Ibbb38c52d191f50ce7a94f8c1cbd3cd9b614ea8b
Reviewed-on: https://go-review.googlesource.com/c/go/+/270940
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
@cherrymui
Copy link
Member

Well, with the CL above it still panic with addr=0x8. The SSA dump before scheduling is

v1 (?) = InitMem <mem>
v17 (?) = MOVQconst <*B> [0] (b[*B], b[*B])
v19 (+32) = InlMark <void> [2] v1
v20 (+25) = LoweredNilCheck <void> v17 v1
v22 (25) = InlMark <void> [4] v1
v24 (11) = MOVQconst <**A> [8]
v25 (+11) = MOVQload <*A> v24 v1 (~R0[*A], ~R0[*B], c[*B])
v33 (+33) = LoweredNilCheck <void> v25 v1

Basically, after inlining and constant folding, the compiler is smart enough to know that B is at constant address 0 and A is at constant address 8. We issue a nil check for B, but there is no scheduling edge between that nil check and the constant 8.

Usually if a pointer is derived from another pointer, that derivation (pointer addition, or folded into a load with offset) is scheduled later than the nil check. But in this case the pointer is really a constant, which can be scheduled arbitrarily (and happened to be before the nil check in this case).

I don't see an easy way to enforce scheduling order for nil check vs. constants. If we know statically a pointer is constant nil, maybe we can make derived pointers all constant nil, i.e. (OffPtr [n] (ConstNil)) -> (ConstNil)? Not sure if that is too much...

@dennwc
Copy link
Contributor Author

dennwc commented Jun 5, 2023

Ideally, if the compiler/runtime can enforce <nil ptr> + <any off> -> nil for method calls, it will make the original use case I described much safer (where methods of embedded struct at off=0 can test for a nil pointer, which is propagated from a parent struct being nil).

I hope that these cases with constants can start a larger discussion for possible spec change/clarification.

@cherrymui cherrymui modified the milestones: Go1.21, Go1.22 Jun 27, 2023
@cherrymui
Copy link
Member

Change like (OffPtr [n] (ConstNil)) -> (ConstNil) still sounds a bit scary to me. It's too late to do that at this point for Go 1.22. Move to backlog, as this involves only statically known offset from nil pointers, and it panics either way. Thanks.

@cherrymui cherrymui modified the milestones: Go1.22, Backlog Nov 27, 2023
@randall77
Copy link
Contributor

I think CL 537775 may help with the ordering here? It enforces ordering between a nil check and subsequent dependent loads.

I'm not entirely sure what the bug is anymore. We certainly allow addr=0x8 in perfectly normal situations, like

package main

type A struct {
	x, y int
}

var a *A

func main() {
	println(a.y)
}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x8 pc=0x104c43a10]

goroutine 1 [running]:
main.main()
	/Users/khr/gowork/issue56568b.go:10 +0x20

@dennwc
Copy link
Contributor Author

dennwc commented Nov 28, 2023

@randall77 If I recall correctly, it was not about offset 0x8 specifically.

It's that the nil check from the method was removed/reordered, allowing passing 0x8 as a receiver to embedded struct's method. I expected it to panic as soon as it tried to offset a nil pointer.

From the description of https://go.dev/cl/537775 that you mentioned, it may indeed fix the root cause. I will check it a bit later.

@cherrymui
Copy link
Member

The example mentioned in #56568 (comment) panics with the stack trace below.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x457166]

goroutine 1 [running]:
main.(*A).NextA(...)
	/tmp/sandbox1097189260/prog.go:11
main.(*B).NextB(...)
	/tmp/sandbox1097189260/prog.go:25
main.main()
	/tmp/sandbox1097189260/prog.go:32 +0x6

Specifically, it panics in NextA.

On tip, it panics with

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x45bc83]

goroutine 1 [running]:
main.(*B).NextB(...)
	/tmp/sandbox2394944921/prog.go:25
main.main()
	/tmp/sandbox2394944921/prog.go:32 +0x3

I.e. panics in NextB, before invoking the NextA method.

Is this the expected panic? If so, we can call it fixed. Thanks.

@dennwc
Copy link
Contributor Author

dennwc commented Nov 30, 2023

Yes, this was exactly the issue I was having. Thank you! Closing it then.

@dennwc dennwc closed this as completed Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Thinking
Projects
None yet
Development

No branches or pull requests

6 participants