Skip to content

runtime: "fatal error: unexpected signal" 0xC0000005 on Windows for a small program with a large allocation #37470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ulikunitz opened this issue Feb 26, 2020 · 25 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Milestone

Comments

@ulikunitz
Copy link
Contributor

ulikunitz commented Feb 26, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14 windows/amd64

Does this issue reproduce with the latest release?

The tests were run with Go 1.14 on a fully patched Windows 10 Home Version 1909 (Build 18363.657).

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\Uli\AppData\Local\go-build
set GOENV=C:\Users\Uli\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:\Users\Uli\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=c:\go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=c:\go\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=C:\Users\Uli\src\lz\go.mod
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\Uli\AppData\Local\Temp\go-build893962726=/tmp/go-build -gno-record-gcc-switches
GOROOT/bin/go version: go version go1.14 windows/amd64
GOROOT/bin/go tool compile -V: compile version go1.14

What did you do?

I'm developing a package that creates Lempel-Ziv sequences for byte streams. On my usual development environment Ubuntu 18.04 I observe sometimes crashes of the test runs (fatal error: bad g->status in ready) for go1.13.8 and go1.14 on multiple kernels including 4.15.0 and 5.3.18. Both should not be affected by the AVX register corruption.

To exclude Linux as a factor I tested the package on Windows and got a fatal error on every run. I was able to reduce it to a small program. The program runs without any errors on Linux. Whether the Windows issue is related to the Linux problems I cannot tell. I'm aware that initializing a structure with a huge array this way is not a good idea, but that is what I wrote initially and what appears to trigger the stack extension that runs into an invalid address access.

I started the program with go run.

> go run main.go
package main

import "fmt"

type Sequencer struct {
        htable [1 << 17]uint32
        buf    []byte
}

func (s *Sequencer) Init(windowSize int) *Sequencer {
        if !(0 <= windowSize) {
                panic(fmt.Errorf("windowSize out of range [%d,%d]", 0, 0))
        }
        *s = Sequencer{
                buf: []byte{0xff},
        }

        return s
}

func main() {
        var s Sequencer
        s.Init(0)
}

https://play.golang.org/p/VRavJw-WPie

What did you expect to see?

No output and no fatal error.

What did you see instead?

fatal error: unexpected signal during runtime execution
[signal 0xc0000005 code=0x1 addr=0xc000134000 pc=0x4143a9]

runtime stack:
runtime.throw(0x4d5ec2, 0x2a)
        c:/go/src/runtime/panic.go:1112 +0x79
runtime.sigpanic()
        c:/go/src/runtime/signal_windows.go:240 +0x25a
runtime.runGCProg(0x4a294c, 0x0, 0xc000132000, 0x1, 0x579680)
        c:/go/src/runtime/mbitmap.go:1901 +0xd9
runtime.materializeGCProg(0x80008, 0x4a2948, 0x7bfc20)
        c:/go/src/runtime/mbitmap.go:1925 +0x93
runtime.adjustframe(0x7bfb30, 0x7bfc20, 0x579680)
        c:/go/src/runtime/stack.go:696 +0x272
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc00004a000, 0x0, 0x0, 0x7fffffff, 0x4d7720, 0x7bfc20, 0x0, ...)
        c:/go/src/runtime/traceback.go:334 +0x111c
runtime.copystack(0xc00004a000, 0x200000)
        c:/go/src/runtime/stack.go:888 +0x298
runtime.newstack()
        c:/go/src/runtime/stack.go:1043 +0x219
runtime.morestack()
        c:/go/src/runtime/asm_amd64.s:449 +0x97

goroutine 1 [copystack]:
main.(*Sequencer).Init(0xc0004dff60, 0x0, 0x0)
        C:/Users/Uli/src/lz/main.go:10 +0x1af fp=0xc0004dff48 sp=0xc0004dff40 pc=0x49f93f
main.main()
        C:/Users/Uli/src/lz/main.go:23 +0x76 fp=0xc00055ff88 sp=0xc0004dff48 pc=0x49f9c6
runtime.main()
        c:/go/src/runtime/proc.go:203 +0x212 fp=0xc00055ffe0 sp=0xc00055ff88 pc=0x434952
runtime.goexit()
        c:/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc00055ffe8 sp=0xc00055ffe0 pc=0x45cd61
exit status 2
@bcmills
Copy link
Contributor

bcmills commented Feb 26, 2020

The [1 << 17]uint32 field is ~524 KiB. If that ends up being allocated on the stack, I could imagine that it is triggering some bug in the interaction between the runtime and the OS and causing the stack growth to appear to be a wild memory fault.

CC @randall77 @aclements @mknyszek

@bcmills bcmills changed the title Windows: Fatal error unexpected signal 0xC0000005 for small program that runs on Linux runtime: "fatal error: unexpected signal" 0xC0000005 on Windows for a small program with a large allocation Feb 26, 2020
@bcmills bcmills added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows labels Feb 26, 2020
@bcmills bcmills added this to the Backlog milestone Feb 26, 2020
@aclements
Copy link
Member

This isn't the stack growth itself causing the crash, but when we try to unroll a temporary GC program for scanning the large stack-allocated object. I'm not sure what's happening exactly, but that's just a bug. That should never crash.

@aclements aclements modified the milestones: Backlog, Go1.15 Feb 26, 2020
@aclements
Copy link
Member

@ulikunitz, do you know if this is reproducible with Go 1.13?

@aclements
Copy link
Member

Found it.

In this case, the GC bitmap for Sequencer will be 65536 zeros followed by 3 ones, or exactly 8 KiB of zero bytes, followed by an 0x7 byte. t.ptrdata is 524312 (the bytes of Sequencer up to an including the last pointer). The calculation for the scratch buffer size in materializeGCProg is wrong: (ptrdata/(8*sys.PtrSize)+pageSize-1)/pageSize. It needs to allocate 8193 bytes, but the ptrdata/(8*sys.PtrSize) rounds down and it allocates only 8192 bytes. That just happens to land on a page boundary, and I guess the next page happens to be unmapped, so runGCProg faults when it tries to write the last byte of the GC bitmap.

@aclements
Copy link
Member

I'm not sure how this could lead to bad g->status in ready specifically, but I bet the same bug is affecting you on Linux, and the out-of-bounds write is showing up as memory corruption instead of a segfault.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/221197 mentions this issue: runtime: fix rounding in materializeGCProg

@aclements
Copy link
Member

@gopherbot, please open a backport to Go 1.14 and Go 1.15.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #37480 (for 1.14), #37481 (for 1.15).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@aclements
Copy link
Member

Oops. @gopherbot, please open a backport to Go 1.13.

@networkimprov
Copy link

Also needs 1.12 backport? There's a queue of fixes for the final 1.12.x release:
https://github.com/golang/go/milestone/135

gopherbot can't backport more than once per issue.

cc @dmitshur

@bcmills
Copy link
Contributor

bcmills commented Feb 26, 2020

@networkimprov, now that 1.14 has been released I would not expect any further 1.12.x releases.

@ulikunitz
Copy link
Contributor Author

I checked that 1.13 and 1.13.2 works and first broken version appears to be 1.13.3. So 1.12 is probably not affected.

@ulikunitz
Copy link
Contributor Author

No, go1.12.9 is also broken. It shares probably a CL with the go1.13 release branch

@dmitshur
Copy link
Contributor

Oops. @gopherbot, please open a backport to Go 1.13.

Manually opened #37483 for Go 1.13.

@ulikunitz
Copy link
Contributor Author

Here is the report of what I tested under Windows.

go1.12     crash
go1.12.9   crash
go1.13     no crash
go1.13.2   no crash
go1.13.3   crash
go1.13.4   crash
go1.13.8   crash
go1.14     crash

@aclements
Copy link
Member

aclements commented Feb 26, 2020

Yes, this bug was introduced in https://go-review.googlesource.com/c/134155 as part of adding support for stack objects, which was released in Go 1.12. Though, as @bcmills pointed out, we're probably not going to do any more 1.12 releases.

The crash is very sensitive to the behavior of the memory allocator, so the fact that it didn't reproduce on 1.13 or 1.13.2 doesn't indicate much. The problem has been there since that CL was committed.

@ulikunitz
Copy link
Contributor Author

ulikunitz commented Feb 26, 2020

Many thanks for the explanation and the fix. Meanwhile I tested https://golang.org/cl/221197 on Linux and Windows and can happily report that i was not able to reproduce any of the issues with the package on Linux and WIndows.

@dmitshur
Copy link
Contributor

@aclements You've asked for this issue to be backported to Go 1.14 and 1.13. This seems like a serious issue, do you think there is a workaround that can be used?

@aclements
Copy link
Member

@dmitshur, technically a workaround could be to pad all types of size [N524288+1, N524288+63] for any integer N so they're not that size any more. That's both really awful (and can apply to code you depend on but don't control), and you have to know that you've encountered this issue to even think about doing something like that, which is most likely to just show up as memory corruption, which you might not even notice.

So, practically speaking, I'd say there isn't really a workaround.

@ulikunitz
Copy link
Contributor Author

ulikunitz commented Mar 20, 2020

Hi, Apparently the CL https://golang.org/cl/221197 didn't make it into 1.14.1. #37480 has been moved to milestone go1.14.2. I guess because the CL review has not been completed.

Is there anything I can do to move the CL forward?

@dmitshur
Copy link
Contributor

@ulikunitz Yes, the review needs to be completed. The CL will then get submitted, backported, and be a part of the next minor release.

I've left a ping for @aclements on the CL in case the notification fell through.

@aclements
Copy link
Member

Oof, sorry this missed 1.14.1. Things have been crazy. I've updated the CL.

@dmitshur
Copy link
Contributor

No problem Austin, it'll make its way into the next minor release. Thank you!

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/224417 mentions this issue: [release-branch.go1.14] runtime: fix rounding in materializeGCProg

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/224418 mentions this issue: [release-branch.go1.13] runtime: fix rounding in materializeGCProg

gopherbot pushed a commit that referenced this issue Apr 7, 2020
materializeGCProg allocates a temporary buffer for unrolling a GC
program. Unfortunately, when computing the size of the buffer, it
rounds *down* the number of bytes needed to store bitmap before
rounding up the number of pages needed to store those bytes. The fact
that it rounds up to pages usually mitigates the rounding down, but
the type from #37470 exists right on the boundary where this doesn't
work:

type Sequencer struct {
	htable [1 << 17]uint32
	buf    []byte
}

On 64-bit, this GC bitmap is exactly 8 KiB of zeros, followed by three
one bits. Hence, this needs 8193 bytes of storage, but the current
math in materializeGCProg rounds *down* the three one bits to 8192
bytes. Since this is exactly pageSize, the next step of rounding up to
the page size doesn't mitigate this error, and materializeGCProg
allocates a buffer that is one byte too small. runGCProg then writes
one byte past the end of this buffer, causing either a segfault (if
you're lucky!) or memory corruption.

Updates #37470.
Fixes #37480.

Change-Id: Iad24c463c501cd9b1dc1924bc2ad007991a094a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/224417
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue Apr 7, 2020
materializeGCProg allocates a temporary buffer for unrolling a GC
program. Unfortunately, when computing the size of the buffer, it
rounds *down* the number of bytes needed to store bitmap before
rounding up the number of pages needed to store those bytes. The fact
that it rounds up to pages usually mitigates the rounding down, but
the type from #37470 exists right on the boundary where this doesn't
work:

type Sequencer struct {
	htable [1 << 17]uint32
	buf    []byte
}

On 64-bit, this GC bitmap is exactly 8 KiB of zeros, followed by three
one bits. Hence, this needs 8193 bytes of storage, but the current
math in materializeGCProg rounds *down* the three one bits to 8192
bytes. Since this is exactly pageSize, the next step of rounding up to
the page size doesn't mitigate this error, and materializeGCProg
allocates a buffer that is one byte too small. runGCProg then writes
one byte past the end of this buffer, causing either a segfault (if
you're lucky!) or memory corruption.

Updates #37470.
Fixes #37483.

Change-Id: Iad24c463c501cd9b1dc1924bc2ad007991a094a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/224418
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@golang golang locked and limited conversation to collaborators Mar 20, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Windows
Projects
None yet
Development

No branches or pull requests

6 participants