Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gccgo: runtime segfaults in package initialization code (only on i386 musl) #63935

Open
nmeum opened this issue Nov 3, 2023 · 5 comments
Open
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@nmeum
Copy link

nmeum commented Nov 3, 2023

What version of Go are you using (go version)?

$ go version
go version go1.18 gccgo (Alpine 13.2.1_git20231014) 13.2.1 20231014 linux/386

Does this issue reproduce with the latest release?

I can reproduce it with GCC 13.2 which is the latest GCC stable release.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="386"
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="386"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH='/root/go'
GOPRIVATE=""
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/lib/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=""
GOTOOLDIR="/usr/libexec/gcc/i586-alpine-linux-musl/13.2.1"
GOVCS=""
GOVERSION="go1.18 gccgo (Alpine 13.2.1_git20231014) 13.2.1 20231014"
GCCGO="/usr/bin/gccgo"
GO386='sse2'
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build2176079711=/tmp/go-build -gno-record-gcc-switches -funwind-tables"

What did you do?

I maintain the Go package for Alpine Linux, for Alpine we presently use gcc-go for bootstrapping Go. As you may know, Alpine Linux uses musl libc. We upstreamed all of our downstream patches for gccgo musl support recently (#51280) and since then haven't had any issues with gcc-go. However, since musl 1.3.4 and gcc 13.X the gcc-go runtime segfaults on any Go program only on i386, on all other arches (x86_64, s390x, riscv64, aarch64, armhf, armv7) it works as intended:

$ cat hello.go
package main

import "fmt"

func main() {
        fmt.Println("hello")
}
$ gccgo -o hello hello.go
$ ./hello
unexpected fault address 0x0
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0xf75791b8]

goroutine 1 [running, locked to thread]:
runtime.dopanic__m
        /home/buildozer/aports/main/gcc/src/gcc-13-20231014/libgo/go/runtime/panic.go:1204
runtime.fatalthrow
        /home/buildozer/aports/main/gcc/src/gcc-13-20231014/libgo/go/runtime/panic.go:1073
runtime.throw
        /home/buildozer/aports/main/gcc/src/gcc-13-20231014/libgo/go/runtime/panic.go:1044
runtime.sigpanic
        /home/buildozer/aports/main/gcc/src/gcc-13-20231014/libgo/go/runtime/signal_unix.go:714

The segfault seems to occur in a compiler-generated package initialization function for the unicode package:

$ gdb ./hello
(gdb) run
Thread 1 "hello" received signal SIGSEGV, Segmentation fault.
unicode..import () at /home/buildozer/aports/main/gcc/src/gcc-13-20231014/libgo/go/unicode/tables.go:65
65              R32: []Range32{
(gdb) bt
#0  unicode..import () at /home/buildozer/aports/main/gcc/src/gcc-13-20231014/libgo/go/unicode/tables.go:65
#1  0x56557e3e in __go_init_main () at hello.go:1
#2  0xf74b141f in runtime.main (p.0=0x0) at /home/buildozer/aports/main/gcc/src/gcc-13-20231014/libgo/go/runtime/proc.go:263
#3  0xf74a87f7 in runtime.kickoff () at /home/buildozer/aports/main/gcc/src/gcc-13-20231014/libgo/go/runtime/proc.go:1316
#4  0xf7f50250 in ?? () from /lib/libucontext.so.1
#5  0x00000000 in ?? ()
(gdb) dissassemble
…
   0xf757914f <+271>:   call   0xf6e796a0 <runtime.newobject@plt>
   0xf7579154 <+276>:   movdqa -0x77df9c(%ebx),%xmm0
   0xf757915c <+284>:   movdqa -0x77dfdc(%ebx),%xmm3
   0xf7579164 <+292>:   movdqa -0x77dfcc(%ebx),%xmm4
   0xf757916c <+300>:   movdqa -0x77dfbc(%ebx),%xmm5
   0xf7579174 <+308>:   movdqa -0x77dfac(%ebx),%xmm6
   0xf757917c <+316>:   movups %xmm0,0x40(%eax)
   0xf7579180 <+320>:   movdqa -0x77df8c(%ebx),%xmm0
   0xf7579188 <+328>:   movups %xmm3,(%eax)
   0xf757918b <+331>:   movq   -0x77bf14(%ebx),%xmm7
   0xf7579193 <+339>:   movups %xmm4,0x10(%eax)
   0xf7579197 <+343>:   mov    %eax,0x46c(%esp)
   0xf757919e <+350>:   movups %xmm5,0x20(%eax)
   0xf75791a2 <+354>:   movups %xmm6,0x30(%eax)
   0xf75791a6 <+358>:   movups %xmm0,0x50(%eax)
   0xf75791aa <+362>:   mov    0x0(%ebp),%eax
   0xf75791ad <+365>:   movl   $0x2,0x478(%esp)
=> 0xf75791b8 <+376>:   movaps %xmm3,0x20(%esp)
   0xf75791bd <+381>:   movaps %xmm4,0x30(%esp)
   0xf75791c2 <+386>:   movaps %xmm5,0x40(%esp)
   0xf75791c7 <+391>:   movaps %xmm6,0x50(%esp)
   0xf75791cc <+396>:   movq   %xmm7,0xc0(%esp)
   0xf75791d5 <+405>:   movq   %xmm7,0x470(%esp)
   0xf75791de <+414>:   add    $0x10,%esp
   0xf75791e1 <+417>:   test   %eax,%eax
   0xf75791e3 <+419>:   jne    0xf758f170 <unicode..import+90416>

I am not exactly sure yet when this regression was introduced. Our gccgo is largely unpatched, but unfortunately, since musl 1.2.4 removed the LFS64 large file interfaces (e.g. stat64, lseek64, open64, …) we employ a minor patch for libgo which we haven't upstreamed yet.1 However, I would be surprised if the segfault is related to this patch since the gccgo runtime works on all other architectures. On all architectures other than i386 we are also able to bootstrap Go using gccgo.

Any idea what might be causing this or any guidance on how to debug this further? (CC: @ianlancetaylor)

One can easily reproduce this on an x86_64 machine using Docker:

$ docker run --rm -it i386/alpine:edge
/ # apk update
/ # apk add gcc-go build-base
/ # cat hello.go
package main

import "fmt"

func main() {
        fmt.Println("hello")
}
/ # gccgo -o hello hello.go
/ # ./hello

Please let me know if I can provide any additional information.

What did you expect to see?

I would have expected ./hello to output "hello".

What did you see instead?

The runtime fault.

Footnotes

  1. Would also welcome suggestion on how to revise the patch in a way that it can be upstreamed. I suppose the issue is that presence of the large file interface is assumed on Linux (e.g. in libcall_posix_largefile.go). So I guess a configure check to determine if the large file interface is supported is needed? If it isn't, a different version of various libcall_*'go files would have to be used?

@gopherbot gopherbot added this to the Gccgo milestone Nov 3, 2023
@cherrymui
Copy link
Member

unexpected fault address 0x0

The fault address is 0, but the faulting instruction is
=> 0xf75791b8 <+376>: movaps %xmm3,0x20(%esp)

which is writing to 0x20(%esp), which cannot be 0 (unless %esp is -0x20, which is very unlikely and should have caused a fault earlier). I don't see why writing to a stack address would fault. Maybe it is not aligned? What is the value of %esp at fault? Thanks.

@cherrymui cherrymui added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 3, 2023
@nmeum
Copy link
Author

nmeum commented Nov 3, 2023

What is the value of %esp at fault?

(gdb) run
Thread 1 "hello" received signal SIGSEGV, Segmentation fault.
unicode..import () at /home/buildozer/aports/main/gcc/src/gcc-13-20231014/libgo/go/unicode/tables.go:65
(gdb) info registers
eax            0x0                 0
ecx            0x1                 1
edx            0xf7ffcc20          -134231008
ebx            0xf7d5f52c          -136973012
esp            0x56a431ec          0x56a431ec
ebp            0xf7dac794          0xf7dac794 <runtime[writeBarrier]>
esi            0x56a50000          1453654016
edi            0x4                 4
eip            0xf75791b8          0xf75791b8 <unicode..import+376>
eflags         0x10216             [ PF AF IF RF ]
cs             0x23                35
ss             0x2b                43
ds             0x2b                43
es             0x2b                43
fs             0x0                 0
gs             0x63                99

@cherrymui
Copy link
Member

Thanks

esp 0x56a431ec 0x56a431ec

Yeah, the SP is not aligned at 16-byte, so is 0x20(%esp). The movaps instruction requires the address to be 16-byte aligned, otherwise it will fault. This is probably the reason.

Does i386 musl not align the SP at 16-byte when starting a thread? Does this cause any problem for other code? If not, maybe you could configure gcc to not emit aligned XMM instructions?

@nmeum
Copy link
Author

nmeum commented Nov 3, 2023

musl itself does seem to align the stack pointer on a 16-byte boundary in its startup code. However, you are probably referring to the code implementing makecontext(3)? This is not provided by musl but instead by an external library called libucontext. libucontext aligns the SP in its i386 makecontext implementation as follows:

https://github.com/kaniini/libucontext/blob/be80075e957c4a61a6415c280802fea9001201a2/arch/x86/makecontext.c#L35-L37

So it seems that libucontext's makecontext(3) implementation also aligns the stack pointer correctly?

@nmeum
Copy link
Author

nmeum commented Nov 19, 2023

One thing to also take into account is that musl does not support -fsplit-stack. There are various #ifdefs regarding stack pointer handling in libgo/runtime/proc.c maybe there is a bug in one of the #ifndef USING_SPLIT_STACK cases? Could explain why this doesn't happen on i386 glibc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants