Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin: SIGSEGV or unknown caller pc on ppc64le with -buildmode pie #43228

Closed
dtrudg opened this issue Dec 16, 2020 · 21 comments
Closed

plugin: SIGSEGV or unknown caller pc on ppc64le with -buildmode pie #43228

dtrudg opened this issue Dec 16, 2020 · 21 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@dtrudg
Copy link

dtrudg commented Dec 16, 2020

What version of Go are you using (go version)?

go version go1.15.6 linux/ppc64le

(also seen on now unsupported go version go1.13.12 linux/ppc64le)

Does this issue reproduce with the latest release?

Yes, using 1.15.6 as above.

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="ppc64le"
GOBIN=""
GOCACHE="/home/dtrudg/.cache/go-build"
GOENV="/home/dtrudg/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="ppc64le"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/dtrudg/go/pkg/mod"
GONOPROXY="github.com/sylabs"
GONOSUMDB="github.com/sylabs"
GOOS="linux"
GOPATH="/home/dtrudg/go"
GOPRIVATE="github.com/sylabs"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_ppc64le"
GCCGO="gccgo"
GOPPC64="power8"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build790835145=/tmp/go-build -gno-record-gcc-switches"
Distro Information
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.8 (Maipo)
$ uname -a
Linux sylabs01 3.10.0-1062.4.1.el7.ppc64le #1 SMP Wed Sep 25 13:57:38 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux
$ lscpu
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    8
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Model:                 2.2 (pvr 004e 0202)
Model name:            POWER8 (architected), altivec supported
Hypervisor vendor:     pHyp
Virtualization type:   para
L1d cache:             32K
L1i cache:             32K
NUMA node0 CPU(s):     0-7

What did you do?

Compiled and used a go plugin, where the main executable is compiled using -buildmode=pie on ppc64le.

Plugin code is, unfortunately, closed source. Main executable is built from https://github.com/hpcng/singularity/releases/tag/v3.7.0 - I will try to get a simpler / open-source reproducer when time permits.

What did you expect to see?

No errors. The same configuration works without issue on amd64 and arm64.

On pp64le using -buildmode=default there is no error. Only -buildmode=pie causes issues.

Also, we see no issues with -buildmode=pie on ppc64le when we are not using plugins.

What did you see instead?

SIGSEGV or unknown caller pc are seen in consecutive runs, with failures occuring at slightly different points in execution (of code within the plugin). The traces/errors seem similar, superficially, to those in #30283 (from containerd/containerd#3005). Those issues occurred on ppc64le with -buildmode=pie (but not plugins afaik) and were fixed in go 1.12.

3 backtraces from consecutive runs:

Backtrace 1
unexpected fault address 0x3fff6cd6da10
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x3fff6cd6da10 pc=0x1297dbdd8]

goroutine 23 [running]:
runtime.throw(0x12a1f6081, 0x5)
runtime/panic.go:1116 +0x68 fp=0xc000449710 sp=0xc0004496d0 pc=0x1297d14b8
runtime.sigpanic()
runtime/signal_unix.go:749 +0x3f8 fp=0xc000449750 sp=0xc000449710 pc=0x1297e9428
runtime.goexit1()
runtime/proc.go:2946 +0x38 fp=0xc0004497b8 sp=0xc000449770 pc=0x1297dbdd8
runtime.goexit()
runtime/asm_ppc64x.s:885 +0x10 fp=0xc0004497b8 sp=0xc0004497b8 pc=0x129809440
created by text/template/parse.lex
text/template/parse/lex.go:223 +0x1b4

goroutine 1 [runnable]:
reflect.funcLayout(0x12a46e0e0, 0x0, 0x1, 0xc00023e160, 0xe, 0x1, 0x194)
reflect/type.go:3052 +0x464
reflect.Value.call(0x12a46e0e0, 0x12a5e2fe8, 0x13, 0x3fff6cacee2e, 0x4, 0xc0000991a0, 0x2, 0x2, 0x2, 0x2, ...)
reflect/value.go:434 +0x528
reflect.Value.Call(0x12a46e0e0, 0x12a5e2fe8, 0x13, 0xc00023e160, 0xe, 0xe, 0x12abc7f00, 0x0, 0x0)
reflect/value.go:337 +0xb4
text/template.safeCall(0x12a46e0e0, 0x12a5e2fe8, 0x13, 0xc00023e160, 0xe, 0xe, 0x0, 0x0, 0x0, 0x0, ...)
text/template/funcs.go:365 +0xb4
text/template.(*state).evalCall(0xc00035f618, 0x3fff6cb945a0, 0xc000119040, 0x99, 0x12a46e0e0, 0x12a5e2fe8, 0x13, 0x12a62d460, 0xc000098d20, 0xc0004f80f3, ...)
text/template/exec.go:720 +0x5c8
text/template.(*state).evalFunction(0xc00035f618, 0x3fff6cb945a0, 0xc000119040, 0x99, 0xc000098de0, 0x12a62d460, 0xc000098d20, 0xc000226700, 0xf, 0x10, ...)
text/template/exec.go:575 +0x130
text/template.(*state).evalCommand(0xc00035f618, 0x3fff6cb945a0, 0xc000119040, 0x99, 0xc000098d20, 0x12a498f00, 0x12ac8bfc8, 0x99, 0x0, 0x3fff6c8927f4, ...)
text/template/exec.go:462 +0x100
text/template.(*state).evalPipeline(0xc00035f618, 0x3fff6cb945a0, 0xc000119040, 0x99, 0xc0007842a0, 0x3fff6ccd2d00, 0xc000099110, 0x30)
text/template/exec.go:431 +0x360
text/template.(*state).walk(0xc00035f618, 0x3fff6cb945a0, 0xc000119040, 0x99, 0x12a62d2e0, 0xc0000990e0)
text/template/exec.go:255 +0x31c
text/template.(*state).walk(0xc00035f618, 0x3fff6cb945a0, 0xc000119040, 0x99, 0x12a62d640, 0xc000098cc0)
text/template/exec.go:263 +0x12c
text/template.(*Template).execute(0xc0000a6200, 0x3fff6cba84a0, 0xc000784300, 0x3fff6cb945a0, 0xc000119040, 0x0, 0x0)
text/template/exec.go:220 +0x198
text/template.(*Template).Execute(...)
text/template/exec.go:203
github.com/sylabs/singularity/pro-log-plugin.SyslogOutput.Log(0xc00078a1e0, 0xd, 0xc00078a200, 0x6, 0xc00078a220, 0xb, 0xc00078e040, 0x3b, 0xc0000a6200, 0x3ec, ...)
github.com/sylabs/singularity/pro-log-plugin/syslog.go:61 +0x17c
github.com/sylabs/singularity/pro-log-plugin.(*pluginImplementation).callOutputs(0xc0000ec120)
github.com/sylabs/singularity/pro-log-plugin/main.go:79 +0x3f8
github.com/sylabs/singularity/pro-log-plugin.callbackLog.func1(0x12abedd80, 0x12ac8bfc8, 0x0, 0x0, 0x0, 0x0)
github.com/sylabs/singularity/pro-log-plugin/main.go:60 +0x164
github.com/spf13/cobra.(*Command).execute(0x12abedd80, 0xc00000e070, 0x0, 0x0, 0x12abedd80, 0xc00000e070)
github.com/spf13/cobra@v1.1.1/command.go:829 +0x4ac
github.com/spf13/cobra.(*Command).ExecuteC(0x12abedae0, 0x12a5e41f8, 0xc000096240, 0xc0002a1620)
github.com/spf13/cobra@v1.1.1/command.go:958 +0x2b4
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.1.1/command.go:895
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/cobra@v1.1.1/command.go:888
github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity()
github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:431 +0x1b0
main.main()
github.com/sylabs/singularity@v0.0.0/cmd/singularity/cli.go:19 +0x64

goroutine 21 [syscall]:
os/signal.signal_recv(0x0)
runtime/sigqueue.go:147 +0x10c
os/signal.loop()
os/signal/signal_unix.go:23 +0x38
created by os/signal.Notify.func1.1
os/signal/signal.go:150 +0x60

goroutine 22 [select]:
github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity.func2(0xc000096240, 0xc0002a1620, 0x12a625ae0, 0xc000407cc0)
github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:423 +0xc0
created by github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity
github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:422 +0x178

Backtrace 2
runtime: unexpected return pc for runtime.sigpanic called from 0x3fff6f7ee5b8
stack: frame={sp:0x3fff6f7ee5b8, fp:0x3fff6f7ee5f8} stack=[0xc000060000,0xc000060800)

fatal error: unknown caller pc

runtime stack:
runtime.throw(0x131866ca2, 0x11)
runtime/panic.go:1116 +0x68
runtime.gentraceback(0x130e49434, 0x3fff6f7ee5b8, 0x0, 0xc000602900, 0x0, 0x0, 0x7fffffff, 0x3fff6f7ee608, 0x0, 0x0, ...)
runtime/traceback.go:273 +0x15dc
runtime.addOneOpenDeferFrame.func1()
runtime/panic.go:721 +0xa0
runtime.systemstack(0x0)
runtime/asm_ppc64x.s:269 +0xa0
runtime.mstart()
runtime/proc.go:1116

goroutine 14 [running]:
runtime: unexpected return pc for runtime.systemstack_switch called from 0x0
stack: frame={sp:0x3fff6f7ee468, fp:0x3fff6f7ee488} stack=[0xc000060000,0xc000060800)

runtime.systemstack_switch()
runtime/asm_ppc64x.s:211 +0x10 fp=0x3fff6f7ee488 sp=0x3fff6f7ee468 pc=0x130e66b60
created by text/template/parse.lex
text/template/parse/lex.go:223 +0x1b4

goroutine 1 [runnable]:
time.loadLocation(0x13185c26a, 0x9, 0xc00035ecb0, 0x1, 0x1, 0x18, 0x3fff6d752d00, 0x3fff6d752d00)
time/zoneinfo_read.go:519 +0x24
time.initLocal()
time/zoneinfo_unix.go:37 +0x1e4
sync.(*Once).doSlow(0x1322ec1e8, 0x131c46610)
sync/once.go:66 +0x16c
sync.(*Once).Do(...)
sync/once.go:57
time.(*Location).get(0x1322b52e0, 0x131c992e0)
time/zoneinfo.go:92 +0xac
time.Time.locabs(0xbfeeb4d922185643, 0x11a9e84, 0x1322b52e0, 0x10, 0x98, 0x97, 0x131a9da20)
time/time.go:455 +0x64
time.Time.AppendFormat(0xbfeeb4d922185643, 0x11a9e84, 0x1322b52e0, 0xc00035f020, 0x0, 0x40, 0x3fff6d55179d, 0xf, 0x3fff6d752d00, 0x208b884f587858, ...)
time/format.go:513 +0x60
time.Time.Format(0xbfeeb4d922185643, 0x11a9e84, 0x1322b52e0, 0x3fff6d55179d, 0xf, 0xc0001b0180, 0xc00060d100)
time/format.go:505 +0xb4
log/syslog.(*netConn).writeString(0xc0004fd2c0, 0x5, 0xc0007a66f0, 0x8, 0x3fffce3bf5f6, 0xb, 0xc00060d200, 0xf4, 0x3fff6d54eac2, 0x1, ...)
log/syslog/syslog.go:288 +0x74
log/syslog.(*Writer).write(0xc0004a2180, 0x5, 0xc00060d200, 0xf4, 0x3fff6d549874, 0x131a9da20, 0xc000507268)
log/syslog/syslog.go:273 +0xd8
log/syslog.(*Writer).writeAndRetry(0xc0004a2180, 0x5, 0xc00060d200, 0xf4, 0x0, 0x0, 0x0)
log/syslog/syslog.go:254 +0x1a4
log/syslog.(*Writer).Write(0xc0004a2180, 0xc0001b0180, 0xf4, 0x180, 0x131c99201, 0x131a9da20, 0xc000507268)
log/syslog/syslog.go:175 +0x8c
fmt.Fprint(0x3fff6d6284a0, 0xc0004a2180, 0xc00035f428, 0x1, 0x1, 0x1, 0xc0004fd2c0, 0x3fff6d5495b8)
fmt/print.go:233 +0xa4
text/template.(*state).printValue(0xc00035f618, 0x131c8d2e0, 0xc0002f94d0, 0x131a9da20, 0xc000507268, 0x98)
text/template/exec.go:960 +0xdc
text/template.(*state).walk(0xc00035f618, 0x3fff6d6145a0, 0xc0002b5520, 0x99, 0x131c8d2e0, 0xc0002f94d0)
text/template/exec.go:257 +0x368
text/template.(*state).walk(0xc00035f618, 0x3fff6d6145a0, 0xc0002b5520, 0x99, 0x131c8d640, 0xc0002f91a0)
text/template/exec.go:263 +0x12c
text/template.(*Template).execute(0xc000507100, 0x3fff6d6284a0, 0xc0004a2180, 0x3fff6d6145a0, 0xc0002b5520, 0x0, 0x0)
text/template/exec.go:220 +0x198
text/template.(*Template).Execute(...)
text/template/exec.go:203
github.com/sylabs/singularity/pro-log-plugin.SyslogOutput.Log(0xc0007a6650, 0xd, 0xc0007a6670, 0x6, 0xc0007a6690, 0xb, 0xc00015f680, 0x3b, 0xc000507100, 0x3ec, ...)
github.com/sylabs/singularity/pro-log-plugin/syslog.go:61 +0x17c
github.com/sylabs/singularity/pro-log-plugin.(*pluginImplementation).callOutputs(0xc000131b00)
github.com/sylabs/singularity/pro-log-plugin/main.go:79 +0x3f8
github.com/sylabs/singularity/pro-log-plugin.callbackLog.func1(0x13224dd80, 0x1322ebfc8, 0x0, 0x0, 0x0, 0x0)
github.com/sylabs/singularity/pro-log-plugin/main.go:60 +0x164
github.com/spf13/cobra.(*Command).execute(0x13224dd80, 0xc00000e070, 0x0, 0x0, 0x13224dd80, 0xc00000e070)
github.com/spf13/cobra@v1.1.1/command.go:829 +0x4ac
github.com/spf13/cobra.(*Command).ExecuteC(0x13224dae0, 0x131c441f8, 0xc000097da0, 0xc0002d9550)
github.com/spf13/cobra@v1.1.1/command.go:958 +0x2b4
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.1.1/command.go:895
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/cobra@v1.1.1/command.go:888
github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity()
github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:431 +0x1b0
main.main()
github.com/sylabs/singularity@v0.0.0/cmd/singularity/cli.go:19 +0x64

goroutine 12 [syscall]:
os/signal.signal_recv(0x0)
runtime/sigqueue.go:147 +0x10c
os/signal.loop()
os/signal/signal_unix.go:23 +0x38
created by os/signal.Notify.func1.1
os/signal/signal.go:150 +0x60

goroutine 13 [select]:
github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity.func2(0xc000097da0, 0xc0002d9550, 0x131c85ae0, 0xc000506f80)
github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:423 +0xc0
created by github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity
github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:422 +0x178

Backtrace 3
runtime: unexpected return pc for runtime.sigpanic called from 0x3fff6180e5b8
stack: frame={sp:0x3fff6180e5b8, fp:0x3fff6180e5f8} stack=[0xc0004f5800,0xc0004f6000)

fatal error: unknown caller pc

runtime stack:
runtime.throw(0x137346ca2, 0x11)
runtime/panic.go:1116 +0x68
runtime.gentraceback(0x136929434, 0x3fff6180e5b8, 0x0, 0xc00008a780, 0x0, 0x0, 0x7fffffff, 0x3fff6180e608, 0x0, 0x0, ...)
runtime/traceback.go:273 +0x15dc
runtime.addOneOpenDeferFrame.func1()
runtime/panic.go:721 +0xa0
runtime.systemstack(0x0)
runtime/asm_ppc64x.s:269 +0xa0
runtime.mstart()
runtime/proc.go:1116

goroutine 52 [running]:
runtime: unexpected return pc for runtime.systemstack_switch called from 0x0
stack: frame={sp:0x3fff6180e468, fp:0x3fff6180e488} stack=[0xc0004f5800,0xc0004f6000)

runtime.systemstack_switch()
runtime/asm_ppc64x.s:211 +0x10 fp=0x3fff6180e488 sp=0x3fff6180e468 pc=0x136946b60
created by text/template/parse.lex
text/template/parse/lex.go:223 +0x1b4

goroutine 1 [runnable]:
fmt.Fprintf(0x3fff5bfc0008, 0xc000208108, 0x3fff604428f8, 0x13, 0xc00035f100, 0x6, 0x6, 0xf4, 0xe, 0x3fff602d802c)
fmt/print.go:202 +0x2c
log/syslog.(*netConn).writeString(0xc000204ae0, 0x5, 0xc0002708f0, 0x8, 0x3fffd579f5f6, 0xb, 0xc000094200, 0xf4, 0x3fff6043eac2, 0x1, ...)
log/syslog/syslog.go:289 +0x230
log/syslog.(*Writer).write(0xc0003622a0, 0x5, 0xc000094200, 0xf4, 0x3fff60439874, 0x13757da20, 0xc0000a0368)
log/syslog/syslog.go:273 +0xd8
log/syslog.(*Writer).writeAndRetry(0xc0003622a0, 0x5, 0xc000094200, 0xf4, 0x0, 0x0, 0x0)
log/syslog/syslog.go:254 +0x1a4
log/syslog.(*Writer).Write(0xc0003622a0, 0xc00053c000, 0xf4, 0x140, 0x137779201, 0x13757da20, 0xc0000a0368)
log/syslog/syslog.go:175 +0x8c
fmt.Fprint(0x3fff605184a0, 0xc0003622a0, 0xc00035f428, 0x1, 0x1, 0x1, 0xc000204ae0, 0x3fff604395b8)
fmt/print.go:233 +0xa4
text/template.(*state).printValue(0xc00035f618, 0x13776d2e0, 0xc0001956b0, 0x13757da20, 0xc0000a0368, 0x98)
text/template/exec.go:960 +0xdc
text/template.(*state).walk(0xc00035f618, 0x3fff605045a0, 0xc000522750, 0x99, 0x13776d2e0, 0xc0001956b0)
text/template/exec.go:257 +0x368
text/template.(*state).walk(0xc00035f618, 0x3fff605045a0, 0xc000522750, 0x99, 0x13776d640, 0xc0001951d0)
text/template/exec.go:263 +0x12c
text/template.(*Template).execute(0xc0000a0200, 0x3fff605184a0, 0xc0003622a0, 0x3fff605045a0, 0xc000522750, 0x0, 0x0)
text/template/exec.go:220 +0x198
text/template.(*Template).Execute(...)
text/template/exec.go:203
github.com/sylabs/singularity/pro-log-plugin.SyslogOutput.Log(0xc000270800, 0xd, 0xc000270840, 0x6, 0xc000270870, 0xb, 0xc000228000, 0x3b, 0xc0000a0200, 0x3ec, ...)
github.com/sylabs/singularity/pro-log-plugin/syslog.go:61 +0x17c
github.com/sylabs/singularity/pro-log-plugin.(*pluginImplementation).callOutputs(0xc000230480)
github.com/sylabs/singularity/pro-log-plugin/main.go:79 +0x3f8
github.com/sylabs/singularity/pro-log-plugin.callbackLog.func1(0x137d2dd80, 0x137dcbfc8, 0x0, 0x0, 0x0, 0x0)
github.com/sylabs/singularity/pro-log-plugin/main.go:60 +0x164
github.com/spf13/cobra.(*Command).execute(0x137d2dd80, 0xc00000e070, 0x0, 0x0, 0x137d2dd80, 0xc00000e070)
github.com/spf13/cobra@v1.1.1/command.go:829 +0x4ac
github.com/spf13/cobra.(*Command).ExecuteC(0x137d2dae0, 0x1377241f8, 0xc0004d60c0, 0xc0002dbee0)
github.com/spf13/cobra@v1.1.1/command.go:958 +0x2b4
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.1.1/command.go:895
github.com/spf13/cobra.(*Command).ExecuteContext(...)
github.com/spf13/cobra@v1.1.1/command.go:888
github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity()
github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:431 +0x1b0
main.main()
github.com/sylabs/singularity@v0.0.0/cmd/singularity/cli.go:19 +0x64

goroutine 50 [syscall]:
os/signal.signal_recv(0x0)
runtime/sigqueue.go:147 +0x10c
os/signal.loop()
os/signal/signal_unix.go:23 +0x38
created by os/signal.Notify.func1.1
os/signal/signal.go:150 +0x60

goroutine 51 [select]:
github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity.func2(0xc0004d60c0, 0xc0002dbee0, 0x137765ae0, 0xc0001cbb00)
github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:423 +0xc0
created by github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity
github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:422 +0x178

@laboger
Copy link
Contributor

laboger commented Dec 17, 2020

This definitely could be a similar problem to what happened before. If r2 is not restored correctly then you would see this type of failure.

Do you know if there is a Go version where it works with -buildmode=pie?

@dtrudg
Copy link
Author

dtrudg commented Dec 17, 2020

I'm afraid I've only tried 1.13.12 and 1.15.6 here so far. I can try more if there might be other useful versions for troubleshooting, with the caveat that the code requires at least 1.13

@laboger
Copy link
Contributor

laboger commented Dec 21, 2020

I think I am able to reproduce this problem with a test where the main program is built with pie and it invokes a plugin. At some point after it returns from the plugin the value of r2 is incorrect which can lead to various errors. I am still trying to debug it to find where it is happening. This failure won't occur if the main program is built with -buildmode=default, because in that case the main program doesn't depend on the r2 value.

@laboger
Copy link
Contributor

laboger commented Dec 21, 2020

In your example backtraces, are all 3 the same if you run with Go 1.13.12?

@dtrudg
Copy link
Author

dtrudg commented Dec 21, 2020

In your example backtraces, are all 3 the same if you run with Go 1.13.12?

I should be able to check tomorrow and will post here. I believe they are the same SIGSEGV or unknown caller pc - but I want to check.

@laboger
Copy link
Contributor

laboger commented Dec 21, 2020

@cherrymui @aclements I need some input here. In trying to reproduce the problem in this issue I found a bug in the case where a main program is built with -buildmode=pie and then invokes a function in a plugin. Seems likely to be this issue but haven't verified that yet.

I reproduced this by modifying the test in misc/cgo/testplugin so that when building TestIssue25756 it passes in -buildmode=pie. If I do that, I get a segv during the run and after debugging it found that r2 is incorrect after calling crosscall2 in runtime/cgo/asm_ppc64x.s.

When compiled with -buildmode=pie, crosscall2 has this generated code in the main program:

000000000018cf30 <crosscall2>:
  18cf30:       0f 00 4c 3c     addis   r2,r12,15
  18cf34:       d0 af 42 38     addi    r2,r2,-20528
  18cf38:       a6 02 08 7c     mflr    r0
  18cf3c:       10 00 01 f8     std     r0,16(r1)
  18cf40:       26 00 00 7c     mfcr    r0
  18cf44:       08 00 01 f8     std     r0,8(r1)
  18cf48:       18 00 41 f8     std     r2,24(r1) 

But without (-buildmode=default):

000000001011b6a0 <crosscall2>:
    1011b6a0:   a6 02 08 7c     mflr    r0
    1011b6a4:   10 00 01 f8     std     r0,16(r1)
    1011b6a8:   26 00 00 7c     mfcr    r0
    1011b6ac:   08 00 01 f8     std     r0,8(r1)
    1011b6b0:   18 00 41 f8     std     r2,24(r1)

So that means when compiled with pie, the r2 that is stored on the stack in crosscall2 is the value computed for the TOC in the current (main) program and that saved value is used by the caller after returning, which is incorrect and will result in a segv or other odd behavior. When compiled for buildmode=default, the value of r2 that is stored in crosscall2 is the r2 value from the caller, which is what it should be.

Not sure what the best way to fix it is. I was also surprised to see that there is a crosscall2 in the main program and one in the plugin object. When calling crosscall2 from the plugin, the one in the main program is called. If the one in the plugin was called instead, that would work because it would generate the correct r2 value before storing it.

@cherrymui
Copy link
Member

cherrymui commented Dec 21, 2020

Yeah, the combination of PIE+plugin is probably never tested and probably doesn't work (on some platforms).

So, you mean crosscall2 should store R2 on stack before computing it from R12? I agree the code doesn't seem to work with assembler-added addis+addi instructions. I guess the part I don't understand is
(runtime/cgo/asm_ppc64x.s)

TEXT crosscall2(SB),NOSPLIT|NOFRAME,$0
        // Start with standard C stack frame layout and linkage
        MOVD    LR, R0
        MOVD    R0, 16(R1)      // Save LR in caller's frame
        MOVW    CR, R0          // Save CR in caller's frame
        MOVD    R0, 8(R1)
        MOVD    R2, 24(R1)      // Save TOC in caller's frame

Why does it store things on the caller's frame? (whereas it doesn't seem to do that on other platforms.) The caller doesn't do that?

@laboger
Copy link
Contributor

laboger commented Dec 21, 2020

So, you mean crosscall2 should store R2 on stack before computing it from R12?

I think so. It should be the R2 that the caller was using before this call.

Why does it store things on the caller's frame? (whereas it doesn't seem to do that on other platforms.) The caller doesn't do that?

I don't know that either. I think it has been doing it that way since the beginning and maybe this was done as a hack to make things work and was just never changed. I believe the caller should be saving R2 at that location before the call. I tried removing the line that saved R2 and that caused some testcases to fail, some of the cgo crash and callback tests, which I will investigate further.

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 22, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Dec 22, 2020
@laboger
Copy link
Contributor

laboger commented Dec 22, 2020

I don't think crosscall2 should not be saving R2 on the caller's stack. if the caller needs R2 and is calling from code that has a different TOC, the caller should be saving and restoring it. However, if the caller of crosscall2 has the same TOC, so doesn't save and restore R2, but R2 gets clobbered by the call to cgocallback in crosscall2 then it needs to be saved and restored around that call, not on the caller's stack but its own. Still investigating this to see if this is correct.

@cherrymui
Copy link
Member

Yeah, I think it makes sense that crosscall2 preserves R2, just not on the caller's frame. This might work: if the caller needs to save/restore R2, it does it by itself and we don't clobber the slot on the caller's frame. If the caller doesn't need to save R2, it will jump past the inserted two instructions (addis+addi), and R2 will be preserved.

@laboger
Copy link
Contributor

laboger commented May 5, 2021

Does this still fail on Go 1.17?

@dtrudg
Copy link
Author

dtrudg commented May 5, 2021

On go version go1.16.3 linux/ppc64le an issue still occurs as below. Am seeing SIGILL or SIGSEGV when compiled with pie. When you asked about Go 1.17 do you want me to try from a master build? I don't think 1.17 is RC or released yet?

run 1 - SIGILL info
$ singularity version
unexpected fault address 0xf11e2dec4d9d7f9c
fatal error: fault
SIGILL: illegal instruction
PC=0x12569fdc8 m=9 sigcode=1
instruction bytes: 0xa 0x0 0x0 0x0 0x0 0x0 0x0 0x0 0x3 0x0 0x0 0x0 0x0 0x0 0x0 0x0

goroutine 10 [running]:
runtime: unknown pc 0x12569fdc8
stack: frame={sp:0x3fff57ffe5a0, fp:0x0} stack=[0xc000063800,0xc000064000)

runtime: unknown pc 0x12569fdc8
stack: frame={sp:0x3fff57ffe5a0, fp:0x0} stack=[0xc000063800,0xc000064000)

created by text/template/parse.lex
	text/template/parse/lex.go:223 +0x178

goroutine 1 [running]:
	goroutine running on other thread; stack unavailable

goroutine 8 [syscall]:
os/signal.signal_recv(0x0)
	runtime/sigqueue.go:168 +0x10c
os/signal.loop()
	os/signal/signal_unix.go:23 +0x38
created by os/signal.Notify.func1.1
	os/signal/signal.go:151 +0x60

goroutine 9 [select]:
github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity.func2(0xc00000d2c0, 0xc0003ee180, 0x125a670a0, 0xc0000463c0)
	github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:423 +0xa0
created by github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity
	github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:422 +0x178

r0   0x0	r1   0x3fff57ffe5a0
r2   0x126047f00	r3   0xc000680000
r4   0x0	r5   0x0
r6   0x0	r7   0x0
r8   0x0	r9   0x0
r10  0x0	r11  0x0
r12  0x0	r13  0x3fff580068f0
r14  0x1260e30c0	r15  0x1
r16  0x1260e28b0	r17  0xc000048000
r18  0x1260e2990	r19  0x1260e2930
r20  0x3fff57ffe5a8	r21  0xc00000d5c0
r22  0x1	r23  0x0
r24  0x0	r25  0x3fff64fce340
r26  0x125615940	r27  0x3fff500008c0
r28  0x3fff64fce340	r29  0x3d0f00
r30  0xc000702a80	r31  0x12569fdc8
pc   0x12569fdc8	ctr  0x0
link 0x12569fdc8	xer  0x0
ccr  0x42442442	trap 0x700
run 2 - SIGSEGV info
$ singularity version
unexpected fault address 0x3fff809ddaf0
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x1 addr=0x3fff809ddaf0 pc=0x1040282c8]

goroutine 15 [running]:
runtime.throw(0x104a9aa32, 0x5)
	runtime/panic.go:1117 +0x68 fp=0xc00005df08 sp=0xc00005dec8 pc=0x10401d1a8
runtime.sigpanic()
	runtime/signal_unix.go:741 +0x288 fp=0xc00005df50 sp=0xc00005df08 pc=0x104036128
runtime.goexit1()
	runtime/proc.go:3369 +0x38 fp=0xc00005dfb8 sp=0xc00005df70 pc=0x1040282c8
runtime.goexit()
	runtime/asm_ppc64x.s:870 +0x10 fp=0xc00005dfb8 sp=0xc00005dfb8 pc=0x1040564f0
created by text/template/parse.lex
	text/template/parse/lex.go:223 +0x178

goroutine 1 [runnable]:
text/template.(*state).evalCall(0xc00035f688, 0x3fff807f5360, 0xc0001968f0, 0x99, 0x104d27200, 0x104ea8f90, 0x13, 0x104eee488, 0xc000202120, 0xc0002a23c3, ...)
	text/template/exec.go:696 +0x170
text/template.(*state).evalFunction(0xc00035f688, 0x3fff807f5360, 0xc0001968f0, 0x99, 0xc000202390, 0x104eee488, 0xc000202120, 0xc00039cd00, 0xf, 0x10, ...)
	text/template/exec.go:580 +0x130
text/template.(*state).evalCommand(0xc00035f688, 0x3fff807f5360, 0xc0001968f0, 0x99, 0xc000202120, 0x104d52da0, 0x10559af50, 0x99, 0x3fff8072590c, 0x0, ...)
	text/template/exec.go:467 +0x100
text/template.(*state).evalPipeline(0xc00035f688, 0x3fff807f5360, 0xc0001968f0, 0x99, 0xc0002a64e0, 0x3fff804e2634, 0x4, 0x3fff80933e00)
	text/template/exec.go:436 +0x354
text/template.(*state).walk(0xc00035f688, 0x3fff807f5360, 0xc0001968f0, 0x99, 0x104eee368, 0xc000202b70)
	text/template/exec.go:255 +0x350
text/template.(*state).walk(0xc00035f688, 0x3fff807f5360, 0xc0001968f0, 0x99, 0x104eee638, 0xc000289fb0)
	text/template/exec.go:264 +0x138
text/template.(*Template).execute(0xc000455d40, 0x3fff80808b58, 0xc0002a6540, 0x3fff807f5360, 0xc0001968f0, 0x0, 0x0)
	text/template/exec.go:220 +0x198
text/template.(*Template).Execute(...)
	text/template/exec.go:203
github.com/sylabs/singularity/pro-log-plugin.SyslogOutput.Log(0xc0004fb6d0, 0xd, 0xc0004fb6f0, 0x6, 0xc0004fb710, 0xb, 0xc0004c8d40, 0x3b, 0xc000455d40, 0x0, ...)
	github.com/sylabs/singularity/pro-log-plugin/syslog.go:61 +0x17c
github.com/sylabs/singularity/pro-log-plugin.(*pluginImplementation).callOutputs(0xc0001ab9e0)
	github.com/sylabs/singularity/pro-log-plugin/main.go:79 +0x3a8
github.com/sylabs/singularity/pro-log-plugin.callbackLog.func1(0x1054edde0, 0x10559af50, 0x0, 0x0, 0x0, 0x0)
	github.com/sylabs/singularity/pro-log-plugin/main.go:60 +0x164
github.com/spf13/cobra.(*Command).execute(0x1054edde0, 0xc000032050, 0x0, 0x0, 0x1054edde0, 0xc000032050)
	github.com/spf13/cobra@v1.1.3/command.go:831 +0x4c8
github.com/spf13/cobra.(*Command).ExecuteC(0x1054edb60, 0x104eaa1d0, 0xc0002a61e0, 0xc000204e00)
	github.com/spf13/cobra@v1.1.3/command.go:960 +0x2b4
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.1.3/command.go:897
github.com/spf13/cobra.(*Command).ExecuteContext(...)
	github.com/spf13/cobra@v1.1.3/command.go:890
github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity()
	github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:431 +0x1b0
main.main()
	github.com/sylabs/singularity@v0.0.0/cmd/singularity/cli.go:19 +0x64

goroutine 13 [syscall]:
os/signal.signal_recv(0x0)
	runtime/sigqueue.go:168 +0x10c
os/signal.loop()
	os/signal/signal_unix.go:23 +0x38
created by os/signal.Notify.func1.1
	os/signal/signal.go:151 +0x60

goroutine 14 [select]:
github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity.func2(0xc0002a61e0, 0xc000204e00, 0x104ee70a0, 0xc000455bc0)
	github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:423 +0xa0
created by github.com/sylabs/singularity/cmd/internal/cli.ExecuteSingularity
	github.com/sylabs/singularity@v0.0.0/cmd/internal/cli/singularity.go:422 +0x178

@laboger
Copy link
Contributor

laboger commented May 11, 2021

I've looked at this a bit more.

  1. When crosscall2 is called using a PLT then the caller's R2 will be saved and restored and no need to do it in crosscall2. That is what is happening in this plugin failure case.

  2. When crosscall2 is called using a bl, that is the case that expects crosscall2 to save and restore R2. Somehow it needs to recognize that crosscall2 can clobber R2 so the caller knows it has to save and restore it. And then crosscall2 wouldn't have to do it. (Examples found in misc/cgo/test.)

Interestingly, in the plugin case, crosscall2 in the main program is called with a PLT even though there is a copy of crosscall2 in the plugin binary. If it could call crosscall2 in the plugin that would work because they would have the same TOC and R2 value. It is only when crosscall2 is called in a different binary that there is an issue.

I haven't looked at this cgo code generation much before so I don't know how to make 2) happen.

@cherrymui
Copy link
Member

Maybe we can make crosscall2 not dynamically exported (i.e. hidden visibility) so it will always use the local copy.

I still think crosscall2 should never write to the caller's frame (as I commented above).

@laboger
Copy link
Contributor

laboger commented May 12, 2021

Ah I think the right solution is to create a stack frame in crosscall2 and save R2 there, along with the other values it saves now in the caller's frame. If it is called through a PLT then the caller will have saved its own R2 on its stack and will restore it after the call. If it is not called through a PLT then crosscall2 will save and restore it from its own stack frame. I think that covers the cases. I will give it a try.

@gopherbot
Copy link

Change https://golang.org/cl/319489 mentions this issue: runtime/cgo: fix crosscall2 on ppc64x

@laboger
Copy link
Contributor

laboger commented May 12, 2021

@dtrudg Can you test out this CL to see if it fixes your issue?

@dtrudg
Copy link
Author

dtrudg commented May 12, 2021

Thanks @laboger - I should have time today, or tomorrow morning at the latest.

@dtrudg
Copy link
Author

dtrudg commented May 12, 2021

This does fix the issue. Using the patched go, compiled application and plugin with -buildmode=pie on the ppc64le system and it works as expected. Many thanks for you efforts on this @laboger!

@laboger
Copy link
Contributor

laboger commented May 14, 2021

@dtrudg I made an update to the CL, could you try again to verify it still fixes the problem before I submit it? Thanks.

@dmitshur dmitshur modified the milestones: Backlog, Go1.17 May 19, 2021
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 19, 2021
@gopherbot
Copy link

Change https://golang.org/cl/321349 mentions this issue: misc/cgo/testplugin: skip TestIssue25756pie on darwin/arm64 builder

gopherbot pushed a commit that referenced this issue May 20, 2021
This test is known to be broken on the darwin/arm64 builder.
Skip it while it's being investigated so it doesn't mask other failures.

For #46239.
Updates #43228.

Change-Id: I8fe57a0636bba84c3100337146dcb96cc264e524
Reviewed-on: https://go-review.googlesource.com/c/go/+/321349
Trust: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@golang golang locked and limited conversation to collaborators May 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants