Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: unable to backtrace program using Go shared object #52608

Closed
pdefrain opened this issue Apr 28, 2022 · 7 comments
Closed

runtime: unable to backtrace program using Go shared object #52608

pdefrain opened this issue Apr 28, 2022 · 7 comments
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.

Comments

@pdefrain
Copy link

There appears to be an issue with the Golang GC when CGo is used with a "C++" application that uses a Go shared object.

go version go1.18 linux/amd64

It reproduces with previous releases, the very latest release has not been tried (the problem being investigated was observed in 1.17, which was updated to 1.18 along the way). It is possible this issue is related to https://github.com/golang/go/issues/19928 and a question was posted to that older report. This is a follow-up to that post after the issue was investigated further.

The build environment is:

GO111MODULE=""
GOARCH="amd64"
GOBIN="/home/defrain/legit3/ams/mediaserver/Golang/go/bin"
GOCACHE="/home/defrain/.cache/go-build"
GOENV="/home/defrain/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/defrain/legit3/ams/mediaserver/Golang/go/pkg/mod"
GONOPROXY="
"
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/defrain/legit3/ams/mediaserver/Golang/go:/home/defrain/legit3/ams/mediaserver/Golang/src/libgo"
GOPRIVATE="
"
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/defrain/legit3/ams/mediaserver/Golang/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/defrain/legit3/ams/mediaserver/Golang/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/defrain/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1959834666=/tmp/go-build -gno-record-gcc-switches"

The code uses CGo, two functions, to pass serialized messages buffers between "C++" <-> Go. The C++ application is the "main" which loads a Golang shared object, libso.so. The C++ application is compiled using the intel compiler, libgo uses "go build -v -buildmode=c-shared -o Linux/release64/libgo.so *.go"

Here is the CGo interface part of the code:

`// -----------------------------------------------------------
//
// C++ API - "C" functions used to access Go
//
// -----------------------------------------------------------

//
// typedef struct {
// void* buf;
// unsigned long length;
// } CSlice;
//
// typedef void (*send_to_go)(CSlice cs);
// extern void SendToGo(CSlice cs);
//
// typedef CSlice (*read_from_go)();
// extern CSlice ReadFromGo();
//
import "C"

import (
"os"
"fmt"
"net"
"time"
"sync"
"errors"
"unsafe"
"strconv"
)

var IncomingChannel chan *GMessage = make(chan *GMessage, 1000)
var OutgoingChannel chan *GMessage = make(chan *GMessage, 1000)

//export SendToGo
func SendToGo(cs C.CSlice) {
// Called from "C" to send a message to Go
var cptr unsafe.Pointer = unsafe.Pointer(cs.buf)
bp := unsafe.Slice((*byte)(cptr), cs.length)
msg := deserializeGMessage(bp)
IncomingChannel <- msg
}

//export ReadFromGo
func ReadFromGo() C.CSlice {
// Called by "C" to get a message from Go
msg := <- OutgoingChannel
var cs C.CSlice
cs.length = C.ulong(msg.getTransmitSize())
cs.buf = C.malloc(cs.length)
gbuf := unsafe.Slice((*byte)(cs.buf), cs.length)
msg.serializeGMessage(gbuf, int(cs.length))
return cs
}
`

where GMessage is used to marshal/unmarshal an object that stores label/value pairs. The "C++" code has a thread that sits on the OutgoingChannel and has other threads that post to the IncomingChannel.

I expect the code to run forever - and it does with GOGC=off until memory is exhausted. When GOGC is enabled, it will eventually panic here:

runtime: unexpected return pc for main.ReadUDP called from 0x7fff4c0112b0
stack: frame={sp:0xc00006ad38, fp:0xc00006adf8} stack=[0xc00006a000,0xc00006b000)
0x000000c00006ac38: 0x00007fffe44b7879 <fmt.Sprintf+0x0000000000000059> 0x000000c00011e270
0x000000c00006ac48: 0x00007fffe498e874 0x0000000000000036
0x000000c00006ac58: 0x000000c00006adb8 0x0000000000000003
0x000000c00006ac68: 0x0000000000000003 0x0000000000000000
0x000000c00006ac78: 0x0000000000000000 0x000000c00011e270
0x000000c00006ac88: 0x000000c00006ad28 0x00007fffe4965d45 <main.DebugLogV+0x00000000000000c5>
0x000000c00006ac98: 0x00007fffe498e874 0x0000000000000036
0x000000c00006aca8: 0x000000c00006adb8 0x0000000000000003
0x000000c00006acb8: 0x0000000000000003 0x00007ffff7f9b228
0x000000c00006acc8: 0x0000000000000008 0x0000000000000016
0x000000c00006acd8: 0x0000000000000010 0x000000c000154150
0x000000c00006ace8: 0x0000000000203000 0x000000c0005a3040
0x000000c00006acf8: 0x000000c00006ad28 0x000000c00006ad28
0x000000c00006ad08: 0x00007fffe43da105 <runtime.convT64+0x0000000000000045> 0x0000000000000008
0x000000c00006ad18: 0x00007fffe4cd6e20 0x0000000000000000
0x000000c00006ad28: 0x000000c00006ade8 0x00007fffe496a3ab <main.ReadUDP+0x000000000000026b>
0x000000c00006ad38: <0x0000000000003eca 0x00007fffe498e874
0x000000c00006ad48: 0x0000000000000036 0x000000c00006adb8
0x000000c00006ad58: 0x0000000000000003 0x0000000000000003
0x000000c00006ad68: 0x0000000000000005 0x0000000000003eca
0x000000c00006ad78: 0x00007fff4c015100 0x0000000000000003
0x000000c00006ad88: 0x000000000000000a 0x000000c00003a7b0
0x000000c00006ad98: 0x000000c0003bd800 0x000000c00006add0
0x000000c00006ada8: 0x00007ffff78df80d 0x0000000000056c9f
0x000000c00006adb8: 0x00007fffe4dada80 0x00007ffff78df80d
0x000000c00006adc8: 0x00007fffe4cd6e60 0x000000c0000321a0
0x000000c00006add8: 0x00007fffe4cd63e0 0x000000c00003a7b8
0x000000c00006ade8: 0x000000c000000000 !0x00007fff4c0112b0
0x000000c00006adf8: >0x000000000003ce14 0x000000000e3b6fef
0x000000c00006ae08: 0x00007fff40592630 0x000000c00006ef68
0x000000c00006ae18: 0x0000000000000005 0x0000000000000005
0x000000c00006ae28: 0x000000000000000a 0x00007fff4c0112b0
0x000000c00006ae38: 0x0000000000000000 0x000000c00006ae70
0x000000c00006ae48: 0x00007ffff78c38d6 0x0000000000000000
0x000000c00006ae58: 0x0000000e0000000a 0x0000000000000005
0x000000c00006ae68: 0x00007fff4c0112e8 0x000000c00006bbf0
0x000000c00006ae78: 0x00007fffe3b0a630 0x0000000000000001
0x000000c00006ae88: 0x0000000000000000 0x000000c000080000
0x000000c00006ae98: 0x0000000000000000 0x0000000000008000
0x000000c00006aea8: 0x000000c00006bc20 0x0000000000000000
0x000000c00006aeb8: 0x000000c000723000 0x0000000000000020
0x000000c00006aec8: 0x000000c000722000 0x0000000000000000
0x000000c00006aed8: 0x000000c0001029c0 0xffffffffffffffff
0x000000c00006aee8: 0x000000c000723520 0x000000c0007225a0
fatal error: unknown caller pc

runtime stack:
runtime.throw({0x7fffe4976978?, 0x7fffe52a4200?})
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/panic.go:992 +0x71
runtime.gentraceback(0x7fff3c03f8cb?, 0x200000004?, 0xc0000031e0?, 0x7fffb05f3d40?, 0x0, 0x0, 0x7fffffff, 0x7fffb05f3d28, 0xc00006ac78?, 0x0)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go:254 +0x1a36
runtime.addOneOpenDeferFrame.func1()
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/panic.go:599 +0x6b
runtime.systemstack()
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/asm_amd64.s:469 +0x46

goroutine 29 [running]:
runtime.systemstack_switch()
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/asm_amd64.s:436 fp=0xc00006a588 sp=0xc00006a580 pc=0x7fffe4430b20
runtime.addOneOpenDeferFrame(0x4?, 0x1770?, 0xe?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/panic.go:598 +0x69 fp=0xc00006a5c8 sp=0xc00006a588 pc=0x7fffe44023e9
panic({0x7fffe4d10dc0, 0x7fffe4e1b1a0})
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/panic.go:798 +0x106 fp=0xc00006a688 sp=0xc00006a5c8 pc=0x7fffe4402c06
runtime.panicmakeslicelen(...)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/slice.go:29
runtime.makeslice(0xc0005a2820?, 0x7fffe496cbc0?, 0xc00006a720?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/slice.go:98 +0x78 fp=0xc00006a6b0 sp=0xc00006a688 pc=0x7fffe4419798
net.hexString(...)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/ip.go:384
net.IP.String({0xe8458b4801fc4583, 0x8f0ffc453b28408b, 0x0?})
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/ip.go:341 +0xc5 fp=0xc00006a7a0 sp=0xc00006a6b0 pc=0x7fffe4580ec5
net.ipEmptyString(...)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/ip.go:397
net.(*UDPAddr).String(0x7ffff78df80d)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/udpsock.go:51 +0x55 fp=0xc00006a858 sp=0xc00006a7a0 pc=0x7fffe4596175
fmt.(*pp).handleMethods(0xc00011e270, 0x0?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/fmt/print.go:626 +0x30b fp=0xc00006aaa8 sp=0xc00006a858 pc=0x7fffe44b9fab
fmt.(*pp).printArg(0xc00011e270, {0x7fffe4dada80?, 0x7ffff78df80d}, 0x73)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/fmt/print.go:709 +0x693 fp=0xc00006ab48 sp=0xc00006aaa8 pc=0x7fffe44bab53
fmt.(*pp).doPrintf(0xc00011e270, {0x7fffe498e874, 0x36}, {0xc00006adb8?, 0x3, 0x3})
/home/defrain/legit3/ams/mediaserver/Golang/go/src/fmt/print.go:1026 +0x288 fp=0xc00006ac40 sp=0xc00006ab48 pc=0x7fffe44bd408
fmt.Sprintf({0x7fffe498e874, 0x36}, {0xc00006adb8, 0x3, 0x3})
/home/defrain/legit3/ams/mediaserver/Golang/go/src/fmt/print.go:219 +0x59 fp=0xc00006ac98 sp=0xc00006ac40 pc=0x7fffe44b7879
main.DebugLogV(0x3eca?, {0x7fffe498e874, 0x36}, {0xc00006adb8, 0x3, 0x3})
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:126 +0xc5 fp=0xc00006ad38 sp=0xc00006ac98 pc=0x7fffe4965d45
runtime: unexpected return pc for main.ReadUDP called from 0x7fff4c0112b0
stack: frame={sp:0xc00006ad38, fp:0xc00006adf8} stack=[0xc00006a000,0xc00006b000)
0x000000c00006ac38: 0x00007fffe44b7879 <fmt.Sprintf+0x0000000000000059> 0x000000c00011e270
0x000000c00006ac48: 0x00007fffe498e874 0x0000000000000036
0x000000c00006ac58: 0x000000c00006adb8 0x0000000000000003
0x000000c00006ac68: 0x0000000000000003 0x0000000000000000
0x000000c00006ac78: 0x0000000000000000 0x000000c00011e270
0x000000c00006ac88: 0x000000c00006ad28 0x00007fffe4965d45 <main.DebugLogV+0x00000000000000c5>
0x000000c00006ac98: 0x00007fffe498e874 0x0000000000000036
0x000000c00006aca8: 0x000000c00006adb8 0x0000000000000003
0x000000c00006acb8: 0x0000000000000003 0x00007ffff7f9b228
0x000000c00006acc8: 0x0000000000000008 0x0000000000000016
0x000000c00006acd8: 0x0000000000000010 0x000000c000154150
0x000000c00006ace8: 0x0000000000203000 0x000000c0005a3040
0x000000c00006acf8: 0x000000c00006ad28 0x000000c00006ad28
0x000000c00006ad08: 0x00007fffe43da105 <runtime.convT64+0x0000000000000045> 0x0000000000000008
0x000000c00006ad18: 0x00007fffe4cd6e20 0x0000000000000000
0x000000c00006ad28: 0x000000c00006ade8 0x00007fffe496a3ab <main.ReadUDP+0x000000000000026b>
0x000000c00006ad38: <0x0000000000003eca 0x00007fffe498e874
0x000000c00006ad48: 0x0000000000000036 0x000000c00006adb8
0x000000c00006ad58: 0x0000000000000003 0x0000000000000003
0x000000c00006ad68: 0x0000000000000005 0x0000000000003eca
0x000000c00006ad78: 0x00007fff4c015100 0x0000000000000003
0x000000c00006ad88: 0x000000000000000a 0x000000c00003a7b0
0x000000c00006ad98: 0x000000c0003bd800 0x000000c00006add0
0x000000c00006ada8: 0x00007ffff78df80d 0x0000000000056c9f
0x000000c00006adb8: 0x00007fffe4dada80 0x00007ffff78df80d
0x000000c00006adc8: 0x00007fffe4cd6e60 0x000000c0000321a0
0x000000c00006add8: 0x00007fffe4cd63e0 0x000000c00003a7b8
0x000000c00006ade8: 0x000000c000000000 !0x00007fff4c0112b0
0x000000c00006adf8: >0x000000000003ce14 0x000000000e3b6fef
0x000000c00006ae08: 0x00007fff40592630 0x000000c00006ef68
0x000000c00006ae18: 0x0000000000000005 0x0000000000000005
0x000000c00006ae28: 0x000000000000000a 0x00007fff4c0112b0
0x000000c00006ae38: 0x0000000000000000 0x000000c00006ae70
0x000000c00006ae48: 0x00007ffff78c38d6 0x0000000000000000
0x000000c00006ae58: 0x0000000e0000000a 0x0000000000000005
0x000000c00006ae68: 0x00007fff4c0112e8 0x000000c00006bbf0
0x000000c00006ae78: 0x00007fffe3b0a630 0x0000000000000001
0x000000c00006ae88: 0x0000000000000000 0x000000c000080000
0x000000c00006ae98: 0x0000000000000000 0x0000000000008000
0x000000c00006aea8: 0x000000c00006bc20 0x0000000000000000
0x000000c00006aeb8: 0x000000c000723000 0x0000000000000020
0x000000c00006aec8: 0x000000c000722000 0x0000000000000000
0x000000c00006aed8: 0x000000c0001029c0 0xffffffffffffffff
0x000000c00006aee8: 0x000000c000723520 0x000000c0007225a0
main.ReadUDP()
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:704 +0x26b fp=0xc00006adf8 sp=0xc00006ad38 pc=0x7fffe496a3ab
created by main.GoInitialize
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:613 +0x185

goroutine 19 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc00020a080)
/home/defrain/legit3/ams/mediaserver/Golang/go/pkg/mod/go.opencensus.io@v0.23.0/stats/view/worker.go:276 +0xad
created by go.opencensus.io/stats/view.init.0
/home/defrain/legit3/ams/mediaserver/Golang/go/pkg/mod/go.opencensus.io@v0.23.0/stats/view/worker.go:34 +0x8d

goroutine 6 [sleep]:
time.Sleep(0x3b9aca00)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/time.go:194 +0x12e
main.libgoAudit()
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/utility.go:111 +0x7d
created by main.GoInitialize
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:640 +0x33e

goroutine 22 [semacquire]:
internal/poll.runtime_Semacquire(0xc00020fbb0?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/sema.go:61 +0x25
internal/poll.(*fdMutex).rwlock(0xc00020aa80, 0xea?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/internal/poll/fd_mutex.go:154 +0xd2
internal/poll.(*FD).readLock(...)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/internal/poll/fd_mutex.go:221
internal/poll.(*FD).ReadFromInet4(0xc00020aa80, {0xc0000d0000, 0x4e20, 0x4e20}, 0xa000?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/internal/poll/fd_unix.go:235 +0x69
net.(*netFD).readFromInet4(0xc00020aa80, {0xc0000d0000?, 0x0?, 0xc00020fc40?}, 0x4?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/fd_posix.go:66 +0x29
net.(*UDPConn).readFrom(0xc00005ec00?, {0xc0000d0000?, 0x0?, 0xa?}, 0xc0001f4a50)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/udpsock_posix.go:52 +0x1b8
net.(*UDPConn).readFromUDP(0xc00013a3d0, {0xc0000d0000?, 0xc00020fde8?, 0x7fffe4965e95?}, 0xc00006bdb8?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/udpsock.go:149 +0x31
net.(*UDPConn).ReadFromUDP(...)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/udpsock.go:141
main.ReadUDP()
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:696 +0x8d
main.incomingMessagesWorker(0x0?)
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:391 +0x7c
created by main.GoInitialize
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:591 +0x9c

goroutine 23 [runnable]:
main.deserializeGMessage({0xc000218000, 0x4e20, 0xc000213de8?})
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/gmessage.go:285 +0x167
main.ReadUDP()
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:698 +0xa9
main.incomingMessagesWorker(0x0?)
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:391 +0x7c
created by main.GoInitialize
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:613 +0x185

goroutine 24 [runnable]:
main.deserializeGMessage({0xc000b98000, 0x4e20, 0xc00020dde8?})
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/gmessage.go:285 +0x167
main.ReadUDP()
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:698 +0xa9
main.incomingMessagesWorker(0x0?)
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:391 +0x7c
created by main.GoInitialize
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:613 +0x185

goroutine 25 [semacquire]:
internal/poll.runtime_Semacquire(0xc0003885a0?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/sema.go:61 +0x25
internal/poll.(*fdMutex).rwlock(0xc00020aa80, 0x34?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/internal/poll/fd_mutex.go:154 +0xd2
internal/poll.(*FD).readLock(...)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/internal/poll/fd_mutex.go:221
internal/poll.(*FD).ReadFromInet4(0xc00020aa80, {0xc000605000, 0x4e20, 0x4e20}, 0x0?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/internal/poll/fd_unix.go:235 +0x69
net.(*netFD).readFromInet4(0xc00020aa80, {0xc000605000?, 0x0?, 0xc00020dc40?}, 0x4?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/fd_posix.go:66 +0x29
net.(*UDPConn).readFrom(0xc00005e800?, {0xc000605000?, 0xc0005a29c0?, 0xa?}, 0xc0001f4630)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/udpsock_posix.go:52 +0x1b8
net.(*UDPConn).readFromUDP(0xc00013a3d0, {0xc000605000?, 0xc00020dde8?, 0x7fffe4965e95?}, 0xc000068db8?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/udpsock.go:149 +0x31
net.(*UDPConn).ReadFromUDP(...)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/udpsock.go:141
main.ReadUDP()
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:696 +0x8d
main.incomingMessagesWorker(0x0?)
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:391 +0x7c
created by main.GoInitialize
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:613 +0x185

goroutine 26 [runnable]:
main.deserializeGMessage({0xc0000d5000, 0x4e20, 0xc00065fde8?})
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/gmessage.go:285 +0x167
main.ReadUDP()
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:698 +0xa9
main.incomingMessagesWorker(0x0?)
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:391 +0x7c
created by main.GoInitialize
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:613 +0x185

goroutine 27 [runnable]:
main.deserializeGMessage({0xc000b9d000, 0x4e20, 0xc00065dde8?})
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/gmessage.go:285 +0x185
main.ReadUDP()
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:698 +0xa9
main.incomingMessagesWorker(0x0?)
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:391 +0x7c
created by main.GoInitialize
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:613 +0x185

goroutine 28 [IO wait]:
internal/poll.runtime_pollWait(0x7ffff7e21090, 0x72)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/netpoll.go:302 +0x89
internal/poll.(*pollDesc).wait(0xc00020aa80?, 0xc000677000?, 0x0)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/internal/poll/fd_poll_runtime.go:83 +0x32
internal/poll.(*pollDesc).waitRead(...)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/internal/poll/fd_poll_runtime.go:88
internal/poll.(*FD).ReadFromInet4(0xc00020aa80, {0xc000677000, 0x4e20, 0x4e20}, 0x0?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/internal/poll/fd_unix.go:250 +0x1e5
net.(*netFD).readFromInet4(0xc00020aa80, {0xc000677000?, 0x0?, 0x7fffe4e28f80?}, 0x1?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/fd_posix.go:66 +0x29
net.(*UDPConn).readFrom(0xc00005ec00?, {0xc000677000?, 0x0?, 0xa?}, 0xc00011d200)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/udpsock_posix.go:52 +0x1b8
net.(*UDPConn).readFromUDP(0xc00013a3d0, {0xc000677000?, 0xc000211de8?, 0x7fffe4965e95?}, 0xc000139db8?)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/udpsock.go:149 +0x31
net.(*UDPConn).ReadFromUDP(...)
/home/defrain/legit3/ams/mediaserver/Golang/go/src/net/udpsock.go:141
main.ReadUDP()
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:696 +0x8d
main.incomingMessagesWorker(0x0?)
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:391 +0x7c
created by main.GoInitialize
/home/defrain/legit3/ams/mediaserver/Golang/src/libgo/libgo.go:613 +0x185

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffb05f4700 (LWP 30894)]
runtime.raise () at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/sys_linux_amd64.s:168
168 /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/sys_linux_amd64.s: No such file or directory.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-324.el7_9.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-44.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libstdc++-4.8.5-36.el7_6.2.x86_64 openssl-libs-1.0.2k-16.el7_6.1.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 runtime.raise () at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/sys_linux_amd64.s:168
#1 0x00007fffe44188e5 in runtime.dieFromSignal (sig=6) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/signal_unix.go:852
#2 0x00007fffe4418b18 in runtime.crash () at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/signal_unix.go:944
#3 0x00007fffe44035b1 in runtime.fatalthrow.func1 () at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/panic.go:1051
#4 0x00007fffe4403530 in runtime.fatalthrow () at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/panic.go:1044
#5 0x00007fffe44032f1 in runtime.throw (s=...) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/panic.go:992
#6 0x00007fffe4426d36 in runtime.gentraceback (pc0=140737022891928, sp0=824634156680, lr0=, gp=, skip=0, pcbuf=0x0, max=2147483647, callback=
{void (runtime.stkframe *, void *, bool *)} 0x7fffb05f3d18, v=0x0, flags=0, ~r0=) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go:254
#7 0x00007fffe44024ab in runtime.addOneOpenDeferFrame.func1 () at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/panic.go:600
#8 0x00007fffe4430b86 in runtime.systemstack () at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/asm_amd64.s:469
#9 0x0000000000000000 in ?? ()

An alternative implementation that does not use CGo, but, instead uses UDP to exchange the buffer between C++ <-> Go, using the ReadUDP and WriteUDP, runs forever w/o issue.

The observed panic occurs in

runtime/traceback.go:254

where it finds an address, which appears to always be in "C++" memory space, it does not expect. Here is the frame (6) from GDB from a panic:

(gdb) frame 6
#6 0x00007fffe4426d36 in runtime.gentraceback (pc0=140737022891928, sp0=824634156680, lr0=, gp=, skip=0, pcbuf=0x0, max=2147483647, callback=
{void (runtime.stkframe *, void *, bool *)} 0x7fffb05f3d18, v=0x0, flags=0, ~r0=) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go:254
254 /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go: No such file or directory.
(gdb) i args
pc0 = 140737022891928
sp0 = 824634156680
lr0 =
gp =
skip = 0
pcbuf = 0x0
max = 2147483647
callback = {void (runtime.stkframe *, void *, bool *)} 0x7fffb05f3d18
v = 0x0
flags = 0
~r0 =
(gdb) i lo
doPrint = false
lrPtr = 824634158576
flag =
flr = {_func = 0x7fffe52764e0, datap = }
level = 1
ctxt = 0x0
nprint = 0
waspanic = false
printing = false
lastFuncID = 0 '\000'
n = 12
f = {_func = 0x7fffe52764e0, datap = }
cgoCtxt = <error reading variable cgoCtxt (access outside bounds of object referenced via synthetic pointer)>
frame = {fn = {_func = 0x7fffe52764e0, datap = 0x7fffe52a4200 <runtime.firstmoduledata>}, pc = 140737028465579, continpc = 140737028447557, lr = 140734468526768, sp = 824634158392, fp = 824634158584,
varp = 824634158376, argp = 824634158392, arglen = 48, argmap = 0x0}
cache = {entries = {{{targetpc = 140737028447557, off = 1312617, val = 152}, {targetpc = 140737023552339, off = 292530, val = 152}, {targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 0, val = 0}, {
targetpc = 0, off = 0, val = 0}, {targetpc = 140737024364229, off = 477966, val = 232}, {targetpc = 140737024450933, off = 498209, val = 176}, {targetpc = 0, off = 0, val = 0}}, {{targetpc = 0, off = 0,
val = 0}, {targetpc = 0, off = 0, val = 0}, {targetpc = 140737023549355, off = 292208, val = 584}, {targetpc = 140737023539321, off = 289663, val = 80}, {targetpc = 0, off = 0, val = 0}, {targetpc = 0,
off = 0, val = 0}, {targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 49, val = 0}}}}

Many different approaches have been tried to avoid the panic. This includes alternative implementations where the Go channels are replaced with blocking queues on the "C++" side. Control of the Go GC has been tried using GOGC=off then by programmatically scheduling a GC (with "stoptheworld" either on and off and with any "C++" threads stopped too). Zeroing malloc memory has no impact. Never "free"ing the malloc memory has no impact. It seems that the GC eventually crashes if it finds anything "C++" related in a stack it is traversing. The the code (meaning the entire application) has no issue when CGo is removed and UDP is used to pass message buffers strongly suggests that there is a problem specific to the operation of the Go GC w.r.t. CGo. Valgrind used against the application code w/o libgo.so included, finds no issues.

@ianlancetaylor
Copy link
Contributor

Can you share a complete program that we can use to replicate the problem? Thanks.

I note a call to panicmakeslicelen which suggests that some Go code is trying to create a slice with a length that is much too large. But I'm not sure why the program is unable to do a backtrace at that point.

@ianlancetaylor
Copy link
Contributor

CC @golang/runtime

@ianlancetaylor ianlancetaylor added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Apr 28, 2022
@ianlancetaylor ianlancetaylor changed the title affected/package: cgo runtime: unable to backtrace program using Go shared object Apr 28, 2022
@pdefrain
Copy link
Author

We've learned something new. The software is stable when Go runs as a separate process using identical code. We can "connect" the C++ application and Go application using a socket, replacing CGo, and the system is stable. If we run Go within the C++ application in a shared object, that initializes using module func init(), using a socket to replace CGo, the system is unstable - it appears to crash (SIGSEV) randomly with invalid stack traces, always in Go. In the shared object configuration, CGO_ENABLED must be set during the Go compile despite CGo not technically being used (there is no import "C" and no //export tagged Go functions). Essentially, CGo is eliminated by exchanging all data using a socket, instead of CGo, using a simple coupling like this.

`
var udpConn *net.UDPConn

// Go listens on port 6001. "C" listens on port 6000
func CreateUDPServer() {
hostName := "127.0.0.1"
portNum := "6001"
service := hostName + ":" + portNum
addr, err := net.ResolveUDPAddr("udp4", service)
if err != nil {
DebugLog("CreateUDPServer : ERROR: %x", err)
}
// setup listener for incoming UDP connection
conn, errl := net.ListenUDP("udp", addr)
if errl != nil {
DebugLog("CreateUDPServer : ERROR: %x", errl)
} else {
udpConn = conn
}
}

func ReadUDP() *GMessage {
var msg *GMessage
if udpConn != nil {
// the MAX_MESSAGE_SIZE size here is different than size - so we pass in the actual size to deserializeGMessage
buffer := make([]byte, MAX_MESSAGE_SIZE)
size, addr, _ := udpConn.ReadFromUDP(buffer)
msg = deserializeGMessage(buffer, size)
buffer = ni
if ConfiguredLogLevel >= 3 {
mtype := msg.getValueAsString("type")
DebugLogV(3, "ReadUDP : received message from %s, type [%s], size %d", addr, mtype, size)
}
if msg == nil {
DebugLog("ReadUDP : ERROR: message failed")
}
}
return msg
}

func WriteUDP(msg *GMessage) {
hostName := "127.0.0.1"
portNum := "6000"
service := hostName + ":" + portNum
addr, err := net.ResolveUDPAddr("udp4", service)
if err != nil {
fmt.Println("WriteUDP : ERROR: bad address")
}
if udpConn != nil {
size := msg.getTransmitSize()
if size <= 0 {
fmt.Println("SendMessageFromGoToC: ERROR: message is empty")
return
}
buf := make([]byte, size)
msg.serializeGMessage(buf, size)
_, err := udpConn.WriteToUDP(buf, addr)
if err != nil {
fmt.Println("UDP ERROR: %x", err)
}
} else{
fmt.Println("writeUDP : ERROR: no address")
}
}
`

in Go, with a similar implementation in C++. Both runtimes block on the sockets waiting for events from the other. We want to use CGo, because of its simplicity and speed, instead of a contraption like this.

The larger application is too large to share - the Go part is smaller and dedicated to interfacing with Google gRPC services.

The common denominator here is the appearence that the C++ and Go runtimes are not compatible in the same process. For Go to call "C" code, it may be that it may be that it can do so only as a Go process. Having a "C" process calling Go, while also having a Go runtime, might be the issue. But, if this is the case, then why support Go as a shared object for a "C" application? I saw a comment on a board that said something like, and I'm paraphrasing, "the C and Go runtimes cannot be used together" but I've never seen this in official documentation. One interesting official documentation comment is here:
https://go.dev/doc/faq "Do Go programs link with C/C++ programs?"
which has many warnings that lack specifics. This appears to suggest that a C++ process cannot safely co-exist with the Go runtime in the same address space. In our case, C and Go are in the same process, when unstable, but they load into different memory regions and I would expect that, aside from CGo, they cannot trample one another.

Regarding the large slices suspected in the traceback. We went as far as logging slice sizes along the C/Go boundary and did not encounter anything unusual. I think any invalid slice range seen is a product of some sort of trampling - which also explains the corrupted stacks - a problem that does not occur, running the same code, as separate C and Go processes.

@ianlancetaylor
Copy link
Contributor

Calling a Go shared library from a C program is in general supported. I don't know what is happening here.

Have you tried using -buildmode=c-archive rather than -buildmode=c-shared? Does that make any difference in the behavior?

When Go and C code are linked together, it's definitely possible for C to trample into Go, and it's possible for Go code that uses unsafe to trample into C.

@pdefrain
Copy link
Author

pdefrain commented Apr 29, 2022

When linked using static libs, I notice a few things. First, the func init() is not reliable. Whether it breaks or not is a compile time issue. I've seen this before in general... sometimes an init() is not called. I work around that by calling what's in the init() for one go file from another and as long as it runs somewhere (like a factory registration) then it's ok. But there is something wrong with init() - that's only a nuisance.

With the Go code linked static everything is now in the C++ binary (which makes it large, but about the sum of its parts) and the same issue occurs:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffdc061700 (LWP 31416)]
0x0000000000b290fc in runtime.gentraceback (pc0=11572566, sp0=824634175120, lr0=, gp=, skip=0, pcbuf=0x0, max=100, callback={void (runtime.stkframe *, void *, bool *)} 0x7fffdc0602f0, v=0x0, flags=0, ~r0=)
at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go:242
242 /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go: No such file or directory.
Missing separate debuginfos, use: debuginfo-install glibc-2.17-324.el7_9.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libcom_err-1.42.9-13.el7.x86_64 libgcc-4.8.5-44.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libstdc++-4.8.5-36.el7_6.2.x86_64 openssl-libs-1.0.2k-16.el7_6.1.x86_64 pcre-8.32-17.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0 0x0000000000b290fc in runtime.gentraceback (pc0=11572566, sp0=824634175120, lr0=, gp=, skip=0, pcbuf=0x0, max=100, callback={void (runtime.stkframe *, void *, bool *)} 0x7fffdc0602f0, v=0x0, flags=0, ~r0=)
at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go:242
#1 0x0000000000b2b3d1 in runtime.traceback1 (pc=0, sp=0, lr=0, gp=0xc000003380, flags=0) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go:835
#2 0x0000000000b2c2e5 in runtime.traceback (pc=, sp=, lr=, gp=0xc000003380) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go:782
#3 runtime.tracebackothers.func1 (gp=0xc000003380) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go:1051
#4 0x0000000000b0a08d in runtime.forEachGRace (fn={void (runtime.g *)} 0x7fffdc0604e0) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/proc.go:590
#5 0x0000000000b2c1db in runtime.tracebackothers (me=) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go:1037
#6 0x0000000000b0735c in runtime.dopanic_m (gp=0x23fa6a0 <runtime.g0>, pc=11561585, sp=140736884770368, ~r0=) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/panic.go:1192
#7 0x0000000000b06d28 in runtime.fatalthrow.func1 () at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/panic.go:1047
#8 0x0000000000b06cb0 in runtime.fatalthrow () at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/panic.go:1044
#9 0x0000000000b06a71 in runtime.throw (s=...) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/panic.go:992
#10 0x0000000000b2a3f6 in runtime.gentraceback (pc0=11572566, sp0=824634175120, lr0=, gp=, skip=0, pcbuf=0x0, max=2147483647, callback={void (runtime.stkframe *, void *, bool *)} 0x7fffdc060a10, v=0x0, flags=0, ~r0=)
at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go:254
#11 0x0000000000af089a in runtime.scanstack (gp=0xc000003380, gcw=0xc00004b738, ~r0=) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/mgcmark.go:783
#12 0x0000000000aef7a5 in runtime.markroot.func1 () at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/mgcmark.go:241
#13 0x0000000000aef425 in runtime.markroot (gcw=0xc00004b738, i=36, flushBgCredit=true, ~r0=) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/mgcmark.go:214
#14 0x0000000000af143f in runtime.gcDrain (gcw=0xc00004b738, flags=3) at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/mgcmark.go:1047
#15 0x0000000000aee2c5 in runtime.gcBgMarkWorker.func2 () at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/mgc.go:1276
#16 0x0000000000b34246 in runtime.systemstack () at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/asm_amd64.s:469
#17 0x0000000000b38b87 in runtime.newproc (fn=0x0) at :1
#18 0x00000000023fa6a0 in crypto/tls.emptyConfig ()
#19 0x0000000000000000 in ?? ()

(gdb) frame 10
#10 0x0000000000b2a3f6 in runtime.gentraceback (pc0=11572566, sp0=824634175120, lr0=, gp=, skip=0, pcbuf=0x0, max=2147483647, callback={void (runtime.stkframe *, void *, bool *)} 0x7fffdc060a10, v=0x0, flags=0, ~r0=)
at /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go:254
254 in /home/defrain/legit3/ams/mediaserver/Golang/go/src/runtime/traceback.go
(gdb) i lo
doPrint = false
lrPtr = 824634175144
flag =
flr = {_func = 0x0, datap = }
level = 1
ctxt = 0x0
nprint = 0
waspanic = false
printing = false
lastFuncID = 0 '\000'
n = 14
f = {_func = 0x1f8f350, datap = }
cgoCtxt = <error reading variable cgoCtxt (access outside bounds of object referenced via synthetic pointer)>
frame = {fn = {_func = 0x1f8f350, datap = 0x22706c0 <runtime.firstmoduledata>}, pc = 11572566, continpc = 0, lr = 824638988288, sp = 824634175120, fp = 824634175152, varp = 0, argp = 0, arglen = 0, argmap = 0x0}
cache = {entries = {{{targetpc = 11572566, off = 85302, val = 24}, {targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 0,
val = 0}, {targetpc = 0, off = 0, val = 0}}, {{targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 0, val = 0}, {
targetpc = 0, off = 0, val = 0}, {targetpc = 0, off = 49, val = 0}}}}
(gdb) i args
pc0 = 11572566
sp0 = 824634175120
lr0 =
gp =
skip = 0
pcbuf = 0x0
max = 2147483647
callback = {void (runtime.stkframe *, void *, bool *)} 0x7fffdc060a10
v = 0x0
flags = 0
~r0 =

This is using a socket interface to exchange messages between C++ and Go so CGo is not used. The applications(s) run fine as separate processes (they ran the same test as such overnight in heavy traffic with no issues).

@pdefrain
Copy link
Author

pdefrain commented May 4, 2022

This issue does not reproduce with a minimal application that drives the Go library in a similar manner than the real one. The real application is too complex to be able to share for diagnostic purposes.

Here's some facts:

  1. The Go code passes race and efence testing and runs forever when stressed from a test C++ application compiled with icc or gcc.
  2. The Go code passes race and efence testing and runs forever when stressed from the real application, compiled with icc, when run as a separate Go process.
  3. The Go code crashes (SIGSEGV) or panics (SIGABRT), with efence enabled, which is generally observed to be cause by stack corruption, when run as a virtually separate process (the only interface is two UDP sockets, one in each direction, no CGo, no import "C", or SWIG) running the exact same implementation but linked static or loaded as a shared object.
  4. The Go code runs until memory is exhausted for condition (3) with GOGC=off.
  5. When the SIGSEGV and SIGABRT for (3) occurs, gdb generally finds only the failing thread running (all others blocked on signals/waits of one for or another) with all other threads' stacks apparently intact.

Something is coupling execution of the application and the Go code (as a library) causing corruption in thread stacks.

@seankhliao seankhliao added WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jun 12, 2022
@gopherbot
Copy link

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@golang golang locked and limited conversation to collaborators Jul 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

4 participants