Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: race-detector crash when using cgo and threads #17190

Closed
sisoftrg opened this issue Sep 22, 2016 · 8 comments
Closed

runtime: race-detector crash when using cgo and threads #17190

sisoftrg opened this issue Sep 22, 2016 · 8 comments

Comments

@sisoftrg
Copy link

sisoftrg commented Sep 22, 2016

What version of Go are you using (go version)?

go1.7.1 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/go"
GORACE=""
GOROOT="/usr/lib/go"
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
CC="x86_64-pc-linux-gnu-gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build555128270=/tmp/go-build -gno-record-gcc-switches"
CXX="x86_64-pc-linux-gnu-g++"
CGO_ENABLED="1"

I'm use external library (pjsip, which using pthreads) with help of cgo. Without enabled race-detector all works well, but I want to debug some race-conditions in Go code.
When I turn on race-detector, app crashes after external library creates some threads.

Crash log:

runtime: invalid pc-encoded table f=runtime.needm pc=0x499718 targetpc=0x49971f tab=[0/0]0x0
        value=56 until pc=0x499718
fatal error: invalid runtime symbol table

runtime stack:
runtime.throw(0x9c37b9, 0x1c)
        /usr/lib/go/src/runtime/panic.go:566 +0x95
runtime.pcvalue(0xa86610, 0x3dec9, 0x49971f, 0x0, 0x1, 0xb9)
        /usr/lib/go/src/runtime/symtab.go:472 +0x4cd
runtime.funcline1(0xa86610, 0x49971f, 0x801, 0x4b23b1, 0x4995d0, 0xddd780)
        /usr/lib/go/src/runtime/symtab.go:496 +0x76
runtime.funcline(0xa86610, 0x49971f, 0x3, 0x48, 0x50)
        /usr/lib/go/src/runtime/symtab.go:507 +0x3a
runtime.raceSymbolizeCode(0x7fe42f7fdc50)
        /usr/lib/go/src/runtime/race.go:105 +0x84
runtime.racecallback(0x1, 0x7fe42f7fdc50)
        /usr/lib/go/src/runtime/race.go:85 +0x78
runtime.racecallbackthunk(0x49971f)
        /usr/lib/go/src/runtime/race_amd64.s:424 +0x66

goroutine 35 [running]:
fatal error: unexpected signal during runtime execution
panic during panic
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x4b7545]

runtime stack:
runtime.startpanic_m()
        /usr/lib/go/src/runtime/panic.go:623 +0xfc
runtime.systemstack(0xa01d78)
        /usr/lib/go/src/runtime/asm_amd64.s:314 +0xab
runtime.startpanic()
        /usr/lib/go/src/runtime/panic.go:544 +0x1e
runtime.throw(0x9c8d0a, 0x2a)
        /usr/lib/go/src/runtime/panic.go:565 +0x88
runtime.sigpanic()
        /usr/lib/go/src/runtime/sigpanic_unix.go:12 +0x2cc
runtime.gentraceback(0xffffffffffffffff, 0x0, 0x0, 0xc420091380, 0x0, 0x0, 0x64, 0x0, 0x0, 0x0, ...)
        /usr/lib/go/src/runtime/traceback.go:295 +0x14f5
runtime.traceback1(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc420091380, 0x0)
        /usr/lib/go/src/runtime/traceback.go:661 +0x121
runtime.traceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc420091380)
        /usr/lib/go/src/runtime/traceback.go:622 +0x52
runtime.tracebackothers(0xc420121520)
        /usr/lib/go/src/runtime/traceback.go:767 +0x292
runtime.dopanic_m(0xc420121520, 0x494995, 0x7fe42f7fd8f0)
        /usr/lib/go/src/runtime/panic.go:669 +0x182
runtime.dopanic.func1()
        /usr/lib/go/src/runtime/panic.go:553 +0x3c
runtime.systemstack(0x7fe42f7fd8c0)
        /usr/lib/go/src/runtime/asm_amd64.s:314 +0xab
runtime.dopanic(0x0)
        /usr/lib/go/src/runtime/panic.go:554 +0x70
runtime.throw(0x9c37b9, 0x1c)
        /usr/lib/go/src/runtime/panic.go:566 +0x95
runtime.pcvalue(0xa86610, 0x3dec9, 0x49971f, 0x0, 0x1, 0xb9)
        /usr/lib/go/src/runtime/symtab.go:472 +0x4cd
runtime.funcline1(0xa86610, 0x49971f, 0x801, 0x4b23b1, 0x4995d0, 0xddd780)
        /usr/lib/go/src/runtime/symtab.go:496 +0x76
runtime.funcline(0xa86610, 0x49971f, 0x3, 0x48, 0x50)
        /usr/lib/go/src/runtime/symtab.go:507 +0x3a
runtime.raceSymbolizeCode(0x7fe42f7fdc50)
        /usr/lib/go/src/runtime/race.go:105 +0x84
runtime.racecallback(0x1, 0x7fe42f7fdc50)
        /usr/lib/go/src/runtime/race.go:85 +0x78
runtime.racecallbackthunk(0x49971f)
        /usr/lib/go/src/runtime/race_amd64.s:424 +0x66
@sisoftrg sisoftrg changed the title Race detector crash when using cgo and threads runtime: race-detector crash when using cgo and threads Sep 22, 2016
@ianlancetaylor
Copy link
Contributor

How can we recreate the problem?

CC @dvyukov

@ianlancetaylor ianlancetaylor added this to the Go1.8 milestone Sep 22, 2016
@sisoftrg
Copy link
Author

Currently I don't know how to strip my complex networking code to minimal working example.
I'm tried to comment out throw() call at symtab.go:472 (func pcvalue) and got a bit more info in new trace:

runtime: invalid pc-encoded table f=runtime.needm pc=0x499898 targetpc=0x49989f tab=[0/0]0x0
        value=56 until pc=0x499898
runtime: invalid pc-encoded table f=runtime.needm pc=0x499898 targetpc=0x49989f tab=[0/0]0x0
        value=1342 until pc=0x49975e
        value=1343 until pc=0x499778
        value=1354 until pc=0x49978b
        value=1363 until pc=0x49979e
        value=1364 until pc=0x4997bc
        value=1372 until pc=0x4997ca
        value=1373 until pc=0x4997cf
        value=1380 until pc=0x4997e0
        value=1381 until pc=0x4997ee
        value=1342 until pc=0x4997f3
        value=1382 until pc=0x499810
        value=1342 until pc=0x499815
        value=1383 until pc=0x499832
        value=1384 until pc=0x49983c
        value=1387 until pc=0x499841
        value=1388 until pc=0x499846
        value=1389 until pc=0x499850
        value=1346 until pc=0x499880
        value=1347 until pc=0x49988c
        value=1354 until pc=0x499891
        value=1346 until pc=0x499898
==================
WARNING: DATA RACE
Read at 0x00c42001e158 by goroutine 28:
  main.(*Ctl).reportAll()
      /home/go/src/mophon/ctl.go:100 +0xf4
  main.(*Ctl).worker.func1()
      /home/go/src/mophon/ctl.go:154 +0x8e7

Previous write at 0x00c42001e158 by goroutine 2:
  main.onRegState()
      /home/go/src/mophon/pjsip.go:140 +0x554
  main._cgoexpwrap_a6764208b0c4_onRegState()
      ??:0 +0x40

Goroutine 28 (running) created at:
  main.(*Ctl).worker()
      /home/go/src/mophon/ctl.go:187 +0xa2a
  main.(*Program).Start.func1.2()
      /home/go/src/mophon/main.go:205 +0x170

Goroutine 2 (running) created at:
  runtime.needm()
      ?@ACDFILMNPSTUZ["\ [some binary garbage from memory skipped]

According to it, I have race-condition between Go code working in goroutine (ctl.go) and Go code called as callback from one of threads in C-library (pjsip.go), but race-detector can't get some info to show stacktraces. External library is not stripped.
After eliminating this race-condition app stop crashing.

@sisoftrg
Copy link
Author

Finally I got working example:

main.go:

package main

/*
#cgo CFLAGS: -Wall -O2 -Wno-unused-function
#cgo linux LDFLAGS: -lpthread
typedef struct cb {
        void (*on_test)();
} cb;
extern void start_test(cb *c);
extern void onTest();
static void test_cb(cb *c) {
        c->on_test = &onTest;
}
*/
import "C"

type T struct {
        t int
}
var t = &T{}

//export onTest
func onTest() {
        t.t++
}

func main() {
        var c C.cb
        C.test_cb(&c)
        C.start_test(&c)
        go func() {
                for {
                        t.t++
                }
        }()
        for {}
}

lib.go:

package main

/*
#cgo CFLAGS: -Wall -O2 -Wno-unused-function
#include <pthread.h>

typedef struct cb {
        void (*on_test)();
} cb;

void *call_test(void *p) {
        cb *c = (cb*)p;
        c->on_test();
        return NULL;
}

void start_test(cb *c) {
        pthread_t thr;
        pthread_create(&thr, NULL, call_test, c);
}
*/
import "C"

@dvyukov
Copy link
Member

dvyukov commented Sep 23, 2016

@sisoftrg thanks for the reproducer.
I can reproduce this on tip.

There are 2 bugs. First is that race detector incorrectly subtracts 1 (or maybe somebody forgot to add 1) to a PC, so it asks to symbolize off by one PC:

  438a5b:       e8 f0 6e 02 00          callq  45f950 <runtime.write>
  438a60:       c7 04 24 01 00 00 00    movl   $0x1,(%rsp)
  438a67:       e8 84 6e 02 00          callq  45f8f0 <runtime.exit>
  438a6c:       e9 e7 fe ff ff          jmpq   438958 <runtime.needm+0x28>
  438a71:       e8 6a 9a ff ff          callq  4324e0 <runtime.panicindex>
  438a76:       0f 0b                   ud2    
  438a78:       cc                      int3   
  438a79:       cc                      int3   
  438a7a:       cc                      int3   
  438a7b:       cc                      int3   
  438a7c:       cc                      int3   
  438a7d:       cc                      int3   
  438a7e:       cc                      int3   
  438a7f:       cc                      int3     /// <<<<<< THIS ONE (most likely must be the next byte)

0000000000438a80 <runtime.newextram>:
  438a80:       64 48 8b 0c 25 f8 ff    mov    %fs:0xfffffffffffffff8,%rcx
  438a87:       ff ff 
  438a89:       48 3b 61 10             cmp    0x10(%rcx),%rsp
  438a8d:       76 7c                   jbe    438b0b <runtime.newextram+0x8b>
  438a8f:       48 83 ec 28             sub    $0x28,%rsp
  438a93:       48 89 6c 24 20          mov    %rbp,0x20(%rsp)

Second is that gentraceback crashes on this (I guess user can trigger this crash by calling one of public runtime functions as well):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff6ff5700 (LWP 144105)]
0x0000000000450645 in runtime.gentraceback () at /usr/local/google/home/dvyukov/src/go/src/runtime/traceback.go:295
295                     frame.lr = uintptr(*(*sys.Uintreg)(unsafe.Pointer(lrPtr)))
(gdb) bt
#0  0x0000000000450645 in runtime.gentraceback () at /usr/local/google/home/dvyukov/src/go/src/runtime/traceback.go:295
#1  0x00000000004514c1 in runtime.traceback1 () at /usr/local/google/home/dvyukov/src/go/src/runtime/traceback.go:661
#2  0x0000000000451312 in runtime.traceback () at /usr/local/google/home/dvyukov/src/go/src/runtime/traceback.go:622
#3  0x0000000000451fe2 in runtime.tracebackothers () at /usr/local/google/home/dvyukov/src/go/src/runtime/traceback.go:767
#4  0x0000000000434633 in runtime.dopanic_m () at /usr/local/google/home/dvyukov/src/go/src/runtime/panic.go:659
#5  0x000000000045718c in runtime.dopanic.func1 () at /usr/local/google/home/dvyukov/src/go/src/runtime/panic.go:543
#6  0x000000000045c10b in runtime.systemstack () at /usr/local/google/home/dvyukov/src/go/src/runtime/asm_amd64.s:314
#7  0x0000000000434050 in runtime.dopanic () at /usr/local/google/home/dvyukov/src/go/src/runtime/panic.go:544
#8  0x0000000000434105 in runtime.throw () at /usr/local/google/home/dvyukov/src/go/src/runtime/panic.go:556
#9  0x000000000044bbfa in runtime.pcvalue () at /usr/local/google/home/dvyukov/src/go/src/runtime/symtab.go:494
#10 0x000000000044beb6 in runtime.funcline1 () at /usr/local/google/home/dvyukov/src/go/src/runtime/symtab.go:518
#11 0x000000000044c00a in runtime.funcline () at /usr/local/google/home/dvyukov/src/go/src/runtime/symtab.go:529
#12 0x0000000000440e81 in runtime.raceSymbolizeCode () at /usr/local/google/home/dvyukov/src/go/src/runtime/race.go:106
#13 0x0000000000440d88 in runtime.racecallback () at /usr/local/google/home/dvyukov/src/go/src/runtime/race.go:85
#14 0x000000000045f886 in runtime.racecallbackthunk () at /usr/local/google/home/dvyukov/src/go/src/runtime/race_amd64.s:424
#15 0x0000000000464194 in __tsan::SymbolizeCode(unsigned long) ()

@dvyukov
Copy link
Member

dvyukov commented Sep 24, 2016

This is broken on so many levels...

@gopherbot
Copy link

CL https://golang.org/cl/29713 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/29714 mentions this issue.

@gopherbot
Copy link

CL https://golang.org/cl/29712 mentions this issue.

gopherbot pushed a commit that referenced this issue Sep 25, 2016
PC passed to racegostart is expected to be a return PC
of the go statement. Race runtime will subtract 1 from the PC
before symbolization. Passing start PC of a function is wrong.
Add sys.PCQuantum to the function start PC.

Update #17190

Change-Id: Ia504c49e79af84ed4ea360c2aea472b370ea8bf5
Reviewed-on: https://go-review.googlesource.com/29712
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
gopherbot pushed a commit that referenced this issue Sep 25, 2016
Don't write line info for types, we don't have it.
Otherwise types look like:

type _Ctype_struct_cb struct {
//line :1
      on_test *[0]byte
//line :1
}

Which is not useful. Moreover we never override source info,
so subsequent source code uses the same source info.
Moreover, empty file name makes compile emit no source debug info at all.

Update #17190

Change-Id: I7ae6fa4964520d7665743d340419b787df0b51e8
Reviewed-on: https://go-review.googlesource.com/29713
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@dvyukov dvyukov self-assigned this Sep 25, 2016
@golang golang locked and limited conversation to collaborators Sep 25, 2017
@rsc rsc unassigned dvyukov Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants