Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: panic only with GOGC=on and highly concurrent programs also using cgo #30276

Closed
sharpevo opened this issue Feb 17, 2019 · 6 comments
Closed
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@sharpevo
Copy link

sharpevo commented Feb 17, 2019

What version of Go are you using (go version)?

$ go version
go version go1.11.2 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/yang/.cache/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/yang/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/lib/go"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build049051044=/tmp/go-build -gno-record-gcc-switches"

What did you do?

The app was built to send instructions to devices concurrently with cgo, in Linux and Windows platform. While running the app with GOGC=off, it works fine with many goroutines for a long period of time. Normally, there will be more errors when gc is turned on. IMO, it seems that variables are released by the garbage collector even when there are references. What's worse, runtime.KeepAlive does not help more.

Error 1: standalone select statement

fatal error: >>> parse cur 0
unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xa9f4ff]

goroutine 477493 [running]:
runtime.throw(0x291a505, 0x2a)
    /usr/lib/go/src/runtime/panic.go:608 +0x72 fp=0xc002cf6510 sp=0xc002cf64e0 pc=0xa6ed62
runtime.sigpanic()
    /usr/lib/go/src/runtime/signal_unix.go:374 +0x2f2 fp=0xc002cf6560 sp=0xc002cf6510 pc=0xa84722
runtime.memmove(0xc002cf6750, 0x8, 0x8)
    /usr/lib/go/src/runtime/memmove_amd64.s:168 +0x12f fp=0xc002cf6568 sp=0xc002cf6560 pc=0xa9f4ff
runtime.typedmemmove(0x2608060, 0xc002cf6750, 0x8)
    /usr/lib/go/src/runtime/mbarrier.go:170 +0x45 fp=0xc002cf65a0 sp=0xc002cf6568 pc=0xa55d35
runtime.selectgo(0xc002cf6758, 0xc002cf6728, 0x2, 0x1, 0xa48501)
    /usr/lib/go/src/runtime/select.go:418 +0x62c fp=0xc002cf6700 sp=0xc002cf65a0 pc=0xa8038c
posam/interpreter.bridge.func1()
    /home/yang/go/src/posam/interpreter/interpreter.go:784 +0x101 fp=0xc002cf67b8 sp=0xc002cf6700 pc=0x214fd21

the code:

func bridge(terminatec <-chan interface{}, chanc <-chan <-chan Response) <-chan Response {
    valStream := make(chan Response)
    util.Go(func() {
        defer close(valStream)
        for {
            var stream <-chan Response
            select { // <- this line
            case <-terminatec:
                return
            case mayStream, ok := <-chanc:
// ...

Error 2: variables in fmt.Println

unexpected fault address 0x4000000010
fatal error: fault
[signal 0xc0000005 code=0x0 addr=0x4000000010 pc=0x4d9da9]

goroutine 22323 [running]:
runtime.throw(0xbae045, 0x5)
        /usr/local/go/src/runtime/panic.go:619 +0x88 fp=0xc04220db70 sp=0xc04220db50 pc=0x42eec8
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_windows.go:170 +0x13a fp=0xc04220dba0 sp=0xc04220db70 pc=0x4433ca
fmt.(*pp).printArg(0xc0420b20c0, 0x4000000000, 0xc0494a2000, 0x76)
        /usr/local/go/src/fmt/print.go:637 +0xe9 fp=0xc04220dc18 sp=0xc04220dba0 pc=0x4d9da9
fmt.(*pp).doPrintln(0xc0420b20c0, 0xc04220df28, 0x3, 0x3)
        /usr/local/go/src/fmt/print.go:1146 +0x4c fp=0xc04220dc88 sp=0xc04220dc18 pc=0x4de8ec
fmt.Fprintln(0xc5cba0, 0xc042092008, 0xc04220df28, 0x3, 0x3, 0xaa1640, 0xac0f00, 0xc04976ef68)
        /usr/local/go/src/fmt/print.go:254 +0x5f fp=0xc04220dcf0 sp=0xc04220dc88 pc=0x4d6daf
fmt.Println(0xc04220df28, 0x3, 0x3, 0xc04976ef68, 0x0, 0x0)
        /usr/local/go/src/fmt/print.go:264 +0x5e fp=0xc04220dd40 sp=0xc04220dcf0 pc=0x4d6eae
posam/interpreter.(*StatementGroup).ExecuteSync.func1()
        /media/sf_GOPATH0/src/posam/interpreter/interpreter.go:1000 +0x268 fp=0xc04220dfb8 sp=0xc04220dd40 pc=0x8a91e8

the code:

util.Go(func() {
    defer close(respcc)
    go func() {
        g.ItemList.Cursor <- 0
    }()
    for itemi := range g.ItemList.Next() {
        //runtime.KeepAlive(itemi) // <- not lucky
        completec := make(chan interface{})
        i := itemi.Index
        itemInterface := itemi.Value
        fmt.Println(">>> parse instructions", i, itemInterface) // <- this line
// ...

What did you expect to see?

no panic.

What did you see instead?

The reason might not be data race:

  • I've tried to build the app with -race flag, and there's no error reported.
  • It works fine with GOGC=off.
@odeke-em odeke-em changed the title Panic only when GC is enabled runtime: panic only with GOGC=on and highly concurrent programs also using cgo Feb 17, 2019
@odeke-em odeke-em added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Feb 17, 2019
@odeke-em odeke-em added this to the Go1.13 milestone Feb 17, 2019
@odeke-em
Copy link
Member

Thank you for filling this bug report @sharpevo and welcome to the Go project!

I shall page some experts from the related areas @ianlancetaylor @randall77 @aclements.

@sharpevo
Copy link
Author

sharpevo commented Feb 20, 2019

Thank you odeke-em!

Here are more errors that looks like memory corruption, with GODEBUG=gctrace=1.

Error 2: variables in fmt.Println
Print variables in the generator function to check the original place that triggers the panic. As a result, It occurs just after GC.

CURSOR queue: &blockingqueue.BlockingQueue{lock:sync.Mutex{state:1, sema:0x0}, itemList:[]interface {}{(*interpreter.Statement)(0xc00029ee00)}, Cursor:(chan int)(0xc00035a600), pushCondition:conditionvariable.ChannelCondition{Mutex:sync.Mutex{state:0, sema:0x0}, c:(chan struct {})(0xc00035a660)}, termCondition:conditionvariable.ChannelCondition{Mutex:sync.Mutex{state:0, sema:0x0}, c:(chan struct {})(0xc00035a6c0)}}
CURSOR list: []interface {}{(*interpreter.Statement)(0xc00029ee00)}
gc 77 @12.091s 0%: 0.10+0.75+0.079 ms clock, 0.81+0.27/3.6/0.040>>> parse cur 0
+0.63CURSOR 1 2
 ms cpu, 1->1->1 MB, 17592186044415 MB goal, 50 P (forced)
unexpected fault address 0x0
fatal error: fault
[signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0xafe50a]

goroutine 1165 [running]:
runtime.throw(0x2a42423, 0x5)
	/usr/lib/go/src/runtime/panic.go:608 +0x72 fp=0xc00016d470 sp=0xc00016d440 pc=0xa70f32
runtime.sigpanic()
	/usr/lib/go/src/runtime/signal_unix.go:397 +0x275 fp=0xc00016d4c0 sp=0xc00016d470 pc=0xa868e5
reflect.(*rtype).Kind(...)
	/usr/lib/go/src/reflect/type.go:785
reflect.unpackEface(...)
	/usr/lib/go/src/reflect/value.go:147
reflect.Value.Elem(0x2793a60, 0xc0002b2560, 0x1b4, 0xc000080a80, 0xa4f0f3, 0x7f815b90c6c0)
	/usr/lib/go/src/reflect/value.go:785 +0x12a fp=0xc00016d4f8 sp=0xc00016d4c0 pc=0xafe50a
fmt.(*pp).printValue(0xc000302000, 0x2793a60, 0xc0002b2560, 0x1b4, 0x76, 0x3)
	/usr/lib/go/src/fmt/print.go:787 +0x16bf fp=0xc00016d6d8 sp=0xc00016d4f8 pc=0xb1686f
fmt.(*pp).printValue(0xc000302000, 0x271e040, 0xc0000d2dc8, 0x1b7, 0x76, 0x2)
	/usr/lib/go/src/fmt/print.go:833 +0x8e4 fp=0xc00016d8b8 sp=0xc00016d6d8 pc=0xb15a94
fmt.(*pp).printValue(0xc000302000, 0x283ce60, 0xc0000d2dc0, 0x199, 0xc000000076, 0x1)
	/usr/lib/go/src/fmt/print.go:783 +0x1ce9 fp=0xc00016da98 sp=0xc00016d8b8 pc=0xb16e99
fmt.(*pp).printValue(0xc000302000, 0x284e1a0, 0xc0000d2dc0, 0x16, 0xc000000076, 0x0)
	/usr/lib/go/src/fmt/print.go:853 +0x1b2c fp=0xc00016dc78 sp=0xc00016da98 pc=0xb16cdc
fmt.(*pp).printArg(0xc000302000, 0x284e1a0, 0xc0000d2dc0, 0xc000000076)
	/usr/lib/go/src/fmt/print.go:689 +0x2b7 fp=0xc00016dd10 sp=0xc00016dc78 pc=0xb14a47
fmt.(*pp).doPrintf(0xc000302000, 0x2a4d767, 0x12, 0xc00016df30, 0x1, 0x1)
	/usr/lib/go/src/fmt/print.go:1003 +0x166 fp=0xc00016ddf8 sp=0xc00016dd10 pc=0xb18686
fmt.Fprintf(0x2d6a800, 0xc0000a8008, 0x2a4d767, 0x12, 0xc000063f30, 0x1, 0x1, 0xb, 0x0, 0x0)
	/usr/lib/go/src/fmt/print.go:188 +0x72 fp=0xc00016de60 sp=0xc00016ddf8 pc=0xb110b2
fmt.Printf(0x2a4d767, 0x12, 0xc000063f30, 0x1, 0x1, 0x0, 0x0, 0x0)
	/usr/lib/go/src/fmt/print.go:197 +0x72 fp=0xc00016dec0 sp=0xc00016de60 pc=0xb111c2
posam/util/blockingqueue.(*BlockingQueue).NextLockless.func1(0xc0000d2dc0, 0xc00035a7e0)
	/home/yang/go/src/posam/util/blockingqueue/blockingqueue.go:182 +0x2d5 fp=0xc00016dfd0 sp=0xc00016dec0 pc=0xe49215
runtime.goexit()
	/usr/lib/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc00016dfd8 sp=0xc00016dfd0 pc=0xaa05d1
created by posam/util/blockingqueue.(*BlockingQueue).NextLockless
	/home/yang/go/src/posam/util/blockingqueue/blockingqueue.go:175 +0x67

code:

func (b *BlockingQueue) NextLockless() <-chan Item {
	itemc := make(chan Item)
	go func() {
		runtime.KeepAlive(b.itemList)
		defer close(itemc)
		var cursor int
		for cursor < len(b.itemList) {
			cursor = <-b.Cursor
			fmt.Println("CURSOR", cursor, len(b.itemList))
			fmt.Printf("CURSOR queue: %#v\n", b) // <- this line, but printed as well in the log
			fmt.Printf("CURSOR list: %#v\n", b.itemList)
			if cursor > len(b.itemList)-1 {
				break
			}
			if cursor == -1 {
				break
			}
			var item interface{}
			item = b.itemList[cursor]
			fmt.Printf("BLOCKINGQUEUE: %#v\n", item)
			itemc <- Item{cursor, item}
		}
	}()
	return itemc
}

Error 3: weird value
Type ExecutionType is defined as constant, which should be 0 or 1. But sometimes what I got is 47244640256, 738734374912, 180388626432, etc.

const (
	SYNC ExecutionType = iota
	ASYNC
)

type StatementGroup struct {
	Execution ExecutionType
	ItemList  *blockingqueue.BlockingQueue
}

func (g *StatementGroup) Execute(terminatec <-chan interface{}, completec chan<- interface{}) <-chan Response {
        // ...
	switch g.Execution {
	case SYNC:
		return bridge(terminatec, g.ExecuteSync(terminatec, completec))
	case ASYNC:
		return bridge(terminatec, g.ExecuteAsync(terminatec, completec))
	default:
		fmt.Printf("!!!!!! Execution type not matched\n%#v\n", g) // <- this line
	}
        // ...
}

error:

!!!!!! Execution type not matched
&interpreter.StatementGroup{lock:sync.Mutex{state:1, sema:0x40a08}, Execution:47244640256, ItemList:(*blockingqueue.BlockingQueue)(0xc0001c8230), Stack:(*instruction.Stack)(0xc00000ca60)}

or

!!!!!! Execution type not matched
&interpreter.StatementGroup{Mutex:sync.Mutex{state:1, sema:0x40a08}, Execution:738734374912, ItemList:(*blockingqueue.BlockingQueue)(0xc000228320), Stack:(*instruction.Stack)(0xc000226300)}

Value of Execution was manipulated occasionally:

2019/02/21 09:15:08 'LOOP: [1 c2]' done
>>> statement group: &interpreter.StatementGroup{Mutex:sync.Mutex{state:0, sema:0x0}, Execution:0, ItemList:(*blockingqueue.BlockingQueue)(0xc000180eb0), Stack:(*instruction.Stack)(0xc0001c9fc0)}
...
2019/02/21 09:15:10 ==== SYNC ====
!!!!!! Execution type not matched
&interpreter.StatementGroup{Mutex:sync.Mutex{state:1, sema:0x20a08}, Execution:751619276800, ItemList:(*blockingqueue.BlockingQueue)(0xc000180eb0), Stack:(*instruction.Stack)(0xc0001c9fc0)}

@sharpevo
Copy link
Author

Error 2: memory corruption

Instance is not accessable in the Lock function, even upgrading Golang to 1.11.5

$ go version
go version go1.11.5 linux/amd64
func (b *BlockingQueue) Lock() {
	fmt.Println(b) // <- this line
	b.lock.Lock()
}
unexpected fault address 0x2300000000
[signal SIGSEGV: segmentation violation code=0x1 addr=0x2300000000 pc=0xaa1837]

goroutine 13602 [running]:
runtime.throw(0x2a4355e, 0x5)
	/usr/lib/go/src/runtime/panic.go:608 +0x72 fp=0xc0002e0810 sp=0xc0002e07e0 pc=0xa70f32
runtime.sigpanic()
	/usr/lib/go/src/runtime/signal_unix.go:397 +0x275 fp=0xc0002e0860 sp=0xc0002e0810 pc=0xa868e5
runtime.memmove(0xc00014e780, 0x2300000000, 0x48)
	/usr/lib/go/src/runtime/memmove_amd64.s:194 +0x187 fp=0xc0002e0868 sp=0xc0002e0860 pc=0xaa1837
runtime.typedmemmove(0x283dfa0, 0xc00014e780, 0x2300000000)
	/usr/lib/go/src/runtime/mbarrier.go:170 +0x45 fp=0xc0002e08a0 sp=0xc0002e0868 pc=0xa57e45
reflect.typedmemmove(0x283dfa0, 0xc00014e780, 0x2300000000)
	/usr/lib/go/src/runtime/mbarrier.go:186 +0x3f fp=0xc0002e08c8 sp=0xc0002e08a0 pc=0xa57f0f
reflect.packEface(0x283dfa0, 0x2300000000, 0x199, 0x0, 0x0)
	/usr/lib/go/src/reflect/value.go:119 +0x9f fp=0xc0002e0908 sp=0xc0002e08c8 pc=0xafb85f
reflect.valueInterface(0x283dfa0, 0x2300000000, 0x199, 0x1, 0x0, 0x0)
	/usr/lib/go/src/reflect/value.go:1008 +0xe1 fp=0xc0002e0958 sp=0xc0002e0908 pc=0xaff3e1
reflect.Value.Interface(0x283dfa0, 0x2300000000, 0x199, 0x0, 0x0)
	/usr/lib/go/src/reflect/value.go:978 +0x44 fp=0xc0002e0998 sp=0xc0002e0958 pc=0xaff2d4
fmt.(*pp).printValue(0xc0002a40c0, 0x283dfa0, 0x2300000000, 0x199, 0x2300000076, 0x1)
	/usr/lib/go/src/fmt/print.go:699 +0x293a fp=0xc0002e0b78 sp=0xc0002e0998 pc=0xb17aea
fmt.(*pp).printValue(0xc0002a40c0, 0x2852300, 0x2300000000, 0x16, 0xc000000076, 0x0)
	/usr/lib/go/src/fmt/print.go:853 +0x1b2c fp=0xc0002e0d58 sp=0xc0002e0b78 pc=0xb16cdc
fmt.(*pp).printArg(0xc0002a40c0, 0x2852300, 0x2300000000, 0x76)
	/usr/lib/go/src/fmt/print.go:689 +0x2b7 fp=0xc0002e0df0 sp=0xc0002e0d58 pc=0xb14a47
fmt.(*pp).doPrintln(0xc0002a40c0, 0xc0002e0f58, 0x1, 0x1)
	/usr/lib/go/src/fmt/print.go:1146 +0xb1 fp=0xc0002e0e60 sp=0xc0002e0df0 pc=0xb19a21
fmt.Fprintln(0x2d6bc60, 0xc00000e018, 0xc00027e758, 0x1, 0x1, 0xc00064ac00, 0xc0002f4380, 0x0)
	/usr/lib/go/src/fmt/print.go:254 +0x58 fp=0xc0002e0ec8 sp=0xc0002e0e60 pc=0xb11688
fmt.Println(0xc00027e758, 0x1, 0x1, 0x0, 0xc00027e768, 0xa6fc92)
	/usr/lib/go/src/fmt/print.go:264 +0x57 fp=0xc0002e0f18 sp=0xc0002e0ec8 pc=0xb11787
posam/util/blockingqueue.(*BlockingQueue).Lock(0x2300000000)
	/home/yang/go/src/posam/util/blockingqueue/blockingqueue.go:211 +0x5f fp=0xc0002e0f78 sp=0xc0002e0f18 pc=0xe4879f
posam/util/blockingqueue.(*BlockingQueue).Iter.func1(0xc0001926c0, 0x2300000000)
	/home/yang/go/src/posam/util/blockingqueue/blockingqueue.go:124 +0x59 fp=0xc0002e0fd0 sp=0xc0002e0f78 pc=0xe48ab9
runtime.goexit()
	/usr/lib/go/src/runtime/asm_amd64.s:1333 +0x1 fp=0xc0002e0fd8 sp=0xc0002e0fd0 pc=0xaa05d1
created by posam/util/blockingqueue.(*BlockingQueue).Iter
	/home/yang/go/src/posam/util/blockingqueue/blockingqueue.go:122 +0x62

@randall77
Copy link
Contributor

Looking at that last error, it appears that a bad value for your *BlockingQueue, 0x2300000000, was provided at the start of the goroutine. Where did that value come from? Are BlockingQueues always heap allocated? It might help to do "b.Lock();b.Unlock()" or something else that uses b just before the go statement that spawns the goroutine listed (at blockingqueue.go:122). That way we'll be able to see the stack trace of the spawning goroutine.

Is there any way you can provide code and instructions so that we can reproduce the error ourselves?

@sharpevo
Copy link
Author

sharpevo commented Feb 26, 2019

I've worked on some fixes for couple of days, and lines were changed in BlockingQueue. But function Iter() has not been messed up yet. BlockingQueue manages a slice of interface{} concurrently. b.Lock() and b.Unlock() is also provided in it.

Recently, I found that there're many similar errors in the same app. Variables/channels might be GCed before called. runtime.KeepAlive() can not help more either, since the error occurs even when the variables initialized by make. I still have no clue for these unexpected fault address errors.

Thank you for the patience!

@sharpevo
Copy link
Author

sharpevo commented Feb 26, 2019

@odeke-em @randall77

The problem has been resolved with the help of Tamás Gulácsi.

Here is an analysis of the reason in shot.

As JimB commented:

You are however passing a pointer to a go slice into a C function call, so it's likely that you're just corrupting the program memory and that call happens to be were things fail.

In the begining of the execution, I pass the SliceHeader rather than the first elements pointer to C functions by mistake, then C fills the array and mess up the contiguous addresses starts from given pointer. Everything works fine temporarily until some memory operations happened in Go, e.g., variable declaration or slice growing as I posted. The memory corruption occurs in a high probability since the length of array is 2500 and, the corrupted addresses, kept untouched if the app is running with GOGC=off, are recognized as available both in Go after GC, and in C after the calling in .dll or .so.

It's difficult to figure out the real reason by stacktrace, as Jan Mercl posted:

the crash is caused by memory corruption which happened way before executing those lines.

Thank you again!

@golang golang locked and limited conversation to collaborators Feb 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants