Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syscall: StartProcess blocked at acquire lock #26836

Closed
wushukai opened this issue Aug 7, 2018 · 9 comments
Closed

syscall: StartProcess blocked at acquire lock #26836

wushukai opened this issue Aug 7, 2018 · 9 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Milestone

Comments

@wushukai
Copy link

wushukai commented Aug 7, 2018

What version of Go are you using (go version)?

go version go1.8.3 linux/amd64

Does this issue reproduce with the latest release?

We have not tested yet.

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/workspace"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build598025583=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

Our service will start subprocess periodically.

And we found some goroutines hanged for competing ForkLock.Lock(), but none succeed. Stack of one blocked goroutine is listed below

goroutine 1526643 [semacquire, 8274 minutes]:
sync.runtime_SemacquireMutex(0x1a2aba4)
/usr/local/go/src/runtime/sema.go:62 +0x34
sync.(*Mutex).Lock(0x1a2aba0)
/usr/local/go/src/sync/mutex.go:87 +0x9d
sync.(*RWMutex).Lock(0x1a2aba0)
/usr/local/go/src/sync/rwmutex.go:86 +0x2d
syscall.forkExec(0xce1c2b, 0x7, 0xc426b93410, 0x3, 0x3, 0xc42019f9e8, 0x0, 0x0, 0x0)
/usr/local/go/src/syscall/exec_unix.go:185 +0x1fd
syscall.StartProcess(0xce1c2b, 0x7, 0xc426b93410, 0x3, 0x3, 0xc42019f9e8, 0x2, 0x4, 0xc4244941e0, 0xc42019f9b8)
/usr/local/go/src/syscall/exec_unix.go:240 +0x64
os.startProcess(0xce1c2b, 0x7, 0xc426b93410, 0x3, 0x3, 0xc42019fb90, 0xc426b93440, 0x3, 0x3)
/usr/local/go/src/os/exec_posix.go:45 +0x1a3
os.StartProcess(0xce1c2b, 0x7, 0xc426b93410, 0x3, 0x3, 0xc42019fb90, 0x0, 0x0, 0x28)
/usr/local/go/src/os/exec.go:94 +0x64
os/exec.(*Cmd).Start(0xc426e70580, 0xc42019fc01, 0xc424fca850)
/usr/local/go/src/os/exec/exec.go:359 +0x3d2
os/exec.(*Cmd).Run(0xc426e70580, 0xc424fca850, 0xc426e70580)
/usr/local/go/src/os/exec/exec.go:277 +0x2b

And besides the blocked groutines, there are not other stack containing function "forkExec".

The problem occurs sometimes among our production services, but I have not found any way to ensure reproducing this.

Please help provides some clue for debugging this problem...

@tklauser tklauser added OS-Linux NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 7, 2018
@tklauser tklauser changed the title syscall.StartProcess blocked at acquire lock syscall: StartProcess blocked at acquire lock Aug 7, 2018
@tklauser
Copy link
Member

tklauser commented Aug 7, 2018

/cc @ianlancetaylor

@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Aug 7, 2018
@ianlancetaylor
Copy link
Contributor

What you are describing sounds like a clear bug, but I do not recall any similar reports. The fork lock is only held briefly and I do not know of any code path in which it could be left locked. Is there any way that we can reproduce the problem ourselves? Is it possible that the kernel is sometimes killing a single thread of your program?

@ianlancetaylor ianlancetaylor added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 7, 2018
@wushukai
Copy link
Author

wushukai commented Aug 8, 2018

Our server use a customized kernel based on linux 4.1. And I had checked the kernel messages, and found nothing seems related to this problem.

I wrote a simple test program which randomly starts child processes, and have not reproduced yet.

I have tried upgrade the golang version to 1.10.3, and deploys the new version to parts of our production servers, I will check if it can reproduce in this new version

@crvv
Copy link
Contributor

crvv commented Aug 8, 2018

The ForkLock is a public variable. It can be held by any code.
Maybe this isn't a bug of stdlib but some other libraries.

@wushukai
Copy link
Author

wushukai commented Aug 9, 2018

@crvv
I checked all source code under GOPATH, no other libs use this lock..

@odeke-em
Copy link
Member

How's it going @wushukai? Any more returns of this bug? Able to perhaps isolate a reproduction?

@wushukai
Copy link
Author

@odeke-em The problem was gone after we upgraded to 1.10.3. we had have some experiments in our test environment using version 1.8.3, but had no luck to reproduce..

@odeke-em
Copy link
Member

odeke-em commented Feb 1, 2019

@odeke-em The problem was gone after we upgraded to 1.10.3. we had have some experiments in our test environment using version 1.8.3, but had no luck to reproduce..

Thank you for the update @wushukai!

Given that this bug is elusive/no pulse yet to reproduce it after Go1.10, perhaps we could:
a) Move this issue to unplanned and out of the Go1.12 milestone
b) Close this issue as non-actionable

@ianlancetaylor what do you think we should do?

@ianlancetaylor
Copy link
Contributor

It sounds like it can be reproduced in newer versions of Go, so I think we can close this. Thanks.

@golang golang locked and limited conversation to collaborators Feb 1, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. OS-Linux WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided.
Projects
None yet
Development

No branches or pull requests

6 participants