Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: syscall hangs during STW #36273

Closed
WangLeonard opened this issue Dec 24, 2019 · 2 comments
Closed

runtime: syscall hangs during STW #36273

WangLeonard opened this issue Dec 24, 2019 · 2 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@WangLeonard
Copy link
Contributor

WangLeonard commented Dec 24, 2019

What version of Go are you using (go version)?

$ go version
`go version devel +48ed1e6113 Tue Dec 24 04:59:06 2019 +0000 darwin/amd64`
and go1.13.3

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/wangdeyu/Library/Caches/go-build"
GOENV="/Users/wangdeyu/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/wangdeyu/Project/GOPATH"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/Users/wangdeyu/Local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/Users/wangdeyu/Local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/09/p8vzp3rn55ggpkq_rv1_hg100000gn/T/go-build532421777=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

package main

//#include <stdio.h>
//#include <stdlib.h>
//#include <unistd.h>
// int mysleep(){
// usleep(1000000);
// return 0;
// }
import "C"
import (
   "log"
   "time"
)

//go:linkname stopTheWorld runtime.stopTheWorld
func stopTheWorld(reason string)

//go:linkname startTheWorld runtime.startTheWorld
func startTheWorld()

func loopDoAdd() {
   for {
      ans := 0
      for i := 0; i < 1000000; i++ {
             ans++
      }
   }
   time.Sleep(time.Microsecond * 10)
}

func main() {
    // Just keep the application busy.
   for i := 0; i < 20; i++ {
      go loopDoAdd()
   }

   for {
      stopTheWorld("TEST")
      C.mysleep()
      startTheWorld()
      log.Println("Done")
      time.Sleep(time.Millisecond)
   }
}

and need to add a empty.s file.

What did you expect to see?

continuously output Done

What did you see instead?

2019/12/24 22:02:16 Done
2019/12/24 22:02:17 Done

...

and hungup.

I was able to reproduce the problem on macos and one Ubuntu, but another Ubuntu will not reproduce (I am not sure of the retake strategy, and sysmon retake will not be triggered on this machine).

I found, if do syscall during STW, and the P where this goroutine is located is retaken by sysmon, this phenomenon will occur.

I have analyzed that this problem occurs when exitsyscall, exitsyscallfast fail->exitsyscall0->globrunqput(gp)->stopm

But in STW, schedule() has no chance to call globrunqget, so it will hungup.

I tried to make the following modification in func retake(now int64) uint32 of proc.go to avoid this problem.

But this is not a perfect solution, I guess handoffp has problem with P's status processing in STW.

func retake(now int64) uint32 {
   n := 0
   lock(&allpLock)
   for i := 0; i < len(allp); i++ {
      _p_ := allp[i]
      if _p_ == nil {
         continue
      }
      pd := &_p_.sysmontick
      s := _p_.status
      
			// skip syscall P in STW.
      if s == _Psyscall && sched.gcwaiting != 0 {
         continue
      }
     ……
     ……
@smasher164 smasher164 changed the title runtime: syscall in STW maybe hangup runtime: syscall hangs during STW Dec 25, 2019
@smasher164 smasher164 added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 25, 2019
@odeke-em
Copy link
Member

Thank you for this report and reproducer @WangLeonard!

Kindly looping in some runtime and garbage collection folks @aclements @mknyszek

@cherrymui
Copy link
Member

stopTheWorld is a runtime internal function. Calling stopTheWorld from user code is not supported.

@golang golang locked and limited conversation to collaborators Dec 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants