Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timer related go routine schedule problems under high concurrency #40786

Closed
lwpyr opened this issue Aug 14, 2020 · 7 comments
Closed

Timer related go routine schedule problems under high concurrency #40786

lwpyr opened this issue Aug 14, 2020 · 7 comments

Comments

@lwpyr
Copy link

lwpyr commented Aug 14, 2020

What version of Go are you using (go version)?

$ go version
go version go1.14.7 linux/amd64

Does this issue reproduce with the latest release?

Latest is 1.15, so I think the answer should be No

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/wli1/.cache/go-build"
GOENV="/home/wli1/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/wli1/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build258194693=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I write a server to handle requests from clients. I hope for each connection the server should process it in a limited time, for example 10ms, which means no matter has the server done the work or not, it should give a respones to the client when the time reachs 10ms. Pseudo-codes are here:

func RecieveRequest(req Request) {
    // tic here
    done := Process(req) // done is a channel to get the response to send, process usually will be done in 6 ms
    select {
    case result := <-done:
    // toc here
        SendResponse(result)
    case <-time.After(time.Duration(10 * time.Millisecond):
    // toc here
        SendResponse(ServerTimeOut) // ServerTimeOut is a predefined response
    }
}

func Serve() {
    // ... codes ...
    for {
     // ... codes ...
        req := accept()
        go RecieveRequest(req)
     // ... codes ...
    }
    // ... codes ...
}

What did you expect to see?

I use tic-toc to messure the actual time interval, the actual time interval should be within 10ms, or a little bit more than 10ms.

What did you see instead?

under low concurrency, most request is processed normally, but there are still very few(under 0.01%) requests' actual process intervals are like 15ms.

under high concurrency, situation worsened. There are more(above 0.1%) actual intervals are larger than 10ms, and the worst case could reach 50~500 ms, which will cause the client timeout.

I think timers should be correct since there is nothing like tight-loop in the codes, during ervery schedule event the processor would check the timer heap to fire those should be fired. Then the problem seems to be the scheduler itself, the scheduler seems not be able to schedule the timer related goroutines to run immediately. But this is just my guess, I'm not sure.

So is this phenomenon normal? Or I did something wrong?

If my guess is correct, should the timer related routines be pushed on the front of the queue?

Hope someone could help, thanks.

@AlexRouSg
Copy link
Contributor

func RecieveRequest(req Request) {
    // tic here
    done := Process(req) // done is a channel to get the response to send, process usually will be done in 6 ms
    select {
    case result := <-done:
        SendResponse(result)
    case <-time.After(time.Duration(10 * time.Millisecond):
        SendResponse(ServerTimeOut) // ServerTimeOut is a predefined response
    }
    // toc here
}

Time = ~10ms + time for SendResponse

You should measure it like ...

func RecieveRequest(req Request) {
    // tic here
    done := Process(req) // done is a channel to get the response to send, process usually will be done in 6 ms
    select {
    case result := <-done:
        SendResponse(result)
    case <-time.After(time.Duration(10 * time.Millisecond):
        // toc here
        SendResponse(ServerTimeOut) // ServerTimeOut is a predefined response
    }
}

@lwpyr
Copy link
Author

lwpyr commented Aug 14, 2020 via email

@AlexRouSg
Copy link
Contributor

The only way to be sure it's the timer and not SendResponse that's taking longer than expected is to time it as such.

@lwpyr
Copy link
Author

lwpyr commented Aug 14, 2020 via email

@lwpyr
Copy link
Author

lwpyr commented Aug 14, 2020

@AlexRouSg Hello, thanks for your adivice, I have done the test exactly after the timer is fired. I have edited the description. and the result is same.

@networkimprov
Copy link

See #38860 which is fixed in 1.15 but not in 1.14... #39383

cc @ianlancetaylor

@ianlancetaylor
Copy link
Contributor

Yes, this looks like a dup of #38860. Closing as a dup, but please comment if you disagree.

@golang golang locked and limited conversation to collaborators Aug 14, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants