Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: timer gets significantly delayed during long time GC #45632

Open
wangforgo opened this issue Apr 19, 2021 · 16 comments
Open

time: timer gets significantly delayed during long time GC #45632

wangforgo opened this issue Apr 19, 2021 · 16 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@wangforgo
Copy link

What version of Go are you using (go version)?

go1.15.8

$ go version

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\xxx\AppData\Local\go-build
set GOENV=C:\Users\xxx\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=C:\Users\xxx\go\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=C:\Users\xxx\go
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=C:\Program Files\Go
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=C:\Users\xxx\AppData\Local\Temp\go-build791935606=/tmp/go-build -gno-record-gcc-switches

What did you do?

First, I initialized a large number of objects that will not be released during the running of my program. These objects will cause a long GC time.
Then I started 6 goroutines:

  • 1 goroutine blocked on a ticker with 200ms period, it will print the ticker time received from the channel tick.C;
  • 4 goroutines with busy task, just infinite loop;
  • 1 goroutine for trigging GC every second.

After that, the main goroutine sleeps for 20 seconds, and waits for the ticker goroutine's printing.

The source codes of my program are also posted:

package main

import (
	"fmt"
	"runtime"
	"time"
)

var x [][]*int

func prepareGarbage() {
	x = make([][]*int, 100)
	for i := range x {
		x[i] = make([]*int, 10000000)
	}
}

func main() {
	fmt.Println(runtime.Version())
	// GOMAXPROCS should be at least 4, so dedicated GC will be enabled.
	runtime.GOMAXPROCS(4) 

	// prepareGarbage for long GC mark time
	prepareGarbage()

	go ticker(200)

	// for keeping all P busy
	for i := 0; i < 4; i++ {
		go busyTask()
	}

	go func() {
		for {
			time.Sleep(time.Second)
			runtime.GC()
		}
	}()

	// sleep enough time for the timer wakeup delay happening
	time.Sleep(time.Second * 20)
}

func ticker(periodms int64) {
	tc := time.NewTicker(time.Duration(periodms) * time.Millisecond)
	fmt.Printf("timer with %v ms period started...\n", periodms)
	var ticks uint64
	var last time.Time
	for t := range tc.C {
		if ticks == 0 {
			last = t
		} else {
			elapsedTime := t.Sub(last).Milliseconds()
			if elapsedTime > periodms*2 {
				fmt.Printf("\n%v ms=========== WAKE UP TOO LATE!!!\n", elapsedTime)
			} else {
				fmt.Printf("%v, ", elapsedTime)
			}
			last = t
		}
		ticks++
	}
}

func busyTask() {
	for {
		task()
	}
}

//go:noinline
func task() {
	do()
}

//go:noinline
func do() {}

What did you expect to see?

I expect to see all the elapsed time will be around 200 with small deviation.

go1.15.8
timer with 200 ms period started...
200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 190, 200, 200, 200, 200, 200, 200, 200, 200, 200, .......
Process finished with exit code 0

What did you see instead?

I actually found the timer will be randomly delayed with a large delta. In some test, the 200 ms ticker can be unbelievably delayed for more than one second. The result is posted:

go1.15.8
timer with 200 ms period started...
200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 200, 190, 200, 200, 200, 200, 200, 201, 200, 200, 200,
541 ms=========== WAKE UP TOO LATE!!!
59, 200, 200, 200, 200,
1516 ms=========== WAKE UP TOO LATE!!!
84, 200, 200, 200, 200,
1458 ms=========== WAKE UP TOO LATE!!!
142, 200, 200, 200, 200,
1403 ms=========== WAKE UP TOO LATE!!!
197, 200, 200, 200, 200,
1378 ms=========== WAKE UP TOO LATE!!!
22, 200, 200, 200, 200,
1528 ms=========== WAKE UP TOO LATE!!!
72, 200, 200, 200, 200,
1511 ms=========== WAKE UP TOO LATE!!!
89, 200,
Process finished with exit code 0

@prattmic prattmic added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 19, 2021
@prattmic
Copy link
Member

prattmic commented Apr 19, 2021

Thanks for the filing the issue. Does this issue reproduce with 1.16 as well? Edit: nevermind, I see you said it applies to the latest release as well.

@wangforgo
Copy link
Author

The listed demo works well with Go 1.13 while it reproduces with the Go versions with 1.14 and later. It is probably because the GC dedicated keeps running and cannot be preempted. Therefore, there is no chance for the P running GC dedicated to enter schedule, in which timers will be checked. When the GC dedicated finishes, the P running GC dedicated begin to check timers, but timers has been already delayed significantly.

As a easy fix, we can make the P's not running GC dedicated help check timers for the P running GC dedicated, by adding a random checkTimerForOtherPs for other P after the origin checkTimers in schedule.

It does work, but I am not sure whether it is a proper fix.

@randall77
Copy link
Contributor

runtime.GC is a blocking operation, including running any goroutines that were enabled because of timers.

GC runs a garbage collection and blocks the caller until the garbage collection is complete. It may also block the entire program.

@wangforgo
Copy link
Author

@randall77 Hi randall, if I read this right, calling runtime.GC will block the caller goroutine. It is reasonable that the caller gets blocked because it needs to wait GC finished. I don't think the other goroutines should also be blocked.

To make it clear, I collect the trace data of the timer demo. We can see goroutines are well scheduled during GC. Each goroutine has a small scheduling slice and all the runnable goroutines have chances to run.
allTrace

However, the timer-driven goroutine G19, which is expected to be scheduled every 200ms, gets significantly delayed. It misses many ticks when GC is working, i.e., the second GC phase in the following graph.
trace

The timer-driven goroutine have a totally different behavior compared with other goroutines. And such behavior weakens the precision of timer strongly.

@randall77
Copy link
Contributor

I don't think the other goroutines should also be blocked.

We don't make that promise at the moment.

I'm not sure what might be involved in making runtime.GC not block other goroutines. @mknyszek @prattmic might know more.

@mknyszek
Copy link
Contributor

We don't make that promise but generally speaking it doesn't block other goroutines except briefly to stop-the-world (just as a GC does).

I don't think the dedicated GC needs to be involved in this solution at all -- ideally, it really does just run unimpeded. We have 3 other Ps (in this reproducer) that should be checking timers regularly, basically whenever they yield back into the scheduler. Your execution trace shows that we're very clearly time slicing these goroutines, so they're calling into the scheduler fairly often. The fact that timers are missed seems like it could be a bug in the scheduler or timer system (but I haven't thought about this enough to have an idea of what it is yet).

@networkimprov
Copy link

cc @ChrisHines

@ChrisHines
Copy link
Contributor

I believe the issue here is that in the situations where the dedicated GC runs on the P that owns the 200ms timer the timer starves while the GC is running. It starves because the owning P is busy and because all of the other Ps have a steady supply of runnable Gs. Ps only attempt to steal timers from other Ps when they are out of local work, which never happens in this scenario. As a result the timer isn't checked until it's owning P finishes the GC work and checks its own timers.

@Jason7602
Copy link
Contributor

Jason7602 commented Sep 26, 2021

This problem has also been encountered in our company’s products. Because my company‘s product service has a very long GC cycle and a relatively large number of timers, the probability of serious timer timeouts is particularly high. A lot of our processing logic relies on the accuracy of the timer, so the accuracy of the timer is very important to us.
@ChrisHines analyzed in detail the cause of this problem, that is, when the GC marking time is particularly long, the dedicated P is in the systemstack, the signal cannot preempt it, so this P cannot check its own local timer in the dedicated state. Other P are very busy, and there is no way to check the timers on this P, so the timers on this P have seriously timed out.

@gopherbot
Copy link

Change https://golang.org/cl/352710 mentions this issue: time: fix timer significantly delayed during long time GC

@aclements
Copy link
Member

We do have a fairly efficient mechanism for stealing timers from other Ps (stealWork). However, based on the trace diagrams, I think what's happening here is that the non-GC Ps always have something in their run queue, and findrunnable will only call stealWork if both the local and global run queues are empty, so we wind up never attempting to steal timers from the P running the dedicated GC worker. (This all agrees with what @ChrisHines said.)

This problem isn't necessarily specific to the dedicated GC worker, though that maybe the worst culprit. We could make the scheduler try stealing timers more frequently, sort of like how it currently consults the global run queue periodically. We could focus on the dedicated GC worker and have pollWork check for expired timers on the calling P.

(More broadly, I've been thinking for a while about unsharding the timer queue and replacing it with a true concurrent data structure. That would probably solve this, too, because any P could run an expired timer, but that's a big chunk of work to bite off.)

@ChrisHines
Copy link
Contributor

@aclements I agree that the problem is not specific to the dedicated GC worker. I was going to reply along the same lines, but you beat me to it. :)

I am curious about your ideas for unsharding the timer queue. It seems to me that it could certainly simplify the runtime's model of timers, but how do we avoid reintroducing #15133? Did you have any specific concurrent data structures in mind for this situation?

@wangforgo
Copy link
Author

@aclements Yes, steal work by checkTimers has the lowest priority in findrunnable. Therefore making sheduler stealing more frequent is a good solution. But as @ChrisHines pointed out, the horizontal scalability should also be taken into account, considering potential fierce contention between P's when the CPU number is very large. As for the GC-dedicated case, is it possible for an extra OS thread (instead of other P's) checking the starving timers on the dedicated-GC P‘s?

@Jason7602
Copy link
Contributor

Jason7602 commented Sep 30, 2021

We do have a fairly efficient mechanism for stealing timers from other Ps (stealWork). However, based on the trace diagrams, I think what's happening here is that the non-GC Ps always have something in their run queue, and findrunnable will only call stealWork if both the local and global run queues are empty, so we wind up never attempting to steal timers from the P running the dedicated GC worker. (This all agrees with what @ChrisHines said.)

This problem isn't necessarily specific to the dedicated GC worker, though that maybe the worst culprit. We could make the scheduler try stealing timers more frequently, sort of like how it currently consults the global run queue periodically. We could focus on the dedicated GC worker and have pollWork check for expired timers on the calling P.

Starting from go1.16, there has been a relatively large optimization on the timer. The basic strategy is to increase the frequency of stealing timer. Especially in the findrunnable, the timers on other P will be considered when calculating pollUntil. What's more, put the priority of stealing timers from other Ps before stealing g from other Ps. This optimization can be a good solution to improve the accuracy of the timer when other P is not very busy, but it makes no sense when P is very busy, cause it even do not enter the findrunnable function.
To solve the timer significantly delayed when P is busy, simply increasing the frequency of the steal timer may bring some side effects, such as more scheduling contention, or useless timer stealing. Because not in all cases, we need to steal the timers on other P.
I also agree that the problem is not specific to the dedicated GC worker. But in any case, we need have a way which can efficiently detect whether there is a serious timer delayed in system, and then accurately steal the timer on the P.

@ChrisHines
Copy link
Contributor

As for the GC-dedicated case, is it possible for an extra OS thread (instead of other P's) checking the starving timers on the dedicated-GC P‘s?

Prior to 8fdc79e in Go 1.16 the sysmon thread did exactly that. When it found a starving timer it would try to wake up a sleeping P. But that strategy doesn't work in the situation described in this issue since all the P's are busy.

This issue boils down to a question of priorities. The system is over subscribed with more work than it has P's to handle. If we want it to give more priority to timers something else will get less priority. Done poorly something else may starve.

@Jason7602
Copy link
Contributor

Jason7602 commented Sep 30, 2021

As for the GC-dedicated case, is it possible for an extra OS thread (instead of other P's) checking the starving timers on the dedicated-GC P‘s?

Prior to 8fdc79e in Go 1.16 the sysmon thread did exactly that. When it found a starving timer it would try to wake up a sleeping P. But that strategy doesn't work in the situation described in this issue since all the P's are busy.

This issue boils down to a question of priorities. The system is over subscribed with more work than it has P's to handle. If we want it to give more priority to timers something else will get less priority. Done poorly something else may starve.

Yes, Prior to 8fdc79e in Go 1.16 the sysmon thread did exactly that, which doesn't work mainly because the startm doesn't work(it can't find a idle P to wake up when all Ps are busy), not the sysmon thread. You can review the CL352710 which takes a different approach to deal with extremely starving timers, even though it also use sysmon thread to mark the timer with seriously timeouts.

@mknyszek mknyszek added this to the Backlog milestone Nov 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

9 participants