Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: inaccurate trace of GC time #19590

Closed
zhaozhiqianghw opened this issue Mar 17, 2017 · 6 comments
Closed

runtime: inaccurate trace of GC time #19590

zhaozhiqianghw opened this issue Mar 17, 2017 · 6 comments

Comments

@zhaozhiqianghw
Copy link

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go 1.8

What operating system and processor architecture are you using (go env)?

linux amd64

What did you do?

When I use gctrace to see the time cost in STW, I have some puzzles in the measurement of STW1.

		prev := work.tSweepTerm
		for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} {
			if i != 0 {
				print("+")
			}
			print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev))))
			prev = ns
		}

I think work.tMark - work.tSweepTerm is the time of STW1. work.tSweepTerm is recorded as follows:

	now := nanotime()
	work.stwprocs, work.maxprocs = gcprocs(), gomaxprocs
	work.tSweepTerm = now

I think this is not proper. Because functiom gcprocs has a sched.lock. So a loog time sleep may occur in this function. Actually I really find a long time cost in this function. In my examples, STW1 changed a lot. When I hack in code, I find that it cost in this lock. But this time should not be the time of STW1, because in this lock STW has not happened.

So the time cost in this lock should be added to STW1. The code should be changed to

	work.stwprocs, work.maxprocs = gcprocs(), gomaxprocs
        now := nanotime()
	work.tSweepTerm = now

The function gcprocs is below:

func gcprocs() int32 {
	// Figure out how many CPUs to use during GC.
	// Limited by gomaxprocs, number of actual CPUs, and MaxGcproc.
	lock(&sched.lock)
	n := gomaxprocs
	if n > ncpu {
		n = ncpu
	}
	if n > _MaxGcproc {
		n = _MaxGcproc
	}
	if n > sched.nmidle+1 { // one M is currently running
		n = sched.nmidle + 1
	}
	unlock(&sched.lock)
	return n
}
@ianlancetaylor ianlancetaylor changed the title runtime/gc: STW1 time measured puzzle. runtime: inaccurate trace of GC time Mar 17, 2017
@ianlancetaylor
Copy link
Contributor

CC @aclements

Do you have a test case that shows the problem more clearly?

@zhaozhiqianghw
Copy link
Author

I can not get the codes, they are in my company. But I do separate the STW1, and can find the time cost in gcprocs. From the logic of the definition of gcprocs, it do have this danger of lock in sched.lock. sched.lock is a quite common lock.

The test case I used starts thousands of goroutine and lots of channels. It's quite an extreme example, I do really find this delay in that case, but not find in other cases. Anyway, if the original version has no particular consideration in this sequence, I think putting the time recording below gcprocs is better.

@aclements
Copy link
Member

aclements commented Mar 17, 2017 via email

@zhaozhiqianghw
Copy link
Author

The delay was about several seconds. It's about several seconds, although it's 100 us normal. Just some of them, not always. My goroutines are about 100K. About the runnable goroutines are not checked. But most of them are waiting on channel for other goroutines give it message. GOMAXPROCS is 4.

@zhaozhiqianghw
Copy link
Author

This case mainly occur on the force.g which is triggered by sysmon, I think this info is helpful for your consideration.

@bradfitz bradfitz added this to the Go1.9Maybe milestone Mar 21, 2017
@gopherbot
Copy link

CL https://golang.org/cl/49612 mentions this issue.

@golang golang locked and limited conversation to collaborators Jul 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants