Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: osyield is expensive on darwin? #19409

Open
josharian opened this issue Mar 5, 2017 · 9 comments
Open

runtime: osyield is expensive on darwin? #19409

josharian opened this issue Mar 5, 2017 · 9 comments
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. OS-Darwin Performance
Milestone

Comments

@josharian
Copy link
Contributor

This is a naive question about an unexpected and striking benchmarking result.

At tip, on my amd64 OS X laptop, I get:

$ go test -bench=BenchmarkSplitSingleByteSeparator -run=NONE bytes
BenchmarkSplitSingleByteSeparator-8   	     500	   2722171 ns/op

If I apply CL 37795, the execution time increases 65%:

$ go test -bench=BenchmarkSplitSingleByteSeparator -run=NONE bytes
BenchmarkSplitSingleByteSeparator-8   	     300	   4518575 ns/op

Note that in that CL, all that really happens is that a single function is removed from the call stack. Index checks the length of its argument, and if it is 1, then Index calls IndexByte.

CPU profiling indicates that basically all of the extra time is spent in runtime.usleep (called from runtime.osyield) and runtime.mach_semaphore_signal (called from runtime.notewakeup).

I'm left wondering:

(1) Is there a cheaper way to do an osyield on darwin that doesn't cost a full microsecond? (Linux appears to make an arch_prctl syscall instead of calling select.)

(2) Why does removing a function call from the stack create additional calls to osyield? Can this be avoided?

cc @ianlancetaylor @aclements

@josharian josharian added this to the Go1.9Maybe milestone Mar 5, 2017
@mundaym
Copy link
Member

mundaym commented Mar 5, 2017

(2) Why does removing a function call from the stack create additional calls to osyield? Can this be avoided?

Possibly related to #10958? Index is preemptible, IndexByte is not. So after CL 37795 the long running loop inside genSplit is no longer preemptible.

@josharian
Copy link
Contributor Author

Possibly related to #10958? Index is preemptible, IndexByte is not.

Oh right, of course. And changing the CL to occasionally call Index instead of IndexByte eliminates the regression:

	if len(sep) == 1 && i&1024 != 0 {

I am still interested in making osyield cheaper. I have definitely seen those stack traces show up prominently in other profiles as well.

@gopherbot
Copy link

CL https://golang.org/cl/37890 mentions this issue.

@bradfitz
Copy link
Contributor

bradfitz commented Mar 7, 2017

Assigning to @rsc for review.

@rsc
Copy link
Contributor

rsc commented Mar 7, 2017

I've seen surprising amounts of usleep in profiles but I assumed the macOS kernel had somehow screwed up profiling again. If this turns out to be the problem instead, great.

@rsc
Copy link
Contributor

rsc commented Mar 7, 2017

I spent a little while playing with this. Adding an explicit preemption check in the genSplit loop fixes the problem, so @dr2chase's back-edge preemption would fix this problem assuming it can be landed without other bad performance effects.

// Generic split: splits after each instance of sep,
// including sepSave bytes of sep in the subslices.
func genSplit(s, sep []byte, sepSave, n int) [][]byte {
	if n == 0 {
		return nil
	}
	if len(sep) == 0 {
		return explode(s, n)
	}
	if n < 0 {
		n = Count(s, sep) + 1
	}

	a := make([][]byte, n)
	n--
	i := 0
	for i < n {
		var m int
		if len(sep) == 1 {
			m = IndexByte(s, sep[0])
		} else {
			m = Index(s, sep)
		}
		if m < 0 {
			break
		}
		a[i] = s[:m+sepSave]
		s = s[m+len(sep):]
		i++
		nop()
	}
	a[i] = s
	return a[:i+1]
}

var b bool

//go:noinline
func nop() {
	if b {
		nop()
	}
}

Overall, the pending CL to replace osyield's usleep(1) with sched_yield() on macOS has a mixed effect on the go1 benchmarks but has an overall -0.29% geomean speedup.
https://perf.golang.org/search?q=upload:20170307.1

However, the effect on Josh's benchmark is devastating, +13% slower.
https://perf.golang.org/search?q=upload:20170307.2

It looks like in the case where we do spin waiting for preemption, using usleep(1) is much better than using sched_yield(). Because of this, I think we should put this change on hold until back-edge preemption is solved, rather than make the problem noticeably worse.

Basically, "fixing" osyield seems to make programs either a tiny bit faster or a lot slower. Most programs won't care but the ones that do will get much worse. Better not to do that.

@rasky
Copy link
Member

rasky commented Mar 7, 2017

To clarify, that slowdown comes from the runtime trying to STW and thus spinning more compared to sleeping 1us at a time? If this is the case, what about Linux? Shouldn't we switch osyield() to usleep(1) there as well to improve performance?

@rsc
Copy link
Contributor

rsc commented Mar 7, 2017

I'd be interested to hear the results of the experiment on Linux too, but I'd rather not change Linux at this point to work around the problem. I'd rather fix the underlying problem #10958. We've got most of a solution and mainly need to make it perform a bit better.

@bradfitz
Copy link
Contributor

This is blocked waiting on #10958, which was moved to Go 1.10.

@bradfitz bradfitz modified the milestones: Go1.10, Go1.9Maybe, Go1.10Early May 24, 2017
@bradfitz bradfitz added early-in-cycle A change that should be done early in the 3 month dev cycle. and removed early-in-cycle A change that should be done early in the 3 month dev cycle. labels Jun 14, 2017
@bradfitz bradfitz modified the milestones: Go1.10Early, Go1.10 Jun 14, 2017
@aclements aclements modified the milestones: Go1.10, Go1.11 Nov 8, 2017
@bradfitz bradfitz modified the milestones: Go1.11, Go1.12 Jun 19, 2018
@aclements aclements removed this from the Go1.12 milestone Jan 8, 2019
@aclements aclements added this to the Go1.13 milestone Jan 8, 2019
@bradfitz bradfitz modified the milestones: Go1.13, Go1.14 Apr 29, 2019
@rsc rsc modified the milestones: Go1.14, Backlog Oct 9, 2019
@rsc rsc removed their assignment Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
early-in-cycle A change that should be done early in the 3 month dev cycle. OS-Darwin Performance
Projects
None yet
Development

No branches or pull requests

7 participants