Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: Go 1.6 regresses Windows performance due to timeBeginPeriod removal #14790

Closed
jstarks opened this issue Mar 12, 2016 · 27 comments
Closed

Comments

@jstarks
Copy link

jstarks commented Mar 12, 2016

  1. What version of Go are you using (go version)?
    go version go1.6 windows/amd64

  2. What operating system and processor architecture are you using (go env)?
    set GOARCH=amd64
    set GOBIN=
    set GOEXE=.exe
    set GOHOSTARCH=amd64
    set GOHOSTOS=windows
    set GOOS=windows
    set GOPATH=
    set GORACE=
    set GOROOT=C:\Go
    set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64
    set GO15VENDOREXPERIMENT=1
    set CC=gcc
    set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
    set CXX=g++
    set CGO_ENABLED=1

  3. What did you do?
    Run a benchmark from my go-winio package, which uses completion ports to provide non-thread-blocking IO on top of named pipes. It regressed significantly between go 1.5.3 and go 1.6, but only when I don't call timeBeginPeriod(1000). It seems that some aspect of the go scheduler still depends on a fast timer.

    You can see this with:

      go get github.com/Microsoft/go-winio
      cd <wherever>; git checkout go_16_regression_test
      go test -bench .
      set gofasttick=1
      go test -bench .
    
  4. What did you expect to see?
    Same performance before and after setting gofasttick=1 (which makes go-winio call timeBeginPeriod(1000) before doing any IO).

  5. What did you see instead?

    Z:\jostarks\go\src\github.com\Microsoft\go-winio>go test -bench .
    PASS
    BenchmarkPipeIo-8           2000            998036 ns/op
    ok      _/Z_/jostarks/go/src/github.com/Microsoft/go-winio      2.304s
    
    Z:\jostarks\go\src\github.com\Microsoft\go-winio>set gofasttick=1
    
    Z:\jostarks\go\src\github.com\Microsoft\go-winio>go test -bench .
    PASS
    BenchmarkPipeIo-8           3000            517709 ns/op
    ok      _/Z_/jostarks/go/src/github.com/Microsoft/go-winio      1.797s
    

I can try to get a more minimal repro if it would be helpful. This has regressed Docker client performance significantly in Windows when operating over named pipes.

@ianlancetaylor ianlancetaylor changed the title Go 1.6 regresses Windows performance due to timeBeginPeriod removal runtime: Go 1.6 regresses Windows performance due to timeBeginPeriod removal Mar 12, 2016
@ianlancetaylor ianlancetaylor modified the milestones: Go1.6.1, Go1.7 Mar 12, 2016
@alexbrainman
Copy link
Member

Smaller reproduction is always welcome. Thank you.

Alex

@alexbrainman
Copy link
Member

I tried investigating this, but your test crashes:

c:\Users\Alex>go get -v -d github.com/Microsoft/go-winiogithub.com/Microsoft/go-winio (download)

c:\Users\Alex>cd c:\Users\Alex\dev\src\github.com\Microsoft\go-winio

c:\Users\Alex\dev\src\github.com\Microsoft\go-winio>git checkout go_16_regression_test
Branch go_16_regression_test set up to track remote branch go_16_regression_test from origin.
Switched to a new branch 'go_16_regression_test'

c:\Users\Alex\dev\src\github.com\Microsoft\go-winio>go test -bench .
Exception 0xc0000005 0x1 0x0 0x7fefd0ba1a2
PC=0x7fefd0ba1a2

syscall.Syscall9(0x77055ce0, 0x8, 0xac, 0x900c4, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        c:/users/alex/dev/go/src/runtime/syscall_windows.go:150 +0x61
syscall.DeviceIoControl(0xac, 0x900c4, 0x0, 0x0, 0x0, 0xc000000000, 0x0, 0x0, 0x0, 0x0)
        c:/users/alex/dev/go/src/syscall/zsyscall_windows.go:1311 +0x101
github.com/Microsoft/go-winio.makeSparseFile(0x0, 0x0)
        c:/users/alex/dev/src/github.com/Microsoft/go-winio/backup_test.go:206 +0x131
github.com/Microsoft/go-winio.TestBackupSparseFile(0xc082050240)
        c:/users/alex/dev/src/github.com/Microsoft/go-winio/backup_test.go:230 +0x45
testing.tRunner(0xc082050240, 0x62e2a8)
        c:/users/alex/dev/go/src/testing/testing.go:473 +0x85
created by testing.RunTests
        c:/users/alex/dev/go/src/testing/testing.go:581 +0x323

goroutine 1 [chan receive]:
testing.RunTests(0x59b488, 0x62e260, 0x11, 0x11, 0x425cb0)
        c:/users/alex/dev/go/src/testing/testing.go:582 +0x369
testing.(*M).Run(0xc08205bed8, 0x59b548)
        c:/users/alex/dev/go/src/testing/testing.go:515 +0x80
github.com/Microsoft/go-winio.TestMain(0xc08205bed8)
        c:/users/alex/dev/src/github.com/Microsoft/go-winio/backup_test.go:21 +0xd4
main.main()
        github.com/Microsoft/go-winio/_test/_testmain.go:86 +0xbb
rax     0x0
rbx     0x0
rcx     0x0
rdi     0xac
rsi     0x0
rbp     0xac
rsp     0x6fd70
r8      0x6fd68
r9      0xac
r10     0x0
r11     0x0
r12     0x3
r13     0x2
r14     0x2
r15     0x58f0b0
rip     0x7fefd0ba1a2
rflags  0x10246
cs      0x33
fs      0x53
gs      0x2b
exit status 2
FAIL    github.com/Microsoft/go-winio   0.094s

c:\Users\Alex\dev\src\github.com\Microsoft\go-winio>

Alex

@jstarks
Copy link
Author

jstarks commented Mar 14, 2016

Hmm, probably go-winio tests only run on Windows 10. Sorry about that.

In any case, here's a smaller repro: http://play.golang.org/p/Zh8F6iqJOE. Run this with go test -bench ., making sure that your default timer resolution is 15ms.

The results are quite stark for me: there is a 15x performance difference.

@alexbrainman
Copy link
Member

@jstarks your last test reproduces the problem for me. Thank you.

I really don't know what the problem here is. Your test uses mix of syscalls and Go channels to synchronize. I changed your test to use syscalls only (to see if problem goes away) and it does go away. And syscalls only version is much faster than your original version. I also built another version that uses channels only, and that is even faster. Here https://play.golang.org/p/JmZcyYUgTz are my test. If I run them I see this:

C:\dev\src\issues\issue14790>go test -bench=.
testing: warning: no tests to run
BenchmarkDefaultResolution                    50          31563106 ns/op
Benchmark1ms                                1000           1956129 ns/op
BenchmarkNoChannelDefaultResolution       300000              4322 ns/op
BenchmarkNoChannel1ms                     300000              4287 ns/op
BenchmarkOnlyChannelDefaultResolution    3000000               484 ns/op
BenchmarkOnlyChannel1ms                  3000000               481 ns/op
PASS
ok      issues/issue14790       10.854s

C:\dev\src\issues\issue14790>

So in addition to your original issue, I am wondering why BenchmarkNoChannel1ms is ~500 times faster then Benchmark1ms. BenchmarkOnlyChannel1ms is even ~10 times faster, but that sounds reasonable to me (internal sync comparing to external).

@dvyukov and @aclements if you can explain what is happening here. Or maybe suggestion on how to investigate this further. Thank you.

Alex

@jstarks
Copy link
Author

jstarks commented Mar 15, 2016

@alexbrainman Thanks. Indeed, it seems that for some reason we are sleeping for a tick instead of scheduling a ready goroutine in some code path.

I played around with the runtime a bit, and I observed that if I removed the usleep in sysmon(), the performance got significantly (100x?) better. Of course, the usleep in sysmon() is there for a reason, so this is not a fix, but it might help with the investigation. I don't fully understand what sysmon's role is, but I think it's responsible for deciding that the syscall is going to take a while and that another goroutine should be scheduled to run.

If that's the case, I guess the problem here is that the syscall is blocked until the goroutine gets to run, but the runtime has no way of knowing that and has to apply this (problematic) heuristic. Perhaps there should be a way to annotate a syscall as blocking, in which case the runtime could immediately schedule another goroutine instead of waiting.

But again, I don't really understand the scheduler yet, so I may be off base.

@aclements
Copy link
Member

@jstarks, your explanation sounds entirely plausible. You're right that one of the roles of sysmon is to decide when a system call is probably blocked and that we should start up another thread to execute goroutines, but it's a very coarse mechanism, and the poor resolution of Windows timers really isn't helping here. This isn't usually an issue because most syscalls that block for a long time are socket IO and those go through a central select() mechanism, but you may have come across a case where it is an issue.

In the runtime, we actually do have a way to indicate that a system call is blocking (entersyscall versus entersyscallblock) and we should schedule another goroutine immediately, but it looks like we only use this mechanism on Solaris. I don't know why that is.

@bradfitz
Copy link
Contributor

We have a similar mechanism on Linux etc with the "//sys" vs "//sysnb" comments which generate calls to either https://golang.org/pkg/syscall/#Syscall (maybe blocking) vs https://golang.org/pkg/syscall/#RawSyscall (not blocking). Looks like entersyscallblock is a "definitely blocking" option. Should that be exposed as an option in pkg syscall?

@aclements
Copy link
Member

Should that be exposed as an option in pkg syscall?

I don't know if we want to expose an yet another generic syscall API (CookedSyscall?), but it may make sense to mark the wrappers for things like synchronization operations as "almost certainly blocking" and internally use entersyscallblock. I really have no idea why we only use that on Solaris—it was created before my time—but it looks like it should work fine on any OS.

It would be nice if we didn't need such hacks.

@alexbrainman
Copy link
Member

I followed Brad's advice and changed my test program https://play.golang.org/p/JmZcyYUgTz to use syscall.RawSyscall in setEvent:

diff -r ebb09e700c4e main_test.go
--- a/main_test.go  Tue Mar 15 12:20:13 2016 +1100
+++ b/main_test.go  Tue Mar 15 14:49:35 2016 +1100
@@ -33,7 +33,7 @@
 }

 func setEvent(h syscall.Handle) {
-   r0, _, e0 := syscall.Syscall(procSetEvent.Addr(), 1, uintptr(h), 0, 0)
+   r0, _, e0 := syscall.RawSyscall(procSetEvent.Addr(), 1, uintptr(h), 0, 0)
    if r0 == 0 {
        panic(syscall.Errno(e0))
    }

I also had to change runtime and syscall to implement syscall.RawSyscall:

diff --git a/src/runtime/cgocall.go b/src/runtime/cgocall.go
index 7a683d7..1763935 100644
--- a/src/runtime/cgocall.go
+++ b/src/runtime/cgocall.go
@@ -128,6 +128,46 @@ func cgocall(fn, arg unsafe.Pointer) int32 {
 }

 //go:nosplit
+func rawcgocall(fn, arg unsafe.Pointer) int32 {
+   if !iscgo && GOOS != "solaris" && GOOS != "windows" {
+       throw("cgocall unavailable")
+   }
+
+   if fn == nil {
+       throw("cgocall nil")
+   }
+
+   if raceenabled {
+       racereleasemerge(unsafe.Pointer(&racecgosync))
+   }
+
+   /*
+    * Lock g to m to ensure we stay on the same stack if we do a
+    * cgo callback. Add entry to defer stack in case of panic.
+    */
+   lockOSThread()
+   mp := getg().m
+   mp.ncgocall++
+   mp.ncgo++
+   defer endcgo(mp)
+
+   /*
+    * Announce we are entering a system call
+    * so that the scheduler knows to create another
+    * M to run goroutines while we are in the
+    * foreign code.
+    *
+    * The call to asmcgocall is guaranteed not to
+    * split the stack and does not allocate memory,
+    * so it is safe to call while "in a system call", outside
+    * the $GOMAXPROCS accounting.
+    */
+   errno := asmcgocall(fn, arg)
+
+   return errno
+}
+
+//go:nosplit
 func endcgo(mp *m) {
    mp.ncgo--

diff --git a/src/runtime/syscall_windows.go b/src/runtime/syscall_windows.go
index ebfa32f..6443de7 100644
--- a/src/runtime/syscall_windows.go
+++ b/src/runtime/syscall_windows.go
@@ -172,3 +172,14 @@ func syscall_Syscall15(fn, nargs, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11,
    cgocall(asmstdcallAddr, unsafe.Pointer(c))
    return c.r1, c.r2, c.err
 }
+
+//go:linkname syscall_RawSyscall syscall.RawSyscall
+//go:nosplit
+func syscall_RawSyscall(fn, nargs, a1, a2, a3 uintptr) (r1, r2, err uintptr) {
+   c := &getg().m.syscall
+   c.fn = fn
+   c.n = nargs
+   c.args = uintptr(noescape(unsafe.Pointer(&a1)))
+   rawcgocall(asmstdcallAddr, unsafe.Pointer(c))
+   return c.r1, c.r2, c.err
+}
diff --git a/src/syscall/dll_windows.go b/src/syscall/dll_windows.go
index 8da1951..7d50b1a 100644
--- a/src/syscall/dll_windows.go
+++ b/src/syscall/dll_windows.go
@@ -27,6 +27,7 @@ func Syscall12(trap, nargs, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12 ui
 func Syscall15(trap, nargs, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13, a14, a15 uintptr) (r1, r2 uintptr, err Errno)
 func loadlibrary(filename *uint16) (handle uintptr, err Errno)
 func getprocaddress(handle uintptr, procname *uint8) (proc uintptr, err Errno)
+func RawSyscall(trap, nargs, a1, a2, a3 uintptr) (r1, r2 uintptr, err Errno)

 // A DLL implements access to a single DLL.
 type DLL struct {

I see not much changed:

C:\dev\src\issues\issue14790>go install -v syscall && go test -bench=.
testing: warning: no tests to run
BenchmarkDefaultResolution                    50          31250200 ns/op
Benchmark1ms                                1000           1956129 ns/op
BenchmarkNoChannelDefaultResolution       300000              4166 ns/op
BenchmarkNoChannel1ms                     300000              4078 ns/op
BenchmarkOnlyChannelDefaultResolution    3000000               479 ns/op
BenchmarkOnlyChannel1ms                  3000000               483 ns/op
PASS
ok      issues/issue14790       10.718s

C:\dev\src\issues\issue14790>

(BenchmarkNoChannelDefaultResolution and BenchmarkNoChannel1ms are ~10% faster, but that is understandable). But main issue is still there.

Alex

@dvyukov
Copy link
Member

dvyukov commented Mar 15, 2016

I think we need to return back timeBeginPeriod. IMO the long-term solution is #7876. It would allow to raise sysmon sleep to that same 15ms.

@minux
Copy link
Member

minux commented Mar 15, 2016 via email

@dvyukov
Copy link
Member

dvyukov commented Mar 15, 2016

@minux We will need the current sysmon as a backoff for OSes that does not support proper user-space scheduling. And windows implementation of that thing will still need timeBeginPeriod. UMS/sleep implementation needs to be chosen at startup depending on presence of necessary syscalls.

But FWIW 2003 appeared before multi-core era and is so inefficient in lots of other fields, so I would not spend too much time on top performance on it. If you care about performance, updating to a newer OS should be the first step.

@alexbrainman
Copy link
Member

I think we need to return back timeBeginPeriod.

All my benchtests ( https://play.golang.org/p/JmZcyYUgTz ) with 1ms in their name run with timeBeginPeriod set. Your suggestion still does not explain why, for example, BenchmarkNoChannel1ms is about 450 times faster than Benchmark1ms:

C:\dev\src\issues\issue14790>go test -bench=1ms
testing: warning: no tests to run
Benchmark1ms                1000           1961788 ns/op
BenchmarkNoChannel1ms     300000              4397 ns/op
BenchmarkOnlyChannel1ms  3000000               498 ns/op
PASS
ok      issues/issue14790       5.580s

C:\dev\src\issues\issue14790>

Why do you think this is? How can I find out?

IMO the long-term solution is #7876. It would allow to raise sysmon sleep to that same 15ms.

I don't want to look for solution until I understand what my particular problem is.

Alex

@dvyukov
Copy link
Member

dvyukov commented Mar 16, 2016

Why do you think this is? How can I find out?

What's the value of GOMAXPROCS? If it is 1, try to set it to 4.

@dvyukov
Copy link
Member

dvyukov commented Mar 16, 2016

@alexbrainman Try the following patch:

diff --git a/src/runtime/proc.go b/src/runtime/proc.go
index 758a0a8..9db79ed 100644
--- a/src/runtime/proc.go
+++ b/src/runtime/proc.go
@@ -3972,7 +3972,7 @@ func runqgrab(_p_ *p, batch *[256]guintptr, batchHead uint32, stealRunNextG bool
                n := t - h
                n = n - n/2
                if n == 0 {
-                       if stealRunNextG {
+                       //if stealRunNextG {
                                // Try to steal from _p_.runnext.
                                if next := _p_.runnext; next != 0 {
                                        // Sleep to ensure that _p_ isn't about to run the g we
@@ -3982,14 +3982,14 @@ func runqgrab(_p_ *p, batch *[256]guintptr, batchHead uint32, stealRunNextG bool
                                        // Instead of stealing runnext in this window, back off
                                        // to give _p_ a chance to schedule runnext. This will avoid
                                        // thrashing gs between different Ps.
-                                       usleep(100)
+                                       //usleep(100)
                                        if !_p_.runnext.cas(next, 0) {
                                                continue
                                        }
                                        batch[batchHead%uint32(len(batch))] = next
                                        return 1
                                }
-                       }
+                       //}
                        return 0
                }
                if n > uint32(len(_p_.runq)/2) { // read inconsistent h and t

@alexbrainman
Copy link
Member

What's the value of GOMAXPROCS? If it is 1, try to set it to 4.

C:\dev\src\issues\issue14790>set GOMAXPROCS=1

C:\dev\src\issues\issue14790>go test -bench=.
testing: warning: no tests to run
BenchmarkDefaultResolution                    50          31260200 ns/op
Benchmark1ms                                1000           1956730 ns/op
BenchmarkNoChannelDefaultResolution       300000              5053 ns/op
BenchmarkNoChannel1ms                     300000              4259 ns/op
BenchmarkOnlyChannelDefaultResolution    3000000               479 ns/op
BenchmarkOnlyChannel1ms                  3000000               472 ns/op
PASS
ok      issues/issue14790       11.139s

C:\dev\src\issues\issue14790>set GOMAXPROCS=4

C:\dev\src\issues\issue14790>go test -bench=.
testing: warning: no tests to run
BenchmarkDefaultResolution-4                 100          15630100 ns/op
Benchmark1ms-4                              2000            981784 ns/op
BenchmarkNoChannelDefaultResolution-4     200000              5001 ns/op
BenchmarkNoChannel1ms-4                   300000              4894 ns/op
BenchmarkOnlyChannelDefaultResolution-4  3000000               484 ns/op
BenchmarkOnlyChannel1ms-4                3000000               478 ns/op
PASS
ok      issues/issue14790       10.216s

C:\dev\src\issues\issue14790>

GOMAXPROCS=4 makes both BenchmarkDefaultResolution and Benchmark1ms twice as fast, but still a lot slower than BenchmarkNoChannelDefaultResolution.

Try the following patch:

C:\dev\src\issues\issue14790>go test -bench=.
testing: warning: no tests to run
BenchmarkDefaultResolution                    50          31268200 ns/op
Benchmark1ms                                1000           1956154 ns/op
BenchmarkNoChannelDefaultResolution       300000              5002 ns/op
BenchmarkNoChannel1ms                     300000              4712 ns/op
BenchmarkOnlyChannelDefaultResolution    3000000               500 ns/op
BenchmarkOnlyChannel1ms                  3000000               492 ns/op
PASS
ok      issues/issue14790       11.407s

C:\dev\src\issues\issue14790>

No change in performance as far as I can tell.

Alex

@gopherbot
Copy link

CL https://golang.org/cl/20834 mentions this issue.

@dvyukov
Copy link
Member

dvyukov commented Mar 18, 2016

Had to blow dust off my windows machine.
You benchmarked my change with GOMAXPROCS=1. The high latency with GOMAXPROCS=1 is explained by the fact that the single P needs to be constantly retaken by sysmon thread to allow the other goroutine to run. Sysmon uses sleeps and so bounded by system time resolution. To fix that we need to restore timeBeginPeriod (at least the delay will go down from 15ms to 1ms).

But my change does help when GOMAXPROCS>1:

BenchmarkDefaultResolution-8 4583372 29720 -99.35%
Benchmark1ms-8 992056 30701 -96.91%

Here we are not even limited by sysmon sleeps because we have enough Ps to execute all runnable goroutines. This fix is even more important than timeBeginPeriod, because GOMAXPROCS=1 is rather esoteric today.

@gopherbot
Copy link

CL https://golang.org/cl/20835 mentions this issue.

@alexbrainman
Copy link
Member

Dmitry I tried your CL 20835, but I don't see any difference to my benchmarks:

C:\dev\src\issues\issue14790>go test -bench=.
testing: warning: no tests to run
BenchmarkDefaultResolution                    50          31249400 ns/op
Benchmark1ms                                1000           1958859 ns/op
BenchmarkNoChannelDefaultResolution       300000              4270 ns/op
BenchmarkNoChannel1ms                     300000              4218 ns/op
BenchmarkOnlyChannelDefaultResolution    3000000               479 ns/op
BenchmarkOnlyChannel1ms                  3000000               477 ns/op
PASS
ok      issues/issue14790       10.775s

C:\dev\src\issues\issue14790>
C:\dev\src\issues\issue14790>set GOMAXPROCS=1

C:\dev\src\issues\issue14790>go test -bench=.
testing: warning: no tests to run
BenchmarkDefaultResolution                    50          31249400 ns/op
Benchmark1ms                                1000           1960812 ns/op
BenchmarkNoChannelDefaultResolution       300000              4374 ns/op
BenchmarkNoChannel1ms                     300000              4218 ns/op
BenchmarkOnlyChannelDefaultResolution    3000000               494 ns/op
BenchmarkOnlyChannel1ms                  3000000               494 ns/op
PASS
ok      issues/issue14790       10.934s

C:\dev\src\issues\issue14790>set GOMAXPROCS=4

C:\dev\src\issues\issue14790>go test -bench=.
testing: warning: no tests to run
BenchmarkDefaultResolution-4                 100          15624700 ns/op
Benchmark1ms-4                              2000            976500 ns/op
BenchmarkNoChannelDefaultResolution-4     300000              4374 ns/op
BenchmarkNoChannel1ms-4                   300000              4251 ns/op
BenchmarkOnlyChannelDefaultResolution-4  3000000               505 ns/op
BenchmarkOnlyChannel1ms-4                3000000               496 ns/op
PASS
ok      issues/issue14790       10.460s

C:\dev\src\issues\issue14790>

The affect of CL 20834 we already know. It is difference between BenchmarkDefaultResolution and Benchmark1ms. But I am more concerned about even much larger difference between BenchmarkDefaultResolution and BenchmarkNoChannelDefaultResolution. Why does mixed (channels and syscalls) version perform so bad comparing to syscall only version?

Alex

@gopherbot
Copy link

CL https://golang.org/cl/21294 mentions this issue.

gopherbot pushed a commit that referenced this issue Apr 4, 2016
It appears that windows osyield is just 15ms sleep on my computer
(see benchmarks below). Replace NtWaitForSingleObject in osyield
with SwitchToThread (as suggested by Dmitry).

Also add issue #14790 related benchmarks, so we can track perfomance
changes in CL 20834 and CL 20835 and beyond.

Update #14790

benchmark                             old ns/op     new ns/op     delta
BenchmarkChanToSyscallPing1ms         1953200       1953000       -0.01%
BenchmarkChanToSyscallPing15ms        31562904      31248400      -1.00%
BenchmarkSyscallToSyscallPing1ms      5247          4202          -19.92%
BenchmarkSyscallToSyscallPing15ms     5260          4374          -16.84%
BenchmarkChanToChanPing1ms            474           494           +4.22%
BenchmarkChanToChanPing15ms           468           489           +4.49%
BenchmarkOsYield1ms                   980018        75.5          -99.99%
BenchmarkOsYield15ms                  15625200      75.8          -100.00%

Change-Id: I1b4cc7caca784e2548ee3c846ca07ef152ebedce
Reviewed-on: https://go-review.googlesource.com/21294
Run-TryBot: Alex Brainman <alex.brainman@gmail.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
@dvyukov dvyukov reopened this Apr 5, 2016
@alexbrainman
Copy link
Member

@jstarks, please, try current tip (5c7ae10) to see if it got faster for you? Thank you.

Alex

@jstarks
Copy link
Author

jstarks commented Apr 6, 2016

@alexbrainman It's significantly faster for GOMAXPROCS > 1, as expected.

For GOMAXPROCS=1, it's still much slower unless I explicitly call timeBeginPeriod in my code -- do you still intend to restore the timeBeginPeriod call in the runtime?

@alexbrainman
Copy link
Member

It's significantly faster for GOMAXPROCS > 1, as expected.

Glad to hear that. Thanks for confirming.

For GOMAXPROCS=1, it's still much slower ...

I know that. Unfortunetly it is how Go's scheduler works at this moment. Maybe it will improve one day.

do you still intend to restore the timeBeginPeriod call in the runtime?

Yes. We are going back to calling timeBeginPeriod(1) as before. See https://go-review.googlesource.com/#/c/20834/ for details.

Alex

gopherbot pushed a commit that referenced this issue Apr 9, 2016
This reverts commit ab4c929.

Sysmon critically depends on system timer resolution for retaking
of Ps blocked in system calls. See #14790 for an example
of a program where execution time goes from 2ms to 30ms if
timeBeginPeriod(1) is not used.

We can remove timeBeginPeriod(1) when we support UMS (#7876).

Update #14790

Change-Id: I362b56154359b2c52d47f9f2468fe012b481cf6d
Reviewed-on: https://go-review.googlesource.com/20834
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
@alexbrainman
Copy link
Member

0435e88 restored pre go1.6 behavior. I don't see what else to do here, so closing.

Alex

@jstarks
Copy link
Author

jstarks commented Apr 11, 2016

Thanks everyone for fixing this.

The UMS approach still sounds like the best approach long term -- I may have some time to help with this in a few months if there is interest.

@alexbrainman
Copy link
Member

The UMS approach still sounds like the best approach long term -- I may have some time to help with this in a few months if there is interest.

I am interested. But I know zilch about UMS and Go scheduler. And it is @dvyukov's baby. But if you sent a change, I will push it along. Thank you.

Alex

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants