Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: cgo performance tracking bug #9704

Open
minux opened this issue Jan 28, 2015 · 12 comments
Open

runtime: cgo performance tracking bug #9704

minux opened this issue Jan 28, 2015 · 12 comments
Milestone

Comments

@minux
Copy link
Member

minux commented Jan 28, 2015

Running this stupid microbenchmark on linux/amd64, with different version of Go.
http://play.golang.org/p/5U0i26sA8U

package main

// int rand() { return 42; }
import "C"

import "testing"

func BenchmarkCgo(b *testing.B) {
    for i := 0; i < b.N; i++ {
        C.rand()
    }
}

func main() {
    testing.Main(func(string, string) (bool, error) {
        return true, nil
    }, nil, []testing.InternalBenchmark{
        {"BenchmarkCgo", BenchmarkCgo},
    }, nil)
}
$ go1 run cgobench.go -test.bench=.
testing: warning: no tests to run
PASS
BenchmarkCgo    50000000            30.8 ns/op
$ go112 run cgobench.go -test.bench=.
testing: warning: no tests to run
PASS
BenchmarkCgo    50000000            40.9 ns/op
$ go121 run cgobench.go -test.bench=.
testing: warning: no tests to run
PASS
BenchmarkCgo    50000000            46.1 ns/op
$ go133 run cgobench.go -test.bench=.
testing: warning: no tests to run
PASS
BenchmarkCgo    50000000            48.3 ns/op
$ go141 run cgobench.go -test.bench=.
testing: warning: no tests to run
PASS
BenchmarkCgo    10000000           160 ns/op
$ go run cgobench.go -test.bench=. # today's Go tip, f4a2617
testing: warning: no tests to run
PASS
BenchmarkCgo    10000000           203 ns/op

Why? Go 1.4 is much worse than any of the previous releases.
And Go tip is even worse than Go 1.4. This might be understandable,
but I wonder why Go 1.4 is that much slower than 1.3.3?

@minux minux added this to the Go1.5 milestone Jan 28, 2015
@dvyukov
Copy link
Member

dvyukov commented Jan 28, 2015

@RLH @randall77 @rsc

It is combined effect of C->Go conversion, write barriers and atomic manipulation of goroutine statuses:

10.45% cgo cgo [.] runtime.cas
8.23% cgo cgo [.] runtime.deferreturn
7.21% cgo cgo [.] runtime.writebarrierptr
6.60% cgo cgo [.] runtime.reentersyscall
6.29% cgo cgo [.] runtime.newdefer
4.07% cgo cgo [.] runtime.getg
3.77% cgo cgo [.] runtime.exitsyscall
3.53% cgo cgo [.] main.BenchmarkCgo
3.50% cgo cgo [.] runtime.systemstack
3.20% cgo cgo [.] runtime.cgocall_errno
3.16% cgo cgo [.] main._Cfunc_rand
2.94% cgo cgo [.] runtime.freedefer
2.93% cgo cgo [.] runtime.deferproc
2.84% cgo cgo [.] runtime.exitsyscallfast
2.69% cgo cgo [.] runtime.casgstatus
2.58% cgo cgo [.] runtime.func.042
2.53% cgo cgo [.] runtime.atomicstore
2.50% cgo cgo [.] runtime.acquirem
2.42% cgo cgo [.] runtime.releasem

@rsc rsc removed cgo labels Apr 10, 2015
@rsc
Copy link
Contributor

rsc commented Jun 8, 2015

Or maybe it's a sinister plot to encourage people to write Go code.

Either way, too late for Go 1.5.

@capnm
Copy link

capnm commented Aug 26, 2015

linux/amd64, linux/arm:

amd64  go-13      BenchmarkCgo  10000000               179 ns/op
amd64  go-14      BenchmarkCgo   3000000               494 ns/op
amd64  go-15      BenchmarkCgo   5000000               354 ns/op ~2x

arm go-13         BenchmarkCgo   2000000               821 ns/op
arm go-14         BenchmarkCgo   1000000              2359 ns/op
arm go-15         BenchmarkCgo    500000              2570 ns/op ~3x

@mwhudson
Copy link
Contributor

mwhudson commented Oct 2, 2016

Going by BenchmarkCgoCall in misc/cgo/test, on my linux/amd64 system, current tip is about twice as fast as go 1.7. Go 1.7, 1.6 and 1.5 are all about the same and Go 1.4 is slightly slower (I lack the energy to get 1.3 cgo working on this system). So the thing the bug specifically complains about ("decrease of cgocall performance") is probably fixed, but is this fast enough? (It's always hard to know when a bug like this can be closed).

@crawshaw
Copy link
Member

crawshaw commented Oct 2, 2016

http://golang.org/cl/30080 ~halved the cgo overhead between 1.7 and tip.

As reported this can be closed, but how about we keep it as a cgo performance tracking bug? I don't expect any more improvements for 1.8, but in the future a runtime accounting overhaul could make it cheaper.

@minux
Copy link
Member Author

minux commented Oct 2, 2016 via email

@minux minux changed the title runtime: steady decrease of cgocall performance runtime: cgo performance tracking bug Oct 2, 2016
@aclements
Copy link
Member

@minux, what benchmark were you running?

@dgryski
Copy link
Contributor

dgryski commented Jan 3, 2018

@aclements I think @minux was running the Go program at the top of the bug that just calls into a C function returning an int.

@navytux
Copy link
Contributor

navytux commented Feb 18, 2018

Note that in addition to serial CGo slowness (i.e. ~ 60ns for 1 call) making several Cgo calls in sequence in presence of other goroutines can bring more slowdown: #19574 (comment).

@thepudds
Copy link
Contributor

thepudds commented Jun 4, 2019

Now that https://golang.org/cl/171758 is merged for 1.13 fixing #6980, is that expected to help here as well?

When a defer is executed at most once in a function body,
we can allocate the defer record for it on the stack instead
of on the heap.
...
name     old time/op  new time/op  delta
Defer-4  52.2ns ± 5%  36.2ns ± 3%  -30.70%  (p=0.000 n=10+10)

@thepudds
Copy link
Contributor

thepudds commented Jun 4, 2019

FWIW, from a very quick test, using tip does not seem to be significantly faster than 1.12.5. Also, I don't know if this is already tracked elsewhere or perhaps not a meaningful result, but this quick test seems to show go1.10 improving, but go1.12 slowing down again.

===============
go1.7.6
BenchmarkCgo-8          100000000              197 ns/op
===============
go1.8.7
BenchmarkCgo-8          200000000               79.7 ns/op
===============
go1.9.4
BenchmarkCgo-8          200000000               81.3 ns/op
===============
go1.10.4
BenchmarkCgo-8          200000000               73.4 ns/op
===============
go1.11.4
BenchmarkCgo-8          200000000               73.2 ns/op
===============
go1.12.5
BenchmarkCgo-8          200000000               81.7 ns/op
===============
gotip
BenchmarkCgo-8          152019986               79.0 ns/op

This is running the benchmark from #9704 (comment) via
go run cgobench.go -test.bench=. -test.benchtime=10s on linux/amd64.

gotip is devel +fff4f59 Tue Jun 4 17:35:20 2019 +0000

@navytux
Copy link
Contributor

navytux commented Jun 5, 2019

I confirm what @thepudds reports - there is improvement with go1.10, but slowdown back with go1.12, and things are not getting faster with tip:

$ benchstat go17.txt go18.txt go19.txt go110.txt go111.txt go112.txt gotip.txt 
name \ time/op  go17.txt    go18.txt   go19.txt     go110.txt    go111.txt    go112.txt    gotip.txt
Cgo-4           186ns ± 1%  82ns ± 0%
goos:linux goarch:amd64
Cgo-4                                  84.5ns ± 0%  74.9ns ± 0%  76.6ns ± 0%  81.1ns ± 0%  81.4ns ± 0%

All measurements were done on unloaded machine with CPU frequency fixed and CPU idle states except C1 disabled (see http://navytux.spb.ru/~kirr/neo.html#measurements-stability for details)

$ ./neotest info-local
date:   Wed, 05 Jun 2019 11:44:55 +0300
xnode:  ...
uname:  Linux deco 4.19.0-5-amd64 #1 SMP Debian 4.19.37-3 (2019-05-15) x86_64 GNU/Linux
cpu:    Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz
cpu/[0-3]/freq: intel_pstate/performance [2.60GHz - 2.60GHz]
cpu/[0-3]/idle: intel_idle/menu: POLL·0/0 C1·2/2 !C1E·10/20 !C3·70/100 !C6·85/200 !C7s·124/800 !C8·200/800 !C9·480/5000 !C10·890/5000 # elat/tres µs
...
$ gotip version
go version devel +5f509148b1 Wed Jun 5 00:53:25 2019 +0000 linux/amd64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants