Puzzling performance of a multithreaded program #22995

jhrabe · 2017-12-05T05:15:43Z

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (`go version`)?

go1.9.2 linux/amd64

Does this issue reproduce with the latest release?

This is the latest release available on Gentoo linux

What operating system and processor architecture are you using (`go env`)?

GOARCH="amd64"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOROOT="/usr/lib64/go"
GOTOOLDIR="/usr/lib64/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="x86_64-pc-linux-gnu-gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build485060439=/tmp/go-build -gno-record-gcc-switches"
CXX="x86_64-pc-linux-gnu-g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

Below is a greatly simplified program reproducing the problem on at least 2 somewhat different machines, both with 8 CPUs (I assume this means 4 physical cores). The program is of the "embarasingly parallel" nature (one loop split into several parallel loops).

I see 2 problems:

Increasing the number of threads (the threads constant ranging from 1 up to 8) almost exactly
linearly reduces performance -- the more threads, the slower it gets, and quite significantly.
The really funny bit: launching several instances (e.g., 2--4) of these programs simultaneously in separate terminals increases the performance of each instance! I'd love to understand why.

The problem goes away when the math/rand.Float64() is replaced by thread-local and individually seeded RNGs. Profiling shows lots of time spent in some sync.(*map).

If this issue cannot be easily fixed, I would suggest at least warning the users on manual pages for the math/rand package that the top-level functions, even though thread-safe, may be unsuitable for multithreading.

// Monte Carlo, originally a light scattering problem but no longer provides valid results of any kind
package main

import (
"fmt"
"math"
"math/rand"
)

const (
nShells = 101
albedo = 0.91
shellsPerMFP = 9.945454 * 1e-4
photons = int(1e6)
threads = 4
)

func main() {
var heat = make([]float64, nShells)
hCh := make(chan []float64) // heat channel (for returning heat arrays)
fmt.Printf("\nStarting ...\n\n")

rand.Seed(1)

// launch a goroutine for each subset of photon bundles
pt := photons / threads
for i := 1; i < threads; i++ {
	go photonCalc(pt, hCh)
}
go photonCalc(photons-pt*(threads-1), hCh)

// collect the results from all goroutines, sum them up
for i := 1; i <= threads; i++ {
	h := <-hCh
	for j := range heat {
		heat[j] += h[j]
	}
	fmt.Printf("A thread finishing\n")
}

fmt.Printf("All done.\n\n")

}

func photonCalc(nPhotons int, ch chan<- []float64) {
var h = make([]float64, nShells) // local heat array
var x, y, z float64
var u, v, w, t float64
var weight float64
var xi, yi, r2 float64

fmt.Printf("Starting a thread for %d photon bundles\n", nPhotons)

for i := 1; i <= nPhotons; i++ {
	x, y, z = 0.0, 0.0, 0.0
	weight = 1.0

	for {
		for {
			xi, yi = 2*rand.Float64()-1, 2*rand.Float64()-1
			if r2 = xi*xi + yi*yi; r2 <= 1 {
				break
			}
		}
		u = xi * 2.0 * math.Sqrt(1-r2)
		v = yi * 2.0 * math.Sqrt(1-r2)
		w = 2*r2 - 1

		t = -math.Log(rand.Float64() + math.SmallestNonzeroFloat64)
		x += t * u
		y += t * v
		z += t * w

		shell := int(math.Sqrt(x*x+y*y+z*z) * shellsPerMFP)
		if shell > nShells-1 {
			shell = nShells - 1
		}
		h[shell] += (1 - albedo) * weight
		weight *= albedo

		if weight < 0.001 {
			if rand.Float64 > 0.1 {
				break
			}
			weight /= 0.1
		}
	}
}
ch <- h

}

If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.

What did you expect to see?

I expected:

Improved performance with more threads.
Diminished performance for more simultaneously run instances

What did you see instead?

The exact opposite.

The text was updated successfully, but these errors were encountered:

davecheney · 2017-12-05T05:19:56Z

Thank you for raising this issue. Unlike many projects on GitHub, the Go project does not use its bug tracker for general discussion or asking questions. We only use our bug tracker for tracking bugs and tracking proposals going through the Proposal Process.

Please see https://golang.org/wiki/Questions for good places to ask questions. They will most likely recommend that you profile your code as a first step towards understanding its runtime properties.

jhrabe · 2017-12-05T05:37:01Z

Thanks for the response. Perhaps my title was chosen badly but I intended to report a bug in math/rand or the scheduler, not enter into a general discussion or merely ask a question. Also, I did profiling and reported its results in my original post.

bradfitz · 2017-12-05T05:40:36Z

It's already somewhat documented:

https://golang.org/pkg/math/rand/#pkg-overview

Top-level functions, such as Float64 and Int, use a default shared Source that produces a deterministic sequence of values each time a program is run. Use the Seed function to initialize the default Source if different behavior is required for each run. The default Source is safe for concurrent use by multiple goroutines, but Sources created by NewSource are not.

It doesn't spell it out exactly, but writing high performance concurrent code means limiting sharing, so it kinda says it.

And profiling should also say it.

jhrabe · 2017-12-05T06:17:40Z

Based on that description, I would expect some reduction in ideally possible speedup. But would you expect the following results? I think these would take any user by surprise:

threads time (real) [s] (single instance)
1 11.3
2 17.3
4 32.1
8 39.1

instances time (real) [s] (using 2 threads)
1 17.3
2 17.2
3 15.5
4 14.0

davecheney · 2017-12-05T06:31:04Z

Here is the profile for your program

Your program is spending a large % of it's time serialised on the lock around the top level functions in the math/rand package.

Here is a version of your program which does not serialise on a shared random number generator. https://play.golang.org/p/InIox-An1k

jhrabe · 2017-12-05T17:03:36Z

Yes, that is in agreement with my original report:

("The problem goes away when the math/rand.Float64() is replaced by thread-local and
individually seeded RNGs. Profiling shows lots of time spent in some sync.(*map).")

So I suppose we agree on the cause and a workaround. Because the top-level functions are currently so expensive (several hundreds of percent of CPU time), I would suggest to add the following sentence to the documentation:

The default Source is safe for concurrent use by multiple goroutines, but Sources created by
NewSource are not.
Because Source synchronization implemented by top-level functions can incur significant performance penalty, use thread-local Sources when possible.

ianlancetaylor · 2017-12-05T20:40:29Z

@jhrabe That is too strong. Whether to use goroutine-local Sources should normally be made based on desired behavior rather than performance. Perhaps we could say "The default Source uses locks and is safe for...."

jhrabe · 2017-12-05T20:51:37Z

OK, so would you be happy with the following?

The default Source uses locks and is safe for concurrent use by multiple goroutines. If the performance penalty incurred by locking becomes too high, use thread-local Sources created by NewSource. These are not safe for concurrent use.

bradfitz · 2017-12-05T20:56:53Z

We never say "thread-local" (or even "threads", generally).

ianlancetaylor · 2017-12-05T21:09:16Z

Personally I don't think it's necessary to spell out the performance penalty in the package comment.

RLH · 2017-12-05T21:28:49Z

Locks are a rand implementation technique. Implementation techniques don't belong at this level of documentation. At some point someone will want to implement a more scalable rand and will not want the implementation choices to be constrained by the documentation.

…

On Tue, Dec 5, 2017 at 4:09 PM, Ian Lance Taylor ***@***.***> wrote: Personally I don't think it's necessary to spell out the performance penalty in the package comment. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#22995 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA7Wn-dqY4BvaU3s0q0bsGPv_S7rGdG6ks5s9bEHgaJpZM4Q1v4X> .

jhrabe · 2017-12-05T21:50:08Z

Sorry, I am being converted from numerical simulations using C. Very well, no mentioning threads and locks then:

The default Source is safe for concurrent use by multiple goroutines. If the performance penalty incurred by synchronization becomes too high, use NewSource to create Sources local to each goroutine. These Sources are not safe for concurrent use.

As for not explicitly mentioning performance penalty, I suppose that depends what typical overhead you expect. If the locking added something like 10%, it would not be such a surprise. In this case (see the table I posted previously), the overhead can reach hundreds of percents, making it likely dominant for most Monte Carlo simulation. I would therefore like to see a clear warning. When/if the performance is significantly improved, the warning can be removed.

davecheney closed this as completed Dec 5, 2017

golang locked and limited conversation to collaborators Dec 5, 2018

gopherbot added the FrozenDueToAge label Dec 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Puzzling performance of a multithreaded program #22995

Puzzling performance of a multithreaded program #22995

jhrabe commented Dec 5, 2017

davecheney commented Dec 5, 2017

jhrabe commented Dec 5, 2017

bradfitz commented Dec 5, 2017

jhrabe commented Dec 5, 2017

davecheney commented Dec 5, 2017

jhrabe commented Dec 5, 2017

ianlancetaylor commented Dec 5, 2017

jhrabe commented Dec 5, 2017

bradfitz commented Dec 5, 2017

ianlancetaylor commented Dec 5, 2017

RLH commented Dec 5, 2017 via email

jhrabe commented Dec 5, 2017

Puzzling performance of a multithreaded program #22995

Puzzling performance of a multithreaded program #22995

Comments

jhrabe commented Dec 5, 2017

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

}

What did you expect to see?

What did you see instead?

davecheney commented Dec 5, 2017

jhrabe commented Dec 5, 2017

bradfitz commented Dec 5, 2017

jhrabe commented Dec 5, 2017

davecheney commented Dec 5, 2017

jhrabe commented Dec 5, 2017

ianlancetaylor commented Dec 5, 2017

jhrabe commented Dec 5, 2017

bradfitz commented Dec 5, 2017

ianlancetaylor commented Dec 5, 2017

RLH commented Dec 5, 2017 via email

jhrabe commented Dec 5, 2017

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?