New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: futex contention caused by memory allocation in goroutines #34231
Comments
For what it's worth, I cannot recreate this on my 8 core laptop.
|
I guess you also do not see anything suspicious in your CPU profiles? I ran the experiments again on my computer. This time with time and not /usr/bin/time. I get the same behavior:
real 0m9.167s
real 0m6.889s
real 0m11.681s Maybe it depends on the underlying Linux kernel. Here the information from my /proc/version: Please let me know when you need more information about my setup or whether I should try something out on my computer. |
For the record, E3-1230 V2 has 4 physical cores. |
@ALTree: Sorry, I missed this detail, Hyperthreading is enabled. The following information from the CPU profile (with the flag -cpu 8) might be helpful: (pprof) peek futex (pprof) list futex I find the 1.71s after the SYSCALL extremely high. But as I said, I have no knowledge about the internals of the Go scheduler. It is only an educated guess that this is related to the running times on my computer. |
What version of Go are you using (
go version
)?Go 1.13
Does this issue reproduce with the latest release?
I made similar obsverations with older Go versions (go 1.10.x,, go1.11.x, and go1.12.x).
What operating system and processor architecture are you using (
go env
)?System: Ubuntu 18.04.3 LTS
Kernel: Linux 4.15.0-60-generic
Processor: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz (8 cores)
go env
OutputWhat did you do?
Iteratively solving two tasks in parallel on a multi-core computer in two separate goroutines. The tasks are independent of each other. At the end of an iteration, the tasks' results are combined. Both tasks have similar running times, but they are not identical. Both tasks allocate local memory (this seems to be crucial for the problem!).
Whether using a wait group or channels at the end of an iteration does not seem to matter. The code below uses a wait group. See also
main.txt.
What did you expect to see?
I expect that the program runs faster on >=2 CPU cores than on a single CPU core, since the tasks are carried out in parallel. I also expect that the running times are similar on 2 and 8 CPU cores. There might be some scheduling overhead when using 8 CPU cores.
Note that -work flag allows you to control the hardness of the problem in the iteration. This parameter will depend on the computer you are using. The default parameters worked well on my computer for the issue.
What did you see instead?
The program runs indead faster with 2 CPU cores than with a single CPU core. However, on 8 CPU cores it runs significantly slower (9 seconds compared to 13 seconds; the system time as output by /usr/bin/time also increases significantly). See the file
output.txt/
for a summary of multiple runs.
Looking at the CPU profiles, one see a significant increase by runtime.futex. It seems that the goroutines are interrupted by a futex syscall, which originates from the memory allocation in one of the goroutines. Afterwards, the Go scheduler then tries to find a runnable goroutine. However, I am not very familiar with futexes and the Go scheduler. So, I might be completely wrong here.
Overall, I would have expected a far smaller scheduling overhead and unused resources (i.e., CPU cores) should not slow down the program.
The text was updated successfully, but these errors were encountered: