New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: large initial values for GOMAXPROCS can cause runtime test failures #47246
Comments
TestGcSys has been flaky, see #37331 . Is there any other failure other than this one? I doubt this is PPC64-specific. |
This fails consistently on a system with > 400 processors. That means by default ./all.bash fails on that system. |
@laboger do you see any other test failing? TestGcSys is a known flaky test. |
I'm still looking.... Some are taking much longer with more GOMAXPROCS, so I'm trying to determine if they are hanging. |
Here is a test that timed out so I wasn't sure if it was a hang, turns out it just takes a lot longer as the GOMAXPROCS increase. This system has 448. Not sure if that is expected behavior for the test.
|
@laboger Oof. I do think I know for certain that the runtime is going to have some scalability issues at that many processors. The scheduler and allocator don't take advantage of NUMA at all, for instance. As Cherry points out, |
I understand, I'm just suggesting that the default for GOMAXPROCS could have an upper limit instead of always basing it on the number of processors on the system, since there known issues and performance hits above a certain point. |
Talking with the team, the consensus was that we should just limit GOMAXPROCS in the tests that don't scale well. @prattmic has done benchmarks that show Go itself can scale pretty well to 512 and even 1024 GOMAXPROCS. This is obviously pretty workload-dependent, so I don't think we want to artificially limit GOMAXPROCS. But some of the tests just don't scale well, and we can keep those from timing out. |
I definitely don't think we should cap explicitly specified values of GOMAXPROCS, however perhaps it could be OK to cap the default value of GOMAXPROCS if there are serious performance issues for real-world applications (though hopefully we could work towards fixing those). I don't think this test is a realistic workload for that purpose, but I'd be curious to see other common applications. |
@mknyszek @aclements I had opened this issue back in 2021 related to GOMAXPROCS and it was interesting to hear others are hitting the problem on systems other than Power. After reviewing this I had a few thoughts after the meeting.
I recommend users on our big systems to use GOMAXPROCS=64. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
./all.bash
Run on systems with > 400 processors without a GOMAXPROCS setting.
On systems with fewer, set GOMAXPROCS=400.
What did you expect to see?
Correct test results.
What did you see instead?
When running on a system with > 400 processors, we are seeing some failures in the runtime package tests during ./all.bash that don't happen an explicit setting of GOMAXPROCS with a setting like 300. Some of these failures can be reproduced on systems with fewer processors if the initial GOMAXPROCS value is set to a number of about 400.
On a power9:
It seems that for ppc64le there should be a maximum default initial GOMAXPROCS value so that the runtime tests don't fail with the default setting.
We have seen other failures intermittently in the runtime package based on the GOMAXPROCS setting. Still trying to narrow down the conditions that cause those to fail.
The text was updated successfully, but these errors were encountered: