Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/tools/internal/lsp: frequent out-of-memory errors on windows-amd64-race builder #33951

Closed
bcmills opened this issue Aug 29, 2019 · 16 comments
Closed
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge gopls Issues related to the Go language server, gopls. NeedsFix The path to resolution is known, but the work has not been done. Performance Testing An issue that has been verified to require only test changes, not just a test failure. Tools This label describes issues relating to any tools in the x/tools repository.

Comments

@bcmills
Copy link
Contributor

bcmills commented Aug 29, 2019

The tests under x/tools/internal/lsp flake frequently on the windows-amd64-race builder with out of memory errors:
https://build.golang.org/log/23e29f4766fa39ac7b7945a729cec7d6ed82a41d
https://build.golang.org/log/086b928e2af996b8955666acce9c2c36ff7f82e8
https://build.golang.org/log/1c6543c8679c8710e897463a62921a3bddfa9a2e

The flakiness of these tests makes it too easy to ignore failures and miss real bugs (such as #31749) detected by the same builder:
https://build.golang.org/log/2c3d2505bf5145815ce4fc57a449c39a46a32cf2

See also #30309, #32834.

CC @ianthehat @stamblerre

@bcmills bcmills added Performance Testing An issue that has been verified to require only test changes, not just a test failure. Builders x/build issues (builders, bots, dashboards) NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. gopls Issues related to the Go language server, gopls. labels Aug 29, 2019
@gopherbot gopherbot added this to the Unreleased milestone Aug 29, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Aug 30, 2019

I'm not sure whether this problem is getting worse, or we're just having a run of bad luck, but we've got five of what appear to be OOM failures in a row now, four of them on https://golang.org/cl/184165.

(https://build.golang.org/?repo=golang.org%2fx%2ftools#short)

CC @matloob @heschik

@bcmills bcmills added the Soon This needs to be done soon. (regressions, serious bugs, outages) label Sep 3, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Sep 3, 2019

Definitely not just a run of bad luck. x/tools went from mostly passing on windows-amd64-race to mostly failing at some point around CL 192277 or CL 184165. I'm guessing that one of those changes increased the memory footprint of the test above some critical threshold.

@stamblerre stamblerre added the Tools This label describes issues relating to any tools in the x/tools repository. label Sep 9, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Nov 6, 2019

These also occur regularly on the windows-amd64-longtest builder: https://build.golang.org/log/4837bb3076d9cee69a232dba45f4e4a9a939cf03

@matloob
Copy link
Contributor

matloob commented Nov 6, 2019

Okay. Can the builders' sizes be increased?

@bcmills
Copy link
Contributor Author

bcmills commented Nov 6, 2019

Looks like they are currently n1-highcpu-4 ,¹ ² which is only 3.6 GB per instance, so probably they should be made substantially larger.

@dmitshur upsized the corresponding linux builders to n1-highcpu-16 in CL 192679.

(CC @bradfitz @toothrot @cagedmantis)

¹https://github.com/golang/build/blob/0babffaf2095de16932bb07c70d58a2730a161fd/dashboard/builders.go#L2029-L2058
²https://github.com/golang/build/blob/0babffaf2095de16932bb07c70d58a2730a161fd/dashboard/builders.go#L421-L441

@dmitshur
Copy link
Contributor

dmitshur commented Nov 7, 2019

I agree we should increase the memory for the -race builder at the very least. In order to do that, we need to make a decision about the machine type to use next.

Right now all the windows/amd64 hosts use n1-highcpu-4 machine type, since they were added in 2017 in CL 41393. This includes the normal builders, the -race builder, and the -longtest builders.

Highcpu is described as:

High-CPU machine types are ideal for tasks that require a moderate increase of vCPUs relative to memory. High-CPU machine types have 0.90 GB of memory per vCPU.

In contrast to standard:

Standard machine types have 4 GB of memory per vCPU.

So, we can consider changing the windows -race builder to use either n1-highcpu-8, or n1-standard-4. Here's a comparison (based on https://cloud.google.com/compute/docs/machine-types and https://cloud.google.com/compute/vm-instance-pricing):

Machine vCPUs RAM Price
n1-highcpu-4 (Current) 4 3.6 GB $0.14
n1-highcpu-8 (Option 1) 8 7.2 GB $0.28
n1-standard-4 (Option 2) 4 15 GB $0.19

I would prefer for us to make smaller changes, and iterate based on the results. My current impression is that the standard machine type would be a better fit for our needs, so I propose we change all the windows hosts to use n1-standard-4 (Option 2) instead of the current n1-highcpu-4. Both options seem quite close so I'm on the fence. Feedback welcome. Otherwise we can proceed with that plan as the next step.

@dmitshur dmitshur added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Nov 7, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Nov 7, 2019

I would like to have lower-latency longtest slowbots, so (budget permitting) I have a sight preference for the highcpu configuration.

(But really, anything is better than OOMing at head, so I'm not going to hold things up if folks would prefer a different option.)

@dmitshur
Copy link
Contributor

dmitshur commented Nov 7, 2019

Bryan, how does n1-highcpu-8 for -longtest and n1-standard-4 for the rest (normal and -race) sound?

@bcmills
Copy link
Contributor Author

bcmills commented Nov 7, 2019

I would be inclined to do n1-highcpu-8 for -longtest and -race and keep n1-highcpu-4 for the rest, although it's possible that -race will need more than 7.2GB to run x/tools.

(If the “short” builders are doing fine today with only 3.6 GB, I'm doubtful that throwing more RAM at them will do much, although I suppose it could improve filesystem buffering performance.)

@bcmills
Copy link
Contributor Author

bcmills commented Nov 7, 2019

FWIW, it looks like freebsd-amd64-race is doing fine on n1-highcpu-4, so it may be worth some time for the x/tools folks to figure out why Windows isn't managing. (Are some tests being unexpectedly skipped on FreeBSD? Is the platform just that much more efficient?)

@bcmills
Copy link
Contributor Author

bcmills commented Nov 7, 2019

I'm having a hard time following the configuration for linux-amd64-race, but I believe it may be on an n1-standard-4 (due to running in a container).

@dmitshur
Copy link
Contributor

it may be worth some time for the x/tools folks to figure out why Windows isn't managing. (Are some tests being unexpectedly skipped on FreeBSD? Is the platform just that much more efficient?)

Issue #33986 is relevant for that.

I'm having a hard time following the configuration for linux-amd64-race, but I believe it may be on an n1-standard-4 (due to running in a container).

I believe that is accurate.

I would be inclined to do n1-highcpu-8 for -longtest and -race and keep n1-highcpu-4 for the rest,

This sounds good to me as a small incremental step to try here. Will mail a CL.

@gopherbot
Copy link

Change https://golang.org/cl/208500 mentions this issue: dashboard: upsize machine type for windows-amd64-{longtest,race} builders

gopherbot pushed a commit to golang/build that referenced this issue Nov 25, 2019
…ders

Start using n1-highcpu-8 machine type instead of n1-highcpu-4
for windows-amd64-longtest and windows-amd64-race builders.
Continue to use the existing n1-highcpu-4 machine type for all
other windows/386 and windows/amd64 builders for now.

The primary motivation of this change is to increase amount of
RAM available on the longtest builder from 3.6 GB to 7.2 GB,
because it is consistenly failing due to being out of memory.

Also fix minor typos, and increase consistency in comments and notes.

Updates golang/go#33951

Change-Id: Iac5da67fa584059842850a9c8deda98c242741f2
Reviewed-on: https://go-review.googlesource.com/c/build/+/208500
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
@dmitshur
Copy link
Contributor

Looks like the CL above helped; we got our first near-pass of x/tools on windows-amd64-longtest builder in this recent run. It found #35839.

@dmitshur dmitshur removed the Soon This needs to be done soon. (regressions, serious bugs, outages) label Nov 26, 2019
@stamblerre stamblerre modified the milestones: Unreleased, gopls unplanned Dec 4, 2019
@gopherbot
Copy link

Change https://golang.org/cl/213557 mentions this issue: dashboard: upsize windows-amd64-{longtest,race} builders to 14.4 GB RAM

gopherbot pushed a commit to golang/build that referenced this issue Jan 7, 2020
Start using n1-highcpu-16 machine type instead of n1-highcpu-8
for windows-amd64-longtest and windows-amd64-race builders.

The primary motivation of this change is to increase amount of
RAM available on the race builder from 7.2 GB to 14.4 GB.
CL 208500 has helped by going from 3.6 to 7.2, but empirically
8 GB is not enough for all tests to pass, while 16 GB can be.
We already use 16 GB for some other race and/or long builders.

Updates golang/go#35186
Updates golang/go#33951
Updates golang/go#33986

Change-Id: I8031a517e6e44ce5e5faba085e7c781a1fcab92e
Reviewed-on: https://go-review.googlesource.com/c/build/+/213557
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
@dmitshur dmitshur self-assigned this Jan 13, 2020
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Jan 13, 2020
@dmitshur
Copy link
Contributor

dmitshur commented Jan 13, 2020

FWIW, it looks like freebsd-amd64-race is doing fine on n1-highcpu-4, so it may be worth some time for the x/tools folks to figure out why Windows isn't managing. (Are some tests being unexpectedly skipped on FreeBSD? Is the platform just that much more efficient?)

Those are good questions, but they are hard to answer without a resolution to #33986. I think that issue can be used to track this work.

The freebsd-amd64-race builder is no longer okay with just 3.6 GB, as reported in issue #36444. The current plan there is to increase its size to 7.2 GB, which is still half of what Windows needed to be able to pass (see issue #35186 for details).

I'll close this issue as resolved via CL 213557 because there aren't frequent OOMs in x/tools/internal/lsp on windows-amd64-longtest builder anymore. See #35186 (comment) specifically.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Builders x/build issues (builders, bots, dashboards) FrozenDueToAge gopls Issues related to the Go language server, gopls. NeedsFix The path to resolution is known, but the work has not been done. Performance Testing An issue that has been verified to require only test changes, not just a test failure. Tools This label describes issues relating to any tools in the x/tools repository.
Projects
None yet
Development

No branches or pull requests

5 participants