Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/build: add MSAN and ASAN builders #70054

Closed
mknyszek opened this issue Oct 25, 2024 · 20 comments
Closed

x/build: add MSAN and ASAN builders #70054

mknyszek opened this issue Oct 25, 2024 · 20 comments
Assignees
Labels
Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done. new-builder
Milestone

Comments

@mknyszek
Copy link
Contributor

I recently broke ASAN mode because the tests in cmd/cgo/internal/testsanitizers are too simple and can't catch issues with Go's instrumentation of user code or issues with the instrumentation of the Go runtime. In this particular case, it was the latter: some allocation codepaths were broken under ASAN (did not notify ASAN of changes correctly), but more sophisticated allocation patterns were required to trigger the issue.

I propose we set up ASAN and MSAN builders, akin to race mode builders, that run all Go tests under MSAN and ASAN. At the very least, this will catch problems with how Go hooks into the sanitizer runtimes. I suspect these weren't set up in the past because ASAN/MSAN do little to help catch issues in Go code (not much opportunity for a use-after-free with a garbage collector, nor is uninitialized memory access likely to occur in regular Go code (I'm not even sure if we instrument Go code for the latter at all)). As long as we have spare machine resources, I think it's worth it to try and catch these lower-level issues early. However, since they're not super high value builders, I think running them only in postsubmit is sufficient.

@gopherbot gopherbot added the Builders x/build issues (builders, bots, dashboards) label Oct 25, 2024
@gopherbot gopherbot added this to the Unreleased milestone Oct 25, 2024
@gabyhelp
Copy link

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/622322 mentions this issue: main.star: add linux-amd64-asan-clang15 and linux-amd64-msan-clang15

@dmitshur dmitshur added the NeedsFix The path to resolution is known, but the work has not been done. label Oct 25, 2024
@dmitshur dmitshur moved this to In Progress in Go Release Oct 25, 2024
gopherbot pushed a commit to golang/build that referenced this issue Oct 26, 2024
This change adds two new builder types. It depends on
https://chromium-review.googlesource.com/c/infra/infra/+/5967115 to
function.

The goal of this change is to test ASAN and MSAN at a baseline, to make
sure the Go parts work. We recently broke ASAN but didn't notice until
we got feedback from someone else; our internal tests are too
simplistic.

While we're here, sort the RUN_MODS list in the source code.

For golang/go#70054.

Change-Id: Ib515a2c908fbfdc58abf6e9d4c9c7a170bec0499
Reviewed-on: https://go-review.googlesource.com/c/build/+/622322
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
@mauri870
Copy link
Member

Currently there are many tests that fail with ASAN/MSAN enabled, I looked a bit into this in separate issues. It seems that most of the failures are related to the number of allocations.

@mknyszek
Copy link
Contributor Author

Yeah, I expected some issues. The alloc numbers are probably because of when the instrumentation happens (before inlining?) and so some things don't get inlined as expected, and we get some more allocs. (Correct me if I'm wrong, anyone.)

I don't think fixing the allocs is worth it; we should just skip those tests. There are other tests we should just skip as well. I'll send CLs today.

@mknyszek
Copy link
Contributor Author

Actually... that doesn't really make sense. Inlining and escape analysis happen long before SSA. Maybe it's calls in the runtime that aren't properly noescape?

@mknyszek
Copy link
Contributor Author

Okay, drilling down on one of the tests, TestAppendOfMake, reveals at least part of the problem: the instrumentation seems to defeat an optimization to eliminate runtime.makeslice. This doesn't happen with race mode, so there must be some handling for it already.

@mknyszek
Copy link
Contributor Author

Just kidding, I totally missed that the test is skipped entirely if !race. Great, that makes this much easier -- no messing around with compiler optimizations, just skip.

@mauri870
Copy link
Member

Thanks for investigating! I believe I already have a branch with changes to skip the tests, but there are so many skips that, at some point, it felt like I wasn’t doing things right.

I can open a CL with the changes. Let me know if you want to send one yourself.

@mauri870
Copy link
Member

mauri870 commented Oct 28, 2024

That is one of the reasons we introduced internal packages for msan/asan as well

@mknyszek
Copy link
Contributor Author

Thanks. I missed your message; I just sent a CL with (almost) all the skips.

A good chunk of the skips do align with -race so I'm not too worried about it. It might be "better" in some sense to make -asan and -msan not defeat quite so many optimizations in certain specific cases, but my immediate goal is to just stop breaking them. I can file a new issue to track any places where, for example, -asan falls behind other modes (that definitely seems like the outlier).

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/622855 mentions this issue: all: skip allocation counting tests with -asan and -msan

@mknyszek
Copy link
Contributor Author

OK, so the vast majority of test changes are skips, but there were some real bugs with ASAN that just weren't caught because we weren't running all of our tests. I've fixed a bunch of them, but there may be more to fix. Good progress so far, definitely much cleaner than when I started.

@github-project-automation github-project-automation bot moved this from In Progress to Done in Go Release Oct 28, 2024
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/623176 mentions this issue: runtime: skip TestNewOSProc0 with asan and msan

gopherbot pushed a commit that referenced this issue Oct 29, 2024
These fail for the same reason as for the race detector, and is the most
frequently failing test in both.

For #70054.
For #64257.
For #64256.

Change-Id: I3649e58069190b4450f9d4deae6eb8eca5f827a3
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-asan-clang15,gotip-linux-amd64-msan-clang15
Reviewed-on: https://go-review.googlesource.com/c/go/+/623176
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@mknyszek
Copy link
Contributor Author

On a whim I just tried skipping TestMemmoveOverflow since it looked fishy and always seemed to be running in parallel with failing test (=== CONT in the output). Looks like that's the culprit. The runtime tests with stress2 don't fail if I skip it.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/623336 mentions this issue: runtime: skip TestMemmoveOverflow with asan

gopherbot pushed a commit that referenced this issue Oct 29, 2024
On a whim I decided to investigate the possibility of whether the
flakiness on the asan builder was due to a concurrently executing test.
Of the most recent failures there were a few candidates, and this test
was one of them. After disabling each candidate one by one, we had a
winner: this test causes other concurrently executing tests, running
pure Go code, to spuriously fail.

I do not know why yet, but this test doesn't seem like it would have
incredibly high value for ASAN, and does funky things like MAP_FIXED in
recently unmapped regions, so I think it's fine.

For #70054.
For #64257.

Change-Id: Ib9a84d9b69812e76c390d99b00698710ee1ece1a
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-asan-clang15
Reviewed-on: https://go-review.googlesource.com/c/go/+/623336
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
@dmitshur
Copy link
Contributor

dmitshur commented Nov 1, 2024

Thanks for adding the builders and fixing tests at tip. The builders are currently failing on release branches 1.23 and 1.22: (CC @cagedmantis)

https://ci.chromium.org/p/golang/g/go-go1.23/console
https://ci.chromium.org/p/golang/g/go-go1.22/console

@mknyszek Do you think it's viable and worthwhile to try to fix the release branches too? Or is it better to just constrain these new builders to only run at 1.24 branch and newer? Making these new builders 1.24+ only seems fine to me, but up to you.

@mknyszek
Copy link
Contributor Author

mknyszek commented Nov 1, 2024

Hm... the benefit of backporting is to make sure that backports don't break MSAN/ASAN. But I think such a situation is quite rare. I also think it would be fine to make them 1.24+ only, though if it turns out the backports are easy then why not. I can try to just backport the changes naively and see what happens.

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/623957 mentions this issue: [release-branch.go1.23] all: skip and fix various tests with -asan and -msan

@gopherbot
Copy link
Contributor

Change https://go.dev/cl/627536 mentions this issue: main.star: remove non-tip ASAN and MSAN builders

gopherbot pushed a commit to golang/build that referenced this issue Nov 22, 2024
Backporting the patches necessary to make these builders pass seem not
worth the effort. The main problem is the patches touch many parts of
std, which results in a lot of conflicts.

Just disable them for now.

For golang/go#70054.

Change-Id: I5a304dbe71b4887d0a59ec2dbc7b1cc6e9129804
Reviewed-on: https://go-review.googlesource.com/c/build/+/627536
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
gopherbot pushed a commit to golang/build that referenced this issue Jan 23, 2025
linux-arm64 is a first class port with no coverage of ASAN or MSAN
modes. We already have a clang15 builder, so adding ASAN and MSAN should
be trivial (fingers crossed).

For golang/go#71395.
For golang/go#70054.

Change-Id: I6a6a636c7a41147b1b22933db946ca838d3696f4
Reviewed-on: https://go-review.googlesource.com/c/build/+/643918
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/643918 mentions this issue: main.star: add linux-arm64 ASAN/MSAN builders

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Builders x/build issues (builders, bots, dashboards) NeedsFix The path to resolution is known, but the work has not been done. new-builder
Projects
Archived in project
Development

No branches or pull requests

5 participants