cmd/compile: move instrumention into an ssa pass? #19054

josharian · 2017-02-13T06:25:35Z

This is a naive question intended to open discussion.

race/msan instrumentation currently happens between walk and SSA generation. As a result, as we slowly migrate walk rewrites to SSA generation and/or SSA itself, racewalk will slowly fall out of sync and/or be a source of added work. And the simpler ontology of SSA might prove helpful in increasing instrumentation coverage.

Does it make sense to consider moving racewalk to a generic SSA pass? If so, opinions/considerations about implementation details?

I am not volunteering at the moment to do this work, although I did bump into this question while satisfying my curiosity about what it would be like to handle ORANGE directly during SSA generation. (Answer: It's a fair amount of work.)

cc @dvyukov @ianlancetaylor @randall77

dvyukov · 2017-02-13T08:52:51Z

Yes, it makes sense.

It should make the pass considerably simpler and hopefully eliminate all the corner cases that we still continue to hit episodically.

It may also help to implement some of the goodness we have in llvm. IIRC that PointerMayBeCaptured eliminated ~1/3 of instrumentation. Also deduplication at least within the same basic block:

void ThreadSanitizer::chooseInstructionsToInstrument(
    SmallVectorImpl<Instruction *> &Local, SmallVectorImpl<Instruction *> &All,
    const DataLayout &DL) {
  SmallSet<Value*, 8> WriteTargets;
  // Iterate from the end.
  for (Instruction *I : reverse(Local)) {
    if (StoreInst *Store = dyn_cast<StoreInst>(I)) {
      Value *Addr = Store->getPointerOperand();
      if (!shouldInstrumentReadWriteFromAddress(Addr))
        continue;
      WriteTargets.insert(Addr);
    } else {
      LoadInst *Load = cast<LoadInst>(I);
      Value *Addr = Load->getPointerOperand();
      if (!shouldInstrumentReadWriteFromAddress(Addr))
        continue;
      if (WriteTargets.count(Addr)) {
        // We will write to this temp, so no reason to analyze the read.
        NumOmittedReadsBeforeWrite++;
        continue;
      }
      if (addrPointsToConstantData(Addr)) {
        // Addr points to some constant data -- it can not race with any writes.
        continue;
      }
    }
    Value *Addr = isa<StoreInst>(*I)
        ? cast<StoreInst>(I)->getPointerOperand()
        : cast<LoadInst>(I)->getPointerOperand();
    if (isa<AllocaInst>(GetUnderlyingObject(Addr, DL)) &&
        !PointerMayBeCaptured(Addr, true, true)) {
      // The variable is addressable but not captured, so it cannot be
      // referenced from a different thread and participate in a data race
      // (see llvm/Analysis/CaptureTracking.h for details).
      NumOmittedNonCaptured++;
      continue;
    }
    All.push_back(I);
  }
  Local.clear();
}

mdempsky · 2017-02-13T19:15:51Z

Incidentally, I was looking at this some last week.

The big issue I had was whether to implement instrumentation as an SSA pass or during buildssa. Normally loads don't modify the memory variable, but when instrumented with a sanitizer function call they do. If we were to handle instrumentation as a pure SSA pass, it might need to rewrite memory variable references and insert phi nodes.

I did have a semi-working CL that emitted instrumentation calls during buildssa (basically just emitted a call before each OpLoad/OpStore/OpMove/OpZero), but it produced only about 80-90% as many calls. I haven't yet investigated why.

dvyukov · 2017-02-13T19:23:21Z

Normally loads don't modify the memory variable, but when instrumented with a sanitizer function call they do.

Implementing it like a normal pass sounds better long term. What's the problem with loads?

but it produced only about 80-90% as many calls

It may be good. First thing is to just run all race test and go test -race std. If that works, we just need to look at some cases where ssa pass does not insert instrumentation.

josharian · 2017-02-13T19:23:30Z

If reasonable, it'd be nice to do it as an SSA pass rather than during buildssa, for encapsulation and understandability reasons--buildssa is already unwieldy, and it will only grow from here.

Normally loads don't modify the memory variable, but when instrumented with a sanitizer function call they do.

It might take some work to make the scheduler happy with this, but could we simply leave sanitizer function calls as non-memory-modifying?

mdempsky · 2017-02-13T19:45:16Z

What's the problem with loads?

Basically the signatures of SSA Load and Store operations are:

Load :: (*T, Memory) -> T
Store :: (*T, T, Memory) -> Memory

SSA also requires that Memory objects be linearly threaded through the computation. (Similar to monads in Haskell.)

So Stores are necessarily linearized, and Loads are freely rearranged between two Stores. Also, Calls behave a lot like Stores:

Call :: (Function, Memory) -> Memory

Instrumenting a Store is easy, because it amounts to just rewriting:

mem2 = Store(ptr, val, mem1)

into:

memtmp = Call(racewrite, ptr)
mem2 = Store(ptr, val, memtmp)

Since Store already produced a new Memory value, introducing a new Memory temporary for the instrumentation call is a purely local rewrite.

Instrumenting Load is trickier though, because if we have something like:

x := 0
if y != nil {
    x = *y
}

that is currently something like:

x := Const(0)
if NEQ(y, Const(nil)) {
    x = Load(y, mem0)
}

But as an SSA pass, we would have to rewrite it to:

x := Const(0)
if NEQ(y, Const(nil)) {
    mem1 = Call(raceread, y, mem0)
    x = Load(y, mem1)
}
mem2 = Phi(mem0, mem1)
// subsequent references to mem0 have to also be updated to mem2

randall77 · 2017-02-13T19:59:10Z

I support moving race instrumentation to an SSA pass.

I was thinking about introducing a non-memory call for side-effect free calls. This would be generically useful for a bunch of runtime support calls, like complex division. We could use a similar call for instrumenting loads.
I've been waiting to pursue this idea until calling convention changes go in. Part of the reason calls take a memory argument is to make sure argument marshaling happens before the call does. That gets a lot easier with a register-based convention.
Of course, these are all special runtime calls, so we could special-case their calling conventions on an as-needed basis.

mdempsky · 2017-02-13T20:02:23Z

Naively (i.e., I still largely treat package ssa as a black box), being able to express side-effect free calls SGTM.

Until we have that though, is there any opposition to moving racewalk into buildssa? My WIP CL was relatively non-intrusive after refactoring out the OpLoad/OpStore construction patterns, and I think could still be migrated to a proper SSA pass later.

josharian · 2017-02-13T20:05:03Z

Of course, these are all special runtime calls, so we could special-case their calling conventions on an as-needed basis.

I can't decide whether this is exciting or horrifying. It might be helpful for race calls, so that they can check args/results without disrupting the original call.

Until we have that though, is there any opposition to moving racewalk into buildssa?

Fine by me.

randall77 · 2017-02-13T20:06:23Z

I guess I'm not really for moving the code twice if we can avoid it.
Do the race calls return anything? If not, we can just use a special call that takes a memory arg but returns void. That could be scheduled anywhere the input memory state is live. That's an easy change to the ssa backend.

mdempsky · 2017-02-13T20:54:15Z

Do the race calls return anything?

No. They just take an address and sometimes a size as input. No outputs.

dvyukov · 2017-02-14T09:53:15Z

Yes, not outputs, and the calls don't touch anything that can possibly concern the compiler. The only case when the calls callback into Go world is symbolization callback during race reporting, but it runs non-instrumented runtime code.
The calls can be moved if they don't cross any synchronization operations, but it is probably not very important.

griesemer · 2017-11-29T05:56:32Z

Too late. Moving to 1.11.

gopherbot · 2018-03-27T23:14:18Z

Change https://golang.org/cl/102817 mentions this issue: cmd/compile: insert instrumentation during SSA building

gopherbot · 2018-03-27T23:14:19Z

Change https://golang.org/cl/102816 mentions this issue: cmd/compile: refactor memory op constructors in SSA builder

gopherbot · 2018-03-27T23:14:21Z

Change https://golang.org/cl/102815 mentions this issue: cmd/compile: disable instrumentation for no-race packages earlier

mdempsky · 2018-03-28T03:17:37Z

@aclements I tried to dig up my old racewalk-during-buildssa experimental CLs, but couldn't find them. I recreated the general patch series though, and it seems to work even. :)

I've randomly surveyed and manually inspected the resulting instrumented code, and it looks right.

I'm a little curious why the number of race calls for cmd/go has gone down modestly (~5.6%), but the binary size has gone up somewhat (~2.4%). The change to reorder1 to spill all function arguments to temporary variables before assigning them to the parameter slots explains a little of the size growth, but much less than I expected: without that change (i.e., generating broken code) the binary size still goes up ~1.9%.

dvyukov · 2018-03-28T05:07:13Z

There seems to be some nice code size reduction! Thanks!

Do you know what was stripped in these ~5.6%? Can you please look at few functions and confirm that we don't remove anything useful? Is it duplicates?

I think the main motivation for moving this to SSA (besides ending infinite stream of bugs) was smarter analysis of redundant and unnecessary instrumentation. One of low hanging fruits is removing dups within a basic block:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Instrumentation/ThreadSanitizer.cpp#L357
Namely, we collect all reads/writes within a basic block with no intervening calls/atomiops and then dedup reads and writes and remove reads if there is a write to that address.

dvyukov · 2018-03-28T05:09:33Z

Re binary size: I would first check that it's indeed due to .text section (e.g. readelf -a shows sections). And then use nm -S to find few functions with the largest size increase, and then just look at them.

mdempsky · 2018-03-28T17:27:18Z

The binary size contribution seems to come from three sources:

Most significantly, the CL prevents SSA-ing structs with 2-4 fields. (Currently, the SSA construction phase handles loading a single field from an SSA-able struct as a load of the entire struct, and then extracting just the relevant field. Presumably this is meant to help with CSE, but it introduces false dependencies during instrumentation. I tried disabling just this optimization, but it caused other failures, so I went with the heavier solution of simply disallowing SSA of >1-field structs while instrumenting.)
The reorder1 changes I described above.
Additional spilling needed because of the instrumentation calls being inserted immediately before loads. E.g., previously, encoding/binary.bigEndian.Uint64 could still be optimized into an 8-byte load on amd64, but now it's compiled as 8 1-byte loads, most of which need to be spilled to live through the subsequent raceread calls.

1 and 2 could potentially be improved. 3 seems like it would be more involved.

I looked through the removed instrumentation calls. They seemed to be mostly uses of stack variables that isartificial couldn't identify. For example, when creating non-escaping function closures on the stack, we can now omit the racewrite calls while creating it.

The only place I noticed new instrumentation calls is when spilling parameters to the heap. For example:

func f(x int) *int { return &x }

is compiled as:

func f(x int) *int {
    px := new(int)
    *px = x
    return px
}

Instrumenting during SSA construction means we now see *px = x and insert a racewrite call, whereas before we didn't. (This could probably be optimized away too.)

dvyukov · 2018-03-28T17:34:43Z

Re (3), won't hoisting all race instrumentation to the beginning of a basic block (or a preceding function call whichever comes first) help? Doing analysis and instrumentation on call-less parts of basic blocks will also help with #19054 (comment). Not sure if it can also help with (1).

dvyukov · 2018-03-28T17:38:35Z

The only place I noticed new instrumentation calls is when spilling parameters to the heap.

Why wasn't it instrumented before?
Theoretically we only need instrumentation once a new object is published. Complete analysis can be hard, but a simpler version that only removes instrumentation after new to the new'ed object in the same BB and only before any calls, atomicops and assignments of the new'ed pointer to any other locations can be easier and still profitable (e.g. handle typical struct initialization).

mdempsky · 2018-03-28T18:01:11Z

Re (3), won't hoisting all race instrumentation to the beginning of a basic block (or a preceding function call whichever comes first) help?

Probably, but that sounds more involved than I have resources available at the moment. It's not trivial to reschedule function calls currently, or to insert them at arbitrary places.

My more immediate concern is the instrumentation pass is blocking us from moving optimizations from order/walk into SSA construction, where we expect they can be both more powerful and easier to maintain. I think that's worth a modest code size increase for instrumented builds, as long as performance is still acceptable.

Why wasn't it instrumented before?

Because paramstoheap adds the assignment to fn.Func.Enter, which instrument skips over ("nothing interesting for race detector in fn->enter").

Theoretically we only need instrumentation once a new object is published.

Ack, I thought about that, but we don't currently have that level of fine-grained information. Currently, we only track whether or not an object is ever published.

a simpler version that only removes instrumentation after new to the new'ed object in the same BB and only before any calls, atomicops and assignments of the new'ed pointer to any other locations can be easier and still profitable (e.g. handle typical struct initialization).

Yes, I think that would be good too.

Pulling these operations out separately so it's easier to add instrumentation calls. Passes toolstash-check. Updates #19054. Change-Id: If6a537124a87bac2eceff1d66d9df5ebb3bf07be Reviewed-on: https://go-review.googlesource.com/102816 Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>

Rather than checking for each function whether the package supports instrumentation, check once up front. Relatedly, tweak the logic for preventing inlining calls to runtime functions from instrumented packages. Previously, we simply disallowed inlining runtime functions altogether when instrumenting. With this CL, it's only disallowed from packages that are actually being instrumented. That is, now intra-runtime calls can be inlined. Updates #19054. Change-Id: I88c97b48bf70193a8a3ee18d952dcb26b0369d55 Reviewed-on: https://go-review.googlesource.com/102815 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>

josharian added the RaceDetector label Feb 13, 2017

josharian added this to the Go1.9Maybe milestone Feb 13, 2017

josharian modified the milestones: Go1.10, Go1.9Maybe Jul 6, 2017

josharian added the early-in-cycle A change that should be done early in the 3 month dev cycle. label Jul 6, 2017

griesemer modified the milestones: Go1.10, Go1.11 Nov 29, 2017

gopherbot closed this as completed in 17df5ed Apr 9, 2018

golang locked and limited conversation to collaborators Apr 9, 2019

gopherbot added the FrozenDueToAge label Apr 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: move instrumention into an ssa pass? #19054

cmd/compile: move instrumention into an ssa pass? #19054

josharian commented Feb 13, 2017

dvyukov commented Feb 13, 2017

mdempsky commented Feb 13, 2017

dvyukov commented Feb 13, 2017

josharian commented Feb 13, 2017

mdempsky commented Feb 13, 2017 •

edited

randall77 commented Feb 13, 2017

mdempsky commented Feb 13, 2017 •

edited

josharian commented Feb 13, 2017

randall77 commented Feb 13, 2017

mdempsky commented Feb 13, 2017

dvyukov commented Feb 14, 2017

griesemer commented Nov 29, 2017

gopherbot commented Mar 27, 2018

gopherbot commented Mar 27, 2018

gopherbot commented Mar 27, 2018

mdempsky commented Mar 28, 2018

dvyukov commented Mar 28, 2018

dvyukov commented Mar 28, 2018

mdempsky commented Mar 28, 2018 •

edited

dvyukov commented Mar 28, 2018

dvyukov commented Mar 28, 2018

mdempsky commented Mar 28, 2018

cmd/compile: move instrumention into an ssa pass? #19054

cmd/compile: move instrumention into an ssa pass? #19054

Comments

josharian commented Feb 13, 2017

dvyukov commented Feb 13, 2017

mdempsky commented Feb 13, 2017

dvyukov commented Feb 13, 2017

josharian commented Feb 13, 2017

mdempsky commented Feb 13, 2017 • edited

randall77 commented Feb 13, 2017

mdempsky commented Feb 13, 2017 • edited

josharian commented Feb 13, 2017

randall77 commented Feb 13, 2017

mdempsky commented Feb 13, 2017

dvyukov commented Feb 14, 2017

griesemer commented Nov 29, 2017

gopherbot commented Mar 27, 2018

gopherbot commented Mar 27, 2018

gopherbot commented Mar 27, 2018

mdempsky commented Mar 28, 2018

dvyukov commented Mar 28, 2018

dvyukov commented Mar 28, 2018

mdempsky commented Mar 28, 2018 • edited

dvyukov commented Mar 28, 2018

dvyukov commented Mar 28, 2018

mdempsky commented Mar 28, 2018

mdempsky commented Feb 13, 2017 •

edited

mdempsky commented Feb 13, 2017 •

edited

mdempsky commented Mar 28, 2018 •

edited