Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: SSA sometimes generates far more write barriers #14869

Closed
aclements opened this issue Mar 18, 2016 · 4 comments
Closed

cmd/compile: SSA sometimes generates far more write barriers #14869

aclements opened this issue Mar 18, 2016 · 4 comments

Comments

@aclements
Copy link
Member

I did a comparison of the write barriers produced when compiling cmd/compile with and without SSA. Sometimes it generates dramatically more write barriers than the old backend. Here are the top movers:

+235 cmd/internal/obj/ppc64.buildop(SB)
+72 cmd/internal/obj/mips.buildop(SB)
+55 cmd/internal/obj/arm.buildop(SB)
+14 cmd/compile/internal/amd64.ssaGenBlock(SB)
+10 cmd/compile/internal/ssa.rewriteValuegeneric_OpStructSelect(SB)
+9 cmd/compile/internal/ssa.decomposeBuiltIn(SB)
+8 cmd/compile/internal/ssa.nilcheckelim(SB)
+6 runtime.getitab(SB)
+5 runtime.(_TypeAssertionError).Error(SB)
+4 runtime.assertI2T(SB)
+4 reflect.Value.MapKeys(SB)
+4 fmt.(_pp).printReflectValue(SB)
+4 cmd/compile/internal/ssa.(_Value).LongString(SB)
+4 cmd/compile/internal/ssa.(_edgeState).setup(SB)
+3 runtime.assertE2T(SB)
+3 reflect.(*structType).FieldByNameFunc(SB)
+3 cmd/internal/obj.Linknew(SB)
+3 cmd/compile/internal/gc.typecheck1(SB)
+3 cmd/compile/internal/gc.Main(SB)

I looked into a few of these and it looks like the old backend is generating one call to typedmemmove in cases where SSA is generating a huge number of calls to writebarrierptr. In the smaller movers like runtime.getitab this may actually be the right thing; if there are just a few pointers, writebarrierptr is better. But 235 more write barriers is probably not the right thing.

This was generated with the following script:

#!/bin/bash

awk '
function dump() {
  if (fn != "")
    counts[fn][file] = n
}
/^TEXT/ { dump(); fn=$2; n=0 }
/CALL runtime.writebarrierptr/ { n++ }
END {
  dump();
  for (fn in counts) {
    printf("%+d %s\n", counts[fn]["b"] - counts[fn]["a"], fn)
  }
}
' file=a <(go tool objdump "$1") file=b <(go tool objdump "$2") | sort -gr

/cc @khr @dr2chase

@randall77
Copy link
Contributor

For cmd/internal/obj/ppc64.buildop at least, we're using writebarrierptr instead of typedmemmove for a slice assignment.
With ssa, that function uses 237 writebarrierptrs and 0 typedmemmove.
Without ssa, the same function uses 2 writebarrierptrs and 235 typedmemmoves.
There is no bug in this function, at least. Did you see many writebarrierptrs substituted for a single typedmemmove anywhere else? I believe the worst the SSA code does currently is 4 for 1.

@aclements
Copy link
Member Author

You're right. Second column is diff in typedmemmove calls:

+235 -235 cmd/internal/obj/ppc64.buildop(SB)
+72 -72 cmd/internal/obj/mips.buildop(SB)
+55 -55 cmd/internal/obj/arm.buildop(SB)
+14 -7 cmd/compile/internal/amd64.ssaGenBlock(SB)
+10 -10 cmd/compile/internal/ssa.rewriteValuegeneric_OpStructSelect(SB)
+9 +0 cmd/compile/internal/ssa.decomposeBuiltIn(SB)
+8 -4 cmd/compile/internal/ssa.nilcheckelim(SB)
+6 -6 runtime.getitab(SB)
+5 -5 runtime.(*TypeAssertionError).Error(SB)
+4 -4 runtime.assertI2T(SB)
+4 -4 cmd/compile/internal/ssa.(*Value).LongString(SB)

Looks like SSA is doing much better than the old backend in general. The only weird one is decomposeBuiltIn, where SSA generating two writebarrierptrs for each mapaccess/append/mapassign line instead of one.

@randall77
Copy link
Contributor

Looks like decomposeBuiltIn is generating a write barrier for a write to the stack. I'll see if I can repro in an independent test case.

@randall77
Copy link
Contributor

I'm going to close this issue, follow along with the remaining write barrier optimizations at #14969 (no pointer writeback for non-growing appends) and #15023 (extra write barrier for map access+append).

@golang golang locked and limited conversation to collaborators Mar 29, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants