Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: avoiding zeroing new allocations when possible #24926

Open
josharian opened this issue Apr 18, 2018 · 11 comments
Open

cmd/compile: avoiding zeroing new allocations when possible #24926

josharian opened this issue Apr 18, 2018 · 11 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Milestone

Comments

@josharian
Copy link
Contributor

Consider:

func f() *int {
  x := new(int)
  *x = 1
  return x
}

The first line gets translated into x = newobject(type-of-int64), which calls mallocgc with a "needszero" argument of true. But it doesn't need zeroing: it has no pointers, and data gets written to the whole thing.

Same holds for:

func f() *[2]int {
  x := new([2]int)
  x[0] = 1
  x[1] = 2
  return x
}

and more interestingly:

func f() *[1024]int {
  x := new([1024]int)
  for i := range x {
    x[i] = i
  }
  return x
}

We could detect such scenarios in the SSA backend and replace the call to newobject to a call to a (newly created) newobjectNoClr, which is identical to newobject except that it passes false to mallocgc for needszero.

Aside: The SSA backend already understands newobject a little. It removes the pointless zero assignment from:

func f() *[2]int {
  x := new([2]int)
  x[0] = 0 // removed
  return x
}

although not from:

func f() *[2]int {
  x := new([2]int)
  x[0] = 1
  x[1] = 0 // not removed, but could be
  return x
}

Converting to newobjectNoClr would probably require a new SSA pass, in which we put values in store order, detect calls to newobject, and then check whether subsequent stores obviate the need for zeroing. And also at the same time eliminate unnecessary zeroing that the existing rewrite rules don't cover.

This new SSA pass might also someday grow to understand and rewrite e.g. calls to memmove and memequal with small constant sizes.

It is not obvious to me that this pass would pull its weight, compilation-time-wise. Needs experimentation. Filing an issue so that I don't forget about it. :)

@TocarIP
Copy link
Contributor

TocarIP commented Apr 18, 2018

We already inline memmove for small constant size in generic.rules.
So if we introduce a newobjectNoClr, we could probably also rewrite newobject with rules. which may remove some overhead of the full pass.

@josharian
Copy link
Contributor Author

There are two problems with doing it with rules.

(1) What we need to do is modify the OpStaticCall's Aux value and leave the overall set of values unchanged; that's hard to do with rules. The first half you can fake by having a condition that has a side-effect, but the second half is not currently possible in a clean way. Maybe we could have a magic "origv" RHS value or some such.

(2) For the *[2]int case, you need nested rules to check that both ints are overwritten. For the *[3]int case you need deeper nested rules. You see the problem. :)

@josharian
Copy link
Contributor Author

josharian commented Apr 19, 2018

Hmm. Another possible use for an rtcall pass: slicebytetostring.

Consider something like:

func f(b []byte) int {
	s := string(b[:4])
	return len(s) // or do something else non-escaping with s
}

This ends up compiling to runtime.slicebytetostring(SP+48, b.ptr, 4, b.cap). This is equivalent to memmove(SP+48, b.ptr, 4). And that can be simplified further in turn.

This optimization is hard to do during walk, because b[:4] gets rewritten into an autotmp during order, and we don't rediscover the constant length until SSA.

@josharian
Copy link
Contributor Author

cc @randall77 for opinions in general about this

@dotaheor
Copy link

please also do this for make: #23905

@randall77
Copy link
Contributor

I'd like to see removal of zeroing if we know the object will be completely overwritten.
It could be implemented using an upgraded deadstore pass. We can check at each allocation whether all its fields are shadowed.
Pointers fields are a bit tricky because of the write barriers. I think we can only do it for completely scalar objects.

@josharian
Copy link
Contributor Author

Re: ptr-containing objects, do you think #24928 would do the trick?

@randall77
Copy link
Contributor

I'm not sure. I think it's probably faster to zero a whole object than to selectively zero fields. Unless there are very large sections which don't have pointers, but that seems rare, and detecting subsequent complete overwriting of such objects also seems hard.

josharian added a commit to josharian/go that referenced this issue Dec 30, 2018
related: golang#24926

Change-Id: I3015ccf2ad9b3fd7f5db7c31806b95aebeab61e1
josharian added a commit to josharian/go that referenced this issue Dec 30, 2018
related: golang#24926

Change-Id: I3015ccf2ad9b3fd7f5db7c31806b95aebeab61e1
@josharian
Copy link
Contributor Author

Regarding the original topic, another approach is to generate different code in the front-end. See
#29446 (comment).

josharian added a commit to josharian/go that referenced this issue May 7, 2019
related: golang#24926

Change-Id: I3015ccf2ad9b3fd7f5db7c31806b95aebeab61e1
josharian added a commit to josharian/go that referenced this issue May 7, 2019
related: golang#24926

Change-Id: I3015ccf2ad9b3fd7f5db7c31806b95aebeab61e1
@gopherbot
Copy link

Change https://golang.org/cl/367496 mentions this issue: cmd/compile: add memcrlelim SSA optimization pass

@quasilyte
Copy link
Contributor

I mentioned some benchmark results in the CL comment.

Posting the benchmarks themselves here.

package example

import (
	"testing"
)

type smallStruct struct {
	f0 int64
	f1 float64
	f2 int32
	f3 int32
}

type averageStruct struct {
	f0 uint64
	flags [8]bool
	small [4]smallStruct
}

//go:noinline
func newSmallStruct(f0 int64, f1 float64, f2, f3 int32) *smallStruct {
	return &smallStruct{
		f0: f0,
		f1: f1,
		f2: f2,
		f3: f3,
	}
}

//go:noinline
func newAverageStruct(v uint64, s0, s1, s2, s3 *smallStruct) *averageStruct {
	return &averageStruct{
		f0: v,
		flags: [8]bool{true, true, true, true, true, true, true, true},
		small: [4]smallStruct{*s0, *s1, *s2, *s3},
	}
}

//go:noinline
func newInt(v int) *int {
	p := new(int)
	*p = v
	return p
}

//go:noinline
func newIntSlice1(v0 int) []int {
	return []int{v0}
}

//go:noinline
func newIntSlice4(v0, v1, v2, v3 int) []int {
	return []int{v0, v1, v2, v3}
}

//go:noinline
func newIntSlice8(v0, v1, v2, v3, v4, v5, v6, v7 int) []int {
	return []int{v0, v1, v2, v3, v4, v5, v6, v7}
}

//go:noinline
func newIntSlice32(v0, v1, v2, v3, v4, v5, v6, v7 int) []int {
	return []int{
		v0, v1, v2, v3, v4, v5, v6, v7,
		v0, v1, v2, v3, v4, v5, v6, v7,
		v0, v1, v2, v3, v4, v5, v6, v7,
		v0, v1, v2, v3, v4, v5, v6, v7,
	}
}

func BenchmarkNewInt(b *testing.B) {
	for i := 0; i < b.N; i++ {
		newInt(i)
	}
}

func BenchmarkNewIntSlice1(b *testing.B) {
	for i := 0; i < b.N; i++ {
		newIntSlice1(i)
	}
}

func BenchmarkNewIntSlice4(b *testing.B) {
	for i := 0; i < b.N; i++ {
		newIntSlice4(i, 0, i, 0)
	}
}

func BenchmarkNewIntSlice8(b *testing.B) {
	for i := 0; i < b.N; i++ {
		newIntSlice8(i, 0, i, 0, i, 0, i, 0)
	}
}

func BenchmarkNewIntSlice32(b *testing.B) {
	for i := 0; i < b.N; i++ {
		newIntSlice32(i, 0, i, 0, i, 0, i, 0)
	}
}

func BenchmarkNewSmallStruct(b *testing.B) {
	for i := 0; i < b.N; i++ {
		newSmallStruct(int64(i), 1.5, 1, 2)
	}
}

func BenchmarkNewAverageStruct(b *testing.B) {
	var s [4]smallStruct
	for i := 0; i < b.N; i++ {
		newAverageStruct(uint64(i), &s[0], &s[1], &s[2], &s[3])
	}
}

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. Performance
Projects
None yet
Development

No branches or pull requests

6 participants