Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: type annotation ~ for stack allocated variables #37346

Closed
JOT85 opened this issue Feb 20, 2020 · 11 comments
Closed

proposal: Go 2: type annotation ~ for stack allocated variables #37346

JOT85 opened this issue Feb 20, 2020 · 11 comments
Labels
Milestone

Comments

@JOT85
Copy link

JOT85 commented Feb 20, 2020

Introduction

Within a typical Go program, a lot of heap allocations are made - which comes with significant overhead. One reason for this is interfaces. If an interface takes a reference to something - whether that be an explicit pointer, a slice, or something else - the compiler must assume that it escapes on to the heap.

How much overhead?

Firstly, the benefit of significantly reduced heap allocation is clear when you consider the implications to garbage collection time. If much less is on the heap, less time will be spent performing garbage collection.

To see how much of an overhead heap allocation actually has, I made this quick program:

package main

import (
	"fmt"
	"io"
	"time"
)

//Stealing a trick from the reflect package.
var dummy struct {
	b bool
	x []byte
}

//escape forces p to the heap, without doing much work.
func escapes(p []byte) {
	if dummy.b {
		dummy.x = p
	}
}

//ExampleReader copies as much of Data that it can to the dst on a call to Read.
type ExampleReader struct {
	Data []byte
}

//Read simply copies r.Data to dst
//Don't inline to prevent that from effecting the benchmarks.
//go:noinline
func (r *ExampleReader) Read(dst []byte) (int, error) {
	return copy(dst, r.Data), nil
}

//These functions all do the same: call the read method and check that the last byte is an 'o'
//readFrom directly uses the Read method.
func readFrom(r *ExampleReader) {
	buf := make([]byte, 4096)
	n, _ := r.Read(buf)
	b := buf[n-1]
	if b != 'o' {
		panic("o!")
	}
}

//readFromEscapey is the same as readFrom, except that the buffer escapes so must be heap allocated.
func readFromEscapey(r *ExampleReader) {
	buf := make([]byte, 4096)
	//force buf to be allocated on the heap
	escapes(buf)
	n, _ := r.Read(buf)
	b := buf[n-1]
	if b != 'o' {
		panic("o!")
	}
}

//readFromInterface is the same as readFrom, except that it uses the Read method from an interface, causing buf to escape.
func readFromInterface(r io.Reader) {
	buf := make([]byte, 4096)
	n, _ := r.Read(buf)
	b := buf[n-1]
	if b != 'o' {
		panic("o!")
	}
}

//Benchmark methods call the corresponding function n times.
func BenchmarkReadFrom(r *ExampleReader, n int) {
	for i := 0; i < n; i++ {
		readFrom(r)
	}
}
func BenchmarkReadFromEscapey(r *ExampleReader, n int) {
	for i := 0; i < n; i++ {
		readFromEscapey(r)
	}
}
func BenchmarkReadFromInterface(r io.Reader, n int) {
	for i := 0; i < n; i++ {
		readFromInterface(r)
	}
}

func main() {
	r := ExampleReader{
		Data: []byte("Hello"),
	}

	//Time each of the functions.
	const n = 10000000
	start := time.Now()
	BenchmarkReadFromInterface(&r, n)
	fmt.Println(time.Since(start) / n)
	start = time.Now()
	BenchmarkReadFromEscapey(&r, n)
	fmt.Println(time.Since(start) / n)
	start = time.Now()
	BenchmarkReadFrom(&r, n)
	fmt.Println(time.Since(start) / n)
}

Then the output of gc flag -m:

$ go build -gcflags '-m' test.go
# command-line-arguments
./test.go:16:6: can inline escapes
./test.go:49:9: inlining call to escapes
./test.go:93:13: inlining call to fmt.Println
./test.go:96:13: inlining call to fmt.Println
./test.go:99:13: inlining call to fmt.Println
./test.go:16:14: leaking param: p
./test.go:30:7: (*ExampleReader).Read r does not escape
./test.go:30:30: (*ExampleReader).Read dst does not escape
./test.go:36:15: readFrom r does not escape
./test.go:37:13: readFrom make([]byte, 4096) does not escape
./test.go:46:22: readFromEscapey r does not escape
./test.go:47:13: make([]byte, 4096) escapes to heap
./test.go:58:24: leaking param: r
./test.go:59:13: make([]byte, 4096) escapes to heap
./test.go:68:24: BenchmarkReadFrom r does not escape
./test.go:73:31: BenchmarkReadFromEscapey r does not escape
./test.go:78:33: leaking param: r
./test.go:85:2: moved to heap: r
./test.go:86:15: ([]byte)("Hello") escapes to heap
./test.go:92:29: &r escapes to heap
./test.go:93:32: time.Since(start) / n escapes to heap
./test.go:93:13: main []interface {} literal does not escape
./test.go:93:13: io.Writer(os.Stdout) escapes to heap
./test.go:96:32: time.Since(start) / n escapes to heap
./test.go:96:13: main []interface {} literal does not escape
./test.go:96:13: io.Writer(os.Stdout) escapes to heap
./test.go:99:32: time.Since(start) / n escapes to heap
./test.go:99:13: main []interface {} literal does not escape
./test.go:99:13: io.Writer(os.Stdout) escapes to heap
<autogenerated>:1: (*File).close .this does not escape

The important note here is that the slice in readFrom does not escape, so is stack allocated, and the slice does escape in both readFromEscapey and readFromInterface.

The result of running these benchmarks is:

1.073µs // readFromInterface
1.078µs // readFromEscapey
116ns   // readFrom

As you can see, the interface performs almost identically to the direct call when the slice escapes, so (I think) all the overhead is coming from the heap allocation, not the interface. (Which is amazing!)

Relative, the time difference here is quite large, >9x, however if the work being done was greater, or involved IO, this could end up being quite a small difference. But, there's a lot of interface use without IO where the difference could be significant.

The other, probably more important measure, is the difference to garbage collection impact that having less things allocated on the heap would make. I think that this could make a big difference. I'm not sure what the best method to measure this is though (without creating a test implementation, which I'm willing to attempt if this proposal doesn't get quickly rejected)!

Quickly back to the intro

The primary goal of this proposal is to reduce the number of heap allocations due to this by optionally allowing a functions type signature to contain information about escape analysis.

I've got a couple of ideas, the first one being the key one and the second being a nice feature that could come for free with the first.

Idea 1 - Annotating types with ~ if they won't escape to the heap

The syntax of this proposal may not be ideal. I had the idea first and this is the nicest syntax that I could come up with, any suggestions would be appreciated!

The basic idea, is that you could define a method as:

func (...) Example(x *~Type) { ... }

The ~ in front of the type of x is a way of indicating that pointer x can point to the stack, it's a promise that it won't do anything to x that would require it to be on the heap, for example setting a global variable to it, or setting a struct value to it - anything that would make the reference live longer than the method duration.

If the Example method were defined this way and its implementation did anything which caused escape analysis to determine that x must move to the heap, it would cause a compile time error.

For example, this interface:

interface ExampleInterface {
	Example(*~Type)
}

When using the Example method from this interface, escape analysis would know that calling Example won't do anything that would cause it's argument to need to be moved to the heap.

In the first program I gave, a new interface could be defined, for example:

interface ReaderStack { // That's probably a bad name for it, but the name is a minor detail.
	Read([]~byte) (int, error) // Since it's a slice of ~byte, it means that the bytes in the slice can be on the stack.
}

Then, if the Read method of ExampleReader took []~byte instead of []byte, it would satisfy this interface.

Following this, when the interface is used, the byte slice wouldn't need to be moved to the heap, leading to the better performance and fewer heap allocations as seen in the tests.

If this were used throughout a large project, a lot less would perhaps end up on the heap - again, I'm not sure how much of a benefit this would be but I'd be super interested to find out as I think it could be significant.

To maximise compatibility, a method with the signature Example(*~Type) would also implement the Example(*Type) interface, and a value of *Type could be passed as a value of *~Type - since a heap allocated pointer will still work with code which doesn't cause it to escape. But importantly, a value of *Type cannot be passed as a value of *~Type and a method Example(*Type) cannot implement the Example(*~Type) interface.

This means that implementations of existing interfaces can use ~ notation to give the advantages when such an interface is used, but can also satisfy the old interface.

Idea 2 - Explicitly defining a variable to be of type ~Type.

Idea 1 already brings rules about using types of type ~Type - when a reference is passed as a function argument.

Therefore, it might make sense to allow types to be explicitly declared as on the stack, for example:

var x ~BigStruct = ...

Then, doing anything that would cause x to escape would be a compile time error. This effectively allows a programmer to guarantee that a variable does not become heap allocated if they don't want it to.

This could also work with :=, perhaps with ~:= to prefix the type with a ~ and allocate it on the stack - I'm not convinced by that syntax yet.

More discussion and an existing issue

The idea of checking whether something is stack allocated was brought up in this proposal #34798 for vet so there is a demand for it.

The same issue does apply though; this means that a program being correct is determined by the compilers escape analysis, which would mean that for it to be well defined whether or not a variable escapes would also have to be defined within the language specification. A big step. The spec could be designed in a way which gives compilers a bit of freedom, since escape analysis could be better than what's defined in the spec, allowing for more optimisation, but still error when the specs escape analysis is not met and ~ is used. Plus, a compiler could choose to ignore the ~ notation entirely and just treat interfaces as always escaping and all valid code would still work (a tool to check the specs escape analysis would be required for the compiler to properly error though).

In addition, all changes to the escape analysis would be backwards compatible since if it didn't escape before then it won't now. So improvements can be made without breaking code.

A nice side effect to this, is that if you pass a reference to something to a function with a ~Type, then you know the reference cannot still be around after that function call, meaning that you know it's safe to recycle something - without something unintentionally still having a reference to it.

Proposal template

Would you consider yourself a novice, intermediate, or experienced Go programmer?

I would consider myself to be fairly experienced. I have been using go on an at least weekly basis for about 3 years and have worked on a project in Go which is running at a small scale - but my production Go experience is very minimal.

What other languages do you have experience with?

JavaScript (a lot), Python (a lot), Haskell (a bit less), TypeScript (a little less), C (a fair bit), Scala (a fair bit), Rust (a little), Java (a little) + various languages I've played around with. (That was not in order of experience, not preference!)

Would this change make Go easier or harder to learn, and why?

This idea would add a feature to Go which requires quite a lot of understanding. It would be possible to use Go as it currently is, since it's backwards compatible - and any ~ interfaces work with non-~ types.

The ability to write code without having to think about memory is wonderful. This feature doesn't remove that, since you can happily write a program without having to worry at all about it! You can use libraries that use it to get a free performance boost, but you don't have to define functions that use it yourself unless you are optimising or implementing an interface which uses a ~Type.

So the initial learning experience wouldn't be much different as you don't have to learn the feature to use the language but it is something extra to learn though.

Has this idea, or one like it, been proposed before?

Not this feature (as far as I'm aware), however a related idea has been discussed. See above.

Who does this proposal help, and why?

This proposal helps programmers who are trying to reduce heap allocations - by giving them the freedom to use interfaces and callbacks without forcing heap allocations and a tool to enforce stack allocation. It also benefits anyone who uses APIs with these types in the signatures since less will need to be moved to the heap.

Is this change backward compatible?

Yep!

Show example code before and after the change.

Before:

//ExampleReader copies as much of Data that it can to the dst on a call to Read.
type ExampleReader struct {
	Data []byte
}

//Read simply copies r.Data to dst
func (r *ExampleReader) Read(dst []byte) (int, error) {
	return copy(dst, r.Data), nil
}

After:

//ExampleReader copies as much of Data that it can to the dst on a call to Read.
type ExampleReader struct {
	Data []byte
}

//Read simply copies r.Data to dst
func (r *ExampleReader) Read(dst []~byte) (int, error) {
	return copy(dst, r.Data), nil
}

Before:

interface Mutator {
	DoStuff(*BigStruct) error
}

After:

interface Mutator {
	DoStuff(*~BigStruct) error
}

If all of your Mutator implentations don't let the struct escape to the heap, which they probably shouldn't, then making a call to this interface would no longer force the struct to escape.

Before:

var ThisShouldBeOnTheStack BigStruct = ...

After:

var ThisShouldBeOnTheStack ~BigStruct = ...
// Now the compiler will error, at compile time, if it gets moved to the heap.

What is the cost of this proposal?

  • This would require a change to gopls, since it would need to helpfully point out where an escape analysis rule is broken.
  • It would also require the compiler to check that an annotated variable does not get moved to the heap, but the compiler already does escape analysis, so I don't think this would be very expensive? (I may be incorrect.)
  • This does not require anything to happen at run time, but it does reduce the heap usage, saving run time time!

Can you describe a possible implementation?

(Perhaps a naive description)

The compiler already performs escape analysis to determine what escapes to the heap, a check would simply have to be added after the analysis to ensure that a variable marked as being on the stack is not moved to the heap.

The escape analysis would also have to be modified to understand the meaning of a ~ (stack annotation) in the type signature.

Prototype

I do not have a prototype, however I would be willing to attempt it!

How would the language spec change?

The language spec would have to include the concept of escaping. Which is a loss of abstraction, however an implementation could change the memory management entirely since ~ does not change the functionality of the code.

Orthogonality: how does this change interact or overlap with existing features?

This is a change to the type system. It doesn't change how types are represented, or how the runtime works. It is just an annotation to the type to hint to escape analysis and get the compiler to enforce things for us.

The annotation is similar to that of * in terms of appearance and would be placed on to the type of the thing being referenced.

Is the goal of this change a performance improvement?

Yes

What quantifiable improvement should we expect?

I'm not sure. There is a definite improvement simply by not having to allocate on the heap! See the test in the introduction for numbers. But I'm really not sure what the performance difference would be like in a large scale project; I'm not sure how much of a difference to garbage collection times it would make. But I think, if used extensively, or maybe just a little on things that happen a lot, it could make a large difference.

How would we measure it?

By making a test implementation and trying it on something big!

Does this affect error handling?

No

Is this about generics?

No

@gopherbot gopherbot added this to the Proposal milestone Feb 20, 2020
@ianlancetaylor ianlancetaylor added v2 A language change or incompatible library change LanguageChange labels Feb 21, 2020
@ianlancetaylor
Copy link
Contributor

It seems to me that this language change has no effect on how the program runs. It only affects efficiency.

The language currently permits people to write code without worrying about whether values are allocated on the stack or the heap. Adding this feature to the language will lead people to start thinking about it. That seems like a significant increase in complexity. As you say, the feature requires quite a lot of understanding. One can say that nobody has to use the feature, which is true, but in practice people will use it, and other people will need to read, understand, and modify their code, so in practice everybody will need to understand it.

In general in Go we prefer to keep the language simple and to put complexity into the implementation. The runtime has do quite a lot of work to present the relatively simple interface of goroutines and channels. And, for that matter, to avoid the possibility of dangling pointers while keeping memory usage and execution time fairly good.

In my opinion before considering adding this kind of feature to the language we should put a lot of effort into trying to implement it in the compiler and runtime. The key feature here is whether the compiler can prove that a call to an interface method does not escape. I can imagine using whole program optimization for this, or perhaps selecting between alternative code sequences based on information supplied by the implementation. Perhaps those ideas will turn out to be unworkable, but I think we should be putting our effort there rather than into a language change that increases complexity.

@JOT85
Copy link
Author

JOT85 commented Feb 21, 2020

That's pretty much the response I was expecting - in terms of keeping the language simple.

For the people already thinking about where their values are allocated, this would make it a lot easier, but it would also force more people to think about it when they wouldn't have needed to before, making things more complex. There would be a benefit, but at a cost which is high for Go. If, as you say, this could be optimised without needing to add complexity to the language, that would probably be an even better solution.

I'd like to spend some time attempting to make this optimisation - but that's no promise I'll be able to make it work or that my solution will be good enough.

Thinking about alternative code sequences, would you compile 2 versions (one with heap allocation and one with stack allocation) and then have the compiler choose which to use based on what it knows about the value it's passing as an interface? (Obviously only including 1 version if the other isn't used). This could increase binary sizes quite a bit if many combinations were created, although you could set a limit to mitigate the impact of this.
Or, would it be better to somehow choose a different path at runtime based on some metadata stored with the interface?
Or something else?

@mvdan
Copy link
Member

mvdan commented Feb 21, 2020

To add to what @ianlancetaylor said, I think there is still lots that could be improved in the compiler. We should exhaust our options there before making the language spec more complex.

@ianlancetaylor
Copy link
Contributor

Yes, I was imagining two sequences with choice based on metadata stored with the type descriptor in the interface. It's just a thought, I don't know what the code size explosion would be.

@JOT85
Copy link
Author

JOT85 commented Feb 21, 2020

@mvdan I agree now. It makes sense to solve the problem without changing the language if possible.

@ianlancetaylor That solution could work well. It wouldn't work for function values being passed though. Is there any reason that the decision can't (or shouldn't) be made at compile time?

@ianlancetaylor
Copy link
Contributor

Of course if the decision can be made at compile time, then we should do that. But it's not hard to imagine cases in which the compiler cannot possibly determine whether a value passed to a method of an interface escapes. Or consider passing a slice to a func([]byte) that is constructed using reflect.MakeFunc.

@JOT85
Copy link
Author

JOT85 commented Feb 21, 2020

If the compiler can use the existing escape analysis on the implementations of the methods to decide whether or not its arguments escape, then could that not be used to decide which version of the function to use? Although that method would require entirely different versions of each function rather than branches within 1 version.
That method would work with methods/functions known at compile time. (With a few complications involving recursion.)

How common are functions constructed using reflect.MakeFunc?

@ianlancetaylor
Copy link
Contributor

I'm not sure I understand what you are saying. The point I'm trying to make is that when calling a method of an interface type the compiler cannot always know exactly which method is being called. Therefore it has no escape analysis information at all.

Functions constructed using reflect.MakeFunc are very rare, but they still have to work correctly. It's just another example where the compiler cannot know, when calling a function, whether the value escapes or not.

@JOT85
Copy link
Author

JOT85 commented Feb 21, 2020

Consider the example from the original proposal:

func readFrom(r io.Reader) {
	buf := make([]byte, 4096)
	n, _ := r.Read(buf)
	b := buf[n-1]
	if b != 'o' {
		panic("o!")
	}
}

The compiler could build 2 versions of this. One where buf is allocated on the heap and one where it is allocated on the stack.
Then, when readFrom is called, for example as:

r := ExampleReader{
	Data: []byte("Hello"),
}
readFrom(r)

The compiler could, at that point, check the analysis for the Read method of ExampleReader and, if the argument to Read doesn't escape, call the version of readFrom where buf is allocated on the stack.
If the Read method did have a leaky argument, then the version which allocated buf on the heap would be used.
(Obviously it would only build the versions of readFrom which it needs though and probably limit the number of functions compiled if there were many permutations.)

This method would require having the escape analysis for a function at compile time, so wouldn't work with dynamically generated functions or ones which aren't compiled with the program, so perhaps a more general method using interfaces would be better. Plus, this method requires multiple versions of entire functions as opposed to just branches within them, so binary sizes would possibly be larger. But it has no runtime cost.

In the method mentioned above, there would be functions that wouldn't get optimised since they aren't known at compile time, but they would still function as they do today.
If you were to take it to the extreme, you could do both methods, and choose the interface one when the compiler is unsure but that might be too complicated and make binary sizes even larger.

@ianlancetaylor
Copy link
Contributor

For the reasons discussed above, and based on emoji voting on the original proposal, this is a likely decline. Leaving open for four weeks for final comments.

Ideas for better compiler optimizations can be discussed on other issues.

@ianlancetaylor
Copy link
Contributor

No further comments. Closing.

@golang golang locked and limited conversation to collaborators Mar 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants