New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: minimize stack occupation before function calls #36067
Comments
Would this have any advantages beyond reaching the stack size limit slower, such as lower memory usage, or any form of performance gain? |
This is not a matter of liveness as we normally use that term. In your example, buf is dead at the point of the recursive call, and the compiler knows that. The problem is not that buf is live, it's that it continues to occupy space on the stack. Changing that would be difficult, would lead to less efficient code for almost all programs, and would help essentially no real programs. It doesn't seem like a good tradeoff. |
I think there is some room to adjust the heuristics for when we stack allocate. Right now, we stack allocate any variable <= 10MB. That's regardless of how many such variables live in a frame, whether the frame's function might be recursive, etc. |
Related: #27447. |
A somewhat related problem comes up in the context of inlining-- say you have a program fragment like func foo() int { largebuf := make([]byte, 8192) //... return expr } func bar() { //... if somecondition { foo() } baz() } In the Go compiler, if 'foo' is inlined into 'bar', the local variables in 'foo' get promoted into local variables in 'bar' (this is a fairly standard approach, but it means that inlining tends to drive up stack usage overall). I worked on a C++ compiler back in the late 90's where at one point the inliner would insert alloca() and dealloca() operations around the body of an inlined function, e.g. func bar() { //... if somecondition { stackpointer -= [size of foo's stack frame] ...body of foo... stackpointer += [size of foo's stack frame] } baz() } This forced the use of a frame pointer and (as I understand it) made unwind-gen more complicated, but the back end had to support alloca() ops already, so it wasn't too bad. These days I'm not aware of anything like this in either clang or GCC. |
I suppose we could have heuristics specifically around recursive calls: if a large variable is dead at the point of a recursive call, don't stack allocate it. But I don't know how many real programs that would help. |
I haven't given this a lot of thought, but what I was thinking was something along the lines of:
At runtime this would entail an additional SP change per call, so the overhead wouldn't be 0 - but it should not be significant either (no copying of stack slots would be required). This wouldn't solve all problems, sure, as it would be a best-effort optimization, and in any case it wouldn't apply to the caller arguments (otherwise stack traces would be incomplete), but it would help alleviate heavy stack usage. edit: I added another example in the issue description about other cases in which this would help |
In go 1.13 variables on the stack are kept
aliveallocated until the function returns. This is normally harmless, but can lead to surprising results: e.g. in the following example with a buf size of 1<<15 the process crashes because we reach the maximum stack size, but with a size of 1<<16 or more the process does not crash (as the slice is not allocated on the stack).This is obviously a contrived example, but it would help if the set of
liveallocated variables were pruned as stack variables die, or if that is too expensive, at least immediately before performing a function call (especially if recursive).Such an optimization would not only help in case of recursive functions that allocate a lot on the stack, but would rather help minimize time spent in morestack (#18138) by reducing stack growth on any call tree that does not contain only live variables (as a further example, in API servers it is somewhat common to have handlers wrapped in middleware, each middleware being often used to execute some logic before and/or after the wrapped handler is executed. In the case of logic executed only or mostly before the wrapped handler, it is likely that most variables in the stack frame of the middleware are dead during the call to the wrapped handler)
A more compact stack, besides requiring less time to be spent in morestack, would also help dcache efficiency.
The text was updated successfully, but these errors were encountered: