cmd/compile: minimize stack occupation before function calls #36067

CAFxX · 2019-12-10T11:57:36Z

In go 1.13 variables on the stack are kept ~~alive~~ allocated until the function returns. This is normally harmless, but can lead to surprising results: e.g. in the following example with a buf size of 1<<15 the process crashes because we reach the maximum stack size, but with a size of 1<<16 or more the process does not crash (as the slice is not allocated on the stack).

package main 

func main() {
    rec(0)
}

func rec(n int) (m int) {
	buf := make([]byte, 1<<15)
	foo(buf)
	// buf is dead now
	if n >= 1<<16 {
		return n
	}
    return rec(n+1)
}

//go:noinline
func foo([]byte) {}

This is obviously a contrived example, but it would help if the set of ~~live~~ allocated variables were pruned as stack variables die, or if that is too expensive, at least immediately before performing a function call (especially if recursive).

Such an optimization would not only help in case of recursive functions that allocate a lot on the stack, but would rather help minimize time spent in morestack (#18138) by reducing stack growth on any call tree that does not contain only live variables (as a further example, in API servers it is somewhat common to have handlers wrapped in middleware, each middleware being often used to execute some logic before and/or after the wrapped handler is executed. In the case of logic executed only or mostly before the wrapped handler, it is likely that most variables in the stack frame of the middleware are dead during the call to the wrapped handler)

A more compact stack, besides requiring less time to be spent in morestack, would also help dcache efficiency.

mvdan · 2019-12-10T13:45:33Z

Would this have any advantages beyond reaching the stack size limit slower, such as lower memory usage, or any form of performance gain?

ianlancetaylor · 2019-12-10T14:53:39Z

This is not a matter of liveness as we normally use that term. In your example, buf is dead at the point of the recursive call, and the compiler knows that. The problem is not that buf is live, it's that it continues to occupy space on the stack. Changing that would be difficult, would lead to less efficient code for almost all programs, and would help essentially no real programs. It doesn't seem like a good tradeoff.

randall77 · 2019-12-10T15:14:48Z

I think there is some room to adjust the heuristics for when we stack allocate. Right now, we stack allocate any variable <= 10MB. That's regardless of how many such variables live in a frame, whether the frame's function might be recursive, etc.
i don't think any such heuristic adjustment would help this example, though.

randall77 · 2019-12-10T15:17:03Z

Related: #27447.

thanm · 2019-12-10T15:25:16Z

A somewhat related problem comes up in the context of inlining-- say you have a program fragment like

func foo() int {
	largebuf := make([]byte, 8192)
	//...
	return expr
}

func bar() {
	//...
	if somecondition {
		foo()
	}
	baz()
}

In the Go compiler, if 'foo' is inlined into 'bar', the local variables in 'foo' get promoted into local variables in 'bar' (this is a fairly standard approach, but it means that inlining tends to drive up stack usage overall).

I worked on a C++ compiler back in the late 90's where at one point the inliner would insert alloca() and dealloca() operations around the body of an inlined function, e.g.

func bar() {
	//...
	if somecondition {
	        stackpointer -= [size of foo's stack frame]
		...body of foo...
	        stackpointer += [size of foo's stack frame]
	}
	baz()
}

This forced the use of a frame pointer and (as I understand it) made unwind-gen more complicated, but the back end had to support alloca() ops already, so it wasn't too bad.

These days I'm not aware of anything like this in either clang or GCC.

ianlancetaylor · 2019-12-10T15:37:43Z

I suppose we could have heuristics specifically around recursive calls: if a large variable is dead at the point of a recursive call, don't stack allocate it. But I don't know how many real programs that would help.

CAFxX · 2020-01-20T08:04:54Z

In your example, buf is dead at the point of the recursive call, and the compiler knows that. The problem is not that buf is live, it's that it continues to occupy space on the stack. Changing that would be difficult, would lead to less efficient code for almost all programs, and would help essentially no real programs. It doesn't seem like a good tradeoff.

I haven't given this a lot of thought, but what I was thinking was something along the lines of:

at compile-time
- sort stack allocations so that slots of variables that die first are at the top of the frame;
- when emitting a callsite (ideally, that we can not prove to be non-recursive) and only if from the previous callsite some variables died, adjust the stack pointer before the call instruction so that the space used by the slots of dead variables can be reused by the stack frame of the callee

At runtime this would entail an additional SP change per call, so the overhead wouldn't be 0 - but it should not be significant either (no copying of stack slots would be required). This wouldn't solve all problems, sure, as it would be a best-effort optimization, and in any case it wouldn't apply to the caller arguments (otherwise stack traces would be incomplete), but it would help alleviate heavy stack usage.

edit: I added another example in the issue description about other cases in which this would help

toothrot added this to the Backlog milestone Dec 10, 2019

toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 10, 2019

CAFxX changed the title ~~cmd/compile: minimize stack live set before a recursive call~~ cmd/compile: minimize stack live set before a function call Feb 18, 2022

CAFxX changed the title ~~cmd/compile: minimize stack live set before a function call~~ cmd/compile: minimize stack occupation before function calls Feb 18, 2022

CAFxX mentioned this issue Apr 29, 2022

runtime: new goroutines can spend excessive time in morestack #18138

Open

gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cmd/compile: minimize stack occupation before function calls #36067

cmd/compile: minimize stack occupation before function calls #36067

CAFxX commented Dec 10, 2019 •

edited

mvdan commented Dec 10, 2019

ianlancetaylor commented Dec 10, 2019

randall77 commented Dec 10, 2019

randall77 commented Dec 10, 2019

thanm commented Dec 10, 2019

ianlancetaylor commented Dec 10, 2019

CAFxX commented Jan 20, 2020 •

edited

cmd/compile: minimize stack occupation before function calls #36067

cmd/compile: minimize stack occupation before function calls #36067

Comments

CAFxX commented Dec 10, 2019 • edited

mvdan commented Dec 10, 2019

ianlancetaylor commented Dec 10, 2019

randall77 commented Dec 10, 2019

randall77 commented Dec 10, 2019

thanm commented Dec 10, 2019

ianlancetaylor commented Dec 10, 2019

CAFxX commented Jan 20, 2020 • edited

CAFxX commented Dec 10, 2019 •

edited

CAFxX commented Jan 20, 2020 •

edited