cmd/compile,runtime: speed up heap allocations in loops #60879
Labels
compiler/runtime
Issues related to the Go compiler and/or runtime.
FeatureRequest
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Performance
Milestone
There is currently a large performance gap between allocating N elements of type T, and allocating a slice of N elements of type T (even in the case in which T does not contain pointers and the size of T is equal to the size class used for T).
Just to give an idea, the following benchmark1 shows the performance gap between the two, for varying number of elements N (old is N allocations, new is a single allocation of a slice of N elements):
To help close the gap, it may be worthwhile to investigate/consider performing batch allocations for heap allocations that occur inside loops. This is something that happens frequently in unmarshaling code and, more in general, in data processing code.
To clarify, I am not suggesting turning N allocations of type T into an allocation of a vector with N elements of type T, as that would prevent garbage collection of individual elements. Instead by "batch allocation" I mean performing the allocation of N individual elements of type T in a single call to the runtime. In this way each one of the N elements can still be individually garbage collected.
This could mean, for a loop with a trip count known at compile time, turning this:
into this:
and for loops with trip count known at runtime, turning this:
into:
and for loops with unknown trip count, turning this:
into:
where the runtime functions calls inserted above by the compiler would be equivalent to:
In a sense, this would entail having loop-scoped bump allocators to avoid entering the main heap allocator for each individual element.
Surely some heuristics would be needed to decide when to use batch allocation and when not to: for example we definitely would not want to use batch allocation for loops with very low trip counts. Nevertheless I think this would be a worthwhile optimization to consider.
An alternative to the above would be to specialize the heap allocator for {size class, has pointers} or otherwise add generic, fast per-P bump allocators (that would have the benefit of being effective for all code, not just in loops). These seem like more invasive changes to the runtime, though.
Other alternatives include the (over)use of
sync.Pool
but as was mentioned in the past almost any use ofsync.Pool
is an indication that the memory management subsystem is lacking in one way or the other.Footnotes
go version go1.20.5 darwin/amd64, Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz (turbo boost disabled) ↩
The text was updated successfully, but these errors were encountered: