Description
A problem I've run into a fair amount is dealing with APIs which only accept a maximum number of inputs at once, though I may have more than that number of inputs that I would like to ultimately process.
For example, Amazon S3 can only delete up to 1000 objects in a single DeleteObjects
API request (https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html). I've run into similar issues in the past when injecting a large number of routes into the Linux kernel via route netlink, or when modifying a large number of WireGuard peers at once.
To deal with this situation in a generic way, I've come up with the following:
package slices
// Chunk batches []T into [][]T in groups of size. The final chunk of []T will be
// smaller than size if the input slice cannot be chunked evenly. It does not
// make any copies of slice elements.
//
// As an example, take a slice of 5 integers and create chunks of 2 integers
// each (the final value creates a short chunk):
// slices.Chunk([]int{1, 2, 3, 4, 5}, 2) = [][]int{{1, 2}, {3, 4}, {5}}
func Chunk[T any](slice []T, size int) [][]T {
var chunks [][]T
for i := 0; i < len(slice); {
// Clamp the last chunk to the slice bound as necessary.
end := size
if l := len(slice[i:]); l < size {
end = l
}
// Set the capacity of each chunk so that appending to a chunk does not
// modify the original slice.
chunks = append(chunks, slice[i:i+end:i+end])
i += end
}
return chunks
}
Which is then used as follows:
objs, err := c.allObjects(ctx)
if err != nil {
return nil, err
}
// Begin deleting the objects concurrently in batches of 1000 (the
// maximum limit size supported by S3).
const limit = 1000
eg, ctx := errgroup.WithContext(ctx)
eg.SetLimit(10)
for _, chunk := range slices.Chunk(objs, limit) {
chunk := chunk
eg.Go(func() error { /* deletion logic */ })
}
If this proposal is accepted, I'd be happy to send a CL for the above. Thanks for your time.
Edit 1: updated second parameter name to size int
as inspired by Rust's chunks
method: https://doc.rust-lang.org/std/primitive.slice.html#method.chunks.
Edit 2: set capacity for each chunk per next comment.
Activity
JeremyLoy commentedon Jul 22, 2022
A small correction to your given implementation:
https://github.com/golang/go/wiki/SliceTricks#batching-with-minimal-allocation
it’s important to use the 3 arg form when chunking, otherwise appending to a chunk could overwrite unintentionally
https://go.dev/play/p/Zp0Dh5cB1rB
sten4eg commentedon Jul 22, 2022
https://go.dev/play/p/EP-wf5dmr3V
earthboundkid commentedon Jul 23, 2022
I just needed this functionality last week and almost opened an issue. I think it would be useful.
icholy commentedon Jul 25, 2022
I think
Batch
is a better name.DeedleFake commentedon Jul 25, 2022
I wonder if this might be better to wait on a general iterator API for. If one of the main intentions is to use it with
range
, this is a fairly inefficient way to do it, but its existence will encourage people to use it anyways.Map()
andFilter()
were dropped fromslices
for the same reason. This could be fairly easily done manually with afor
loop in an efficient way, however, if there was some easier way to clamp that last index:Is there a common use for this that doesn't involve immediately proceeding to loop over it?
icholy commentedon Jul 31, 2022
Ive needed this function for 2 out of the last 3 projects I've worked on. An iterator based approach would not have had any benefit in either of these use cases.
DeedleFake commentedon Aug 1, 2022
But would an iterator-based approach have been worse, @icholy? The inner slices don't need to be reallocated in either case, and the outer slice does need to be newly allocated in either case, so I don't think that an iterator-based variant would essentially ever be worse. If there's some
iters.ToSlice()
function, then it would only take one extra function call to get a slice if that would be easier to work with for a specific use-case.kf6nux commentedon Aug 23, 2022
You probably want to be explicit about your panic condition and panic in one additional case where size == 0
Here's an alternate implementation that some may like more
dcormier commentedon Jan 6, 2023
Ah, nice to see a proposal for this. I've just been using my personal implementation:
[-]proposal: x/exp/slices: add Chunk function to divide []T into [][]T chunks[/-][+]proposal: slices: add Chunk function to divide []T into [][]T chunks[/+]52 remaining items