Skip to content

slices: add Chunk function to divide []T into [][]T chunks #53987

Closed
@mdlayher

Description

@mdlayher

A problem I've run into a fair amount is dealing with APIs which only accept a maximum number of inputs at once, though I may have more than that number of inputs that I would like to ultimately process.

For example, Amazon S3 can only delete up to 1000 objects in a single DeleteObjects API request (https://docs.aws.amazon.com/AmazonS3/latest/API/API_DeleteObjects.html). I've run into similar issues in the past when injecting a large number of routes into the Linux kernel via route netlink, or when modifying a large number of WireGuard peers at once.

To deal with this situation in a generic way, I've come up with the following:

package slices

// Chunk batches []T into [][]T in groups of size. The final chunk of []T will be
// smaller than size if the input slice cannot be chunked evenly. It does not
// make any copies of slice elements.
//
// As an example, take a slice of 5 integers and create chunks of 2 integers
// each (the final value creates a short chunk):
//   slices.Chunk([]int{1, 2, 3, 4, 5}, 2) = [][]int{{1, 2}, {3, 4}, {5}}
func Chunk[T any](slice []T, size int) [][]T {
	var chunks [][]T
	for i := 0; i < len(slice); {
		// Clamp the last chunk to the slice bound as necessary.
		end := size
		if l := len(slice[i:]); l < size {
			end = l
		}

		// Set the capacity of each chunk so that appending to a chunk does not
		// modify the original slice.
		chunks = append(chunks, slice[i:i+end:i+end])
		i += end
	}

	return chunks
}

Which is then used as follows:

	objs, err := c.allObjects(ctx)
	if err != nil {
		return nil, err
	}

	// Begin deleting the objects concurrently in batches of 1000 (the
	// maximum limit size supported by S3).
	const limit = 1000

	eg, ctx := errgroup.WithContext(ctx)
	eg.SetLimit(10)

	for _, chunk := range slices.Chunk(objs, limit) {
		chunk := chunk
		eg.Go(func() error { /* deletion logic */ })
	}

If this proposal is accepted, I'd be happy to send a CL for the above. Thanks for your time.

Edit 1: updated second parameter name to size int as inspired by Rust's chunks method: https://doc.rust-lang.org/std/primitive.slice.html#method.chunks.

Edit 2: set capacity for each chunk per next comment.

Activity

added this to the Proposal milestone on Jul 21, 2022
JeremyLoy

JeremyLoy commented on Jul 22, 2022

@JeremyLoy

A small correction to your given implementation:

https://github.com/golang/go/wiki/SliceTricks#batching-with-minimal-allocation

it’s important to use the 3 arg form when chunking, otherwise appending to a chunk could overwrite unintentionally

https://go.dev/play/p/Zp0Dh5cB1rB

earthboundkid

earthboundkid commented on Jul 23, 2022

@earthboundkid
Contributor

I just needed this functionality last week and almost opened an issue. I think it would be useful.

icholy

icholy commented on Jul 25, 2022

@icholy

I think Batch is a better name.

DeedleFake

DeedleFake commented on Jul 25, 2022

@DeedleFake

I wonder if this might be better to wait on a general iterator API for. If one of the main intentions is to use it with range, this is a fairly inefficient way to do it, but its existence will encourage people to use it anyways. Map() and Filter() were dropped from slices for the same reason. This could be fairly easily done manually with a for loop in an efficient way, however, if there was some easier way to clamp that last index:

for start := 0; start < len(slice); start += chunkSize {
  end := min(start + chunkSize, len(slice))
  chunk := slice[start:end:end]

  // ...
}

Is there a common use for this that doesn't involve immediately proceeding to loop over it?

icholy

icholy commented on Jul 31, 2022

@icholy

Ive needed this function for 2 out of the last 3 projects I've worked on. An iterator based approach would not have had any benefit in either of these use cases.

DeedleFake

DeedleFake commented on Aug 1, 2022

@DeedleFake

But would an iterator-based approach have been worse, @icholy? The inner slices don't need to be reallocated in either case, and the outer slice does need to be newly allocated in either case, so I don't think that an iterator-based variant would essentially ever be worse. If there's some iters.ToSlice() function, then it would only take one extra function call to get a slice if that would be easier to work with for a specific use-case.

moved this to Incoming in Proposalson Aug 10, 2022
kf6nux

kf6nux commented on Aug 23, 2022

@kf6nux

You probably want to be explicit about your panic condition and panic in one additional case where size == 0

if size < 1 {
	panic("chunk size cannot be less than 1")
}

Here's an alternate implementation that some may like more

func Chunk[T any](slice []T, size int) (chunks [][]T) {
	if size < 1 {
		panic("chunk size cannot be less than 1")
	}
	for i := 0; ; i++ {
		next := i * size
		if len(slice[next:]) > size {
			end := next + size
			chunks = append(chunks, slice[next:end:end])
		} else {
			chunks = append(chunks, slice[i*size:])
			return
		}
	}
}
dcormier

dcormier commented on Jan 6, 2023

@dcormier
Contributor

Ah, nice to see a proposal for this. I've just been using my personal implementation:

func Chunk[E any](values []E, size int) [][]E {
	if size <= 0 {
		panic("size must be > 0")
	}

	var chunks [][]E
	for remaining := len(values); remaining > 0; remaining = len(values) {
		if remaining < size {
			size = remaining
		}

		chunks = append(chunks, values[:size:size])
		values = values[size:]
	}

	return chunks
}
changed the title [-]proposal: x/exp/slices: add Chunk function to divide []T into [][]T chunks[/-] [+]proposal: slices: add Chunk function to divide []T into [][]T chunks[/+] on Apr 7, 2023

52 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @josharian@rsc@dcormier@earthboundkid@DeedleFake

        Issue actions

          slices: add Chunk function to divide []T into [][]T chunks · Issue #53987 · golang/go