Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: slices: add Zip function #65062

Open
piotrkowalczuk opened this issue Jan 11, 2024 · 5 comments
Open

proposal: slices: add Zip function #65062

piotrkowalczuk opened this issue Jan 11, 2024 · 5 comments
Labels
Milestone

Comments

@piotrkowalczuk
Copy link

piotrkowalczuk commented Jan 11, 2024

Proposal Details

Proposal Details

I propose to add the Zip function to the slices package. The zip function is a critical part of the Python toolkit in the context of data science and machine learning.

Zip can be considered a supper set of transpose but is less restrictive regarding input shape. Silently adjusting output on a best-effort basis.

Example

Implementation

// Zip returns a slice, where the i-th slice contains the i-th element from each argument.
// If all the slices are the same length, the operation is equivalent to a matrix transpose.
// Excessive elements are discarded.
// Length of the returned slice equals the length of the shortest slice passed as an argument.
// Calling Zip again on the result will reverse the process. However, the result may be different from the original argument.
func Zip[T ~[]E, E any](unzipped ...T) [][]E {
	if unzipped == nil {
		return nil
	}

	if len(unzipped) == 0 {
		return [][]E{}
	}

	cols := len(unzipped)
	rows := len(unzipped[0])
	for _, slice := range unzipped {
		if len(slice) < rows {
			rows = len(slice)
		}
	}

	zipped := make([][]E, rows)
	arr := make([]E, rows*len(unzipped))
	for i := 0; i < rows; i++ {
		row := arr[:cols:cols]
		arr = arr[cols:]
		for j := 0; j < cols; j++ {
			row[j] = unzipped[j][i]
		}
		zipped[i] = row
	}

	return zipped
}

Edit: The allocation pattern changed as suggested by @adonovan

Test

func TestZip_matrix(t *testing.T) {
	cases := map[string]struct {
		given     [][]int
		exp, exp2 [][]int
	}{
		"nil": {
			given: nil,
			exp:   nil,
			exp2:  nil,
		},
		"empty": {
			given: [][]int{},
			exp:   [][]int{},
			exp2:  [][]int{},
		},
		"one": {
			given: [][]int{
				{1, 2, 3},
			},
			exp: [][]int{
				{1},
				{2},
				{3},
			},
			exp2: [][]int{
				{1, 2, 3},
			},
		},
		"transpose": {
			given: [][]int{
				{1, 3, 5},
				{2, 4, 6},
			},
			exp: [][]int{
				{1, 2},
				{3, 4},
				{5, 6},
			},
			exp2: [][]int{
				{1, 3, 5},
				{2, 4, 6},
			},
		},
		"uneven": {
			given: [][]int{
				{1, 2, 3},
				{4, 5},
			},
			exp: [][]int{
				{1, 4},
				{2, 5},
			},
			exp2: [][]int{
				{1, 2},
				{4, 5},
			},
		},
	}
	for hint, c := range cases {
		t.Run(hint, func(t *testing.T) {
			got := sliceutil.Zip[[]int](c.given...)
			if cmp.Diff(c.exp, got) != "" {
				t.Fatalf("unexpected result, diff: %s", cmp.Diff(c.exp, got))
			}

			got = sliceutil.Zip[[]int](got...)
			if cmp.Diff(c.exp2, got) != "" {
				t.Fatalf("unexpected second result, diff: %s", cmp.Diff(c.exp2, got))
			}
		})
	}
}

func TestZip_tensor(t *testing.T) {
	cases := map[string]struct {
		given     [][][]int
		exp, exp2 [][][]int
	}{
		"nil": {
			given: nil,
			exp:   nil,
			exp2:  nil,
		},
		"empty": {
			given: [][][]int{},
			exp:   [][][]int{},
			exp2:  [][][]int{},
		},
		"uneven": {
			given: [][][]int{
				{{1, 2}, {2, 3}, {4, 5}},
				{{6, 7}, {8, 9}},
			},
			exp: [][][]int{
				{{1, 2}, {6, 7}},
				{{2, 3}, {8, 9}},
			},
			exp2: [][][]int{
				{{1, 2}, {2, 3}},
				{{6, 7}, {8, 9}},
			},
		},
	}
	for hint, c := range cases {
		t.Run(hint, func(t *testing.T) {
			got := sliceutil.Zip[[][]int](c.given...)
			if cmp.Diff(c.exp, got) != "" {
				t.Fatalf("unexpected result, diff: %s", cmp.Diff(c.exp, got))
			}

			got = sliceutil.Zip[[][]int](got...)
			if cmp.Diff(c.exp2, got) != "" {
				t.Fatalf("unexpected second result, diff: %s", cmp.Diff(c.exp2, got))
			}
		})
	}
}

Benchmark

func BenchmarkZip(b *testing.B) {
	data := [][]int{
		{1, 3, 5},
		{2, 4, 6},
	}
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		_ = sliceutil.Zip[[]int](data...)
	}
}
@gopherbot gopherbot added this to the Proposal milestone Jan 11, 2024
@seankhliao
Copy link
Member

how useful is this when all slices have to be of the same type?

@piotrkowalczuk
Copy link
Author

how useful is this when all slices have to be of the same type?

@seankhliao The first thing that comes to my mind is machine learning and, to be more precise, tensor processing. In this context, data must be normalized.

Std lib lacks the capabilities for that field; maybe this function could fit better under some newly formed package.

@thediveo
Copy link

I've seen this coming up as part of the Go iterator activities. Seeing the Python implementation working on iterators I wonder if this would be better move to the Go iterator field?

@piotrkowalczuk
Copy link
Author

I've seen this coming up as part of the Go iterator activities. Seeing the Python implementation working on iterators I wonder if this would be better move to the Go iterator field?

Interesting point @thediveo. I don't know the overall direction or whether the community want iter.Seq to become the way to go for processing collections in general. I would enjoy the iter package if it would be an opt-in bridge between various types. At the same time, having access to a similar API layer in both maps and slices packages. To keep simple things simple.

@adonovan
Copy link
Member

adonovan commented Jan 11, 2024

I agree that this function should wait till the question of generalized sequences and iterators is settled.

BTW: it might be slightly faster (and better locality) to make a single allocation a slice of m*n elements upfront, and then break off pieces (setting the cap each time) for the individual rows of the result (like this).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

5 participants