Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: slices: function for count specific value on slice #62301

Open
dozheiny opened this issue Aug 26, 2023 · 17 comments
Open

proposal: slices: function for count specific value on slice #62301

dozheiny opened this issue Aug 26, 2023 · 17 comments
Labels
Milestone

Comments

@dozheiny
Copy link

The slices package has no functions for counting a specific value for now.
I think it's good the definition of function is like this:

func Counts[S []E, E comparable](s S, e E) int

An example of the Counts function is like this:

names := []string{"Alex", "Gopher", "Bob", "Alice", "Gopher"}
slices.Counts(names, "Gopher") // 2
slices.Counts(names, "Alex") // 1
slices.Counts(names, "James") // 0
@seankhliao
Copy link
Member

seankhliao commented Aug 26, 2023

how common is this operation that it needs to be in the standard library?
https://go.dev/doc/faq#x_in_std

this would just be reduce in #61898

@seankhliao seankhliao changed the title slices: function for count specific value on slice proposal: slices: function for count specific value on slice Aug 26, 2023
@gopherbot gopherbot added this to the Proposal milestone Aug 26, 2023
@seankhliao seankhliao added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 26, 2023
@jimmyfrasche
Copy link
Member

It's possible to define this in term of slices.Index: https://go.dev/play/p/H9HZqXlyI2e

@dozheiny
Copy link
Author

how common is this operation that it needs to be in the standard library? https://go.dev/doc/faq#x_in_std

this would just be reduce in #61898

I think it's widespread.
In multiple base codes, I want to count a specific value in slices and write a function that counts for me.

@dozheiny
Copy link
Author

It's possible to define this in term of slices.Index: https://go.dev/play/p/H9HZqXlyI2e

Also, It's possible to implement the CountsFunc function. https://go.dev/play/p/NunxDFQh8pZ?v=gotip

@jimmyfrasche
Copy link
Member

it's simpler to just define it directly, too: https://go.dev/play/p/6kCIzk-yTgl

how does this come up in your code bases? I've certainly needed to know the count of things before but I usually use a multiset as a histogram because when I need the count of one thing I usually need the count of several others too.

@dozheiny
Copy link
Author

it's simpler to just define it directly, too: https://go.dev/play/p/6kCIzk-yTgl

how does this come up in your code bases? I've certainly needed to know the count of things before but I usually use a multiset as a histogram because when I need the count of one thing I usually need the count of several others too.

Typically, I require the CountsFunc in codebases. Within my codebases, I often deal with large slices containing numerous objects and fields that require counting for specific objects within that slice. This allows me to make comparisons among these objects.

@jimmyfrasche
Copy link
Member

Here's what I usually do: https://go.dev/play/p/PZoRXcl8qc3 that lets me build the counts up for the whole thing and then I query the map. This is better than doing repeated O(n) queries on the slice if you need to run multiple queries. If you just want to run one query the search is faster and it needs to get rebuilt if the slice changes. I'm usually just doing this for one off analysis or a one time ETL so I usually have all the slices "done" before I need to check the counts and I tend to need to check the counts of multiple items.

@dozheiny
Copy link
Author

dozheiny commented Aug 26, 2023

Here's what I usually do: https://go.dev/play/p/PZoRXcl8qc3 that lets me build the counts up for the whole thing and then I query the map. This is better than doing repeated O(n) queries on the slice if you need to run multiple queries. If you just want to run one query the search is faster and it needs to get rebuilt if the slice changes. I'm usually just doing this for one off analysis or a one time ETL so I usually have all the slices "done" before I need to check the counts and I tend to need to check the counts of multiple items.

This implementation is better than mine,
It's better than returns counts of one specific value.

But what about struct slices? Typically, you would require the CountsFunc function for struct slices. However, CountsFunc only returns counts of a specific value.

@seankhliao seankhliao removed the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Aug 26, 2023
@jimmyfrasche
Copy link
Member

Have a func from your value to something comparable. When you build the map use

h[key(v)] += 1

and check the map using h[key(v)]

Or use a different data structure.

@AndrewHarrisSPU
Copy link

But what about struct slices? Typically, you would require the CountsFunc function for struct slices. However, CountsFunc only returns counts of a specific value.

As noted earlier, the Reduce in xiter is enough, maybe a bit dense:

h := make(map[string]int)
Reduce(h, func(h map[string]int, s string) map[string]int { h[s]++; return h }, seq)

Less generally:

// could easily use `vs Seq[V]` instead

func Counts[V comparable](vs []V) map[V]int {
	return CountsFunc(vs, func(v V) V { return v })
}

func CountsFunc[V any, K comparable](vs []V, key func(V) K) map[K]int {
	h := make(map[K]int)
	for _, v := range vs {
		h[key(v)]++
	}
	return h
}

Counting like this is a no-brainer in e.g. a data science library or language, but strategies for optimal counting diverge ... no opinion on whether or where this fits culturally in Go.

@ianlancetaylor
Copy link
Contributor

C++ has std::count and std::count_if that do this operation on iterators. Java has java.utils.Collections.frequency that does this operation on a collection. As far as I know C++ std::vector and Java Vector do not support this operation directly. I think for Go we should hold off on adding this to the slices package, and instead consider whether to add it to the iters package if we adopt #61898.

@earthboundkid
Copy link
Contributor

Python has collections.Counter which consumes an iterator to make a histogram map: https://docs.python.org/3/library/collections.html#collections.Counter

@dozheiny
Copy link
Author

@ianlancetaylor So, will this function be added to the slices package or x/exp/xiter?

@ianlancetaylor
Copy link
Contributor

@dozheiny This is a proposal for adding new API. No decision has been made. I'm saying that I think it would be preferable to add this to x/exp/iter rather than slices. If anybody wants to make an argument that slices, or perhaps both, is better, this is the place to do it.

@DeedleFake
Copy link

I think this definitely makes way more sense as a function for iterators, not slices. It's a function that inherently needs to be implemented via iteration over a slice anyways, so by implementing it for iterators it'll be exactly the same but not limited to just slices.

@earthboundkid
Copy link
Contributor

earthboundkid commented Aug 27, 2023

For what it’s worth, I feel like the simple case of just counting how many times X appears in a slice or iterator is better of just being done in a loop. I have used languages that have a lot of fancy functions to do this stuff for you, and the end result is you spend more time looking up the documentation for the iterator library than it would take to just write the loop. It creates a second way to do things that only saves a trivial amount of typing but takes up a certain amount of cognitive load.

Obviously these things are a spectrum. I’m happy to have slices.Clone even though it’s trivial because it’s shorter and more clear for a very common operation. As the operations become less common though, the value of having them in a library drops off pretty rapidly.

@AndrewHarrisSPU
Copy link

A quirky argument for Count in xiter, precisely using map[K]int: the iteration order of built-in maps is guaranteed to shuffle. I believe Count would be in many cases naive for performance, and using a map would often be part of the reason. But there is utility in testing something fancier against a simply and correctly written solution, and the shuffling is a great property here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests

8 participants