Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: add built-in set data structure #7088

Closed
gopherbot opened this issue Jan 9, 2014 · 16 comments
Closed

proposal: spec: add built-in set data structure #7088

gopherbot opened this issue Jan 9, 2014 · 16 comments
Labels
FeatureRequest FrozenDueToAge LanguageChange Thinking v2 A language change or incompatible library change
Milestone

Comments

@gopherbot
Copy link

by webuser1200:

When working with large diverse data sets in go I wish I had a built in set data
structure. For now, I've made by using map but my code would be cleaner and simpler if
there was a built in set method.
@cznic
Copy link
Contributor

cznic commented Jan 9, 2014

Comment 1:

IMHO, using a map as a set seems easy enough.

@gopherbot
Copy link
Author

Comment 2 by deckarep:

I would love to see a built-in Set in Go as well.  I would argue that a Set is probably
more of a fundamental data structure than even a map.  While we can use a map to model
as a set as I've done here: https://github.com/deckarep/golang-set. I'm still not quite
happy with having to use interface{}.  Also, it's not exactly trivial to come up with a
full implementation that utilizes all the set operations.
Please consider it.

@ianlancetaylor
Copy link
Contributor

Comment 3:

Labels changed: added release-none, repo-main, languagechange.

@davecheney
Copy link
Contributor

Comment 4:

Status changed to Thinking.

@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@rsc rsc changed the title spec: Add built-in set data structure proposal: spec: add built-in set data structure Jun 20, 2017
@rsc rsc added the v2 A language change or incompatible library change label Jun 20, 2017
@pciet
Copy link
Contributor

pciet commented Nov 20, 2017

https://blog.golang.org/profiling-go-programs shows that slices can be a more efficient representation of an unordered set. In my programming with map[type]struct{} I've used these:

add(set, item) set

combine(sets…) set

reduce(set) set

has(set, item) bool

equal(sets…) bool

diff(setA, setB) set

len(set) int

set.String() string

for item, _ := range set

@pciet
Copy link
Contributor

pciet commented Nov 20, 2017

and delete(set, item)

@mvdan
Copy link
Member

mvdan commented Nov 21, 2017

I don't see an advantage to having this in the language, as opposed to a package somewhere like sync.Map. The two possible backing structures - map and slice - are already there, so the language doesn't need to be involved.

For most use cases, all one needs to do is add, delete, and check elements of a set, for which a map is enough. Bear in mind that adding another builtin type to the spec (and especially such a complex one with so many methods/funcs) would also complicate the language noticeably.

It is also worth noting that the compiler and runtime have and are getting better at dealing with maps, so some of the overhead reported in 2014 may have been removed. For example: fbfc203 ec0e2ed 2a56b02

@pciet
Copy link
Contributor

pciet commented Nov 28, 2017

Here's a benchmark package for the kind of set I'm using: an unordered set of slices. This compares a first-pass implementation of a map and slice unordered set type for a specific use.

https://github.com/pciet/pathsetbenchmark

For operations that may modify the input sets I've added a copy operation which adds significant time to the benchmark. The time in parenthesis is the recorded time minus the benchmarked copy time.

The CPU governor is set to performance so there shouldn't be any frequency scaling.

> go version
go version go1.8.5 linux/amd64
> sudo lshw | grep "product: Intel"
product: Intel(R) i5-6400 CPU @ 2.70GHz
> cat /etc/*release | grep PRETTY_NAME
PRETTY_NAME="Ubuntu 16.04.3 LTS"
> go test -bench=. -benchtime=20s
BenchmarkSliceCopy-4            100000000              216 ns/op
BenchmarkMapCopy-4              10000000              2696 ns/op
BenchmarkSliceAdd-4             50000000               523 ns/op (307)
BenchmarkMapAdd-4               10000000              2785 ns/op (89)
BenchmarkSliceDelete-4          50000000               562 ns/op (346)
BenchmarkMapDelete-4             5000000              5779 ns/op (3083)
BenchmarkSliceCombine-4         50000000               743 ns/op (527)
BenchmarkMapCombine-4            5000000              7992 ns/op (5296)
BenchmarkSliceReduce-4          10000000              3364 ns/op (3148)
BenchmarkMapReduce-4             1000000             24625 ns/op (21929)
BenchmarkSliceHas-4             300000000               90.5 ns/op
BenchmarkMapHas-4               100000000              388 ns/op
BenchmarkSliceEqual-4           20000000              1368 ns/op
BenchmarkMapEqual-4              5000000              6951 ns/op
BenchmarkSliceDiff-4            10000000              2541 ns/op (2325)
BenchmarkMapDiff-4               2000000             16257 ns/op (13561)

@pciet
Copy link
Contributor

pciet commented Nov 28, 2017

For my application changing the sets from map to slice looks like it would save an order of magnitude in processing time according to this benchmark. If this can be shown true for most cases (maybe it's not) perhaps the Go 2 documentation could suggest slices for unordered typed sets instead of maps.

Slices for sets also have the added benefit of a more sensible variable definition, such as for defining test cases:

	{
		Equal: false,
		A:     MapPathSet{},
		B: MapPathSet{
			&Path{{0, 0}, {0, 1}, {0, 2}}: {},
			&Path{{2, 2}}: {},
		},
		C: MapPathSet{},
		D: MapPathSet{
			&Path{{2, 2}}: {},
		},
	},
	{
		Equal: false,
		A:     SlicePathSet{},
		B: SlicePathSet{
			{{0, 0}, {0, 1}, {0, 2}},
			{{2, 2}},
		},
		C: SlicePathSet{},
		D: SlicePathSet{
			{{2, 2}},
		},
	},

@ianlancetaylor
Copy link
Contributor

From a quick glance your benchmarks use fairly small sets. Try benchmarking slices vs. maps with 10,000 entries.

@pciet
Copy link
Contributor

pciet commented Nov 28, 2017

I'll put together a graph for each function with the x-axis as the entry count when I have some time. These small sets are my only use case currently.

@pciet
Copy link
Contributor

pciet commented Dec 6, 2017

Changing my application that uses sets of slices, sets of pointers, and sets of two-int structs from map unordered sets to slice unordered sets reduced my testing time from 1.78-1.89 to 0.54-0.55 seconds with 1.8.5 darwin/amd64. That's 95 chess board test cases for determining all moves available or making a move determined to be legal.

@ianlancetaylor
Copy link
Contributor

Either Go 2 will have some support for generics, in which case you can define your own Set type (perhaps using an underlying map), or it won't, in which case we still won't want to add another special kind of type that is so similar to maps. So, we aren't going to do this. Closing.

@pciet
Copy link
Contributor

pciet commented Dec 13, 2017

@ianlancetaylor is it worth opening a separate issue for a documentation update of https://golang.org/doc/effective_go.html#maps? There's the map[type]struct{} pattern and it seems at first glance and through the 2011 profiling Go blog post that slices are more performant without much change in code complexity.

A set can be implemented as a map with value type bool. Set the map entry to true to put the value in the set, and then test it by simple indexing.

@ianlancetaylor
Copy link
Contributor

@pciet Are you suggesting that we should describe how to use maps as sets in Effective Go? I don't think that's really necessary. The idea seems straightforward enough to me.

I'm sorry, I simply don't believe that slices are more performant than maps. If you want to convince me of that, show me some benchmark code that uses sets with 1,000 different elements.

@pciet
Copy link
Contributor

pciet commented Dec 14, 2017

@ianlancetaylor I'll keep the benchmark in mind for some free time and if it's clear that for most or every case slices out-perform maps then I'll open an issue proposing an Effective Go change or addition for programmers that are looking for an unordered set variation. The current state of things is fair to me though, thanks for taking a look at this issue.

Background on my assertion: my test load profiles (https://github.com/pciet/wichess/tree/master/profiles) showed runtime.mapassign, runtime.mapiternext, and runtime.mallocgc as taking 17 seconds out of 73 total. With slices its runtime.mallocgc, runtime.scanobject, and runtime.duffcopy taking 19 out of 73 but with 3x request throughput. Map random iteration and map hash generation would have me think maps will always take more time for a heavily-iterating use case although my benchmark showed adding an element is faster for maps than slices (maybe I was causing an append reallocation).

@golang golang locked and limited conversation to collaborators Dec 14, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FeatureRequest FrozenDueToAge LanguageChange Thinking v2 A language change or incompatible library change
Projects
None yet
Development

No branches or pull requests

8 participants