Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: map keys should use Equal method if it is defined #21829

Closed
jeromefroe opened this issue Sep 10, 2017 · 15 comments
Closed

proposal: Go 2: map keys should use Equal method if it is defined #21829

jeromefroe opened this issue Sep 10, 2017 · 15 comments
Labels
FrozenDueToAge LanguageChange Proposal v2 A language change or incompatible library change
Milestone

Comments

@jeromefroe
Copy link

According to the Go 1 spec, the only requirement for keys in a map is that the comparison operators, == and !=, must be defined for them. If this requirement is satisfied then the keys will be compared using these operators as defined in the Comparison operators section of the spec. As an example, for structs:

Two struct values are equal if their corresponding non-blank fields are equal.

Consequently, irregardless of whether or not the struct has an Equal method defined for it, the struct will be compared on a field by field basis. I'd like to propose that maps should use the Equal method to test for equality if it is defined for the key type before falling back to comparing based on basic types, as this would more closely align with developers' expectations.

As a concrete example, consider the following code snippet (playground link):

package main

import (
	"fmt"
	"time"
)

func main() {
	a := time.Now()
	b:= time.Unix(0, a.UnixNano())
	
	m := make(map[time.Time]struct{})
	m[a] = struct{}{}
	
	_, ok := m[b]
	fmt.Println(ok)
}

Prior to Go 1.9, this snippet would print true. However, with the changes made to time.Time to support monotonic clocks, it now prints false. This seems unnatural since comparing the values using the Equal method would indicate that they are, in fact, equal.

Furthermore, there is precedent for such an approach in other languages as well. For example, the Rust implementation of a Hashmap requires that the key implement the Eq trait. As a result, developers have the ability to add a custom implementation for testing equality or they can use the #[derive] attribute to have the compiler automatically derive an Equality method for the type. It seems we could do the same in Go as well, where the fallback is the current definition of struct equality cited above.

@gopherbot gopherbot added this to the Proposal milestone Sep 10, 2017
@martisch
Copy link
Contributor

martisch commented Sep 10, 2017

This seems to be a backwards breaking change so needs a good reason to do so.

func main() {
	a := time.Now()
	b:= time.Unix(0, a.UnixNano())
	
	m := make(map[int64]struct{})
	m[a.UnixNano()] = struct{}{}
	
	_, ok := m[b.UnixNone()]
	fmt.Println(ok)
}

Seems to be an easy solution for the desired equality as far as i understand the example.

@jeromefroe Do you have another example in mind where mapping keys to a different type that has the desired equality property is not so easy?

Note that types from other packages wont be able to be extended with new methods. So e.g. for time.Time having the option of an Equal method wont work for users that dont change the std lib.

Also it would be a bit inconsistent if == would not use the Equal method too.

Under the hood the map is a hash map that uses a hash function per type to find the bucket where the key can be found if it exists. I would assume the hash function would need to be provided too unless the current implementation is radically changed.

@ccbrown
Copy link

ccbrown commented Sep 10, 2017

How do you propose hashing be done?

@OneOfOne
Copy link
Contributor

In a ghetto implementation that I wrote (not published), it'd check for a Hash() uint64, so maybe instead of Equal it should use that with a big fat warning saying hash must be 100% unique per object?

@jeromefroe
Copy link
Author

jeromefroe commented Sep 10, 2017

@martisch There's certainly a workaround for the case of time.Time and I don't have an example of another case where it wouldn't be easy to develop a workaround. However, I think the point remains that it is unnatural to have two equality methods, namely Equal and == for a type, and that the burden is on the developer to know when they are applied.

I'm not sure I understand your comment on packages not being able to be extended with new methods. I would imagine that the implementor of a type would either define the Equal method or not so that external callers wouldn't have a choice.

I also agree that it would be inconsistent if == did not use the Equal method as well and I thought about including that point in the proposal but decided to not include it so as to restrict the scope of the proposal. I would be in favor of it though.

I don't see why the Hash function would absolutely have to be provided. I think it would be perfectly acceptable to use the default hash, but use the Equal method when comparing values in the map. With that being said, I'm in favor of allowing the implementor to override the default hash function when necessary by defining a Hash method on the type. This is, in fact, the approach that Rust takes as the key in a map must implement both the Hash and Eq traits.

@ccbrown See comments above on hashing.

@OneOfOne I was thinking something along the same lines, the map could perhaps check for both Equal and Hash methods on the type though.

@jeromefroe
Copy link
Author

On further thought, I can't think of a scenario where one would want to define an Equal method but not a Hash method, so I think it would be fine to require both of them.

@creker
Copy link

creker commented Sep 10, 2017

@OneOfOne, it would be better to use Hash only as a hint and optimization as is done in other languages. If Hash gives different results, then objects are guaranteed to be different. If Hash gives equal results, then objects MIGHT be the same and Equal should be used to give definite answer.

@ccbrown
Copy link

ccbrown commented Sep 10, 2017

I don't see why the Hash function would need to be provided. I think it would be perfectly acceptable to use the default hash, but use the Equal method when comparing values in the map.

An important property of hash maps in general is that objects considered equal have the same hash. You end up with outright broken behavior otherwise.

In a ghetto implementation that I wrote (not published), it'd check for a Hash() uint64, so maybe instead of Equal it should use that with a big fat warning saying hash must be 100% unique per object?

I don't think that would be very usable. 64-bit hashes are small enough that collisions are very feasible, and it's not possible to create a 64-bit hash function that guarantees no collisions for inputs with more than 64 bits of entropy.

But given an implementation that requires both Equal and Hash, here are my reservations:

  • Currently, all map keys are immutable. By allowing user-defined hashing / equalities, you introduce a way for programmers to shoot themselves in the foot: It becomes their responsibility to make sure that the fields referenced by keys' Equal and Hash methods never change.

  • I don't think this actually enables you to do anything you can't already do by creating a struct with the values that should make up the key.

    type MyType struct {
        A string
        B string
        C string
    }
    
    func (x *MyType) Key() MyTypeKey {
        return MyTypeKey{x.A, x.B}
    }
    
    type MyTypeKey struct {
        A string
        B string
    }
    
    m[foo.Key()] = "foo"

    That strikes me as much safer and not much (if any) more difficult. So I'm not sure a language change is really warranted.

@martisch
Copy link
Contributor

martisch commented Sep 10, 2017

I think the point remains that it is unnatural to have two equality methods, namely Equal and == for a type, and that the burden is on the developer to know when they are applied.

As i understand the current proposal requires the developer to check if the struct has an Equal method to understand how the map element equality will work instead of relying on knowing always == is the equality. I think it can also lead to cycles if e.g. the Equal method is defined on a pointer type that operates on all the fields of a struct (pointed too) that has a pointer ....

Also the hash function needs to be consistent with the equality which provides another possibility of getting the implementation of hash and equality needed for the map wrong.

My comment about not being able to implement new methods was directed at the issue of even if the magic Equal exists it wont allow implementing arbitrary Equal mappings on existing types e.g. time.Time by someone not having control over the package of the type. So the workaround for the given example in the proposal still needs to be used.

Another issue i see is if the implementor of the type provides a Equal method should there be a way for the user of the type to get back the original behavior of hashing/== in maps on the whole struct?

@ianlancetaylor ianlancetaylor added v2 A language change or incompatible library change LanguageChange labels Sep 10, 2017
@jeromefroe
Copy link
Author

jeromefroe commented Sep 10, 2017

Currently, all map keys are immutable. By allowing user-defined types, you introduce a way for programmers to shoot themselves in the foot: It becomes their responsibility to make sure that the fields referenced by keys' Equal and Hash methods never change.

I don't think this is entirely accurate, consider the example below of a struct which contains a pointer field (playground link):

package main

import (
	"fmt"
)

type foo struct {
	p *int
}

func main() {
	f := foo{p: new(int)}
	m := make(map[foo]struct{})
	
	m[f] = struct{}{}
	_, ok := m[f]
	fmt.Println(ok)
	
	i := 1
	f.p = &i
	_, ok = m[f]
	fmt.Println(ok)
}

The value of ok is true and then false. Although the example is admittedly contrived, it does show that map keys are not immutable and developers already need to be cognisant of that.

I don't think this actually enables you to do anything you can't already do by creating a struct with the values that should make up the key.

That certainly works for types which one defines, but it fails for external types, as in the time.Time example above.

As i understand the current proposal requires the developer to check if the struct has an Equal method to understand how the map element equality will work instead of relying on knowing always == is the equality.

I agree, which I was I think == should be determined by an Equal method on the type if it is defined.

I think it can also lead to cycles if e.g. the Equal method is defined on a pointer type that operates on all the fields of a struct (pointed too) that has a pointer ....

I don't see why this problem wouldn't exist already since Go already checks if pointers are equal by checking if the values they point to are equal. In any case, such problems would be easily found in testing since the first time Equal is invoked it would cause a panic from a stack overflow when the level of recursion gets too deep.

Also the hash function needs to be consistent with the equality which provides another possibility of getting the implementation of hash and equality needed for the map wrong.

This is certainly a possibility, but I think if a developer opts into this behavior by defining the requisite methods then they must guarantee that the methods are compatible. The language cannot, of course, prevent every kind of bug.

My comment about not being able to implement new methods was directed at the issue of even if the magic Equal exists it wont allow implementing arbitrary Equal mappings on existing types e.g. time.Time by someone not having control over the package of the type. So the workaround for the given example in the proposal still needs to be used.

I see, my expectation would be that the time package would implement the requisite Equal and Hash method so that callers could use them as keys. More generally, the developer would define these methods if they saw fit, external callers should not define them.

Another issue i see is if the implementor of the type provides a Equal method should there be a way for the user of the type to get back the original behavior of hashing/== in maps on the whole struct?

My inclination is no, at least I can't imagine a scenario where one would want to use the original implementation if an alternative is defined.

@ccbrown
Copy link

ccbrown commented Sep 10, 2017

I don't think this is entirely accurate, consider the example below of a struct which contains a pointer field (playground link):
The value of ok is true and then false. Although the example is admittedly contrived, it does show that map keys are not immutable and developers already need to be cognisant of that.

@jeromefroe I think you misunderstand my statement. Your code isn't demonstrating that map keys are mutable. You're mutating a local variable and demonstrating that looking up a different key has different results. Here's the point I'm trying to make:

type Bar struct {
    Id string
}

type Foo struct {
    Bar *Bar
}

func (f *Foo) Equal(f2 *Foo) bool {
    return f.Bar.Id == f2.Bar.Id
}

func (f *Foo) Hash() int64 {
    return HashString(f.Bar.Id)
}

func main() {
    m := make(map[Foo]string)
    bar := &Bar{"one"}
    m[Foo{bar}] = "foo"
    bar.Id = "two"
    // At this point, your map may be irreversibly broken.
    // It contains an element that is probably in the wrong bucket.
}

@mvdan
Copy link
Member

mvdan commented Sep 10, 2017

Here's a question - if you'd like m[x] to use x.Hash(), why not just use m[x.Hash()] instead?

Go is about being explicit and predictable - if I see m[x], I expect the operation to be cheap and without side effects. If it calls .Hash(), virtually anything could happen.

@jeromefroe
Copy link
Author

@ccbrown Apologies, I see what you mean now. It seems this problem exists in Java though it doesn't exist in Rust because inserting a key into a HashMap requires a mutable reference to the key so that is cannot be mutated elsewhere while the HashMap is in scope.

Given that it is potentially unsafe the feature could possibly be gated by requiring that the developer import the unsafe package in some way when implementing the Hash method. Alternatively, there has been discussion of integrating reference immutability into Go into which it could be required that only constant keys could be inserted into the map which would get around this issue as well.

@mvdan

Here's a question - if you'd like m[x] to use x.Hash(), why not just use m[x.Hash()] instead?

My true aim isn't really for the map to use a custom hash function but a custom equality function. In order for the custom equality function to work properly, however, it most likely needs a custom hash function as well (at least I can't think of a case where a custom equality function would be compatible with the default hash function). For example, in the case of time.Time you can imagine that two time.Time values would be equivalent if they are the same number of nanoseconds from the Unix epoch, so the hash function should only hash the number of nanoseconds from the epoch and not any other internal state.

Go is about being explicit and predictable - if I see m[x], I expect the operation to be cheap and without side effects. If it calls .Hash(), virtually anything could happen.

I believe the same could be said for fmt.Println(x). If x defines a String method then it could, theoretically, do virtually anything. Furthermore, in my opinion, an explicit aim of this proposal is to make Go more predictable. For example, it seems odd to me that there are, in effect, two definitions of equality for time.Time, one defined by == and another defined by the Equal method.

@martisch
Copy link
Contributor

martisch commented Sep 10, 2017

I don't see why this problem wouldn't exist already since Go already checks if pointers are equal by checking if the values they point to are equal.

As far as i know Go checks that the pointers are equal not that the pointed to values are equal as Equal would be allowed to do.

type t struct {
	i *int
}

func main() {
	m := make(map[t]int)
	ki1 := 1
	ki2 := 1
	m[t{i:&ki1}] = 3
	println(m[t{i:&ki2}])	
}

prints 0 not 3.

In any case, such problems would be easily found in testing since the first time Equal is invoked it would cause a panic from a stack overflow when the level of recursion gets too deep.

I dont think they are always easily found. The cycle might only form in specific paths during runtime. A field with a pointer might be nil most of the time but then be pointer at to one of its "parents" due to a specific chain of events.

Also current optimizations such as not causing a copy for map[string([]byte...)] would be invalid as the Equal method could change the byte slice of the key. Which also brings the question what is the semantics of the map operation when an Equals method is implemented that changes the key while comparing keys inside the map? The answer could be that a deep key copy needs to be made for each map operation to continue comparing as if the key was not changed. However all these changes will have performance impacts.

@jeromefroe
Copy link
Author

jeromefroe commented Sep 10, 2017

@martisch That's correct. However, I believe the problem does still exist for reflect.DeepEqual which compares pointers in the following manner: "Pointer values are deeply equal if they are equal using Go's == operator or if they point to deeply equal values." as this example shows.

Also current optimizations such as not causing a copy for map[string([]byte...)] would be invalid as the Equal method could change the byte slice of the key.

I don't see why those optimizations would be invalid since the compiler would still know that []byte does not and will not ever implement Equal.

Which also brings the question what is the semantics of the map operation when an Equals method is implemented that changes the key while comparing keys inside the map? The answer could be that a deep key copy needs to be made for each map operation to continue comparing as if the key was not changed. However all these changes will have performance impacts.

I touched on this a little in a previous comment. Reproducing here for clarity, though the full context can be found above: "It seems this problem exists in Java though it doesn't exist in Rust because inserting a key into a HashMap requires a mutable reference to the key so that is cannot be mutated elsewhere while the HashMap is in scope. Given that it is potentially unsafe the feature could possibly be gated by requiring that the developer import the unsafe package in some way when implementing the Hash method. Alternatively, there has been discussion of integrating reference immutability into Go into which it could be required that only constant keys could be inserted into the map which would get around this issue as well."

@pjebs
Copy link
Contributor

pjebs commented Dec 3, 2017

Why was this closed?

@golang golang locked and limited conversation to collaborators Dec 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge LanguageChange Proposal v2 A language change or incompatible library change
Projects
None yet
Development

No branches or pull requests

9 participants