Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: disallow NaN keys in maps #20660

Open
bcmills opened this issue Jun 13, 2017 · 44 comments
Open

proposal: spec: disallow NaN keys in maps #20660

bcmills opened this issue Jun 13, 2017 · 44 comments

Comments

@bcmills
Copy link
Contributor

bcmills commented Jun 13, 2017

Motivation

Go currently supports NaN keys in maps through some clever tricks.

The current spec is ambiguous on this behavior. It says:

[I]f the map contains an entry with key x, a[x] is the map value with key x and the type of a[x] is the value […]

but "with key x" could plausibly mean "with key represented by x". (In practice, today it means "with key equal to x".)

That leads to surprising and subtle bugs (such as #14427) and adds complexity even in correct code (#20138 (comment)).

For example, the following two functions appear to be equivalent, but the latter version silently copies zeroes instead of the intended values for NaN keys:

func copy(m map[float64]int) map[float64]int {
  c := make(map[float64]int, len(m))
  for k, v := range m {
    c[k] = v
  }
}
func copy(m map[float64]int) map[float64]int {
  c := make(map[float64]int, len(m))
  for k := range m {
    c[k] = m[k]
  }
}

Proposal

We already reject incomparable values as map keys:

If the key type is an interface type, these comparison operators must be defined for the dynamic key values; failure will cause a run-time panic.

I propose that we should apply that standard to NaNs as well. Specifically, I propose that we add the following sentence to https://golang.org/ref/spec#Index_expressions:
Assigning to an element of a map with a key that is not equal to itself causes a run-time panic.

This proposal is along the lines of #19624, in that it produces panics for arithmetic operations which are otherwise likely to result in silent corruption in typical user code.


Workarounds

Users can easily avoid the proposed panics by calling math.IsNaN(k) checking whether the key is equal to itself before inserting a map entry.

Users who expect NaNs with the same representation to map to the same element can transform the keys using math.Float64bits or math.Float32bits:

  if k == 0 { // -0.0 or +0.0
    m[math.Float64bits(0)] = v
  } else {
    m[math.Float64bits(k)] = v
  }

Users who really do intend to store all of the entries for NaN keys separately can explicitly pair the map with a slice:

  type nanElem struct {
    k float64
    v V
  }
  var (
    m map[float64]V
    nans []nanElem
  )
  if k != k {
    nans = append(nans, nanElem{k, v})
  } else {
    m[k] = v
  }
@gopherbot gopherbot added this to the Proposal milestone Jun 13, 2017
@dsnet dsnet added the v2 A language change or incompatible library change label Jun 13, 2017
@dsnet dsnet changed the title proposal: Go 2: disallow NaN keys in maps proposal: disallow NaN keys in maps Jun 13, 2017
@randall77
Copy link
Contributor

FWIW, there's a fair amount of code in the hashmap implementation that deals with the possibility of k!=k for keys k (search for reflexivekey in runtime/hashmap.go). It would be nice to strip that out.
That said, I don't think it is a big performance impediment. In the common case it's just a load/compare/predictable branch in a few places.

@dsnet
Copy link
Member

dsnet commented Jun 13, 2017

The strange behavior with maps only because == is not symmetric when applied on NaNs.
An alternative proposal would be to define == to be symmetric for NaNs only for maps. Technically there are multiple types of NaNs (signaling and quiet NaNs and a variety of valid bit sequences that represent NaNs). I similarly argue that they should all be equal to each other for maps in the same way that -0.0 == +0.0.

@bcmills
Copy link
Contributor Author

bcmills commented Jun 13, 2017

@randall77 It may be that this proposal would simplify the hashmap implementation, but I would prefer to focus on the correctness and complexity implications for user code.

@bcmills
Copy link
Contributor Author

bcmills commented Jun 13, 2017

@dsnet As @rsc notes in the blog post, allowing m[k1] and m[k2] to be same element when k1 != k2 would also be surprising. It is straightforward to equate NaNs in user code if desired using the math.FloatNbits approach outlined above, and I would argue that the explicit code makes the change in equality less surprising.

m := map[uint64]Vswitch {
case math.IsNan(k):
  m[math.Float64bits(math.NaN())] = v
case k == 0:  // +0.0 or -0.0
  m[math.Float64bits(0)] = v
default:
  m[math.Float64bits(k)] = v
}

@randall77
Copy link
Contributor

@bcmills : I agree, just wanted to let people know what the effects on the implementation would be (tl;dr a small net positive).

@cznic
Copy link
Contributor

cznic commented Jun 14, 2017

We already reject incomparable values as map keys:

If the key type is an interface type, these comparison operators must be defined for the dynamic key values; failure will cause a run-time panic.

All and every value of type {float32,float64} is comparable (guaranteed by the specs). The result of the comparison is irrelevant here.

Map keys must be comparable is a simple, non-complicated rule. I object to making it not only more complex, but suprisingly non-orthogonal.

@rsc
Copy link
Contributor

rsc commented Jun 14, 2017

Users can easily avoid the proposed panics by calling math.IsNaN(k) before inserting a map entry for k.

Not true. NaNs can be inside arrays or structs.

All our current restrictions are type-based, not value-based. Even the interface panic restriction is about the type inside the interface, not the value. I'd really like to avoid introducing and explaining new value-based restrictions.

@bcmills
Copy link
Contributor Author

bcmills commented Jun 14, 2017

Users can easily avoid the proposed panics by calling math.IsNaN(k) before inserting a map entry for k.

Not true. NaNs can be inside arrays or structs.

Good point. But it's still easy to check, especially given the "not equal to itself" phrasing I've proposed:

if k != k {
  … // Can't be a map key. Do something else with it.
} else {
  m[k] = v
}

@bcmills
Copy link
Contributor Author

bcmills commented Jun 14, 2017

The main issue I want to address is the surprising inability to access elements by key in an irreflexively-keyed map. If you don't want to introduce value-based restrictions, perhaps we could at least change map entries from "with key equal to x" to "with key represented by x"?

@rsc
Copy link
Contributor

rsc commented Jun 14, 2017

It's certainly an idea worth evaluating when we get to language changes again (hence the Go2 label). Like all the Go2-labelled things, I'm not going to try to evaluate them now.

@rsc rsc changed the title proposal: disallow NaN keys in maps proposal: spec: disallow NaN keys in maps Jun 16, 2017
@zombiezen
Copy link
Contributor

One other way to address the specific case of copying maps would be to extend the copy builtin function to support maps.

@bcmills
Copy link
Contributor Author

bcmills commented Dec 21, 2017

That would still give surprising behavior for functions that are not copy exactly.

For example, consider this snippet:

	m := map[float64]int{k: 0}
	m[k] += 1
	m[k] += 1

It appears to increment the entry for m[k] twice, but what it actually does (when k is a NaN) is add two new entries mapping k to 1 (https://play.golang.org/p/gIUlA0lg-9b).

Similarly, there is no way to delete a NaN entry from a map: https://play.golang.org/p/Rz2TLn336c8

So special-casing copy isn't, IMO, a viable solution.

@bcmills
Copy link
Contributor Author

bcmills commented Jan 2, 2018

For an example where += matters, consider the case of aggregating two maps into one (https://play.golang.org/p/iwKyUWV3m_U):

func inc(dst, src map[float64]int) {
	for k, v := range src {
		dst[k] += v
	}
}

There is currently no way to implement such a function correctly in the presence of NaNs: because delete doesn't work with NaN keys, there is no way to remove or update the existing entries in the dst map even if you use the Float64bits trick to aggregate into some non-float-keyed intermediate.

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Feb 20, 2018

I agree with @bcmills that a map with NaN key values is hard and perhaps even impossible to use. However, the current definition of maps is entirely type based, and I don't see a clear reason to change that. The current behavior is consistent and predictable even though surprising.

@bcmills suggests that you can check for an irreflexive value before adding it to the map, but that is true whether we make this change or not.

Closing.

@griesemer
Copy link
Contributor

cc: griesemer

@bcmills
Copy link
Contributor Author

bcmills commented Feb 21, 2018

The current behavior is consistent and predictable even though surprising.

I would argue that “surprising” is exactly the opposite of “predictable”: you can only predict the current behavior if you stop to think about it, and (empirically) most users don't.

So what about using representation rather than equality (per #20660 (comment))?

That would also be consistent and predictable, and would arguably be a better match for the behavior people already assume for maps with floating-point keys. (For example, it would give reasonable behavior for all of the examples of subtly-broken loops I've listed above.)

The only downside I can see is that it would not equate the entries for +0.0 and -0.0, but it's not at all obvious to me that that matters in practice: given that lookups are already sensitive to rounding, it seems much less surprising to have separate entries for +0 and -0 than to have arbitrarily many entries for math.NaN().


A quick sample of float-keyed maps at Google (because that's an easy corpus for me to search) gives some interesting data:

  • The vast majority are inversions or indexes of other floating-point data, where the set of keys used for lookup are drawn from some fixed data set and lookups from that set are assumed to succeed.
  • Most of the rest are operating on data sets with no negative sign bits, such as percentiles, probabilities in the range [0, 1], or scalar distances.
  • One does operate with arbitrary keys, but special-cases keys equal to 0 before performing the lookup.
  • One is a float64-to-*float64 interning table, which doesn't particularly care about rounding or ±0 but will consume much more memory than intended if you feed it NaNs.

@ianlancetaylor
Copy link
Contributor

Right now maps keys use the same operation as the == operator in the language. If we use representation instead, we break that simple rule. I don't think that is a good idea. In general, the current implementation minimizes special cases in the language. I think that is a good thing.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 22, 2018

In general, the current implementation minimizes special cases in the language. I think that is a good thing.

I agree that minimizing special cases is a good thing, but I don't think it's accurate to say that the current implementation minimizes special cases. The existing semantics already imposes a special case: it just isn't documented.

The spec for range says:

For each iteration, iteration values are produced as follows if the respective iteration variables are present:

Range expression                          1st value          2nd value
[…]
map             m  map[K]V                key      k  K      m[k]       V

But today, range does not produce the iteration value m[k] if k != k. Instead, it produces the value for an element with key represented by k.

It's also worth noting that the current spec does not use the == operator to define map semantics:

  • delete "removes the element with key k" (not "the element with key equal to k").
  • a[x] is "the map element with key x" (not "the map element with key equal to x).
  • For map types, "[t]he comparison operators == and != must be fully defined", but the spec does not specify how they are used.

If we summarize map semantics in terms of equality, the existing special cases become clear:

  • a[x] is the value of the element with key equal to x, unless it is the left-hand operand of an assignment.
  • If there is an existing element with key equal to x, a[x] = v sets the value of that element to v.
    Otherwise, it adds a new element with key x and value v.
  • delete(m, k) removes the element with key equal to k.
  • range m yields the key and value of each element in m.

Using representation instead, the special cases disappear:

  • a[x] is the value of the element with key represented by x.
    • Assigning to a[x] creates that element if it does not yet exist.
  • delete(m, k) removes the element with key represented by k.
  • range m yields the key and value of each element in m.

Enumerating the "equal" rules without the special cases gives the proposed panic semantics:

  • a[x] is the value of the element with key equal to x.
    • Assigning to a[x] creates that element if it does not yet exist.
      (If such an element cannot exist, attempting to create it causes a run-time panic.)
  • delete(m, k) removes the element with key equal to k.
  • range m yields the key and value of each element in m.

@mdempsky
Copy link
Member

All our current restrictions are type-based, not value-based. Even the interface panic restriction is about the type inside the interface, not the value.

It's about the dynamic type of the value, which I think is still arguably a value-based restriction from the perspective of the map key type, even if the spec isn't currently worded to emphasize that.

Currently we incompletely support NaN keys (e.g., delete doesn't work, and package reflect can't access NaN-keyed values). I'd argue we should either explicitly disallow them or we should completely support them (e.g., @bcmills's representation-based proposal above).

That said, I'm not particularly fond of the representation-based approach. It feels like an abstraction violation to me, but that's an admittedly rather subjective concern.

@ianlancetaylor This was tagged for Go2, but now it's closed. Does that mean it's been rejected for consideration from Go2?

@cznic
Copy link
Contributor

cznic commented Feb 22, 2018

Currently we incompletely support NaN keys (e.g., delete doesn't work, ...

I don't think delete does not work with NaN keys. I think that delete with a NaN key works actually exactly as specified. Related comment.

@bcmills
Copy link
Contributor Author

bcmills commented Feb 22, 2018

I think that delete with a NaN key works actually exactly as specified.

“Works in a way that is not useful to anybody” is the same as “doesn't work” in terms of user value.

I have yet to see a program in which the current behavior of delete for NaN keys is useful and desired: if you have one in mind, that would be a great example to look at.

@ianlancetaylor ianlancetaylor added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Feb 23, 2018
@bcmills
Copy link
Contributor Author

bcmills commented Feb 23, 2018

I am strongly against changing maps to work on the NaN representation. I care less about having map operations panic on NaN, but I continue to find the arguments above to be unconvincing.

I don't have a strong preference between panicking on NaNs or using representations instead.

If we decide to do neither and maintain the status quo, I do think we would need to address the other NaN-key issues in the standard library (#11104, #14427, and #24075 that I know of).

The exact code that people will have to write if map operations panic on NaN will work today without changing the language.

That's also true of a number of other run-time panics: for example, division by zero, out-of-bounds slice access, failed type-assertions, races on map operations, and attempts to put incomparable keys into interface-keyed maps.
We panic on behaviors because they indicate likely bugs, not because those bugs cannot be avoided.

@gopherbot gopherbot removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Aug 16, 2019
@gopherbot gopherbot added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Sep 3, 2019
@bcmills
Copy link
Contributor Author

bcmills commented Dec 16, 2020

#43207 is another example of a package in the standard library erroneously assuming that map values can be looked up by key.

@deanveloper
Copy link

deanveloper commented Jul 27, 2021

When discussing the antireflexivity of NaN, I think it’s important to discuss why NaN is antireflexive.

NaN is supposed to bubble out of mathematical operations. You have an invalid-decoded float, bad operation with an Infinity, etc. You did something bad and now you have a non-real or otherwise invalid float. So, all operations on NaN evaluate to NaN. (And if NaN == NaN then that would mean NaN-NaN==0, NaN/NaN==1, etc) Your application should see this NaN and do something about it!

However, instead Go maps essentially hide NaNs away to never be seen again, they cannot be indexed nor deleted (they can only be detected by range and kind-of len()). The entire point of the antireflexivity of NaNs is to make them seen, but instead maps do the opposite, and I consider this a major flaw.

@DeedleFake
Copy link

I've done a quick survey of map implementations and how they handle NaN keys in a number of other languages for reference:

Rust
Rust avoids the problem completely by defining two levels of equality based on the Eq and PartialEq traits. The first is required for using a type as a map key, but only the second is defined for floats.
Python
Python instantiates NaN values as float('NaN'). Values of this kind are not equal, even to themselves, but the same instance of a NaN is equal to itself for purposes of maps. They are not equal to other instances, however.
Java and Kotlin
Java and Kotlin treat NaN values properly as being unequal under normal circumstances, but treats them as equal for the sake of map key usage, allowing them to be looked up and disallowing 'duplicates'. It is possible that it's actually doing the same thing as Python, but NaN is a constant in Java so it's mitigated somewhat in some circumstances.
Ruby
Ruby is the same as Java as far as I can tell, meaning that NaN == NaN is false, but is treated as a distinct value that is equal to itself for the purposes of map keys. It also has a constant that it uses for NaN values, rather than instantiating new ones.
Clojure
Clojure does the same thing as Go, and it even has an entire section of its manual dedicated to explaining the odd behavior.

@rsc
Copy link
Contributor

rsc commented Jul 28, 2021

There are only bad answers.
The activation energy (the coefficient of static friction?)
required to migrate from one bad answer to a different one is very high.
It seems unlikely it would be reached at this point.

@deanveloper
Copy link

To add on to @DeedleFake, here is JavaScript Map:

NaN is considered the same as NaN (even though NaN !== NaN) and all other values are considered equal according to the semantics of the === operator.

@bcmills
Copy link
Contributor Author

bcmills commented Aug 6, 2021

This comment thread on CL 338609 is another interesting data point.

(It seems that the behavior of NaN map keys runs counter to the intuition of even @robpike!)

@robpike
Copy link
Contributor

robpike commented Aug 6, 2021

I wouldn't say that. It was how NaNs worked (or not) in reflect.DeepEqual I didn't know.

@rsc rsc added this to Incoming in Proposals (old) Jan 5, 2022
@rsc rsc removed this from Incoming in Proposals (old) Jul 27, 2022
@docmerlin
Copy link

I don't like the idea of runtime panics on NaN being assigned to maps. Instead we should change the map behavior to allow NaN to be a valid key that can be queried by a NaN.

@docmerlin
Copy link

docmerlin commented Sep 12, 2022

IMO behind the scenes it should be doing something like what was proposed above for a workaround:

  if k == 0 { // -0.0 or +0.0
    m[math.Float64bits(0)] = v
  } else {
    m[math.Float64bits(k)] = v
  }

@robpike
Copy link
Contributor

robpike commented Sep 13, 2022

Users can easily avoid the proposed panics by calling math.IsNaN(k) before inserting a map entry for k.

Not true. NaNs can be inside arrays or structs.

All our current restrictions are type-based, not value-based. Even the interface panic restriction is about the type inside the interface, not the value. I'd really like to avoid introducing and explaining new value-based restrictions.

I'd really like to avoid introducing NaNs, but I missed that boat. :(

@atdiar
Copy link

atdiar commented Sep 29, 2022

It's interesting that if we consider a value to be/have a subtype (if a type is a set of values, a subtype can be represented as a subset... e.g. float64 with the added constraint that the set of values is restricted to NaN), we can reproduce the same logic used for the comparable interface.

I don't know if this would be backward-compatible though (because I think it would require some additional checks, equivalent to implementing a negated constraint such as non-reflexive comparability constraint checks). Could think about it.

@ianlancetaylor
Copy link
Contributor

Just a note that as of the upcoming 1.21 release the language will have a new builtin, clear, which removes a part of the argument for this change. (But there are other arguments for this change, and we could in principle still adopt it.)_

@ianlancetaylor ianlancetaylor removed v2 A language change or incompatible library change NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. labels Aug 23, 2023
@bcmills
Copy link
Contributor Author

bcmills commented Oct 13, 2023

#63312 is another issue of a package in the standard library neglecting to document an exception to its behavior for NaN keys.

The documentation for maps.DeleteFunc says “DeleteFunc deletes any key/value pairs from m for which del returns true”, but today it does not delete such pairs if the key for the pair is not equal to itself.

@candlerb
Copy link

candlerb commented Oct 13, 2023

Panicking on m[k] = v if k != k seems sane and reasonable to me, and although it would be backwards-incompatible, it would be in such a way as to draw attention to itself anywhere that it mattered.

The utility of being able to store multiple values in a hash with the "same" key, which you can't actually retrieve (without iterating over the whole map) or delete (without clearing the whole map) is very low. Anybody who currently does this, very likely does so by accident IMO.

Special-casing NaN when used as a hash key sounds appealing at first, but it would also have to apply to NaNs deeply nested in structs.

Nobody likes the idea of another line of code that could panic at runtime, but this is a similar case to interface comparisons, which are almost always allowed but can panic in certain cases (https://go.dev/play/p/eK9pF8naT8q) - or indeed a divide-by-zero error or a nil pointer dereference.

I suppose a counter-argument is it might encourage more use of panic/recover in a try/catch kind of way, like this, but again, I don't suppose NaN map keys are used much.

There is another possible behaviour: m[k] = v silently does nothing if k != k (i.e. nothing is inserted into the map, and no panic). It gives roughly the same semantics as today: m[k] does not return v; delete(m, k) does nothing. The only difference is that you won't find k/v when iterating over the map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Incoming
Development

No branches or pull requests