Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: unboundedly expensive equality testing #62488

Open
seebs opened this issue Sep 6, 2023 · 4 comments
Open

runtime: unboundedly expensive equality testing #62488

seebs opened this issue Sep 6, 2023 · 4 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@seebs
Copy link
Contributor

seebs commented Sep 6, 2023

What version of Go are you using (go version)?

playground (1.21, i think?)

Does this issue reproduce with the latest release?

presumably

What operating system and processor architecture are you using (go env)?

N/A, should happen everywhere

What did you do?

https://go.dev/play/p/CcJuPywA867

summary: nest a handful of objects each of which is a struct{a, b, c, d any}, so that each points four times to the next. compare the "top" one. at N=16 or so, this begins taking longer to compare than the playground will let it run.

What did you expect to see?

I don't know.

What did you see instead?

Timeouts.

On the one hand, this is clearly You're Holding It Wrong. Don't Do That Then. Etcetera.

But I think in nearly all cases, I've been well-served by my belief that comparing go objects would have reasonably bounded runtime, because after all, anything that would cause them to explode or whatever would end up with pointer comparisons, and pointer comparisons come out equal-or-unequal quickly and easily. But interfaces provide a way to have a thing which sort of partakes in pointer-ish semantics, but doesn't actually fully copy things. If you weren't using interfaces for this, you'd actually have to populate all 4^16 items, or you'd have to use pointers (and then the comparisons would be fast). Interfaces let us bypass that perfectly good intuition and have copies of an arbitrarily-large nested tree that don't actually require copying it, but also don't enjoy the "we can compare these quickly" behavior of pointers. So each tier of this structure copies the next tier's four-interfaces header, but doesn't dereference the next tier's objects, so you can assemble the structure in time that's pretty much just linear with tree depth, but reading it is exponential.

On the one hand, I don't see a way to exploit this through user input to, say, encoding/json or anything like that. On the other hand, it feels like the way in which interface types are allowing us to, by copying a few words, create a thing that will act like it contains megabytes of data when you do == on it, is... unusual? Distinctive?

Obviously this is extremely susceptible to a small cache. On the other hand, the expense of building that cache for most comparisons would be punitive to every other go program in the world, pretty much. I don't have a good idea how to address this, or even whether it's worth addressing, but it sure is weird.

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Sep 6, 2023
@randall77
Copy link
Contributor

We do code-generate comparison functions, so we could add caches in just the cases that matter.
We'd only need a cache for types with >1 interface field.

It would be nice if we could detect that interfaces are equal if their data pointers are equal. I think that would mostly be correct, except for the pesky NaN. We could add a bit to types to mark them as reflexive (we already do that for map keys), and then data pointer equality implies interface equality. (Not sure if any of this paragraph would help OP's issue though, as different pointers doesn't imply values are different.)

@seebs
Copy link
Contributor Author

seebs commented Sep 6, 2023

Yeah, in this case, i think, the data pointers aren't equal? because i think each interface in a given struct is going to be a new copy of the next tier's struct, copied off and allocated and shoved into the heap for the T2I conversion. So we actually have four copies of each of these structs in memory, at different addresses, but which happen to have identical contents. Hmm. Although I guess we're not actually comparing those; we're never checking that x.a and y.b are equal, we're always comparing x.a to x.a, so they do in fact have identical pointers in that case.

I think that if you want to synthesize one of these that doesn't involve same-pointers, I think you end up needing to actually spend the space and time to build the data, at which point I think it's much more clearly Your Own Fault and you can reasonably be expected to anticipate that comparing two multi-gigabyte data structures is actually that expensive.

@seebs
Copy link
Contributor Author

seebs commented Sep 6, 2023

I've convinced myself this could actually happen by poking at the edges of "what if you used interface wrapping like this as a way to get quasi-immutable behavior for tree structures". consider type bintree struct { left, right any; val int }

you can make a binary tree of these, and later modifications to a node won't affect anything that got a copy of that node, and this is true recursively.

but say you do a slightly fancier thing, with type node interface{} type btree { first, last node children [8]node childCount int val int }

Now a node with a single child will in fact end up comparing that child three times if you compare it to itself, and so on... So I think you could actually hit this in code that someone might write for reasons other than just to cause problems on purpose.

@seebs
Copy link
Contributor Author

seebs commented Sep 7, 2023

So I've been wrestling with why this case confuses my intuition so much, and I think... In general, the cost of comparison closely mirrors the cost of copying, except that for header-like things, the cost of comparison can be unrelated to the cost of copying -- but it'll still be comparable to the cost of making the thing the first time. Nested interfaces allow you to make a thing which goes exponential. Only it's actually integral-of-exponential; it's 1 + b^1 + b^2 + [...] + b^N, rather than just b^N. But for cases where all the objects are different objects, or there's only one interface per level, b is effectively 1, so it's still just linear time comparison from a constant-time assignment.

So, when you create each of these objects, you are allocating four copies of the next-tier-down object... But each of those copies is only paying for the cost of copying the headers, but the resulting comparison will have to pay for the cost of comparing the things the headers point to. So we end up with four copies of each of the structs being made to store in interfaces, but end up with 4^N comparisons to run.

I can't come up with a way to get similar behavior except through interfaces. I looked briefly at gob after someone suggested it in gopherslack performance chat, and I think it might be possible to synthesize a similar case there, but also gob explicitly disclaims security when handling untrusted inputs. I can't find a way to trigger this from encoding/json.

@cherrymui cherrymui added this to the Unplanned milestone Sep 7, 2023
@cherrymui cherrymui added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
None yet
Development

No branches or pull requests

5 participants