Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: container: add weak reference maps #43615

Open
seebs opened this issue Jan 9, 2021 · 54 comments
Open

proposal: container: add weak reference maps #43615

seebs opened this issue Jan 9, 2021 · 54 comments

Comments

@seebs
Copy link
Contributor

seebs commented Jan 9, 2021

This is an outgrowth of #32779, and also of https://mdlayher.com/blog/unsafe-string-interning-in-go/ and ensuing discussion of this on Gopher Slack.

Occasionally, it would be desireable to have weak references. Unfortunately, you can't implement these as a user; you need hooks into the runtime to make a safe weak reference implementation, I think. The string interning hack described above probably works but is not complying with the uintptr conversion rules.

In nearly every case where weak references are desireable, it's desireable to be able to look up specific weak references, which suggests something similar to a map data type. It's also desireable to be able to find out whether the weak reference exists, or to be able to request the reference, getting an actual stable non-weak reference back, without a separation between the query and the retrieval.

Conveniently, a map with a pointer key type already provides exactly this behavior. What it doesn't provide is the potential for GC to remove objects.

Thus, my proposal:

cache := make(weak map[string]*Data)

This is equivalent to make(map[string]*Data), except that things which are in the map but not referenced anywhere else may become deleted from the map at any arbitrary time. However, retrieval from the map is guaranteed to either yield a nil (and a false in the ,ok form) or the pointer last stored for that key, and if it yields a non-nil pointer, that pointer is safe to use.

The internal implementation details aren't 100% obvious to me. You could nearly do it with SetFinalizer, except you need a way to ensure that this doesn't clash with any other finalizers used on the item. (For instance, the same value could be present in two weak maps...)

I'm not totally sure about the syntax, but I'm pretty sure that this wouldn't compile now under any possible circumstances, so it doesn't break any existing code.

Why not sync.WeakMap, parallel to sync.Map and/or sync.Pool? Because type safety is nice, and because the ability to use or not use the ", ok" form of retrieval seems desireable. Plain old maps are easier and friendlier to use, and less error-prone.

That said, sync.WeakMap, which would vaguely combine the behaviors of sync.Map and sync.Pool, would be viable too. The main issue this is intended to address is that "would like to keep this if it's convenient and/or if it's being used, but don't need to retain it if there's memory pressure" seems like a common use case.

@JAicewizard
Copy link

I assume when iterating over a map the map would be "locked" in some way? To avoid async reads/writes to the map.

@seebs
Copy link
Contributor Author

seebs commented Jan 10, 2021

I would assume it'd be the same as any other map. Up to you to ensure that you are not modifying it while the iterator is running.

@JAicewizard
Copy link

What about when the a weak reference gets removed by the runtime while iterating over the map?

@seebs
Copy link
Contributor Author

seebs commented Jan 10, 2021

Same thing that would happen with anything else involving pointers in a moving GC: It's up to the runtime to have suitable barriers on or around their accesses.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Jan 11, 2021
@ianlancetaylor
Copy link
Contributor

CC @bradfitz

@dsnet
Copy link
Member

dsnet commented Jan 11, 2021

Some thoughts:

  • A detriment of using a string as the key is that it probably requires an allocation forming a key to look something up that may not even exist anymore. I see some benefit to an allocation-free key like [16]byte or [32]byte.
  • How complete is a sync.WeakMap? Are entries allowed to be removed even if there may be references to it (i.e., similar to sync.Pool)?

@seebs
Copy link
Contributor Author

seebs commented Jan 11, 2021

You can use any key type you want. It's just a normal map. string is just an example.

No, entries can't be removed if there are references to them. If there's references to them, they still exist, and a weak reference should stay valid. The entire point is to have a reliable lookup for these objects if they still exist, without ensuring that they still exist.

@bcmills
Copy link
Contributor

bcmills commented Jan 11, 2021

@seebs, how would weak references interact with finalizers (if at all)?

Specifically: is the weak reference removed from the map before the finalizer is run, or after, or is it forbidden to obtain a weak reference to an object that has a finalizer?

@seebs
Copy link
Contributor Author

seebs commented Jan 11, 2021

That is a fascinating question. On consideration: If we assume that nothing else ever users finalizers, finalizers might be a credible way to implement this, internally. Runtime can skip the storage for a weak map when marking, and set a finalizer for each item to delete it from the map, roughly. So one solution would be "forbidden to set a finalizer on weak references, or obtain a weak reference to an object with a finalizer". Except... that's horribly brittle, and might result in someone developing a time machine so they can go back to before it was done and add "never use this feature" to Effective Go.

But this is a really interestingly-difficult philosophical question. It is permissible for a finalizer to actually do a thing that prevents a thing from being collected. So if we remove things from the map first, the object will potentially still exist but we will report it missing (and thus, in most use cases, create a new one). If we run finalizers first, we can have objects in the map which have had their finalizers run, and which can thus end up referenced again post-finalizer. And you can't tell, from the perspective of the caller of the finalizer, whether that finalizer may have stashed the object somewhere so that it won't really be collected.

And it might be possible to fix this if finalizers had a way to indicate whether or not they were vetoing collection, but that could nonetheless be wrong...

I think the answer is: Finalizers run after deletion from the map, but if your finalizer preserves the object, it becomes possible that the object exists but the map doesn't refer to it. So don't do that. But not in the sense of "runtime is allowed to explode in a shower of demons", just "yes, so, that previous object is around, and the map doesn't refer to it but might now have a new entry for that key."

But philosophically, in general, a finalizer is supposed to run "after the object is dead and never referenced again", and it should not do something that would make the object get referenced again, it's just after-the-fact cleanup, and it seems fine for the object to no longer be in the map, and then get cleaned up, even if a new object for its former key ends up getting created.

(Although if you did, say, "directory named after the key" and the finalizer was "delete the directory" that would be a Bad Idea, and also your own fault, don't do that.)

@ingvarm-gr
Copy link

From a purely philosophical view, is it proper to impose "at most one" on weak references to an object?

@rsc
Copy link
Contributor

rsc commented Jan 13, 2021

This is a major language and implementation change on the garbage collector side.
It is highly unlikely that we are going to add weak references any time soon.
That's an amazing amount of new complexity.

/cc @aclements @mknyszek

@seebs
Copy link
Contributor Author

seebs commented Jan 13, 2021

The "major change" thing is both why it's a hard sell to put it in runtime, and why it's impossible to put it somewhere else.

@aclements
Copy link
Member

This is something I've thought about quite a bit in the past. My past thinking has led to a bit of a counter-proposal: don't allow iteration over weak maps. Right now, the garbage collector is invisible in the language semantics (other than finalizers, which are an unfortunate wart at best). The language specification is built on the illusion of infinite memory. If you have a map that allows lookups and not iteration, we maintain that illusion because the GC can delete unreachable keys from that map without any effect on program execution. It remains invisible. An un-iterable map still provides most functionality that people typically want from a weak map, including string interning and carrying side information for objects without forcing them to remain live. There is also exactly how ECMAScript's WeakMap works.

I would argue that this should at least wait on generics. At that point, we could provide this as a regular type with all the type-safety of the built-in map. I'm not sure where it should live; it's not about synchronization, so it seems odd to put it in sync.

Implementing the above for pointer keys is actually pretty easy. We'd mark the key storage as non-pointer data, use allocation specials to track the set of weak maps containing an object as a key, and delete keys from those weak maps during sweeping. Disallowing iteration simplifies a lot of questions around races with the GC.

Implementing it for non-pointer keys is harder, but probably necessary to make it useful. I think the map would have to find the pointers in a key when it's added and attach specials to each of those allocations. During tracing, if the GC reaches any of these objects, it would have to trace across to everything else referenced by that key as well. Object tracing doesn't currently have to look at specials, so I'm not sure how to do this without a performance penalty to every object scanned during GC.

@seebs
Copy link
Contributor Author

seebs commented Jan 13, 2021

That's a fascinating idea. My vague intuition is there might be cases where you'd want to iterate the map to do something about the data, but in all those cases, I think you'd also want it to be non-weak at that point.

But I think "unreachable keys" feels like a slightly different question; I was assuming ordinary keys, which might be strings or integers, and potentially-unreachable values. So for instance, with a weak map[string]*Node, if someone has associated a given string with a *Node in the map, then we want any attempt to look up that same string -- even at a different address and constructed differently -- to necessarily find that *Node, if it is still referenced anywhere. But if nothing references that *Node except for this map, then we no longer care; there's no way for anyone to detect that a new node we return isn't the one we had for that string before, in theory.

I guess the answer is that either keys or values could in principle be weak references.

@josharian
Copy link
Contributor

Implementing it for non-pointer keys is harder

Could the implementation allocate for these and implicitly unbox when returning them? Then the GC could continue to deal only in pointers.

@ianlancetaylor
Copy link
Contributor

A weak map has two types: a key type and a value type. Only the value type is weak. That is, if m is a weak map, then m.Set(k, v) stores v with key k. If m is the only reference to v, then the garbage collector is permitted to release v, after which m.Get(k) will indicate that there is no such key. Moreover, if v is released, then k can be released.

My point is that with those semantics I think it's fine to restrict the value type of a weak map to be a pointer. It's hard to even understand what it means to store a non-pointer in a weak map. What does it mean to say that there are no remaining references to a non-pointer value? Isn't that equivalent to saying that there are no pointers to the value?

Are people thinking of different semantics?

@josharian
Copy link
Contributor

What does it mean to say that there are no remaining references to a non-pointer value? Isn't that equivalent to saying that there are no pointers to the value?

I was thinking of the value being of type (say) [2]*int. There are two semantics possible: (1) Each of the pointers is weak; they can be set to nil independently, when the int that they refer to is GC'd. (2) The entire object is GC'd or not; only when both ints referred to by the pointers are GC'd may both be set to nil.

That said, I'd be perfectly happy to restrict the value type of a weak map to be a pointer. I imagine it'd have the same restrictions as the first argument to runtime.SetFinalizer, which are even stronger than "must be a pointer".

@aclements
Copy link
Member

It was pointed out to me that my point about non-pointer keys was really unclear and also wrong. I don't mean pointer-free keys; I'm not even sure what that would mean in a weak-keyed map (probably it would just act like a normal map). I mean keys that contain pointers, like string or struct {a, b *int}. However, in my post above I was thinking this was hard because you couldn't remove the key until all of the pointers were unreachable, but you can delete the key as soon as any of the pointers are unreachable (since you can no longer construct that key). That makes non-pointer keys not much harder to support than pointer-typed keys.

But I think "unreachable keys" feels like a slightly different question; I was assuming ordinary keys, which might be strings or integers, and potentially-unreachable values.

Ah, that is quite different from what I was talking about. I much less on-board with a weak-valued map than a weak-keyed (non-interable) map because it exposes the behavior and timing of the garbage collector, exactly the way finalizers do. Also, I understand concrete use cases of weak-keyed maps, but I don't actually understand concrete uses of weak-valued maps. Your example of a string-to-*Node map was rather abstract. Can you give a concrete example of what you would use that for?

Are people thinking of different semantics?

I was definitely thinking of different semantics, but maybe I'm the only one.

@seebs
Copy link
Contributor Author

seebs commented Jan 13, 2021

Well, hmm. You can't GC the int pointed to by one of the pointers unless you set that pointer to nil, because otherwise you'd have an easy way to get an invalid pointer unexpectedly. I think my intuition is that as long as the [2]*int exists, it is in fact a valid reference and the things it points to aren't collectible, but if nothing refers to it, then it can be collected, at which point the things it points to are no longer referenced (by it, anyway). But that implies that the value type really ought to be *[2]*int, because otherwise it's not really meaningful to talk about collecting it.

I am definitely leaning towards Ian's interpretation; the keys are not "weak" in any way, they're just keys, if they contain pointers they are references to them. The only difference from a regular map is that, if the values are pointers, they don't count as references, but if the thing they point to gets GCd, the key/value pair is deleted from the map when that happens.

So, a concrete example of utility: Interning strings. You want to have a lot of objects which should contain a string, but you don't want to have multiple different allocations for the same string, and you want comparisons of them for sameness to be cheap. So...

type Intern string
var interned weak map[string]*Intern
var internMu sync.Mutex

// intern a string:
func intern(k string) *Intern {
  internMu.Lock()
  defer internMu.Unlock()
  if i, ok := interned[k]; ok { return i }
  internMe := Intern(k)
  interned[k] = &internMe
  return internMe
}

The result of calling intern(foo) for any given string is a *Intern, with the convenient guarantee that anyone else who interned the same string got, not just a pointer to an equal Intern, but exactly the same pointer. You can use these as map keys, for instance, and get single-word compares instead of string comparisons. If you have a million objects that were created using slices of different strings, and all the slices compared equal, you have a single string that all of them intern to, and the things those were sliced out of aren't being pinned. So you get much better performance than you would using the original string object(s).

So why not just use a normal map? Because if the set of strings is largeish, and changes over time, and sometimes a string will stop being used, a non-weak map[string]*Intern will eventually grow without bound. You can't safely remove anything from it because you can't prove that nothing's still using that value, and having two different *Intern that actually refer to identical strings would break using them as map keys, etcetera.

With the weak map, you can discard any values that are no longer referenced anywhere, and you still have the guarantee that no two users who try to intern the same string will ever see different interned values.

This can also be used for some kinds of caching. For instance, say you were doing a naive cache of fetched URLs. If you do it as weak map[url]*body, then if you fetch a URL, and dump it in the map, another fetch while that body is still around can just grab the copy in the weak-reference map, avoiding re-fetching the body that you already have, but if no one is using the body, it can be dropped silently from the cache. (I am aware that this wouldn't be a great choice for a lot of use cases, for a number of reasons. The first hard problem is called cache invalidation.)

So basically, the use case is: (1) it would make sense to have a map with these things as values, (2) we definitely don't want to throw them away if anyone's using them (3) it's fine to throw them away if no one's using them (4) it would be annoying to try to track who might be using them.

I note that "not iterable" seems fine to me for a weak-valued map as well. It's not really useful to iterate it, generally.

I'm not getting a clear picture of how a weak-keyed map would be used. Like, for interning, what would I be storing as keys? The addresses of strings can't be right, because the whole point is that I want to get the same thing from the map given two distinct strings which happen to compare equal.

@aclements
Copy link
Member

Okay, I think I understand your interning example (I certainly understand the code and intent; not sure I've completely wrapped my head around the implications :). And I think interning like this does require a structure that exposes GC behavior, since it has to distinguish whether objects still exist. So weak-keyed maps wouldn't work for interning.

But my basic concern with a weak-valued map is that its semantics are deeply tied to GC behavior. It can't be described without explaining what it means for things to become unreachable. Finalizers are the only other thing in the language like that right now, and those are an unfortunate and necessary evil. In my opinion, the bar for adding another is extremely high. (sync.Pool's semantics don't expose GC behavior. It can be described simply as a lossy set. GC behavior is key to Pool's performance characteristics, but that's true for a lot of things.)

Also, I believe weak-valued maps are ultimately no different from weak references, and I'm not sure why we would add weak-valued maps rather than just weak references. You can build a weak reference from a weak map using a single-entry map, and you can build a weak map from a weak reference by using weak references for the values (plus either finalizers or phantom reference queues, depending on the details of the weak reference API).

My other concern is implementation. First, a weak-valued map (or any form of weak reference) cracks open the assumptions Go's current write barrier is based on. It might be possible to work around that by synchronizing carefully with the garbage collector during the "Get" and "Put" operations, but I'd have to work out those details. It's particularly hairy if an unreachable value is actively being swept during a Get. Sweeping would probably have to lock the map. (Weak-keyed maps have none of these problems.)

Second, weak-valued maps have the problem I was (incorrectly) getting at above for values containing multiple pointers, like [2]*int. It doesn't make sense to collect the map entry until all of the pointers are unreachable simultaneously, but I don't know how to do that without slowing down the GC on everything. Maybe we would only allow pointer values, but does that restrict the utility of a weak-valued map? I don't know.

@seebs
Copy link
Contributor Author

seebs commented Jan 14, 2021

So, as a user, I mostly want a thing like this in a case where I don't care whether the object never existed or used to exist but got GCd, I just don't want to make an object if it already exists, but I don't want to pin a zillion objects that I might never use again. So, it sort of exposes a GC behavior, but in a very loose sense; nothing prevents the implementation from continuing to hold things for a long time. (Strictly speaking, existing maps already satisfy the expected usage behavior, in that they meet the constraint that the retrieved pointer is valid as long as the object hasn't been collected, it's just that it always turns out that the object hasn't been collected because the map counts as a reference to it. But the GC would be allowed to never get around to collecting objects anyway...)

The reason I wrote this proposal with a map rather than with references is sort of glossed over in the initial proposal. Basically:

(1) I can't think of a real use case where I want a weak-reference, and I don't want to be able to look it up by some kind of key.
(2) Weak references are naturally well suited to the , ok idiom of maps.

And then, thinking about it, it occurred to me that a lot of the things that make weak references seem like an annoying mess to track are potentially significantly mitigated by grouping them all together. If all the weak references are being kept in one object, then we have one object/region that needs special treatment. If you have a million structs each of which has a weak reference to something, you have to do whatever your special processing is for every one of those weak references, separately from whatever you do for pointers in those structs. If you have a weak map, you can do your special processing for everything in the map and nothing not in the map, and that seems like it should be simpler.

I think a weak-valued map only makes sense if the values in it are themselves pointers to things. Maps aren't addressable; nothing can be pointing-to the map entry, so the map entry can't itself be collectable or pinned. So I think it's only meaningful for a map to be weak when the value type is a pointer type itself.

My vague intuition, which I haven't fully thought through, is that if a weak-valued map didn't mark the things it contains, but did follow pointers within them and mark those, and it were prohibited to use SetFinalizer on things that live in a weak-valued map, and things lived in only one such map, it would be adequate for the runtime to SetFinalizer() on them to delete them from the map. Except, as you note, the sync issue.

I'm still not quite able to make sense of a weak-keyed map. I can describe it but I'm not sure what I'd use it for. I guess it would be a more implicit usage. With the weak-valued map, I have a key and I want to know whether there's an associated value. With the weak-keyed map, if I have a key, I know the value can't have been deleted.

@josharian
Copy link
Contributor

I can't think of a real use case where I want a weak-reference, and I don't want to be able to look it up by some kind of key.

“Looking up by key” and “stored in a field alongside other data” often end up being interchangeable. Imagine an expensive immutable cached result. You store it with a weak ref alongside the raw data and return it. If no one is using it any more, it returns to nil, and you recalculate again later.

At the end of this conversation, I don’t see the advantage that maps have over plain old weak refs, except possibly in syntax. But with generics, we could have a runtime.WeakRef[T] type.

@ianlancetaylor
Copy link
Contributor

Weak references are basically isomorphic to finalizers. It is possible to implement weak references using finalizers, it's just that the result is awkward to use because you can't use ordinary pointers. So if you don't like finalizers, you won't like weak references.

@josharian
Copy link
Contributor

It is possible to implement weak references using finalizers, it's just that the result is awkward to use because you can't use ordinary pointers.

That was not my experience. Please see the discussion at https://github.com/go4org/intern/blob/3eb7198706b254cddea46d066da7316b2620ba7a/intern.go#L169. I’d love to be wrong about this, but the conclusion I came to was that weak refs are not implementable using finalizers.

@josharian
Copy link
Contributor

@bcmills made a similar point on Gopher Slack, so cc’ing him here.

@ianlancetaylor
Copy link
Contributor

@josharian I sketched it out at #41303 (comment). The problem is that everybody has to use Strong pointers, they can't use ordinary pointers. And they have to be careful about using runtime.KeepAlive.

@bcmills
Copy link
Contributor

bcmills commented Jan 15, 2021

I don't see why that would be the case for a native, runtime-supported implementation of weak references in Go either.

If you implement weak references in terms of finalizers, then it is certainly true that they share all the same drawbacks. However, that is a conclusion that follows from the assumption that the two are equivalent — it is not evidence for their intrinsic equivalence.

@ianlancetaylor
Copy link
Contributor

In a garbage collected language that doesn't use reference counting, what can you do with weak references that you can't do with finalizers? And vice-versa?

@bcmills
Copy link
Contributor

bcmills commented Jan 15, 2021

With weak references but not finalizers, you can have structs with weakly-referenced fields without rewriting all transitive references to the values to use a “strong reference” wrapper type.

With weak references but not finalizers, you can implement interning tables with weak values, but you have to manually prune the stale keys as values are collected.

With weak references plus finalizers, you can implement interning tables with weak values with automatic pruning of stale keys.

That is not to say that weak references are essential, but they are not redundant with finalizers.

@dsnet
Copy link
Member

dsnet commented Jan 15, 2021

Maybe I'm lost in the discussion, but what's the exact relationship between "weak references versus finalizers" and what the issue was originally about, which is weak references maps? Are we suggesting we should provide weak references instead of weak reference maps or are we talking about how a weak reference map would be implemented?

It seems that that weak references alone is insufficient since a key feature of a weak reference map is that the garbage collector takes care of deleting entries that are dead. I'm not sure I know how to implement a weak reference map with weak references alone.

@bcmills
Copy link
Contributor

bcmills commented Jan 15, 2021

@dsnet

Given that Go already has finalizers, I claim that we could enable users to implement weak-reference maps by adding only ordinary weak references to the language.

That seems simpler to me, because the semantics of ordinary weak references seem straightforward (just the invariants described in #43615 (comment)), in comparison to the confusion and ambiguity of “weak keys” vs. “weak values”.

@ianlancetaylor
Copy link
Contributor

I don't think this addresses the point I was trying to make but as a practical matter I think it's unimportant.

@aclements
Copy link
Member

For example, cyclical references don't work with finalizers, and they won't work with weak references either.

@ianlancetaylor, I don't believe this is correct. Cycles through a weak reference can be collected. (There are other forms of finalizers that can also deal with cycles, such as phantoms.)

Maybe I'm lost in the discussion, but what's the exact relationship between "weak references versus finalizers" and what the issue was originally about, which is weak references maps?

@dsnet, that was partly my fault. I brought up weak references vs weak maps here.

There are many variations on weak references. One variation is in how you tell an application that the weak reference has been collected. The runtime can just nil out the reference and leave the application to discover that later, or it can provide a callback or a "reference queue" that lets the user do finalization-style things. Usually when used with weak references, the callback or queue doesn't provide the original object, but some other bit of information you associated with it, for example, the map and key it was stored under. Doing this with weak references and callbacks/queues is nice because it's clear what happens when there are multiple weak references. While there's nothing impossible about having multiple finalizers for an object, we disallow it because it gets messy quickly (what order do they run in? if one of them revives the object, do you run the rest? how do you remove a finalizer?).

One reasons Go finalizers are so tricky in the runtime is that they provide the original object to the callback, which means the runtime has to revive the object just to run the callback. This is what makes objects with finalizers in cycles un-collectable, delays reclaiming of memory for finalized objects, and means that a chain of N objects with finalizers takes N GC cycles to clean up. If weak references had a callback, but didn't provide the original object, they don't have these problems.

@seebs
Copy link
Contributor Author

seebs commented Jan 15, 2021

Part of the appeal, to me, of the map is that you don't have to think about a specific reference. If you have a reference, it's a strong reference. The weak-reference nature is restricted to grabbing things from a map, and that already has the , ok behavior.

@RLH
Copy link
Contributor

RLH commented Jan 17, 2021 via email

@dsnet
Copy link
Member

dsnet commented Mar 29, 2021

I was looking into a memory leak that stemmed from dynamically created types not being garbage collected (#28783). In that issue, @randall77 comments:

Yep. This is a hard one to fix.
We depend on type uniqueness, so we need to keep track of all the types we've currently made.
To GC the type, we need some way of telling reflect we've done so, so it can remove that type from its list of created types. Probably something akin to Java's weak references.
All the interface tables are allocated in persistent memory, not on the heap, so that would need to change also.
The compiler has a bunch of assumptions about types and interface tables never being GCd (e.g. the first word of an interface isn't a pointer that GC needs to traverse), that would need finding and fixing also.

I don't know much about the implementation of reflect, but there might be overlap in fixing #28783 with this issue.

Even if dynamically constructed types could be garbage collected, we still have the problem where some libraries (e.g., encoding/json, google.golang.org/protobuf, and others) have a global sync.Map that acts like a cache keyed on reflect.Type to some expensively computed data structure. The use of a global sync.Map or even a map[reflect.Type]V would keep dynamically created types alive forever (even if #28783 were fixed) unless there was some weak reference map that allowed for the removal of certain reflect.Type entries when they are GC'd.

@seebs
Copy link
Contributor Author

seebs commented Oct 26, 2021

I ran into another example of a case where I want a thing that behaves like this, only in this case, I think I actually want the keys to be the weak references. I have a set of test hooks that basically allow tracking open/close pairs (or any equivalent "must be exactly paired" operation, really) on arbitrary objects. So, when you mark an object open, I grab a stack trace, and when you mark it closed, I grab a stack trace.

If you double-close it, I can report where it was created, when it was originally closed, and when it got double-closed.
If you forget to close it, I can report where it was created, and that it wasn't closed.

So I have a map, keyed by *T for some type, of the data about when that pointer was allocated-and-opened, and when it was closed.

But... That also means that, for the entire life of the program, I'm keeping all of these objects alive in an enormous map.

And I have to keep them, forever, in case of a double-close. The only way it would be "safe" (for purposes of this functionality) not to keep an item in the map would be if there were no other references to it.

But if I could have the keys of the map be weak references, such that if they are the only reference they are deleted from the map, that would actually work perfectly. If nothing has any references to an object anymore, I don't need to worry about one of those references being used for a double-close, so it's fine if that entry goes away. At the end of whatever operation I'm testing, I can scan the map for things. I still have to check whether or not they've been marked as closed, because they might be closed but not yet garbage-collected, but all the ones that got garbage collected go away.

So I think I'm now sold on both "keys are themselves weak references" and "values are weak references, and if the value gets collected, the key stops referring to it", or something.

@coder543
Copy link

@seebs couldn't you set a finalizer on each instance of *T that would remove itself from a map[uintptr]V? You could convert from *T to uintptr for keying on the map, and this would avoid having the map itself keep *T alive.

@ianlancetaylor
Copy link
Contributor

If I understand the suggestion correctly, the conversion to uintptr would violate the unsafe.Pointer rules (https://pkg.go.dev/unsafe#Pointer) and would break if Go ever introduces a moving garbage collector.

@coder543
Copy link

Ah, that is an unfortunate point.

@coder543
Copy link

coder543 commented Oct 29, 2021

I'm working with an application where some medium-size strings (dozens of characters) are often individually repeated tens of times across a graph structure that is hundreds of megabytes in size. The significance of possible memory savings from string interning for this specific use-case have had me reviewing the options yet again today, but I have been unable to find an appealing option.

The go4.org/intern library referenced by Tailscale is probably among the best options for this, but I hate to depend on something that is actively thwarting the garbage collector's best efforts, and could theoretically break in a future Go release. I really, really wish that Go had a built in weak map, so I could simply use a weak map[string]string to perform optimal string interning on these specific string values.

@josharian
Copy link
Contributor

@coder543 have you played with github.com/josharian/intern?

@coder543
Copy link

@josharian thanks, it looks interesting, I think I'll have to give it a closer look soon. Based on a quick review of the (very small!) code, it looks like it won't provide optimal string interning, but it'll probably be a lot better than the nothing that I'm currently using.

@xeus2001
Copy link

This is a really interesting problem, I implemented my own version of a weak-reference using go 1.18 and will try to do the same for a weak map, but theoretically you could use this already for a weak map. I could as well change the implementation so that I do not "reattach" the finalizer, but let the reference die, but this way it works better, at least with the current GC. In any case, interesting topic.

@navytux
Copy link
Contributor

navytux commented Apr 7, 2022

I implemented my own version of a weak-reference using go 1.18

@xeus2001, for the reference, you might be interested to check how GC can crash when a pointer/interface is recreated from uintptr even if pointed object is known to be still alive and referenced via another regular pointer: #41303 (comment).

@MagicalTux
Copy link

I also have my own weakref implementation for go1.18 that seems to be doing fine, based on @xeus2001 's code but with a map object too.

@navytux I've looked at the #41303 issue and it looks like to be an issue with using direct rematerialization of pointer through transforming uintptr[2] into interface{} which is likely not a good thing. Using unsafe.Pointer to store the object directly seems a better way to do it, and go1.18's generics makes the whole process a lot nicer.

I've done some load testing and so far I haven't had any crash, even with GOGC=0.

@rsc rsc changed the title proposal: weak reference maps proposal: container: add weak reference maps Aug 10, 2022
@xeus2001
Copy link

xeus2001 commented May 31, 2023

I implemented my own version of a weak-reference using go 1.18

@xeus2001, for the reference, you might be interested to check how GC can crash when a pointer/interface is recreated from uintptr even if pointed object is known to be still alive and referenced via another regular pointer: #41303 (comment).

@MagicalTux : I added a comment to the linked issue why I think that the implementation is buggy and what has to be done to fix it (adding a load barrier). However, you could directly try out my version, which uses two atomics where memory ordering is guaranteed (when I get the specs correct).

@ammario
Copy link

ammario commented Mar 17, 2024

Tried my hand at a weakmap implementation here that uses a single persistent finalizer per non-empty map to automatically evict old entries based on memory pressure. The approach may solve the weak map for memory caching use case but not others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests