New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: sync: Add UnlockToRLock() to RWMutex #38891
Comments
For the reference: pygolang settled on |
Over on #23513 (comment) I said:
Optimizing the "holding for writing and dropping to holding for reading" makes it sound like "holding for writing" is something that can be done efficiently, but my understanding is that on today's machines, if an RWMutex is held for writing with any appreciable frequency, you don't see performance wins beyond having a plain Mutex. The contention from the writers dominates everything else. If my understanding is correct, then this new API surface would add complexity without any performance benefit, and so we shouldn't add it. Does anyone have numbers to address this concern, one way or the other? |
@pjebs, you could get some of the semantic benefit of UnlockAndRLock() with TryRLock(). Any thoughts? |
@rsc, on my side it is not the performance in the first place, but semantic guarantee of not releasing the lock while downgrading it. If there is no such guarantee, the writer that done modification part of its job and no longer needs the lock to be exclusively held, needs to recheck the data for consistency after Unlock->Rlock because the data might have been changed by other mutators while the lock was not held. With atomic downgrade there is no such need. And contrary to upgrade, which is not possible to implement without deadlocks, atomic downgrade is well-defined operation with well-defined semantic. Leaving performance topic for others. |
@navytux The question that @rsc is asking is: if you need to downgrade the lock, would you get better performance overall by always using a |
There are other reasons to use a RWMutex, including to wait for all potentially-concurrent tasks to cease (and block new ones), e.g. before shutdown/upgrade. Such tasks acquire a long-lived read lock. If a situation implies a RWMutex, you should use that, even if there's no apparent performance issue. As the app evolves, or its workloads in the field increase, you avoid the need to replace a Mutex with a RWMutex. And you have more meaningful code. You might also be able to replace a collection of Mutexes covering separate resources with a single RWMutex covering them all, if most of the access to them is read-only. |
@networkimprov Those are nice examples, but do any of them require a |
I experimented with this concept here https://github.com/as/lock/blob/master/lock.go and it was called In practice, it had very little benefit for similar reasons that @rsc mentioned. |
My point is that performance is really not the correct focus of this discussion. @navytux makes a good point re the need to recheck the data for consistency after Unlock->RLock. That operation may not be lightweight. |
@networkimprov I disagree, the consistency can only be violated if you release the lock. Why do you release the lock and re-acquire the read lock? |
Performance really does come in, because the code can simply replace the |
The temptation that leads to these types of proposals is that there are many readers and few writers. The writer is finished writing in the critical section and wants to unblock the readers, but for some reason, needs read access in the critical section. However, there is usually a way to avoid reading in the critical section altogether. For example, a writer stores a value into a map, and then loads it and returns it. There is no reason to really load the element since the writer was the one who put it in there. In more complex cases, this is desired when there are multiple maps, but the result often has negligible performance benefits upon systematic benchmarking. I suspect this is because it is easy to underestimate the complexity of |
I think we should let @pjebs and @navytux explain their use cases before we claim that they're weak :-) I'm disappointed to see the semantic value of RWMutex brushed off. Switching to a Mutex for this situation may make the code "simpler" in a way, but ultimately less meaningful. This is also the third (at least) request for this feature, so maybe there's something to it. |
@networkimprov I found these, but the first one is talking about upgrading a RWMutex reader to writer. Do you have the issue for the third request? |
@ianlancetaylor thanks for clarifying the context of this question. In short my answer is "no" - RWMutex - also known as shared readers / exclusive writer lock - brings real performance benefit because it allows many readers to be run simultaneously instead of running only a single reader at a time as would be implied by regular sync.Mutex. Now let me try to explain: In the filesystem I write, both at server and at client sides, a particular database view is maintained for transactional data snapshot. There are readers of the data which need data snapshot to remain stable throughout their operations and tasks that update database view (e.g. as new transactions are being committed and database sends corresponding notifications). Here it is natural to use shared-readers/single-writer lock - aka sync.RWMutex - to organize synchronization in between readers and snapshot updater. However contrary to usual scenarios with plain mutex the lock is not short-lived because readers perform network round-trip while accessing the database under the lock for data snapshot to remain stable. I think in this case it goes without saying that using The above could work without atomic downgrade if the updater could work with only taking the lock for write and then completely releasing it when done. That is however not the case for me:
Organization of client locking: https://lab.nexedi.com/kirr/wendelin.core/blob/9f5f8bab/wcfs/client/wcfs.cpp#L117-184 Overall I think for a well-defined and well-understood operation it is better to put complexity into sync.RWMutex instead of exposing complexity to programmers and forcing them to solve this tasks by themselves many times instead of only once, properly and for everyone. For naming my rationale for Kirill |
#4026 asked for both downgrade & upgrade, and it was clarified that upgrade isn't possible without a failure mode. |
The performance argument is interesting to me, because on the one hand it seems like a (And, to be honest, most of the uses of |
@navytux, note that for “snapshot”-style uses in particular, an |
Bryan, how is that worse than a Mutex in the scenario you give, or are you saying that any Mutex is wrong for it?
For example? |
@networkimprov RCU |
I endorse this. I think we should avoid adding features to rwmutexes that make them more tempting to use than they already are. I like @nelhage's blog post as a reference for the problems with read-write locks. |
@bcmills, by "snapshot" I was meaning something different than just one value snapshot. In my case it is something like database connection with many objects obtained through it and exposed to the user. When updating we need to update both the connection and already exposed objects in consistent way. Maybe I'm missing something, but it is hard for me to see how The part of the writer that performs network round-trip: |
It is true that whether shared-lock is writer-priority, or reader-priority, or of some other kind, makes a difference. Even more: wcfs implements additional custom locking protocol on top of writer-priotity sync.RWMutex to avoid deadlocks due to potential locking cycle through page lock in OS kernel: https://lab.nexedi.com/kirr/wendelin.core/blob/7dbddf9d/wcfs/notes.txt#L186-205 That does not mean shared locks are useless. It is true that RWMutex locking patterns can be implemented with just channels (see e.g. here (2) for writer-preference version). It is also however true that just plain Does that mean that Mutex and/or RWMutex are useless and should not be there? |
So Caleb, Bryan, and "as" suggest that one must roll your own sync scheme with atomics when UnlockToRLock() is required, because a RWMutex isn't optimal in certain (possibly unrelated) scenarios? Are there any widely used, well-tested third-party packages with alternate sync schemes? Roll-your-own sync may not turn out well. Isn't safeguarding Go users against incorrect code a higher priority than directing them towards optimal code? (And isn't there a proverb about premature optimization? :-) |
@networkimprov, converting a write-lock to a read-lock is never “required”: you can keep holding the write-lock instead. Our argument is that if you need an optimization, |
@bcmills's last comment makes clear why I brought up performance: if there's not a performance win, then just keep holding the write lock. @navytux's scenario is fascinating. I am honestly a bit surprised that you've gotten as far as you have without wanting control over the read-write contention policy. I'd have expected that a database would care about exactly how to resolve those conflicts and not be at the mercy of whatever the RWMutex implementation does, especially since the RWLock is tuned for in-memory protection (with very different concerns than a database). Thanks @as for reporting back on past experience with this operation (and that it didn't help much). Overall, the general sentiment seems to be that we can do without adding this operation, except for @navytux's example. Am I reading that right? |
@rsc, thanks for feedback and kind words. Writer-preference shared/exclusive lock semantic fits into my need for contention policy, that why I'm using sync.RWMutex directly. While I understand that majority of the uses for What I'm trying to say is: if From what I've read above, my understanding is that it is very likely for my case to be declared "special", While certainly doable, I think that would be a pity overall outcome. Kirill /cc @nixprime, @dsymonds, @josharian |
@navytux, I think you're reading more into the phrase “critical section” than at least I intended. I interpret the “critical section” as a contiguous segment of the program's execution trace, not its source code. (That is: the “critical section” is the portion of the execution of the program during which the lock is held.) If you don't like that term, the same argument applies to “mutual exclusion”: as far as I can tell |
Yes, in my mind a "critical section" can begin in one goroutine and be finished in another goroutine. That's why sync.Mutex allows unlocking in a different goroutine. |
With the understanding that @navytux still disagrees, it does seem like the overall consensus otherwise is in favor of not adding UnlockToRLock. If someone came back with a clear performance win compared to Unlock+RLock then that might be worth revisiting. But for now, this seems like a likely decline. |
In my past life I use to be a barrister. I found arguing in court so emotionally draining that I became a Go developer instead. I guess I've come full circle: From what I read above, I see confusion and misunderstandings amongst the key proponents so I will attempt to elucidate why people's arguments are sliding past each other. But firstly I will restate below the scenario that my proposal was catering for. There was a mistake in my original formulation which I've now corrected above. ======== The scenario is an operation that has a well delineated "write" portion and "read" portion:
The objective is:
The alternative is:
======== PerformanceA lot of the discussion is related to the @rsc believes there may be no gain by implementing the proposal, hence it's not worth the team's limited resources implementing it and/or complicating the API surface. @ianlancetaylor believes there is definitely no gain, hence it should not be implemented. He advocates for Alternative #1 stated above. @bcmills believes the proposal is probably innocuous so there is no loss, but no practical gain either, hence it should not be implemented. I'm not aware of any empirical evidence either way for these assertions. My response I contend that this proposal does improve performance in terms of overall throughput. In an operation where there is a well delineated "write" and "read" portion, AND there are other (hypothetically many) independent operations that only want to acquire a "read", then quite clearly a write lock will block those independent read operations from running, when that need not be the case. This proposal allows those independent "read-only" operations from performing their task while the current operation is proceeding (and in the "read" portion). This desired functionality is the reason why this proposal has come up numerous times. Now @bcmills and @rsc are knee deep in low-level areas of Go that I and some of the other proponents have no understanding of. Perhaps they see something that we don't from a low-level implementation perspective. In @bcmills case, he assumes that the proposal won't work because the "read" locks from the independent read-operations will potentially be behind a higher-priority "write" lock so there will not be an increase in throughput. On the other hand, all the proponents of the proposal assume that those independent "read-only" operations can "jump" the queue (bypass the awaiting "write" lock) and thus acquire the "read" once the original operation calls That is another source of confusion. Labelling AngleThere is another angle to this proposal that @navytux and @networkimprov have highlighted. Irrespective of the "contention" performance issues, they believe that such a function is an important addition to From this perspective, it doesn't matter if But they would hope as a bonus that the ** My response** I believe that In that case then I would still be in favour of the proposal so that we can write correct code that uses ** Better ways to do it ** There is another argument made by some people. They claim that for the scenario listed, the ** My response ** I disagree. The The developer can instead be notified of how and when it should be used similar to the instructions/recommendations in the new sync pkg map. |
Implementation-wise, it is not difficult. However, we intentionally do not allow readers to jump because that can lead to writer starvation. (That is:
I don't think there would be any point in writing the application as if On the other hand, a read-preferring lock would be straightforward to implement as a third-party library. (I would not be at all surprised if one already exists somewhere in the open-source world.) And since mutexes should never be part of a package's exported API (concurrency should be an implementation detail!), it is easy to switch any given package to use a third-party alternative. You could argue, perhaps, that |
What about if the use of |
Personally, I would classify that as a “more sophisticated (read: complex and difficult-to-reason-about) heuristic”. |
(It also wouldn't help for readers starved before the first call to |
I still think implementing |
I can speak briefly to #23513. The implementation for this is available at as gvisor.dev/gvisor/pkg/sync.RWMutex if anyone would like to play with it. This lock is used to protect what are (in effect) page tables. The write lock is held to create/delete entries, and the read lock is held during any I/O using those mappings. This function is one of the key uses of The approach of using That said, the writer-priority behavior @bcmills noted is absolutely going to have a limiting factor on our performance. Today, this lock protects all memory mappings for a single process, so all I/O or memory operations in the process must take the same lock. In all likelihood, gVisor will likely eventually change the design to use different locks for small regions of the address space (the upstream Linux kernel is also considering this). Since that design would significantly reduce contention, |
I wouldn't take this as a given, for a number of reasons:
|
@prattmic, would you say that |
No, it is not "essential in early production releases". We went a long time without |
@pjebs, you are saying it will lead to an increase in throughput. Do you have concrete numbers? What we've been saying all along is that if there is a clear performance win then that would be good evidence for doing something. In the absence of any clear evidence for what to do, we typically hold back and don't make changes. Will leave as likely decline for another week. If it is declined next week and we get a clear performance rationale after that, we can always revisit. |
I think we heard two use cases, for gVvisor (where it improved performance) and Wendelin.core (where it's required for data coherence). We also heard the semantic (vs performance) value of RWMutex. Is that not sufficient evidence to address this? If not, what precisely is required? |
I'd be happy to reopen this discussion with specific numbers from gVisor or any other production app. We only have an anecdote, not production numbers we can anaylze (and in particular compare to a plain non-RW Mutex). For now, it seems there is no change in consensus here, so declined. |
There is a scenario that can't be done efficiently (or at all) from outside sync package.
This is with regards to RWMutex
If you have an operation that has a well delineated "write" portion and "read" portion:
The objective is:
My proposal is a function called
UnlockToRLock()
that is simple to implement from insidesync
package that covers the scenario above.The alternative is:
Accept a goroutine that wants to acquire lock interjecting between Step 3&4(the scenario assumes an operation where the "read" portion must follow the "write" portion, and it is not correct for another goroutine to potentially interfere before the "read")The text was updated successfully, but these errors were encountered: