Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime,iter: the interaction of Pull with runtime.LockOSThread is currently buggy, and generally not well-defined #65889

Open
mknyszek opened this issue Feb 22, 2024 · 53 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. release-blocker
Milestone

Comments

@mknyszek
Copy link
Contributor

Problem

The current implementation of iter.Pull in the rangefunc experiment creates a new goroutine and uses an optimized goroutine switch path during iteration.

The problem with this implementation, as it stands, is that it interacts poorly with runtime.LockOSThread. If the goroutine spawned for the iterator (let's call it the iterator goroutine) calls runtime.LockOSThread, first and foremost, that call will not be respected. For example, if the next function is passed to a different goroutine, the runtime will not schedule the iterator goroutine on the thread it was locked to. Bad things might happen. Simultaneously, even in the easy case, if the iterator goroutine always switches with the same calling goroutine on the same thread, then once that iterator goroutine exists, it will take down the current thread with it. This is expected, but this happens before the calling goroutine gets placed back into the scheduler's ownership. The calling goroutine instead stays blocked forever with no hope of executing it again.

One possible solution

The most straightforward fix to this is to fall back to a slower kind of switching of goroutines if either the iterator goroutine or the calling goroutine locks itself to a thread. This slower kind of switching would probably be done with channels, and would have semantics that are equivalent to the existing implementation of iter.Pull. Because at least one of the goroutines is locked to a thread, the overhead of OS context switching also comes into play, so this implementation would likely be dramatically slower.

This fix, then, has a really unfortunate performance cliff. One could imagine that a UI library (for example) locks a goroutine to a thread because it needs to interact with C libraries that require staying on the main thread. Using iter.Pull on such a goroutine would be a very, very bad idea if this workaround was adopted. The resulting performance regression would be very surprising.

The proposed solution

Discussing with @aclements, @dr2chase, and @cherrymui, we have an alternative fix that avoids the performance cliff. Specifically, we would restrict iterators from being able to call runtime.LockOSThread by panicking if called from an iterator goroutine (the runtime handles the iterator goroutine's creation, and thus can enforce this). Meanwhile, the calling goroutine (the caller of next, that is) is fully free to be locked to an thread.

If the iterator is prevented from locking itself to a thread from the iterator goroutine, then it's totally fine for that iterator goroutine to just execute on the locked calling goroutine's thread. That iterator goroutine effectively gets locked to the same thread while its executing, and we know that it will eventually switch back to the calling goroutine. It's fine for the calling goroutine to send next to another goroutine as well, even if that other goroutine is locked, because it can only execute once the iterator goroutine has ceded control of the thread (by the semantics and implementation of iter.Pull).

One hazard of this approach is that the iterator, running on the iterator goroutine, may care about thread-local state, and threads locked to goroutines tend to be locked precisely because there is some important thread-local state that needs to be preserved. However, if the iterator goroutine really cared about thread-local state, it should call runtime.LockOSThread, which in this case, will panic. That does mean that the iterator is slightly restricted in what it can do, but we feel that it's better for the code author to find that out rather than deal with some subtle issue related to runtime.LockOSThread and local thread state not being what they expect.

Furthermore, we think it's very unlikely that an iterator will want to lock itself to a thread anyway, so the extra restriction is unlikely to matter to the vast majority of Go users. Simultaneously, making this choice lets us keep iter.Pull fast for many more (more realistic) use cases, lets us avoid the pain of maintaining two implementations of iter.Pull, and prevents surprising performance cliffs for users of iter.Pull.

@mknyszek mknyszek added this to the Go1.23 milestone Feb 22, 2024
@mknyszek mknyszek added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. compiler/runtime Issues related to the Go compiler and/or runtime. labels Feb 22, 2024
@mknyszek mknyszek changed the title iter: the interaction of Pull with runtime.LockOSThread is currently buggy, and generally not well-defined runtime,iter: the interaction of Pull with runtime.LockOSThread is currently buggy, and generally not well-defined Feb 22, 2024
@mknyszek
Copy link
Contributor Author

CC @rsc

@thediveo
Copy link

just to get this right, as I'm often working on system-related Go code dealing with Linux kernel namespaces that needs Os-level thread locking: the caller creating the iterator and asking to iterate needs to lock the thread, as the iterator itself cannot, correct?

Is there a way for the iterator to ensure its caller has locked the thread?

@prattmic
Copy link
Member

Specifically, we would restrict iterators from being able to call runtime.LockOSThread by panicking if called from an iterator goroutine

I worry this is too restrictive, as some libraries call LockOSThread temporarily as part of normal operation. Being unable to use these libraries seems rather strange.

A few examples I have found:

Admittedly I haven't found a ton of cases like this, but from a user perspective it seems like a very orthogonal limitation to for loop iterators.

@mknyszek
Copy link
Contributor Author

just to get this right, as I'm often working on system-related Go code dealing with Linux kernel namespaces that needs Os-level thread locking: the caller creating the iterator and asking to iterate needs to lock the thread, as the iterator itself cannot, correct?

Correct. The caller creating the iterator and any caller of next or stop is allowed to be locked to the thread. Just the iter.Seq passed to iter.Pull would not be allowed.

Is there a way for the iterator to ensure its caller has locked the thread?

I'm not sure I understand. The iter.Seq implementation is executed on a separate goroutine when passed to iter.Pull. Because it's a different goroutine altogether, there's already no way to ensure anything about the calling goroutine's state. The iter.Seq implementation would have to ensure it locked to an OS thread itself, if it needed (and if it was allowed, which the proposed fix does not allow).

I worry this is too restrictive, as some libraries call LockOSThread temporarily as part of normal operation. Being unable to use these libraries seems rather strange.

That's a good point. You might not be in control of, or be aware of, what the library is doing. I think the glog example is especially interesting, though I'm not sure the git2go example makes a lot of sense to me (calling into C already temporarily binds the goroutine to the thread, and that will continue to work).

We could also punt the question, pick a more restrictive implementation to begin with, and see what happens in practice. We can always backwards-compatibly lift the restriction in the future.

Thinking out loud, could we have a slow implementation, and only fall back to it if only the iterator goroutine locks to the thread? That might be OK? However, we then have a bit of a "worst of both worlds" situation. Not only do you have a performance cliff (it's just in a different place now), you still have the pitfall wherein the iterator goroutine (not locked) may see thread state the code author didn't expect it to.

@thediveo
Copy link

slow would be better than panic.

@prattmic
Copy link
Member

I'm not sure the git2go example makes a lot of sense to me (calling into C already temporarily binds the goroutine to the thread, and that will continue to work).

I was confused as well initially. The problem here is that MakeGitError calls C.git_last_error(), which presumably uses something thread-local similar to errno.

	ecode := C.git_blame_options_init(&opts, C.GIT_BLAME_OPTIONS_VERSION)
	if ecode < 0 {
		return BlameOptions{}, MakeGitError(ecode)
	}

@chad-bekmezian-snap
Copy link

Imo, it's better to start with something more restrictive (such as panicking for now), and then lift some of those restrictions over time as issues arise and solutions appear more clear

@eliasnaur
Copy link
Contributor

Very interesting. I believe pull iterators may well alleviate the need for #64777 (and #64755), but I ran into issues with runtime.LockOSThread and Cgo callbacks. I documented my findings in #65946 in case they're not a duplicate of this issue.

@mknyszek
Copy link
Contributor Author

mknyszek commented Feb 27, 2024

Myself, @aclements, @cherrymui, and @dr2chase discussed this offline a little bit more, and if hypothetically panics are off the table, I think we have an idea of what the implementation would have to look like.

If the calling goroutine is locked but the iterator goroutine isn't, that iterator goroutine can share the thread with the calling goroutine. If next moves to a different goroutine on a different thread, the iterator might run there, and that's fine. If the iterator really needs to read thread-local state, it must call runtime.LockOSThread. This case is still fast.

If the iterator goroutine calls runtime.LockOSThread, it immediately sets a flag in the coro implementation to fall back on a channel-based implementation. It then wakens the calling goroutine, which must check whether that flag is set. (This can be checked non-atomically, because the calling goroutine must be blocked when this happens.) The iterator goroutine then calls into the scheduler. Upon executing later, it locks itself to whatever thread it ends up running on. This ensures that both the calling goroutine and iterator goroutine, if both locked, are guaranteed to be locked to different threads.

Even if the calling goroutine is not locked to a thread, for simplicity, we'll always fall back to the channel-based implementation.

In short: the iterator goroutine can share threads with thread-locked goroutines, but the calling goroutine cannot.

The reason for this asymmetry is that you can pass the iterator goroutine's context to another thread, and it would be difficult to support a calling goroutine being able to share a thread with the iterator goroutine when the iterator goroutine is locked specifically. (The calling goroutine would have to be able to force itself to switch threads to the iterator goroutine's thread. This might require a substantial amount of work in the scheduler.)

We suspect the iterator wanting to lock to the thread to be less common in general than the calling goroutine being locked to a thread. There's still a performance pitfall here, but it hopefully shouldn't show up most of the time. That is, if a thread-locked goroutine is just iterating over, say, just some straight-forward all-in-Go data structures with iter.Pull (say, iterating over a pair of trees or something), that will still be fast.

@eliasnaur
Copy link
Contributor

What happens when the iterator goroutine is switched? Will this program successfully hand over the OS main thread to a different goroutine?

package main

import (
	"fmt"
	"iter"
	"runtime"
)

func init() {
	// Lock main thread to main goroutine.
	runtime.LockOSThread()
}

func main() {
	next, _ := iter.Pull(switchThread)
	// Pass thread to iterator, and get back a different
	// thread.
	next()

	// Continue on a different, non-main thread.
	fmt.Println("successfully passed thread")
}

func switchThread(yield func(struct{}) bool) {
	// Unblock the waiting pull iterator with different
	// thread.
	go yield(struct{}{})

	// Perform main-thread work.
	fmt.Println("received thread")
}

If so, #64777 and #64755 can be closed.

@aarzilli
Copy link
Contributor

My understanding is that this is what would happen

@eliasnaur
Copy link
Contributor

eliasnaur commented Mar 3, 2024

My understanding is that this is what would happen

I don't understand how main can get the main thread back. switchThread is called on the locked main thread and doesn't give it up. The original proposal states:

That iterator goroutine effectively gets locked to the same thread while its executing

But to your argument the proposal also states

and we know that it will eventually switch back to the calling goroutine.

which is not necessary true with the go yield() hack.

@aarzilli
Copy link
Contributor

aarzilli commented Mar 3, 2024

I don't understand how main can get the main thread back. switchThread is called on the locked main thread and doesn't give it up. The original proposal states

I was talking about @mknyszek's comment that immediately precedes yours.

@eliasnaur
Copy link
Contributor

And of course, as long as the iterator goroutine runs on the locked main thread, I can force the runtime to let me keep it:

package main

import (
	"fmt"
	"iter"
	"runtime"
)

/*
static void main_thread_api(void) {
    switchBack();
    call_main_thread_api_that_never_returns();
}
*/
import "C"

func init() {
	runtime.LockOSThread()
}

func main() {
	next, _ := iter.Pull(switchThread)
        // Runs on the main thread.
	next()
        // Continues on some other thread.
	fmt.Println("successfully passed thread")
}

//export switchBack
func switchBack() {
        // Runs in a Cgo callback which effectively locks the main
        // thread so the runtime can't give it back through the call
        // to yield.
	go yieldMain(struct{}{})
}

var yield func(struct{}) bool

func switchThread(yield func(struct{}) bool) {
	yieldMain = yield
	fmt.Println("received main thread, running main thread API")
	C.main_thread_api()
}

@eliasnaur
Copy link
Contributor

I don't understand how main can get the main thread back. switchThread is called on the locked main thread and doesn't give it up. The original proposal states

I was talking about @mknyszek's comment that immediately precedes yours.

Which states:

If the calling goroutine is locked but the iterator goroutine isn't, that iterator goroutine can share the thread with the calling goroutine.

which to me seems to support my use-case: my calling goroutine is locked to the main thread, the iterator goroutine isn't, so they can (will?) share threads.

@aarzilli
Copy link
Contributor

aarzilli commented Mar 3, 2024

I don't understand how main can get the main thread back. switchThread is called on the locked main thread and doesn't give it up. The original proposal states

I was talking about @mknyszek's comment that immediately precedes yours.

Which states:

If the calling goroutine is locked but the iterator goroutine isn't, that iterator goroutine can share the thread with the calling goroutine.

which to me seems to support my use-case: my calling goroutine is locked to the main thread, the iterator goroutine isn't, so they can (will?) share threads.

My understanding is that the iterator goroutine can run on the locked thread but it isn't locked to it. The scheduler can still migrate it to a different goroutine if it wants to. So you don't have any guarantees as to which thread will execute any of the instructions in switchThread.

It would be weird if it worked the way you are proposing, it would mean that in general people writing iterators shouldn't pass yield to a different goroutine, because doing so could result in removing a thread lock.

To be honest, I'm now a bit worried about this thread sharing, @mknyszek: what happens if the body of switchThread calls something that blocks the thread (some blocking cgo code or a syscall)? Couldn't that potentially lead to a deadlock (in theory the main thread should be able to run because yield was called, but it cannot because its locked thread is unavailable forever).

@thediveo
Copy link

thediveo commented Mar 3, 2024

yield confusion

@eliasnaur
Copy link
Contributor

I don't understand how main can get the main thread back. switchThread is called on the locked main thread and doesn't give it up. The original proposal states

I was talking about @mknyszek's comment that immediately precedes yours.

Which states:

If the calling goroutine is locked but the iterator goroutine isn't, that iterator goroutine can share the thread with the calling goroutine.

which to me seems to support my use-case: my calling goroutine is locked to the main thread, the iterator goroutine isn't, so they can (will?) share threads.

My understanding is that the iterator goroutine can run on the locked thread but it isn't locked to it. The scheduler can still migrate it to a different goroutine if it wants to. So you don't have any guarantees as to which thread will execute any of the instructions in switchThread.

It would be weird if it worked the way you are proposing, it would mean that in general people writing iterators shouldn't pass yield to a different goroutine, because doing so could result in removing a thread lock.

I see. Your analysis makes sense, but then I'm surprised that the iterator does not run on the same thread as the caller, regardless of iter.Pull. In particular:

package main

import "runtime"

func main() {
	runtime.LockOSThread()

	it := func(yield func() bool) {
		// Surprisingly, this body of code may run on
		// a completely different thread.
	}
	for range it {
	}
}

@aarzilli
Copy link
Contributor

aarzilli commented Mar 4, 2024

I see. Your analysis makes sense, but then I'm surprised that the iterator does not run on the same thread as the caller, regardless of iter.Pull

No, iter.Pull would be what makes the difference, IIUC. If you are using for range f everything will run on the same goroutine (and therefore thread if LockOSThread is used).

@mknyszek
Copy link
Contributor Author

mknyszek commented Mar 4, 2024

As @aarzilli says, this does not affect for range f. You'd have to explicitly pass the function next returned by iter.Pull to a different goroutine.

To be honest, I'm now a bit worried about this thread sharing, @mknyszek: what happens if the body of switchThread calls something that blocks the thread (some blocking cgo code or a syscall)? Couldn't that potentially lead to a deadlock (in theory the main thread should be able to run because yield was called, but it cannot because its locked thread is unavailable forever).

That example is breaking my brain. I need to time to digest that.

It may well be that having the iterator call into cgo or a syscall means we need to treat it the same as calling LockOSThread, meaning it'd be kicked to a different thread immediately.

@eliasnaur
Copy link
Contributor

As @aarzilli says, this does not affect for range f. You'd have to explicitly pass the function next returned by iter.Pull to a different goroutine.

Are you saying that iter.Pull behaves differently than for range f? And only some of the time, in an indeterministic way? In particular:

package main

import "runtime"
import "iter"

func main() {
	runtime.LockOSThread()

	it := func(yield func(struct{}) bool) {
		// Always on the main thread if called in a
		// `for range f`, (most of?) the time on the
		// main thread if called by `iter.Pull`.
	}

	for _ = range it {
	}

	next, _ := iter.Pull(it)
	for {
		next()
	}

	// Bonus question: what happens when
	// yield is called from another goroutine?
	it2 := func(yield func(struct{}) bool) {
		go yield(struct{}{})
	}
	for _ = range it2 {

	}
}

To be honest, I'm now a bit worried about this thread sharing, @mknyszek: what happens if the body of switchThread calls something that blocks the thread (some blocking cgo code or a syscall)? Couldn't that potentially lead to a deadlock (in theory the main thread should be able to run because yield was called, but it cannot because its locked thread is unavailable forever).

That example is breaking my brain. I need to time to digest that.

It may well be that having the iterator call into cgo or a syscall means we need to treat it the same as calling LockOSThread, meaning it'd be kicked to a different thread immediately.

Yeah, considering cgo/syscalls as implicitly calling LockOSThread for the duration of their calls seems consistent.

@mknyszek
Copy link
Contributor Author

mknyszek commented Mar 4, 2024

Are you saying that iter.Pull behaves differently than for range f? And only some of the time, in an indeterministic way?

iter.Pull is defined as creating a new goroutine to run the iterator, so I'm not surprised it behaves differently. for range f meanwhile is defined as iterating on the same goroutine: it's just a bunch of function calls. iter.Pull has the same semantics as passing values back and forth between goroutines across a channel (approximately, it's really more of an unbuffered channel that is used to synchronized a shared memory location, but close enough). It's just very tightly optimized, meaning that goroutines pass time between each other directly instead of going through actual channels.

@eliasnaur
Copy link
Contributor

Are you saying that iter.Pull behaves differently than for range f? And only some of the time, in an indeterministic way?

iter.Pull is defined as creating a new goroutine to run the iterator

FWIW, the documentation doesn't mention goroutines:

 % GOEXPERIMENT=rangefunc go doc iter.Pull
package iter // import "iter"

func Pull[V any](seq Seq[V]) (next func() (V, bool), stop func())
    Pull converts the “push-style” iterator sequence seq into a “pull-style”
    iterator accessed by the two functions next and stop.

    Next returns the next value in the sequence and a boolean indicating whether
    the value is valid. When the sequence is over, next returns the zero V and
    false. It is valid to call next after reaching the end of the sequence
    or after calling stop. These calls will continue to return the zero V and
    false.

    Stop ends the iteration. It must be called when the caller is no longer
    interested in next values and next has not yet signaled that the sequence is
    over (with a false boolean return). It is valid to call stop multiple times
    and when next has already returned false.

    It is an error to call next or stop from multiple goroutines simultaneously.

In any case it seems better to have consistent behaviour between for range func and iter.Pull. Here's one way:

  • Guarantee that the iterator goroutine runs on the same thread as the calling goroutine, same as for for range func.
  • If the thread is locked, yield called from another goroutine will panic (similar to how calling yield after iteration has stopped also panics).

This has several advantages:

  • Same thread locking behaviour as for range func.
  • Yield panics guarantees that a locked thread cannot be stolen by the iterator goroutine, only temporarily transferred while the caller is blocked.
  • No need for a slow channel-based fallback for iter.Pull: calling LockOSThread from the iterator goroutine behaves the same as if the calling goroutine had called it.
  • Bonus: fixes proposal: runtime: add RunOnMainThread to take control of the main thread #64777.

@eliasnaur
Copy link
Contributor

eliasnaur commented Mar 5, 2024

Here's another reason yield should panic if called from another goroutine than the locked one:

func main() {
	rng := func(yield func(struct{}) bool) {
		go yield(struct{}{})
                select{}
	}
        runtime.LockOSThread()
	for range rng {
		// This body is *not* running on the locked thread!
	}
}

As far as I can tell from experiments, the body of the for range func loop is not running on the locked thread, despite LockOSThread.

@mknyszek
Copy link
Contributor Author

mknyszek commented Mar 5, 2024

FWIW, the documentation doesn't mention goroutines:

Hm... OK. That's a good point.

Guarantee that the iterator goroutine runs on the same thread as the calling goroutine

I don't think we can guarantee this in general. I assume you mean just in the LockOSThread case.

In that case, that's kind of interesting. Is the idea that if one of these two goroutines locks to the thread, they're always both locked (and if there are more, nested, calls to iter.Pull, effectively all goroutines in the chain are locked)? That may simplify things a little bit, but I am worried about the next point (which seems to be necessary to make this work without a lot of complexity). I think this is something similar to what @cherrymui suggested, at some point.

There may be something I'm missing here to implementing this though. Offline, we've discussed a few different ideas and it's all getting pretty jumbled for me.

If the thread is locked, yield called from another goroutine will panic (similar to how calling yield after iteration has stopped also panics).

Calling yield on another goroutine is generally allowed, and I feel like this may introduce a surprising divergence. (Some iterators will simply be unusable with goroutines locked to OS threads, though I suppose it'll probably be rare.)

On the one hand, your example is compelling that this behavior would be surprising. Simultaneously, it's already true that the loop body could execute on another goroutine, and it's overall more consistent to say that the new goroutine isn't locked to the thread.

I do not know what the right answer is here, and I don't think I alone could decide. I think we need to take a step back, write up all the different possible semantics, and get a big picture view.

@mknyszek
Copy link
Contributor Author

mknyszek commented Mar 5, 2024

Another dimension to all of this is: what happens if the iterator and/or yield function calls runtime.Goexit? Normally, that's supposed to destroy the thread if a goroutine is locked to it. Can we actually do that? And how does that align with the semantics of for range f?

@eliasnaur
Copy link
Contributor

eliasnaur commented Mar 5, 2024

Guarantee that the iterator goroutine runs on the same thread as the calling goroutine

I don't think we can guarantee this in general. I assume you mean just in the LockOSThread case.

Yes. The interesting cases I can think of are LockOSThread, Cgo callbacks, (Windows) syscall callbacks.

(FWIW, I tend to think of "same thread" equivalent to "any thread" if no thread is locked, because the runtime is free to move an unlocked goroutine to any existing or new thread.)

In that case, that's kind of interesting. Is the idea that if one of these two goroutines locks to the thread, they're always both locked (and if there are more, nested, calls to iter.Pull, effectively all goroutines in the chain are locked)?

Exactly. Multiple goroutines locked to the same thread is ok, because the coroutine semantics and yield panic ensure that at most one goroutine may run on the locked thread.

@AndrewHarrisSPU
Copy link

AndrewHarrisSPU commented Mar 5, 2024

In that case, that's kind of interesting. Is the idea that if one of these two goroutines locks to the thread, they're always both locked (and if there are more, nested, calls to iter.Pull, effectively all goroutines in the chain are locked)? That may simplify things a little bit, but I am worried about the next point (which seems to be necessary to make this work without a lot of complexity). I think this is something similar to what @cherrymui suggested, at some point.

I had some thoughts along these lines. I'm unclear about whether the impetus towards a guaranteed-different-thread locking solution is from a performance perspective, or from correctness perspective (and I'm also sure I'm unclear about other things).

I think I understand how degenerate scheduling patterns could arise from a usually-same-thread solution, and passing yields around. I do wonder that, to some degree, there's always been some risk of this with the runtime tools.

I'm less clear about correctness problems. I agree there is a possibility of confusion where a configuration with caller and iterator locking to different threads can't be eliminated. What else is there?

@eliasnaur
Copy link
Contributor

eliasnaur commented Apr 9, 2024

If the thread is locked, yield called from another goroutine will panic (similar to how calling yield after iteration has stopped also panics).

Calling yield on another goroutine is generally allowed, and I feel like this may introduce a surprising divergence. (Some iterators will simply be unusable with goroutines locked to OS threads, though I suppose it'll probably be rare.)

Good point.

On the one hand, your example is compelling that this behavior would be surprising. Simultaneously, it's already true that the loop body could execute on another goroutine, and it's overall more consistent to say that the new goroutine isn't locked to the thread.

I do not know what the right answer is here, and I don't think I alone could decide. I think we need to take a step back, write up all the different possible semantics, and get a big picture view.

Given your point above, I sketched the three semantics I see when the calling goroutine is locked to a thread:

  1. The iterator goroutine is always a separate goroutine, as if iter.Pull were implemented without runtime magic and as such runs on a different OS thread. As far as I see it, this option is the worst of options: hard to get optimal performance from and doesn't match the thread behaviour of a range-over-func.
  2. Always run the iterator goroutine on the same thread as the calling goroutine, but panic if yield is called from a goroutine not locked to the calling thread. This is my proposal above. It achieves optimal performance, and matches the behaviour of range-over-func except for the awkward panic condition.
  3. Same as option 2, but don't panic. Optimum performance, exactly matches range-over-func behaviour.

This leaves the only issue with option 3: it's possible for the iterator goroutine to "steal" a locked thread by calling yield from a different goroutine. E.g.:

func stealThread() {
	runtime.LockOSThread()

	it := func(yield func(struct{}) bool) {
            go yield(struct{}{})
            
            // The locked OS thread continues execution from here.
            ....
	}


	next, _ := iter.Pull(it)
        next()
        
        // From here on, another OS thread continues execution. 
       ...
}

My question is then: what's the big deal? Note that a limited form of thread stealing is already allowed by a range-over-func, as demonstrated by a previous comment:

func stealThread2() {
	rng := func(yield func(struct{}) bool) {
		go yield(struct{}{})
                // Locked thread continues here.
	}
        runtime.LockOSThread()
	for range rng {
		// This body is *not* running on the locked thread!
	}
        // Locked thread continues here.
}

The difference is that iter.Pull allows you to open-code the loop and as such the amount of code that would run without the thread locked seems larger than the range-over-func body. But that hardly matters: the range-over-func body can be arbitrary large because it includes function and method calls, recursively.

So: assuming the thread behaviour of range-over-func is fixed, I propose option 3 for iter.Pull to get optimal performance and semantics consistent with range-over-func, including thread stealing.

@eliasnaur
Copy link
Contributor

eliasnaur commented Apr 10, 2024

I forgot to mention that in practice, the stop function will limit the scope of accidental thread stealing to the function body creating the iter.Pull:

func f() {
	runtime.LockOSThread()
        stealThread()
        // The locked thread continues here.
}

func stealThread() {
	next, stop := iter.Pull(it)
        defer stop() // stop blocks until the iterator goroutine returns, along with the locked thread.
        // Here, the locked thread may or may not execute.
}

@eliasnaur
Copy link
Contributor

Here are two more surprising cases caused by a iter.Pull not consistent with range-over-func:

  1. Emulating range-over-func with reflection.

Just like reflect.MapIter, some users would like to obtain range-over-func behaviour from reflection code without the explicit for {} loop. I imagine ssa/interp is one such user. This is trivially achieved with a consistent iter.Pull, but very difficult otherwise.

In other words, I expect a call for a reflect.Iter if iter.Pull isn't it.

  1. Iterators expecting a particular thread

Some situations with a locked thread will not be able to use an inconsistent iter.Pull. Consider an iterator that expects to run on the locked thread. This works:

func f() {
        runtime.LockOSThread()

	lockedIter := func(yield func(Element) bool) {
		// This body *must* run on the locked thread.
	}

	for elem := range lockedIter {
            // Process elem, also on the locked thread.
	}
}

Now, assume that the requirements change so that a simple for loop is no longer sufficient, prompting the conversion to iter.Pull:

func f() {
        runtime.LockOSThread()

	lockedIter := func(yield func(Element) bool) {
		// This no longer runs on the locked thread!
	}

	next, stop := iter.Pull(lockedIter)
        defer stop()

        // Use next and process elements.
}

However, the converted code is (subtly) broken: lockedIter no longer runs on the locked thread.

Consistency is important in Go's design process, so I must have missed the reason(s) or discussions why it's ok for iter.Pull to be inconsistent with range-over-func. It may be obvious to some but not to me.

Thank you for bearing with me.

@dr2chase
Copy link
Contributor

Just want to point out that the right way to think about this is that Pull can be implemented with goroutines and channels, and anything else is an optimization, that must only be used when its behavior does not differ from what you would get from goroutines and channels. And if we can't figure out how to detect the differing uses, then the implementation is goroutines and channels. (One approach is to revert to goroutines and channels any time a LockOSThread goroutine touches the Pull value handoff.)

@eliasnaur
Copy link
Contributor

Just want to point out that the right way to think about this is that Pull can be implemented with goroutines and channels, and anything else is an optimization, that must only be used when its behavior does not differ from what you would get from goroutines and channels.

Can you please qualify this assertion that iter.Pull should deliberately behave differently than range-over-func? I offered two examples where an inconsistent iter.Pull would make its use difficult or impossible, and asked whether avoiding "thread stealing" is worth sacrificing consistency for. I also pointed out above that "thread stealing" is also possible with range-over-func, albeit limited to the loop body.

@dr2chase
Copy link
Contributor

I don't think I asserted that. We can implement iter.Pull with goroutines and channels, and that was the first proposed version of it. "coroutines" are an optimization. This works:

func GoPull[T any](seq Seq[T]) (iter func() (T, bool), stop func()) {
	next := make(chan struct{})
	yield := make(chan T)

	go func() {
		defer close(yield)

		_, ok := <-next
		if !ok {
			return
		}

		seq(func(v T) bool {
			yield <- v
			_, ok := <-next
			return ok
		})
	}()

	return func() (v T, ok bool) {
			select {
			case <-yield:
				return v, false
			case next <- struct{}{}:
				v, ok := <-yield
				return v, ok
			}
		}, sync.OnceFunc(func() {
			close(next)
			<-yield
		})
}

@eliasnaur
Copy link
Contributor

"coroutines" are an optimization

With respect to LockOSThread, "coroutines" behaves differently than "goroutines + channels". My examples are cases where this difference matters.

Range-over-func is implemented as "coroutines". Are you suggesting range-over-func should change to behave like "goroutines + channels" too? If not, why the inconsistency?

@dr2chase
Copy link
Contributor

dr2chase commented Apr 15, 2024

In range-func iterators, there are neither goroutines nor coroutines. Those only show up to implement "Pull", when two range-over-function iterators need to be paired up.

"Coroutines" don't have a dependable semantics yet, and can't be relied on to "define" any other behavior;. They are an optimization for a particular goroutine-channel idiom, and it looks like LockOSThread might not be compatible with that optimization. In this context, "behave differently" is actually "behave broken".

PS/edit -- in general, a programmer doesn't get to know, or complain, about the specific binding between goroutine and thread. Certain choices are faster, but faster is preferred, not guaranteed. LockOSThread is an exception to this rule, forced by various host platforms attaching wonky/interesting semantics to thread identity. "Are pull-er and pull-ee goroutines bound to the same thread?" is a question you may ask, but the Go implementation is not required to answer it in any particular way, especially if requiring that answer creates weird deadlocks.

@eliasnaur
Copy link
Contributor

In range-func iterators, there are neither goroutines nor coroutines. Those only show up to implement "Pull", when two range-over-function iterators need to be paired up.

Are you saying that the LockOSThread problem doesn't exist for range-over-func? This is not so, as demonstrated by #65889 (comment).

In other words, both range-over-func and iter.Pull must have some behaviour with respect to threads locked to goroutines, and so the "coroutine vs goroutine" question is relevant for both.

"Coroutines" don't have a dependable semantics yet, and can't be relied on to "define" any other behavior;. They are an optimization for a particular goroutine-channel idiom, and it looks like LockOSThread might not be compatible with that optimization. In this context, "behave differently" is actually "behave broken".

PS/edit -- in general, a programmer doesn't get to know, or complain, about the specific binding between goroutine and thread. Certain choices are faster, but faster is preferred, not guaranteed. LockOSThread is an exception to this rule, forced by various host platforms attaching wonky/interesting semantics to thread identity. "Are pull-er and pull-ee goroutines bound to the same thread?" is a question you may ask, but the Go implementation is not required to answer it in any particular way, especially if requiring that answer creates weird deadlocks.

What deadlocks are you referring to? In particular, are there deadlocks particular to iter.Pull that doesn't show up in a range-over-func?

Regardless of the particular behaviour with respect to locked threads, can we at least agree that range-over-func and iter.Pull should behave the same? That is, iter.Pull being the open-coded version of a range-over-func. If not, why not?

@dr2chase
Copy link
Contributor

Rangefunc iterators are not required to use an additional goroutine, and if they do, they may cause interesting problems with LockOSThread. Programmers might not want to do that in general-purpose iterators that might be used where LockOSThread is active. For Pull, there is no choice -- there must be a second goroutine. Pull forces the issue, whereas for rangefunc, "maybe you don't want to do that" (the root cause is LockOSThread, but it already exists).

Deadlocks occur if we start requiring that Pull must use coroutines, provisionally defined as "two goroutines sharing a single OS thread". Combining that with LockOSThread seems to create the possibility of two goroutines both demanding to use the same OS thread, which is where things start to go wrong.

I don't understand "iter.Pull being the open-coded version of a range-over-func", nor am I sure I understand what you mean by "behave the same". An iterator should deliver its values in the same order, whether it is rangefunc or passed to Pull, that is the same, but the plumbing that delivers those values is quite a bit different. And the problem is in the plumbing.

@eliasnaur
Copy link
Contributor

Rangefunc iterators are not required to use an additional goroutine, and if they do, they may cause interesting problems with LockOSThread. Programmers might not want to do that in general-purpose iterators that might be used where LockOSThread is active.

Surely it's useful to specify the behaviour when the caller or the iterator calls LockOSThread? Or are you suggesting that iterators cannot be used in combination with LockOSThread? If so, you're also excluding code called from a Cgo callback as well as library code that has no chance to know its calling context.

Deadlocks occur if we start requiring that Pull must use coroutines,
provisionally defined as "two goroutines sharing a single OS thread". Combining that with LockOSThread seems to create the possibility of two goroutines both demanding to use the same OS thread, which is where things start to go wrong.

Please provide an example of a deadlock. I claim that two goroutines can share a specific thread because the calling goroutine and iterator goroutine can be cooperatively scheduled: the running (and thread-locked) goroutine explicitly hands over control (and therefore the locked thread) when calling yield or next, goes to sleep and allows the other goroutine to continue (on the locked thread).

I don't understand "iter.Pull being the open-coded version of a range-over-func", nor am I sure I understand what you mean by "behave the same". An iterator should deliver its values in the same order, whether it is rangefunc or passed to Pull, that is the same, but the plumbing that delivers those values is quite a bit different. And the problem is in the plumbing.

By "behave the same" I mean that a rangefunc must be interchangeable with an equivalent iter.Pull and vice versa, without any change in behaviour visible to the programmer (except performance). That includes the behaviour of thread locked by the caller or iterator. For example, a Go interpreter should be able to faithfully implement rangefunc with iter.Pull.

Note that I'm not even saying what the locked thread behaviour should be, just that the behaviour should be equal.

@dr2chase
Copy link
Contributor

Suppose goroutine A creates a pull function P, and passes it to goroutine B. B calls P, P calls LockOSThread and returns a value to B. B calls LockOSThread (just to be really unambiguous that B and P now both have a lock on the same OS thread), and acks A in a channel, and then blocks in a cgo call waiting for some almost-never-happen event, normally B would still be waiting there on process exit.

(Alternate scenario for glueing B and P to the same thread, B is already LockOSThreaded, P then calls LockOSThread when B calls it. Normally in this situation we would increment a counter.)

A calls LockOSThread. A calls P. What happens?
I don't think we can get the OS Thread loose from the cgo call.

It's not formal deadlock, but it's not great.

Another interesting case occurs if, A is locked to an OSThread, and creates two pull functions P and Q (and for sake of non-ambiguity, pulls a value out of each one and they each also call LockOSThread), then hands P and Q to separate goroutines B and C, who attempt to call P and Q concurrently. We can time slice those, of course, so no deadlock, but that's new and interesting for the Go scheduler.

@dr2chase
Copy link
Contributor

I've been trying to figure out what should/shouldn't work and where something must change. I have a default bias that LockOSThread is the pig at the party here, but maybe that's wrong.

I think that these things should by-default work, with possible exceptions in the case of LockOSThread "somewhere":

  • a rangefunc iterator type Seq[V any] func(yield func(V) bool) is a function, and a function is a value, there are no restrictions on how it is used.
  • the result of a calling iter.Pull[V any](seq Seq[V]) is a pair of functions, and functions are values, there are no restrictions on how they are used, other than "be sure to call stop() after done calling yield()".
  • a goroutine that has already called LockOsThread should be able to use rangefunc iterators and Pull functions. In general there's no way to know where iterators might appear.

The last case can create problems, depending on why it called LockOSThread. Did the goroutine need it for reasons unrelated to the iteration (context), or was it because of iteration (iterator), or was it because the loop body needed to run on a particular OS thread (body)?

If it's just context, it doesn't matter what the range function does or where the loop body runs, they are incidental. And if Pull is called, that is not a problem either, whether it uses goroutine or coroutine because the iterator is insensitive to goroutine/thread.

If it's the loop body, then Pull is not an issue because Pull works on iterators. In the default case of rangefunc, that does not explicitly do interesting things with goroutines (range functions can do things with goroutines, but it's not the default, and it looks like perhaps running yield (the loop body) in a different goroutine should be discouraged), this will also "just work", the range function runs in the same goroutine as the body and as its callers, so it all works there, too.

The interesting case is the iterator. I think this breaks down into three subcases, which differ in how the deal with undoing LockOSThread.

  1. the caller is expected to call LockOSThread, before the iteration, and then undo it afterwards. In this case, the need for LockOSThread and the special binding is not a secret, so the programmer is to some extent "on notice".
  2. the caller is not expected to call LockOSThread, but the same call that provides the range function also provides a dispose function to be called after the iterator is done, to undo the LockOSThread. There are two subcases, in both, there is a clue for the programmer that this is not a standard range function:
    a. Lock/UnlockOSthread occur in the caller's context
    b. the range function or its generator creates a separate goroutine, which is Locked, and communicates with it via channels.
  3. the rangefunc-maker implements cleanup with a finalizer. The programmer may not notice that anything special is going on, and might treat the iterator function like any other value. HOWEVER, finalizers run in a different goroutine; to make the unlock OS thread happen, there must be an explicitly created goroutine, that is locked to an OS thread, generating values, and stuffing them into a channel, and at the same time is also waiting on its own "go away" channel for notification that it is no longer needed and unlock the OS thread, and that is the channel that the finalizer writes to. This means that the range function is actually insensitive to what goroutine it is run in.

In the first case, it's obvious that LockOSThread is needed, and thus the special rule applies, "don't pass the range function to other goroutines". This case is also sensitive to where the Pull function executes, so a coroutine implementation is preferred. 2a is the same.

For 2b and 3, because the locked goroutine is hidden within the range function, it doesn't matter where the range function itself runs.

I think the only tricky case is for an iterator that requires locking to an OS thread, but these are either obvious when they are created, or else insulate iterator value generation inside another goroutine.

It seems like a range function that runs "yield" (the loop body) in a different thread than the outer caller is officially a bad idea unless this is documented (i.e., it has a purpose).

I think it is also a bad idea if an range function calls LockOSThread (in the caller's goroutine, so, 2a) without unlocking before it returns to the caller. That implies a need for a balancing UnlockOSThread that is not necessarily provided. GoodStyle™ would be to lock the OS thread when the range function is created, not when it is called.

This is the best I've got on this much time. I'll think about it some more, but maybe other people will have better thoughts, improve it, or shoot it down.

@eliasnaur
Copy link
Contributor

eliasnaur commented Apr 16, 2024

Thank you for the detailed example. I believe I've identified where our "coroutine" semantics differ. Hopefully, the following explanation is clearer.

First, assumptions:

  • There should be no observable difference between rangefunc and iter.Pull behaviour, except for performance.
  • Since goroutines have no IDs, the only observable difference is the underlying OS thread.
  • Since the Go runtime is free to move non-threadlocked goroutines among non-locked threads at any time, the only observable difference is the underlying locked thread, if any.

Then my proposal: let the locked thread (if any) and its lock count follow goroutine switches between caller and iterator, for both rangefunc and iter.Pull.

The consequences are:

  • No more than one goroutine has a particular thread locked at any one time, eliminating any deadlock conditions.
  • A locked thread follows execution, as expected: once LockOSThread is called by either caller or iterator, the execution trace has that thread locked from then on.
  • It's now possible to "steal" locked threads, e.g. if the caller calls LockOSThread and the iterator yields from another goroutine. This is perhaps surprising, but notably already possible with the current implementation of rangefunc:
func main() {
        // This iterator runs on the locked thread.
	rng := func(yield func(struct{}) bool) {
		go yield(struct{}{}) // Run the loop body with a different thread.
                select{} // Or a blocking Cgo function.
	}
        runtime.LockOSThread()
	for range rng {
		// The loop body is *not* running on the locked thread!
	}
}

Suppose goroutine A creates a pull function P, and passes it to goroutine B. B calls P, P calls LockOSThread and returns a value to B. B calls LockOSThread (just to be really unambiguous that B and P now both have a lock on the same OS thread), and acks A in a channel, and then blocks in a cgo call waiting for some almost-never-happen event, normally B would still be waiting there on process exit.

With my proposal, the following happens:

  • A creates a pull function P, and passes it to goroutine B. B calls P.
  • P calls LockOSThread and is locked to thread T1.
  • P returns a value (and the locked thread T1) to B and blocks in yield.
  • B continues locked to T1, calls LockOSThread which increments the lock count on T1.
  • B calls a Cgo function and is blocked forever (on T1).

(Alternate scenario for glueing B and P to the same thread, B is already LockOSThreaded, P then calls LockOSThread when B calls it. Normally in this situation we would increment a counter.)

A calls LockOSThread. A calls P. What happens? I don't think we can get the OS Thread loose from the cgo call.

  • A calls LockOSThread and is locked to thread T2.
  • A calls P, thereby passing T2 to P, and blocks in next.
  • P continues locked on T2.

It's not formal deadlock, but it's not great.

By passing locked threads between caller and iterator, LockOSThread deadlocks are no longer possible.

Another interesting case occurs if, A is locked to an OSThread, and creates two pull functions P and Q (and for sake of non-ambiguity, pulls a value out of each one and they each also call LockOSThread), then hands P and Q to separate goroutines B and C, who attempt to call P and Q concurrently. We can time slice those, of course, so no deadlock, but that's new and interesting for the Go scheduler.

  • A locks to thread T.
  • A calls P, passing T to it.
  • P runs locked on T, calls LockOSThread and increments the lock count of T.
  • P returns some value to A, passing T (and lock count 2) back to A.
  • A continues locked to T.

The same happens for Q, leaving A locked to T.

Now, B and C have no threads locked so P and Q will run on some arbitrary thread when called by them.

@dr2chase
Copy link
Contributor

This seems problematic:

  • P calls LockOSThread and is locked to thread T1.
  • ...
  • A calls P, thereby passing T2 to P, and blocks in next.
  • P continues locked on T2.

Since P was previously locked to T1, and might have made that call expecting that subsequent calls would also be locked to T1, running on T2 seems wrong. However in my "how is this used anyway?" scenario, the iterator inside the pull function "should not" have exported (left unreversed) a call to LockOSThread; that should have occurred in the creating context, perhaps, and this is even something we can check for in the Pull implementation, and panic.

Locking the OS thread when the iterator is created in A/T2 creates the opposite problem of "what happens in B/T1 when B calls P", and how do we distinguish that "bad" case from the "not bad" case where the iterator does not care that it is thread-locked, and just happens to be embedded within code that A called, and A was locked to T2 for other reasons?

I am inclined to assert that if there is an iterator that requires thread-locking to work properly, that it must be coded to hide that from its callers by creating its own goroutine, locking that to an OS thread, and using it to generate values, which are then passed to the yield function in the caller's context (that way, the calling function can also require thread-locking for the body, perhaps even to a different thread). Probably such an iterator would be paired with a dispose/stop function and might also have a finalizer, to recover the goroutine and locked OS thread.

So if I were to try to get closer to a proposal, Pull functions use coroutines, and if the locked count changes between entry and exit to the yield function (i.e., unpaired lock/unlock), panic, and if an iterator (a range function) wishes to run on a locked OS thread, it must create a separate goroutine for that purposes, the calling context is not guaranteed.

@eliasnaur
Copy link
Contributor

I am inclined to assert that if there is an iterator that requires thread-locking to work properly, that it must be coded to hide that from its callers by creating its own goroutine, locking that to an OS thread, and using it to generate values, which are then passed to the yield function in the caller's context (that way, the calling function can also require thread-locking for the body, perhaps even to a different thread). Probably such an iterator would be paired with a dispose/stop function and might also have a finalizer, to recover the goroutine and locked OS thread.

This only applies to iterators that require a locked thread separate from their caller, correct? That is, iterators that must be locked to their caller's locked thread will just work like the equivalent rangefunc does.

So if I were to try to get closer to a proposal, Pull functions use coroutines, and if the locked count changes between entry and exit to the yield function (i.e., unpaired lock/unlock), panic,

Would rangefunc iterators also panic? If not, why not?

and if an iterator (a range function) wishes to run on a locked OS thread, it must create a separate goroutine for that purposes, the calling context is not guaranteed.

By "calling context is not guaranteed" do you mean "the calling context is what's calling next"? That is, the context is well-defined but under control of the caller and thus may change during each call to yield.

@dr2chase
Copy link
Contributor

This only applies to iterators that require a locked thread separate from their caller, correct? That is, iterators that must be locked to their caller's locked thread will just work like the equivalent rangefunc does.

I think so, including "if you pass the range function to a different goroutine, it will see a different context" (don't do that if you want it to run in a constant context -- in this situation the programmer is/should-be aware that thread locking is in play).

Would rangefunc iterators also panic? If not, why not?

Probably not, and the reason is cost. We're one optimization and some boosted inlining away from range function iterators running really well, and all the checking code that gets inserted into the generated yield function constant-propagates away. Checking the locked-thread-state won't go away like that. We might include a more-checking mode for this, maybe we would always check it anyway.

By "calling context is not guaranteed" do you mean "the calling context is what's calling next"? That is, the context is well-defined but under control of the caller and thus may change during each call to yield.

Right.

The underlying problem here is that LockOSThread is probably the wrong primitive, it would have been better expressed as func WithLockedOSThread(runsInLockedContext func()), but, oops, mistakes were made. This would have made some of the problematic cases unsayable. We had the go-func and defer-func examples right there to copy, and we blew it, oh well.

One performance-related question -- if we declare that it is GoodStyle™ for range functions that need locking to create a separate locked goroutine and access that through channels, that's going to be even more expensive when passed to Pull. A problem with range function iterators is that they're opaque -- they don't provide the option of offering a cheaper implementation of Pull (e.g., a SliceIterator has a much cheaper implementation of Pull, but that's not available). It seems like in this case if you know that the iterator will require a thread locked goroutine, we might be better off inverting the relationship, have Pull be the default, and the range function derived from that. (I am not at all sure about this, but I don't like the pile of expensive layers.)

@dr2chase
Copy link
Contributor

The other reason that rangefunc iterators likely won't panic is that the range function is written by a human and really is "just a function". The compiler rewrite of

for x := somerangefunction {
   body
}

is

   <declare and initialize some state variables>
   yield := func(x T) bool {
      <fiddle with state variables>
      body // with tweaked rewrites for break/continue/goto/return
      return true
   }
   somerangefunction(yield)

The most we could check is that thread locking state doesn't change between initialization and each iteration of the loop body, but the bare range function is just a function.

And strictly speaking, nobody's required to use iter.Pull, a somewhat slower non-coroutine version would also work. I think the interesting question here is whether we want to sign up for the different, extra, and potentially useful semantics of passing OS-thread-locked state across a coroutine switch. Originally, this had just been a performance optimization.

@eliasnaur
Copy link
Contributor

Would rangefunc iterators also panic? If not, why not?

Probably not, and the reason is cost. We're one optimization and some boosted inlining away from range function iterators running really well, and all the checking code that gets inserted into the generated yield function constant-propagates away. Checking the locked-thread-state won't go away like that. We might include a more-checking mode for this, maybe we would always check it anyway.

May I suggest not panic'ing for iter.Pull either? It's a subtle panic condition, it seems ok to require LockOSThread users to be careful, and most importantly I think it's important to be able to refactor rangefuncs into iter.Pull equivalents and be certain the behaviour is equal.

On the other hand, starting with the panic and relaxing it later is backwards compatible.

And strictly speaking, nobody's required to use iter.Pull, a somewhat slower non-coroutine version would also work.
I think the interesting question here is whether we want to sign up for the different, extra, and potentially useful semantics of passing OS-thread-locked state across a coroutine switch. Originally, this had just been a performance optimization.

iter.Pull is less useful if it doesn't match rangefunc behaviour. Go interpreters come to mind, as well as my reflect example.

In addition, starting with a non-context-passing iter.Pull cannot later be changed to pass context in a backwards compatible way.

@cherrymui
Copy link
Member

I'm not sure I understand the reflect example. Could you explain more about it, maybe with a concrete example?

As I understand, rangefunc is just a function, so the way to use reflection to run a rangefunc loop is just Call the rangefunc, passing in the "body" written in terms of reflection.

@eliasnaur
Copy link
Contributor

I'm not sure I understand the reflect example. Could you explain more about it, maybe with a concrete example?

Please ignore the reflect example; I failed to find an instance that is not a case of the general second example in my comment. That is, cases where (1) iter.Pull is required, and (2) the iterator must run on the same locked thread as the caller.

@timothy-king
Copy link
Contributor

I am very sympathetic to wanting to implement range-over-func using iter.Pull. I spent a couple of weeks in December trying [unsuccessfully] to do this for x/tools/go/ssa and interp. I was eventually convinced that I was trying to pound a square peg through a round hole. x/tools/go/ssa has a proposed implementation based on function rewriting today.

The FAQ already covers that iter.Pull and similar approaches are not equivalent: https://go.dev/wiki/RangefuncExperiment#why-are-the-semantics-not-exactly-like-if-the-iterator-function-ran-in-a-coroutine-or-goroutine . Let me try to give an expanded version. This explanation is mostly outside of the scope of this issue, but I think it addresses what you are asking.

There are two main proposed semantics/implementations for range-over-func statements:

for x := range somerangefunction {
  body
}

The first implementation is function rewriting. This takes a range-over-function statement and constructs a new synthetic function that communicates across function boundaries via hidden state, calls the iterator function on the synthetic function, and reestablishes control flow after the function call. Adjusting what David had a bit:

   <declare and initialize some state variables>
   yield := func(x T) bool {
      <fiddle with state variables>
      body // with tweaked rewrites for break/continue/goto/return
      return true
   }
   somerangefunction(yield)
   <finish tweaked break/continue/go/return and panicking>

What exactly those tweaks are is a bit complicated, but these should be mostly invisible to regular users. They are not invisible to compilers and interpreters though. The overwhelming advantage of this approach is that it is as fast as users expect. This is mostly accomplished by inlining somerangefunction and yield.See https://go.dev/wiki/RangefuncExperiment#can-range-over-a-function-perform-as-well-as-hand-written-loops

The second implementation is to concurrently execute somerangefunction and to communicate between the concurrent routines. In practice, concurrent routines are implemented via goroutines in go. This leads to a candidate semantics based around iter.Pull or similar.

next, stop := iter.Pull(somerangefunction) // execute somerangefunction in another goroutine
defer stop()
for x, done := next(); !done; x, done = next() {
  body
}
stop()

This is a bit over simplified, but the key point is somerangefunction executes concurrently with body in a different goroutine. What is great about iter.Pull is that it can be library function. It does not need extra compiler help to change body or somerangefunction. This leads to the lowering above being simple.

There are an enormous number of additional posible tweaks to both of the above approaches (runtime.coro for example), but those are the two big categories of proposed semantics for range-over-func statements.

These semantics do differ. Stacks and goroutines are observable in go. It is easier to see this with an example.

var count int
func somerangefunc(y func(x int) bool) {
  defer() func { if r := recover(); r != nil { count++; panic(r) } }()
  for y(rand.Int()) {}
}
func f() {
  defer func() { _ = recover() }
  for x := range somerangefunc {
    switch x {
    case 1:  panic(x)
    case 2: runtime.Goexit()
    case 3:  break
    }
  }
}

Let's suppose rand.Int() returns 1 first, so panic(x) happens. What are the stacks at the panic(x) call in the different semantics?

  • Function rewriting has one stack [panic(x), yield, somerangefunc, f]
  • Concurrent execution has two stacks [panic(x), f] and [yield_pull, somerangefunc, iter.Pull]. (yield_pull is an internal lambda in iter.Pull.)

Because the stacks are different, the panic executes differently in these two cases.

  • Function rewriting: panic(x) goes through yield, then somerangefunc, count is incremented, r is rethrown, and the panic is recovered by f.
  • Concurrent execution: panic(x) is recovered by f. It does not go through yield_pull or somerangefunc. Count is not incremented.

So the value of count is an example of an observable difference in the two semantics. The root cause is that the goroutines and calls are structured differently in the two proposals, and this is observable.

It is also worth considering a panic within the iterator function ('somerangefunc' in the example) also plays out differently. One semantics terminates the program. Also runtime.Goexit() within the iterator function also plays out differently. Exercises left to reader. This is not even getting into whether runtime's stack traces are considered observable. The runtime library is particularly prone to having functions like LockOSThread that make them even more observable.

In the end, range-over-func's were chosen to be equivalent to the function rewrite semantics. This is mostly due to efficiency (and partially debugging and partially robustness). So users of range-over-func should have the function rewrite semantics in mind when they use it (the simple version above), not iter.Pull. This gives iter.Pull a bit of a sharp edge that needs to be documented.

You can then ask can anything be done to narrow the divide. Maybe. But I would recommend against trying to do this on the issue tracker. You need to poke some fundamental stuff like "what is a stack in Go?", or "what if there was a second kind of coroutine in the runtime?" and do this in a backward compatible fashion. I recommend taking a few days [or weeks] and try to come up with something offline.

I hope this helped.


With background out of the way, on to specific questions.

#65889 (comment) :

Just like reflect.MapIter, some users would like to obtain range-over-func behaviour from reflection code without the explicit for {} loop. I imagine ssa/interp is one such user.

ssa/interp does not need this. The ssa package does the function rewrite so the interpreter did not need serious modification. It did need modification to handle defer elegantly. (Alternatives for the defer changes exist but they are really ugly.)

This is trivially achieved with a consistent iter.Pull, but very difficult otherwise.

It is not trivial. And it probably will be difficult. Range-over-function is a big language change enabled by compiler magic. This places a burden on tools sensitive to detailed semantics like compilers and interpreters.

FWIW ssa/builder.go is now ~17% more lines of code.

func f() {
runtime.LockOSThread()
lockedIter := func(yield func(Element) bool) {
// This no longer runs on the locked thread!
}
next, stop := iter.Pull(lockedIter)
defer stop()
// Use next and process elements.
}

lockedIter is called from another goroutine than f. If you inline iter.Pull, the go statement becomes clear and this ceases to be surprising. So I would say this example WAI. Yes this is a sharp edge and that is unfortunate.

For example, a Go interpreter should be able to faithfully implement rangefunc with iter.Pull.

I do not know how to do this. Maybe someone more clever than me has already solved this? Based on my own experience updating an interpreter, one can get something 90% of an implementation of range-over-func using iter.Pull, but it will not end up 100% equivalent to the function rewrite semantics. And interpreters need to be >= 99% to be useful. The stacks are different and trying to work around this is hard.

range-over-func is a big language change, and I expect interpreters to need to do a big adjustment. I cannot accurately predict what these will be for other interpreters. Interpreters likely need to understand the code transformation to understand the semantics of a range-over-func statement. A good place to start is the comments here https://go.googlesource.com/go/+/refs/heads/master/src/cmd/compile/internal/rangefunc/rewrite.go (and the tests in rangefunc_test.go). Interpreter maintainers can then determine what support they need from the Go standard library to implement range-over-func in their tool. They can file proposals requesting these features.

FWIW I suspect what they need is to attach state equivalent to #next to range-over-func yield functions, to attach meaning to #next based on the program locations, and handle the "<finish tweaked break/continue/go/return and panicking>" step based on this value. Again they need to understand the details.

iter.Pull is less useful if it doesn't match rangefunc behaviour.

Fair enough. But hopefully it is still useful enough.

That is, cases where (1) iter.Pull is required, and (2) the iterator must run on the same locked thread as the caller.

Maybe try to rephrase how you think of this requirement as 'On the statement go g() the function g must run on the same locked thread as the go g() statement'. I'll leave it to the folks that work on the runtime to address whether/how this is doable.

@eliasnaur
Copy link
Contributor

Thank you very much for your detailed comment.

Your section about function rewriting and Go interpreters made me realize that function rewriting is irrelevant for iter.Pull in a basic sense: there's no control flow to rewrite.

Given iter.Pull is implemented with goroutines, the remaining question relevant to this issue is the LockOSThread behaviour.

That is, cases where (1) iter.Pull is required, and (2) the iterator must run on the same locked thread as the caller.

Maybe try to rephrase how you think of this requirement as 'On the statement go g() the function g must run on the same locked thread as the go g() statement'. I'll leave it to the folks that work on the runtime to address whether/how this is doable.

I tried and failed to understand this rephrasing. Can you elaborate? What I'm proposing is not that two goroutines must run on the same locked thread. I'm proposing that next and yield both act as cooperative scheduling points. That is, the thread entering next or yield is the same thread that resumes execution from the return of its counterpart. Calling next from (locked) thread T will see yield return (locked) on thread T and vice versa.

I don't have enough knowledge of the Go runtime to know whether such behaviour is easy to implement, but I presume the runtime scheduler is already involved every time a goroutine sends or receives from a channel, which is what next and yield do.

@timothy-king
Copy link
Contributor

I'm proposing that next and yield both act as cooperative scheduling points. That is, the thread entering next or yield is the same thread that resumes execution from the return of its counterpart. Calling next from (locked) thread T will see yield return (locked) on thread T and vice versa.

Seems like I misunderstood what you were proposing. I will let others comment on the plausibility of implementing this. The internal coroswitch seems like a very special channel, but I don't know enough details.

I will point out that seq might call yield from another goroutine. So even if next and stop are guaranteed to switch to yield on T when LockOSThread is true, seq may still execute on a different goroutine after yield returns. seq is just some user provided function so it could do this. So a caller of iter.Pull(seq) that cares about LockOSThread would need be aware of some details of the implementation of seq. Maybe good enough?

@eliasnaur
Copy link
Contributor

I'm proposing that next and yield both act as cooperative scheduling points. That is, the thread entering next or yield is the same thread that resumes execution from the return of its counterpart. Calling next from (locked) thread T will see yield return (locked) on thread T and vice versa.

Seems like I misunderstood what you were proposing. I will let others comment on the plausibility of implementing this. The internal coroswitch seems like a very special channel, but I don't know enough details.

I will point out that seq might call yield from another goroutine. So even if next and stop are guaranteed to switch to yield on T when LockOSThread is true, seq may still execute on a different goroutine after yield returns. seq is just some user provided function so it could do this. So a caller of iter.Pull(seq) that cares about LockOSThread would need be aware of some details of the implementation of seq. Maybe good enough?

Indeed. On the other hand without cooperative scheduling, a caller of iter.Pull that cares about LockOSThread would need to be aware of the different behaviour from rangefunc: from the perspective of a LockOSThread user, rangefunc function rewriting effectively cooperatively schedules between the caller and iterator.

So it seems to me it's a trade-off between two kinds of sharp edges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. release-blocker
Projects
Development

No branches or pull requests

10 participants