Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: add block/wake function #46431

Closed
GuhuangLS opened this issue May 28, 2021 · 19 comments
Closed

proposal: runtime: add block/wake function #46431

GuhuangLS opened this issue May 28, 2021 · 19 comments

Comments

@GuhuangLS
Copy link

GuhuangLS commented May 28, 2021

Background: channel is designed to be a data transfer pipe between
go routines. A scenario in gVisor is: block/wake a G (stands for a
task) by using channel, wherein there's no need for data transfer.
Channel is relatively too heavy for this case.
Channel manages Gs in a list, and has capacity. When the channel is
full, the sender will bock, and the channel is clear, the receiver
is blocked. The receiver pushed to chan list, then schedule to run
next. The sender uses goready to wake Gs in the chan list.

A summary on our idea: introduce a new set of APIs for goroutine
wake/block.
For example, Waitlist at gVisor(such as futex.waiter:
pkg/sentry/kernel/futex/futex.go) used to task IPC; Waiter uses
GetG() to get the address of G, then saved at waiter struct; then
call BlockG to schedule the M to run next G. The coming waker
uses the G(waiter) address to wake it by WakeG(G).

Details:

We propose three new APIs:

GetG() - used to get the address of G by itself, which can be saved
at waitlist.By address, the waker find the G easier, then to wake it at
go runtime.

WakeG() - can be used to wake one G, which can be in running/blocked
status.
If the waiter has been blocked aready, then waker wakes it by
goready().
If the waiter is in waiting process, the one got sched.wakeLock first
processes first:if waiter is later, then waker wakes it by goready(). If the
waiter got the lock after the waker,then no need to block, because
"gp.wakeStatus == gStatusWaked".
If the waker coming, waiter not blocked, then if the waiter is at list, then
modifies gp.wakeStatus = gStatusWaked. when waiter coming, no need
bo block. if no waiter at the list, no need to WakeG().

BlockG() - can be used to block goroutine by itself.
modify:
GuhuangLS@109c8f5

How we use this in gVisor:

GuhuangLS/gvisor@97e0e6c

In futex()/epoll_wait(), we can modify it to use the new mechanism
for block and wake. Between sentry and go runtime, we maintain the
status of task Gs. Let's use futex as an example, add running status
at goruntime, NoWake,Waked,Blocked.

At sentry, one task/G can use BlockG() to block, like <-chan. Other
tasks/Gs can use WakeG() to wake the task/G which is blocked by BlockG()
, like chan <-. Based on a basic prototype of Go and gVisor, we use the
program in google/gvisor#2033(comment) as the test program.

We can see 22% improvement by test case:
google/gvisor#2033.
cc @prattmic @ianlewis @QinChenggang @tanjianfeng @lubinszARM

@seankhliao seankhliao changed the title schedule: add block/wake function proposal: runtime: add block/wake function May 28, 2021
@gopherbot gopherbot added this to the Proposal milestone May 28, 2021
@seankhliao
Copy link
Member

cc @aclements @randall77 @mknyszek

@ianlancetaylor

This comment has been minimized.

@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) May 28, 2021
@GuhuangLS

This comment has been minimized.

@mknyszek
Copy link
Contributor

mknyszek commented Jun 1, 2021

Channel is relatively too heavy for this case.

If this is the case, can't we just improve the implementation of chan struct{} (or more generally, chan <any zero-sized type>) in the compiler+runtime? Have a special streamlined implementation for this use-case? That doesn't require adding a new API, and everyone benefits. It's not going to be as fast as the caller knowing exactly which G to wake up, but channels have lots of room for improvement in particular, I think.

Beyond that, does the sync package's primitives help here at all? And if not, could they be improved to help out here?

Also, is there anything we could in the runtime to improve the situation? In google/gvisor#2033 @prattmic noted performance issues in the scheduler such as #43997. I suspect there are still ways we can improve the scheduler without having to resort to new APIs that could help improve your (and gVisor's) situation.

Finally, I just want to say that in general, exposing the concept of a G to the user (beyond the concept of a goroutine I mean) brings with it potentially a lot of conceptual overhead. There are good arguments against something like goroutine-local-storage and the GetG function of this proposal seems to provide a direct mechanism to expose a unique identifier for a goroutine.

@prattmic
Copy link
Member

prattmic commented Jun 1, 2021

For this case, I think there is potential for general runtime optimizations on channels.

One such proposed optimization is #32113, which I prototyped in #32113 (comment). That optimization applies well to the futex case, where wake+wait is common. IIRC, I applied this prototype google/gvisor#2033 and saw similar improvements to this prototype (~20%). Though I don't seem to have posted the benchmark results anywhere, so I may be misremembering and need to dig up the actual results.

@prattmic
Copy link
Member

prattmic commented Jun 1, 2021

cc @nixprime @amscanne

@GuhuangLS
Copy link
Author

GuhuangLS commented Jun 2, 2021

Channel is relatively too heavy for this case.

If this is the case, can't we just improve the implementation of chan struct{} (or more generally, chan <any zero-sized type>) in the compiler+runtime? Have a special streamlined implementation for this use-case? That doesn't require adding a new API, and everyone benefits. It's not going to be as fast as the caller knowing exactly which G to wake up, but channels have lots of room for improvement in particular, I think.

Beyond that, does the sync package's primitives help here at all? And if not, could they be improved to help out here?

Also, is there anything we could in the runtime to improve the situation? In google/gvisor#2033 @prattmic noted performance issues in the scheduler such as #43997. I suspect there are still ways we can improve the scheduler without having to resort to new APIs that could help improve your (and gVisor's) situation.

Finally, I just want to say that in general, exposing the concept of a G to the user (beyond the concept of a goroutine I mean) brings with it potentially a lot of conceptual overhead. There are good arguments against something like goroutine-local-storage and the GetG function of this proposal seems to provide a direct mechanism to expose a unique identifier for a goroutine.

For #43997, shows the case clearly.
Like gVisor, it mangages Gs by task struct; futex/lock realised at sentry, uses waitlist. Waiter and waker communicate each other by channel which saved at waiterlist. chan can be seen by gVisor and go runtime. if no chan, need other struct for communication.
The GetG likes chan, tells the waiter at runtime. Maybe change name of GetG to GetLightweightChan?
Except for task program/chan/scheduler, one more lightwait function could
enrich runtime.

@GuhuangLS
Copy link
Author

GuhuangLS commented Jun 2, 2021

For this case, I think there is potential for general runtime optimizations on channels.

One such proposed optimization is #32113, which I prototyped in #32113 (comment). That optimization applies well to the futex case, where wake+wait is common. IIRC, I applied this prototype google/gvisor#2033 and saw similar improvements to this prototype (~20%). Though I don't seem to have posted the benchmark results anywhere, so I may be misremembering and need to dig up the actual results.

#32113 shows wake/block by M waked and idle p got, it's good optimization. One G to goready other G, then to get idle p, to wake M, could the concurrence be improvement between goready and wake M?

#46431 optimise chan list managetime, it calls goready() at WakeG(), so #32113 optimsization and #46431 is not confliction. The two optimsization could work togerther.

@rsc
Copy link
Contributor

rsc commented Jun 9, 2021

It would definitely be much more preferable to focus on optimizations that apply to particular common uses of channels than to introduce new APIs.

@GuhuangLS
Copy link
Author

GuhuangLS commented Jun 10, 2021

It would definitely be much more preferable to focus on optimizations that apply to particular common uses of channels than to introduce new APIs.

if add the function to channel , is it feasible? Let channel owns the fast path.

@cocotyty
Copy link

cocotyty commented Jun 29, 2021

It would definitely be much more preferable to focus on optimizations that apply to particular common uses of channels than to introduce new APIs.

if add the function to channel , is it feasible? Let channel owns the fast path.

Great idea ! Maybe fast path on 'chan struct{}' ?

@GuhuangLS
Copy link
Author

GuhuangLS commented Jun 30, 2021

It would definitely be much more preferable to focus on optimizations that apply to particular common uses of channels than to introduce new APIs.

if add the function to channel , is it feasible? Let channel owns the fast path.

Great idea ! Maybe fast path on 'chan struct{}' ?

Hope it is; Maybe need any other modification to use 'chan struct{}' directly;

@rsc
Copy link
Contributor

rsc commented Nov 10, 2021

OK, so what is on the current 'chan struct{}' path that needs optimizing away?

@rsc
Copy link
Contributor

rsc commented Nov 10, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Nov 10, 2021
@rsc
Copy link
Contributor

rsc commented Dec 1, 2021

It sounds like we are waiting on @GuhuangLS or someone else who knows about the use case (@prattmic?) to say exactly what needs optimizing in the chan struct{} path, but it also sounds like doing the optimizations should remove the need to add new API to runtime.

Do I have that right?

@ros201712
Copy link

It sounds like we are waiting on @GuhuangLS or someone else who knows about the use case (@prattmic?) to say exactly what needs optimizing in the chan struct{} path, but it also sounds like doing the optimizations should remove the need to add new API to runtime.

Do I have that right?

Hi rsc
Sorry to reply so late, as I am working other works recently.
I can find the method with no API modification, and welcome everybody to realize the new function.

@rsc
Copy link
Contributor

rsc commented Dec 8, 2021

OK, so it sounds like API modifications are not needed anymore?

@rsc rsc moved this from Active to Likely Decline in Proposals (old) Dec 15, 2021
@rsc
Copy link
Contributor

rsc commented Dec 15, 2021

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Likely Decline to Declined in Proposals (old) Jan 5, 2022
@rsc
Copy link
Contributor

rsc commented Jan 5, 2022

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc closed this as completed Jan 5, 2022
@golang golang locked and limited conversation to collaborators Jan 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

9 participants