proposal: runtime: statefullize most 2 select blocks in loops at run time for each goroutine. #18846

go101 · 2017-01-29T19:34:37Z

The intention of the proposal is to resolve two problems in the current stateless implementation.

performance problem

Select is slow.
Statefullizing select blocks will reduce memory allocations, so less time is consumed.
Besides less memory allocations, we can use a simpler cases sorting method.
For example, assume there are 5 cases, the initial cases order is [0 1 2 3 4].
If case 2 is selected in the first pass, then the order changes to [0 1 3 4 2].
In other words, last selected case will get the smallest chance to be selected in the next pass.
By using this method, no cases will be in starvation status.

The cases order is cached in the hselect struct and the order can be stored as a linked list.

delay of response to stop-doing-it signal problem

Following example is a common use pattern of select.

// one-select version
go func() {
	for {
		select {
		case <-stopSignal: 
			return
		case dataCh <- newValue():
			doSomething()
		}
	}
}()

Once the channel stopSignal becomes unblocked, the goroutine should exit in the next pass.
However when dataCh is a buffered channel, even if stopSignal becomes unblocked,
the second case may be still selected for several passes.
To make the goroutine exit as early as possible,
following version is adopted in practice instead :

// two-selects version
go func() {
	for {
		select {
		case <-stopSignal: 
			return
		default:
		}
		
		select {
		case <-stopSignal: 
			return
		case dataCh <- newValue():
			doSomething()
		}
	}
}()

Select is already slow, more selects will make programs even slower.

By using the cases sorting method mentioned above,
the stopSignal case will be selected once stopSignal becomes unblocked in the one-select version,
no needs to adopt the two-selects version.

=====================================

Caching any select blocks would be hard,
caching one would be not able to cover some common select use patterns.
So caching two would be a good compromise.
Or only cache the cases orders, which will use much less cache memory than cache the whole hselect struct.

This optimization may change the select behavior a bit,
but will not break the compatibility.

The text was updated successfully, but these errors were encountered:

minux · 2017-01-30T14:34:49Z

I read the issue a few times but frankly I couldn't understand what are you proposing. Are you proposing adding prioritizes to select cases?

go101 · 2017-01-30T14:52:28Z

Are you proposing adding prioritizes to select cases?

part of. But only for select blocks in for loops.
Runtime will try to select the first case if it never get selected before, in a for loop.
But once it is selected in one pass, it will get the lowest chance to be selected in the next pass.

minux · 2017-01-30T14:55:21Z

That's definitely an incompatible behavior change. Select cases are specified to be selected pseudorandomly and there are programs whose correctness depends on that property.

go101 · 2017-01-30T15:09:55Z

I think pseudorandom is not the fundamental purpose of the select mechanism, the fundamental purpose is to distribute chances on all cases averagely.
In fact the proposed sorting method will get a smaller variance than the current one when the number of passes in a loop is small.

go101 · 2017-01-30T15:47:53Z

Maybe statefullizing all direct select blocks in a function calling is easier.
A function calling will not maintain states of the select blocks in its children callings.

rsc · 2017-01-30T20:47:42Z

Before making a proposal, it's important to first define the problem you are going to solve. Then the proposal is a potential solution to that problem.

It sounds like the problem you want to solve has to do with the performance of select. Can you post a benchmark showing the specific performance issue?

go101 · 2017-01-31T04:37:56Z

The original intention for this proposal is to (efficiently) solve the second problem I mentioned in the first comment.

In the following program, it is expected no sends and receives performed.
But it is possible several sends and receives will still be performed.

package main

import "fmt"
import "time"

func main() {
	var stopSignal = make(chan struct{})
	var dataCh = make(chan int, 10)
	
	// just a test. In practice, this channel would be closed at any time.
	close(stopSignal)
	
	// sender
	go func() {
		for {
			select {
			case <-stopSignal: 
				return
			case dataCh <- 1:
				fmt.Println("sent")
				time.Sleep(time.Second / 2)
			}
		}
	}()
	
	// receiver
	go func() {
		for {
			select {
			case <-stopSignal: 
				return
			case <-dataCh:
				fmt.Println("received")
				time.Sleep(time.Second)
			}
		}
	}()
		sent
	time.Sleep(time.Second * 10)
}

// one possible output:
sent
received
sent
received
sent
sent
sent
sent

In other words, there are many sends and receives wasted in the above program.
(in theory, the two goroutines would never exit even if stopSignal is closed already, :), right?)
Sometimes it may be not a big problem, but sometimes it is.

To solve the problem, the below version is often adopted in practice.
There will be most one send and receive wasted in the below version.
(the proposed method will waste most 0.5 send and receive)

package main

import "fmt"
import "time"

func main() {
	var stopSignal = make(chan struct{})
	var dataCh = make(chan int, 10)
	
	// just a test
	close(stopSignal)
	
	// sender
	go func() {
		for {
			//>>> to make the sender exit as early as possible
			select {
			case <-stopSignal: 
				return
			default:
			}
			//<<<
			
			select {
			case <-stopSignal: 
				return
			case dataCh <- 1:
				fmt.Println("sent")
				time.Sleep(time.Second / 2)
			}
		}
	}()
	
	// receiver
	go func() {
		for {
			//>>> to make the receiver exit as early as possible
			select {
			case <-stopSignal: 
				return
			default:
			}
			//<<<
			
			select {
			case <-stopSignal: 
				return
			case <-dataCh:
				fmt.Println("received")
				time.Sleep(time.Second)
			}
		}
	}()
	
	time.Sleep(time.Second * 10)
}

However, in practice, for the most passes in the sender and receiver loop, stopSignal is not closed,
so the first select in each loop will do harm for performance.

The first problem I mentioned in the fist comment is really a by-product in solving the second problem.

go101 · 2017-01-31T04:45:59Z

Ok, I benchmarked the two versions in the above comment, it is a surprise that the two-selects-in-loop version performs quite well, much better than I expected.

package main

import "fmt"
import "time"
import "runtime"
import "reflect"

func f1(dataCh chan int, stopSignal chan struct{}) {
	n := len(dataCh)
	for n > 0 {
		n--
		
		select {
		case <-stopSignal: 
			return
		case <-dataCh:
		}
	}
}

func f2(dataCh chan int, stopSignal chan struct{}) {
	n := len(dataCh)
	for n > 0 {
		n--
		
		select {
		case <-stopSignal: 
			return
		default:
		}
		
		select {
		case <-stopSignal: 
			return
		case <-dataCh:
		}
	}
}

func test(f func (chan int, chan struct{})) time.Duration {
	const N = 1000000
	var dataCh = make(chan int, N)
	for i := 0; i < N; i++ {
		dataCh <- 1
	}
	var stopSignal = make(chan struct{})
	
	// 
	runtime.GC()
	time.Sleep(2 * time.Second)
	t := time.Now()
	f(dataCh, stopSignal)
	d := time.Since(t)
	fmt.Printf("%s, %s \n", runtime.FuncForPC(reflect.ValueOf(f).Pointer()).Name(), d)
	return d
}

func main() {
	d1 := test(f1)
	d2 := test(f2)
	fmt.Printf("ratio: %d%% \n", 100 * d2 / d1)
}

// the output:
main.f1, 284.381026ms 
main.f2, 298.757031ms 
ratio: 105%

@rsc which version do you recommend to use in practice?

davecheney · 2017-01-31T04:48:50Z

@golang101 please write a proper testing.Benchmark benchmark.

go101 · 2017-01-31T05:39:16Z

Here is a benchmark:

package main

import "testing"

const N = 100000

func BenchmarkOneSelect(b *testing.B) {
	var dataCh = make(chan int, N)
	close(dataCh)
	var stopSignal = make(chan struct{})
	
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		f1(dataCh, stopSignal)
	}
}

func BenchmarkTwoSelects(b *testing.B) {
	var dataCh = make(chan int, N)
	close(dataCh)
	var stopSignal = make(chan struct{})
	
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		f2(dataCh, stopSignal)
	}
}

func f1(dataCh chan int, stopSignal chan struct{}) {
	n := cap(dataCh)
	for n > 0 {
		n--
		
		select {
		case <-stopSignal: 
			return
		case <-dataCh:
		}
	}
}

func f2(dataCh chan int, stopSignal chan struct{}) {
	n := cap(dataCh)
	for n > 0 {
		n--
		
		select {
		case <-stopSignal: 
			return
		default:
		}
		
		select {
		case <-stopSignal: 
			return
		case <-dataCh:
		}
	}
}

I don't like writing go benchmarks, the results of go benchmarks wave too much. And sometimes, I must make some modifications to my program code to make it benchmarkable.

rsc · 2017-02-13T20:59:39Z

@golang101 That's fine but it's very helpful for us if you can write them in Go benchmark format because it means we can vary basic execution settings using the usual mechanisms.

The fact is, the language definition requires that f1 sometimes pick dataCh, even though stopSignal is always ready. There's no way around that without a language change, and I don't think there's nearly enough evidence for that.

Also the proposed change has flaws: select in a loop is special but select in a function called from within a loop is not? What if the function is inlined (either by the programmer or by the compiler)? Then the semantics change? This is not easy to understand or explain.

Also:

Once runtime finds the first one unblocked channel operation, just selects the corresponding case.

This is already true today. Select makes up an order and then picks the first one it finds that is ready to run.

The non-blocking select in f2 is also already optimized: it's not a real select, which is why f2 is only 5% slower than f1 despite having 100% more selects.

If there's any work here, I'd rather make the f1 select 5% faster, so that you will be happy with the overall performance of f2, instead of changing the semantics of select in a way that is subtle and difficult to explain.

go101 · 2017-02-18T05:16:27Z

Yes, there are many details hard to handle in the proposed method if it is used as an overall method.

Would it be a good idea to let programmers assign a sort method for some special select blocks?
(which is discussed in this thread https://groups.google.com/forum/#!topic/golang-nuts/ZrVIhHCrR9o)

select (random | ordered | stateful) {
case <- stopCh:
    return
case value := <-dataCh:
}

random: the current implementation
ordered: by the case appearance order
stateful: the one proposed in this issue (but only for the specified select blocks).

I'm still not familiar with go compiler source, I would make some hacking and write some benchmarks to check if it is worth supporting the custom sort methods later.

go101 changed the title ~~Proposal: statefullize most 2 select blocks at run time for each goroutine.~~ Proposal: statefullize most 2 select blocks in loops at run time for each goroutine. Jan 30, 2017

rsc changed the title ~~Proposal: statefullize most 2 select blocks in loops at run time for each goroutine.~~ proposal: runtime: statefullize most 2 select blocks in loops at run time for each goroutine. Jan 30, 2017

rsc closed this as completed Feb 13, 2017

rsc added the Proposal-Declined label Feb 13, 2017

go101 mentioned this issue Apr 3, 2017

proposal: runtime: select should panic sending to closed channels #19820

Closed

go101 mentioned this issue Sep 14, 2017

proposal: add a runttime.SetGoroutineOption API #21894

Closed

golang locked and limited conversation to collaborators Sep 14, 2018

gopherbot added the FrozenDueToAge label Sep 14, 2018

FiloSottile added the Proposal label May 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: runtime: statefullize most 2 select blocks in loops at run time for each goroutine. #18846

proposal: runtime: statefullize most 2 select blocks in loops at run time for each goroutine. #18846

go101 commented Jan 29, 2017 •

edited

minux commented Jan 30, 2017 via email

go101 commented Jan 30, 2017

minux commented Jan 30, 2017 via email

go101 commented Jan 30, 2017 •

edited

go101 commented Jan 30, 2017 •

edited

rsc commented Jan 30, 2017

go101 commented Jan 31, 2017 •

edited

go101 commented Jan 31, 2017 •

edited

davecheney commented Jan 31, 2017

go101 commented Jan 31, 2017 •

edited

rsc commented Feb 13, 2017

go101 commented Feb 18, 2017 •

edited

proposal: runtime: statefullize most 2 select blocks in loops at run time for each goroutine. #18846

proposal: runtime: statefullize most 2 select blocks in loops at run time for each goroutine. #18846

Comments

go101 commented Jan 29, 2017 • edited

minux commented Jan 30, 2017 via email

go101 commented Jan 30, 2017

minux commented Jan 30, 2017 via email

go101 commented Jan 30, 2017 • edited

go101 commented Jan 30, 2017 • edited

rsc commented Jan 30, 2017

go101 commented Jan 31, 2017 • edited

go101 commented Jan 31, 2017 • edited

davecheney commented Jan 31, 2017

go101 commented Jan 31, 2017 • edited

rsc commented Feb 13, 2017

go101 commented Feb 18, 2017 • edited

go101 commented Jan 29, 2017 •

edited

go101 commented Jan 30, 2017 •

edited

go101 commented Jan 30, 2017 •

edited

go101 commented Jan 31, 2017 •

edited

go101 commented Jan 31, 2017 •

edited

go101 commented Jan 31, 2017 •

edited

go101 commented Feb 18, 2017 •

edited