Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: runtime: statefullize most 2 select blocks in loops at run time for each goroutine. #18846

Closed
go101 opened this issue Jan 29, 2017 · 12 comments

Comments

@go101
Copy link

go101 commented Jan 29, 2017

The intention of the proposal is to resolve two problems in the current stateless implementation.

  1. performance problem

Select is slow.
Statefullizing select blocks will reduce memory allocations, so less time is consumed.
Besides less memory allocations, we can use a simpler cases sorting method.
For example, assume there are 5 cases, the initial cases order is [0 1 2 3 4].
If case 2 is selected in the first pass, then the order changes to [0 1 3 4 2].
In other words, last selected case will get the smallest chance to be selected in the next pass.
By using this method, no cases will be in starvation status.

The cases order is cached in the hselect struct and the order can be stored as a linked list.

  1. delay of response to stop-doing-it signal problem

Following example is a common use pattern of select.

// one-select version
go func() {
	for {
		select {
		case <-stopSignal: 
			return
		case dataCh <- newValue():
			doSomething()
		}
	}
}()

Once the channel stopSignal becomes unblocked, the goroutine should exit in the next pass.
However when dataCh is a buffered channel, even if stopSignal becomes unblocked,
the second case may be still selected for several passes.
To make the goroutine exit as early as possible,
following version is adopted in practice instead :

// two-selects version
go func() {
	for {
		select {
		case <-stopSignal: 
			return
		default:
		}
		
		select {
		case <-stopSignal: 
			return
		case dataCh <- newValue():
			doSomething()
		}
	}
}()

Select is already slow, more selects will make programs even slower.

By using the cases sorting method mentioned above,
the stopSignal case will be selected once stopSignal becomes unblocked in the one-select version,
no needs to adopt the two-selects version.

=====================================

Caching any select blocks would be hard,
caching one would be not able to cover some common select use patterns.
So caching two would be a good compromise.
Or only cache the cases orders, which will use much less cache memory than cache the whole hselect struct.

This optimization may change the select behavior a bit,
but will not break the compatibility.

@minux
Copy link
Member

minux commented Jan 30, 2017 via email

@go101
Copy link
Author

go101 commented Jan 30, 2017

Are you proposing adding prioritizes to select cases?

part of. But only for select blocks in for loops.
Runtime will try to select the first case if it never get selected before, in a for loop.
But once it is selected in one pass, it will get the lowest chance to be selected in the next pass.

@minux
Copy link
Member

minux commented Jan 30, 2017 via email

@go101
Copy link
Author

go101 commented Jan 30, 2017

I think pseudorandom is not the fundamental purpose of the select mechanism, the fundamental purpose is to distribute chances on all cases averagely.
In fact the proposed sorting method will get a smaller variance than the current one when the number of passes in a loop is small.

@go101 go101 changed the title Proposal: statefullize most 2 select blocks at run time for each goroutine. Proposal: statefullize most 2 select blocks in loops at run time for each goroutine. Jan 30, 2017
@go101
Copy link
Author

go101 commented Jan 30, 2017

Maybe statefullizing all direct select blocks in a function calling is easier.
A function calling will not maintain states of the select blocks in its children callings.

@rsc
Copy link
Contributor

rsc commented Jan 30, 2017

Before making a proposal, it's important to first define the problem you are going to solve. Then the proposal is a potential solution to that problem.

It sounds like the problem you want to solve has to do with the performance of select. Can you post a benchmark showing the specific performance issue?

@rsc rsc changed the title Proposal: statefullize most 2 select blocks in loops at run time for each goroutine. proposal: runtime: statefullize most 2 select blocks in loops at run time for each goroutine. Jan 30, 2017
@go101
Copy link
Author

go101 commented Jan 31, 2017

The original intention for this proposal is to (efficiently) solve the second problem I mentioned in the first comment.

In the following program, it is expected no sends and receives performed.
But it is possible several sends and receives will still be performed.

package main

import "fmt"
import "time"

func main() {
	var stopSignal = make(chan struct{})
	var dataCh = make(chan int, 10)
	
	// just a test. In practice, this channel would be closed at any time.
	close(stopSignal)
	
	// sender
	go func() {
		for {
			select {
			case <-stopSignal: 
				return
			case dataCh <- 1:
				fmt.Println("sent")
				time.Sleep(time.Second / 2)
			}
		}
	}()
	
	// receiver
	go func() {
		for {
			select {
			case <-stopSignal: 
				return
			case <-dataCh:
				fmt.Println("received")
				time.Sleep(time.Second)
			}
		}
	}()
		sent
	time.Sleep(time.Second * 10)
}

// one possible output:
sent
received
sent
received
sent
sent
sent
sent

In other words, there are many sends and receives wasted in the above program.
(in theory, the two goroutines would never exit even if stopSignal is closed already, :), right?)
Sometimes it may be not a big problem, but sometimes it is.

To solve the problem, the below version is often adopted in practice.
There will be most one send and receive wasted in the below version.
(the proposed method will waste most 0.5 send and receive)

package main

import "fmt"
import "time"

func main() {
	var stopSignal = make(chan struct{})
	var dataCh = make(chan int, 10)
	
	// just a test
	close(stopSignal)
	
	// sender
	go func() {
		for {
			//>>> to make the sender exit as early as possible
			select {
			case <-stopSignal: 
				return
			default:
			}
			//<<<
			
			select {
			case <-stopSignal: 
				return
			case dataCh <- 1:
				fmt.Println("sent")
				time.Sleep(time.Second / 2)
			}
		}
	}()
	
	// receiver
	go func() {
		for {
			//>>> to make the receiver exit as early as possible
			select {
			case <-stopSignal: 
				return
			default:
			}
			//<<<
			
			select {
			case <-stopSignal: 
				return
			case <-dataCh:
				fmt.Println("received")
				time.Sleep(time.Second)
			}
		}
	}()
	
	time.Sleep(time.Second * 10)
}

However, in practice, for the most passes in the sender and receiver loop, stopSignal is not closed,
so the first select in each loop will do harm for performance.

The first problem I mentioned in the fist comment is really a by-product in solving the second problem.

@go101
Copy link
Author

go101 commented Jan 31, 2017

Ok, I benchmarked the two versions in the above comment, it is a surprise that the two-selects-in-loop version performs quite well, much better than I expected.

package main

import "fmt"
import "time"
import "runtime"
import "reflect"

func f1(dataCh chan int, stopSignal chan struct{}) {
	n := len(dataCh)
	for n > 0 {
		n--
		
		select {
		case <-stopSignal: 
			return
		case <-dataCh:
		}
	}
}

func f2(dataCh chan int, stopSignal chan struct{}) {
	n := len(dataCh)
	for n > 0 {
		n--
		
		select {
		case <-stopSignal: 
			return
		default:
		}
		
		select {
		case <-stopSignal: 
			return
		case <-dataCh:
		}
	}
}

func test(f func (chan int, chan struct{})) time.Duration {
	const N = 1000000
	var dataCh = make(chan int, N)
	for i := 0; i < N; i++ {
		dataCh <- 1
	}
	var stopSignal = make(chan struct{})
	
	// 
	runtime.GC()
	time.Sleep(2 * time.Second)
	t := time.Now()
	f(dataCh, stopSignal)
	d := time.Since(t)
	fmt.Printf("%s, %s \n", runtime.FuncForPC(reflect.ValueOf(f).Pointer()).Name(), d)
	return d
}

func main() {
	d1 := test(f1)
	d2 := test(f2)
	fmt.Printf("ratio: %d%% \n", 100 * d2 / d1)
}

// the output:
main.f1, 284.381026ms 
main.f2, 298.757031ms 
ratio: 105% 

@rsc which version do you recommend to use in practice?

@davecheney
Copy link
Contributor

@golang101 please write a proper testing.Benchmark benchmark.

@go101
Copy link
Author

go101 commented Jan 31, 2017

Here is a benchmark:

package main

import "testing"

const N = 100000

func BenchmarkOneSelect(b *testing.B) {
	var dataCh = make(chan int, N)
	close(dataCh)
	var stopSignal = make(chan struct{})
	
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		f1(dataCh, stopSignal)
	}
}

func BenchmarkTwoSelects(b *testing.B) {
	var dataCh = make(chan int, N)
	close(dataCh)
	var stopSignal = make(chan struct{})
	
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		f2(dataCh, stopSignal)
	}
}

func f1(dataCh chan int, stopSignal chan struct{}) {
	n := cap(dataCh)
	for n > 0 {
		n--
		
		select {
		case <-stopSignal: 
			return
		case <-dataCh:
		}
	}
}

func f2(dataCh chan int, stopSignal chan struct{}) {
	n := cap(dataCh)
	for n > 0 {
		n--
		
		select {
		case <-stopSignal: 
			return
		default:
		}
		
		select {
		case <-stopSignal: 
			return
		case <-dataCh:
		}
	}
}

I don't like writing go benchmarks, the results of go benchmarks wave too much. And sometimes, I must make some modifications to my program code to make it benchmarkable.

@rsc
Copy link
Contributor

rsc commented Feb 13, 2017

@golang101 That's fine but it's very helpful for us if you can write them in Go benchmark format because it means we can vary basic execution settings using the usual mechanisms.

The fact is, the language definition requires that f1 sometimes pick dataCh, even though stopSignal is always ready. There's no way around that without a language change, and I don't think there's nearly enough evidence for that.

Also the proposed change has flaws: select in a loop is special but select in a function called from within a loop is not? What if the function is inlined (either by the programmer or by the compiler)? Then the semantics change? This is not easy to understand or explain.

Also:

Once runtime finds the first one unblocked channel operation, just selects the corresponding case.

This is already true today. Select makes up an order and then picks the first one it finds that is ready to run.

The non-blocking select in f2 is also already optimized: it's not a real select, which is why f2 is only 5% slower than f1 despite having 100% more selects.

If there's any work here, I'd rather make the f1 select 5% faster, so that you will be happy with the overall performance of f2, instead of changing the semantics of select in a way that is subtle and difficult to explain.

@go101
Copy link
Author

go101 commented Feb 18, 2017

Yes, there are many details hard to handle in the proposed method if it is used as an overall method.

Would it be a good idea to let programmers assign a sort method for some special select blocks?
(which is discussed in this thread https://groups.google.com/forum/#!topic/golang-nuts/ZrVIhHCrR9o)

select (random | ordered | stateful) {
case <- stopCh:
    return
case value := <-dataCh:
}

random: the current implementation
ordered: by the case appearance order
stateful: the one proposed in this issue (but only for the specified select blocks).

I'm still not familiar with go compiler source, I would make some hacking and write some benchmarks to check if it is worth supporting the custom sort methods later.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants