runtime: blocked channel operations starvation #21187

johnsnow4514 · 2017-07-27T08:08:15Z

What version of Go are you using (`go version`)?

1.8.0

What operating system and processor architecture are you using (`go env`)?

windows/amd64

What did you do?

Related to #11506
I have a messaging app that generates messages and passes them using a channel to
go-routines whose job is to transform those messages into small packets of data which then passes
using another channel to go-routines whose job is to read the packets and send them using a socket.

What did you expect to see?

packets that are generated at some point in time should be sent around the same time.
there shouldn't be more than a few milliseconds between sending a packet to a channel and receiving it
on the receiving go-routine.

What did you see instead?

In some cases a go-routine might starve for like 30 seconds while other packets are transferred just fine.
This causes problems in my app and I had to implement my own wrap around a channel to insure FIFO
ordering which fixed the problem.
is there any plans to fix #11506 ? @rsc @randall77
it really causes weird and rare behaviors in production environments
Here's my implementation for FIFO channeling if anyone is facing the same problem...

package utils

import (
	"sync"
	"sync/atomic"
)

type FifoChannel struct {
	innerChan                       chan interface{}
	conditions                      map[uint64]*sync.WaitGroup
	mapLock                         *sync.RWMutex
	currentPriority, latestPriority *uint64
}

func NewFifoChannel(innerChan chan interface{}) *FifoChannel {
	zero1 := uint64(0)
	zero2 := uint64(0)
	return &FifoChannel{
		innerChan:       innerChan,
		currentPriority: &zero1,
		latestPriority:  &zero2,
		conditions:      make(map[uint64]*sync.WaitGroup),
		mapLock:         new(sync.RWMutex),
	}
}

func (fc *FifoChannel) Add(item interface{}) {
	myPriority := atomic.AddUint64(fc.latestPriority, 1) - 1
	if myPriority != atomic.LoadUint64(fc.currentPriority) {
		gate := new(sync.WaitGroup)
		gate.Add(1)
		fc.mapLock.Lock()
		fc.conditions[myPriority] = gate
		fc.mapLock.Unlock()
		if myPriority != atomic.LoadUint64(fc.currentPriority) {
			gate.Wait()
		}
		fc.mapLock.Lock()
		delete(fc.conditions, myPriority)
		fc.mapLock.Unlock()
	}
	fc.innerChan <- item
	atomic.AddUint64(fc.currentPriority, 1)
	fc.mapLock.RLock()
	if gate, exist := fc.conditions[myPriority+1]; exist {
		gate.Done()
	}
	fc.mapLock.RUnlock()
}

func (fc *FifoChannel) Get() interface{} {
	return <-fc.innerChan
}

The text was updated successfully, but these errors were encountered:

mvdan · 2017-07-27T09:11:46Z

Have you tried on 1.9rc1? What about other operating systems, like Linux?

Also CC @aclements

johnsnow4514 · 2017-07-27T11:36:22Z

@mvdan not yet. I find it hard to reproduce in my program.
The thing is, I am using my FIFO channel only with the packets producer/consumer logic
After 6 months in production the problem occurred on some higher level channel (on messages channel, instead of the packets channel).
The symptom was that some messages starved for around 30-40 seconds and created some kind of a timeout.
Fortunately I also found this issue #6205 which linked to a reproducible playground code
I'll soon test this code on 1.9rc1 and linux

ianlancetaylor · 2017-07-27T19:02:43Z

I don't see a bug in the playground code in the last comment. It's true that the number of values sent on the different channels can be very unequal, but that is not a bug. It also doesn't seem to directly correspond to the original problem description, which was about multiple receiving goroutines rather than multiple sending goroutines.

With multiple receiving goroutines, it's still not a bug if one of them is starved. Why does it matter as long as all the values sent on the channel are being delivered to some goroutine?

johnsnow4514 · 2017-07-29T08:31:18Z

@ianlancetaylor I was talking about multiple sending go-routines. One can starve and it's message
will not be passed on for a long time.

ianlancetaylor · 2017-07-29T16:47:57Z

Your playground example does show that if the sending goroutines do a busy loop, then one or more them can starve sending a value. I'm not sure what, if anything, we can do about that. As you say above, it's possible to restructure the program to avoid the problem. Other than that, it's usually a good idea for a single processor to keep running the same set of goroutines, and it's usually a good idea to start the goroutine receiving from a channel when a channel is full (though I'm not sure the current scheduler actually does this). For an example like your playground example, it's possible for the sending and receiving goroutine to be on the same processor, and for that processor to keep flipping back and forth and, apparently, to tie up the channel sufficiently that no other goroutine gets in. But anything we do to change that will tend to slow down the scheduler and may penalize other programs. It's not obvious to me that we want to optimize for the case of multiple goroutines sending to the same channel in a busy loop. Of course if you have any suggestions I would like to hear them.

johnsnow4514 · 2017-07-29T17:00:53Z

I think @rsc derscribed it best (#11506)

Especially since so many Go programs care about latency, making channels first come, first served seems worth doing. And it looks to me like it can be done trivially and at no performance cost. The current behavior looks more like a bug than an intended result.

I believe that a channel should be FIFO when multiple senders are blocked.
The first sender to get blocked should be the first to be released.

RLH · 2017-07-29T19:34:28Z

#11506 <#11506> was not implemented due to the last two comments on that issue. Neither this thread nor the literature has added any additional performance numbers, throughput or latency, to warrant reconsideration. On the other hand the conjecture that other schedules can be built on top of drive-by/barging has been confirmed with the simple code provided above.

…

On Sat, Jul 29, 2017 at 1:01 PM, johnsnow4514 ***@***.***> wrote: I think @rsc <https://github.com/rsc> derscribed it best (#11506 <#11506>) Especially since so many Go programs care about latency, making channels first come, first served seems worth doing. And it looks to me like it can be done trivially and at no performance cost. The current behavior looks more like a bug than an intended result. I believe that a channel should be FIFO when multiple senders are blocked. The first sender to get blocked should be the first to be released. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#21187 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA7Wn3xjHen3h9HAe5gybYCgTm9teYdtks5sS2VNgaJpZM4Ok7vj> .

ianlancetaylor · 2018-03-30T06:43:49Z

As far as I can tell there is nothing to do here. Please comment if you disagree.

mvdan changed the title ~~blocked channel operations starvation~~ runtime: blocked channel operations starvation Jul 27, 2017

ianlancetaylor closed this as completed Mar 30, 2018

golang locked and limited conversation to collaborators Mar 30, 2019

gopherbot added the FrozenDueToAge label Mar 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: blocked channel operations starvation #21187

runtime: blocked channel operations starvation #21187

johnsnow4514 commented Jul 27, 2017 •

edited by mvdan

mvdan commented Jul 27, 2017

johnsnow4514 commented Jul 27, 2017

ianlancetaylor commented Jul 27, 2017

johnsnow4514 commented Jul 29, 2017

ianlancetaylor commented Jul 29, 2017

johnsnow4514 commented Jul 29, 2017

RLH commented Jul 29, 2017 via email

ianlancetaylor commented Mar 30, 2018

runtime: blocked channel operations starvation #21187

runtime: blocked channel operations starvation #21187

Comments

johnsnow4514 commented Jul 27, 2017 • edited by mvdan

What version of Go are you using (go version)?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

mvdan commented Jul 27, 2017

johnsnow4514 commented Jul 27, 2017

ianlancetaylor commented Jul 27, 2017

johnsnow4514 commented Jul 29, 2017

ianlancetaylor commented Jul 29, 2017

johnsnow4514 commented Jul 29, 2017

RLH commented Jul 29, 2017 via email

ianlancetaylor commented Mar 30, 2018

johnsnow4514 commented Jul 27, 2017 •

edited by mvdan

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?