Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: blocked channel operations starvation #21187

Closed
johnsnow4514 opened this issue Jul 27, 2017 · 8 comments
Closed

runtime: blocked channel operations starvation #21187

johnsnow4514 opened this issue Jul 27, 2017 · 8 comments

Comments

@johnsnow4514
Copy link

johnsnow4514 commented Jul 27, 2017

What version of Go are you using (go version)?

1.8.0

What operating system and processor architecture are you using (go env)?

windows/amd64

What did you do?

Related to #11506
I have a messaging app that generates messages and passes them using a channel to
go-routines whose job is to transform those messages into small packets of data which then passes
using another channel to go-routines whose job is to read the packets and send them using a socket.

What did you expect to see?

packets that are generated at some point in time should be sent around the same time.
there shouldn't be more than a few milliseconds between sending a packet to a channel and receiving it
on the receiving go-routine.

What did you see instead?

In some cases a go-routine might starve for like 30 seconds while other packets are transferred just fine.
This causes problems in my app and I had to implement my own wrap around a channel to insure FIFO
ordering which fixed the problem.
is there any plans to fix #11506 ? @rsc @randall77
it really causes weird and rare behaviors in production environments
Here's my implementation for FIFO channeling if anyone is facing the same problem...

package utils

import (
	"sync"
	"sync/atomic"
)

type FifoChannel struct {
	innerChan                       chan interface{}
	conditions                      map[uint64]*sync.WaitGroup
	mapLock                         *sync.RWMutex
	currentPriority, latestPriority *uint64
}

func NewFifoChannel(innerChan chan interface{}) *FifoChannel {
	zero1 := uint64(0)
	zero2 := uint64(0)
	return &FifoChannel{
		innerChan:       innerChan,
		currentPriority: &zero1,
		latestPriority:  &zero2,
		conditions:      make(map[uint64]*sync.WaitGroup),
		mapLock:         new(sync.RWMutex),
	}
}

func (fc *FifoChannel) Add(item interface{}) {
	myPriority := atomic.AddUint64(fc.latestPriority, 1) - 1
	if myPriority != atomic.LoadUint64(fc.currentPriority) {
		gate := new(sync.WaitGroup)
		gate.Add(1)
		fc.mapLock.Lock()
		fc.conditions[myPriority] = gate
		fc.mapLock.Unlock()
		if myPriority != atomic.LoadUint64(fc.currentPriority) {
			gate.Wait()
		}
		fc.mapLock.Lock()
		delete(fc.conditions, myPriority)
		fc.mapLock.Unlock()
	}
	fc.innerChan <- item
	atomic.AddUint64(fc.currentPriority, 1)
	fc.mapLock.RLock()
	if gate, exist := fc.conditions[myPriority+1]; exist {
		gate.Done()
	}
	fc.mapLock.RUnlock()
}

func (fc *FifoChannel) Get() interface{} {
	return <-fc.innerChan
}
@mvdan mvdan changed the title blocked channel operations starvation runtime: blocked channel operations starvation Jul 27, 2017
@mvdan
Copy link
Member

mvdan commented Jul 27, 2017

Have you tried on 1.9rc1? What about other operating systems, like Linux?

Also CC @aclements

@johnsnow4514
Copy link
Author

@mvdan not yet. I find it hard to reproduce in my program.
The thing is, I am using my FIFO channel only with the packets producer/consumer logic
After 6 months in production the problem occurred on some higher level channel (on messages channel, instead of the packets channel).
The symptom was that some messages starved for around 30-40 seconds and created some kind of a timeout.
Fortunately I also found this issue #6205 which linked to a reproducible playground code
I'll soon test this code on 1.9rc1 and linux

@ianlancetaylor
Copy link
Contributor

I don't see a bug in the playground code in the last comment. It's true that the number of values sent on the different channels can be very unequal, but that is not a bug. It also doesn't seem to directly correspond to the original problem description, which was about multiple receiving goroutines rather than multiple sending goroutines.

With multiple receiving goroutines, it's still not a bug if one of them is starved. Why does it matter as long as all the values sent on the channel are being delivered to some goroutine?

@johnsnow4514
Copy link
Author

@ianlancetaylor I was talking about multiple sending go-routines. One can starve and it's message
will not be passed on for a long time.

@ianlancetaylor
Copy link
Contributor

Your playground example does show that if the sending goroutines do a busy loop, then one or more them can starve sending a value. I'm not sure what, if anything, we can do about that. As you say above, it's possible to restructure the program to avoid the problem. Other than that, it's usually a good idea for a single processor to keep running the same set of goroutines, and it's usually a good idea to start the goroutine receiving from a channel when a channel is full (though I'm not sure the current scheduler actually does this). For an example like your playground example, it's possible for the sending and receiving goroutine to be on the same processor, and for that processor to keep flipping back and forth and, apparently, to tie up the channel sufficiently that no other goroutine gets in. But anything we do to change that will tend to slow down the scheduler and may penalize other programs. It's not obvious to me that we want to optimize for the case of multiple goroutines sending to the same channel in a busy loop. Of course if you have any suggestions I would like to hear them.

@johnsnow4514
Copy link
Author

I think @rsc derscribed it best (#11506)

Especially since so many Go programs care about latency, making channels first come, first served seems worth doing. And it looks to me like it can be done trivially and at no performance cost. The current behavior looks more like a bug than an intended result.

I believe that a channel should be FIFO when multiple senders are blocked.
The first sender to get blocked should be the first to be released.

@RLH
Copy link
Contributor

RLH commented Jul 29, 2017 via email

@ianlancetaylor
Copy link
Contributor

As far as I can tell there is nothing to do here. Please comment if you disagree.

@golang golang locked and limited conversation to collaborators Mar 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants