Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: inconsistent mutex state #41603

Open
UnightSun opened this issue Sep 24, 2020 · 21 comments
Open

sync: inconsistent mutex state #41603

UnightSun opened this issue Sep 24, 2020 · 21 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@UnightSun
Copy link

My code face "sync: inconsistent mutex state" panic, and I'm trying to fix it.

throw("sync: inconsistent mutex state")

I wonder should this line be old = atomic.LoadInt32(&m.state)?

old = m.state

It seems like there will be concurrent write in line: 212, and this line cannot guarantee old value is correct.

I found https://groups.google.com/g/golang-nuts/c/Eq4CWTB6LhM/m/F0Lbgnfq4hQJ this post said it's a bug,
but 7 years past, it's still there, any purpose? Or is it safe to read int value from a pointer's field?

sorry for my poor english

@UnightSun UnightSun changed the title sync: inconsistent mutex state sync/mutex sync: inconsistent mutex state Sep 24, 2020
@UnightSun UnightSun changed the title sync/mutex sync: inconsistent mutex state sync/mutex: sync: inconsistent mutex state Sep 24, 2020
@davecheney
Copy link
Contributor

This looks like memory corruption. Have you tried running your program under the race detector? See https://blog.golang.org/race-detector .

@UnightSun
Copy link
Author

This looks like memory corruption. Have you tried running your program under the race detector? See https://blog.golang.org/race-detector .

unfortunately my code running in produce environment that I have no fully control, and It cannot be reappeared in dev environment. I wrote a little test case but no race found.

@davecheney
Copy link
Contributor

Can you please provide the full panic message. Thank you

@UnightSun
Copy link
Author

UnightSun commented Sep 24, 2020

fatal error: sync: inconsistent mutex state

goroutine 3571787 [running]:
runtime.throw(0x4e6ccbc, 0x1e)
        /usr/local/go/src/runtime/panic.go:1116 +0x72 fp=0xc0021dd378 sp=0xc0021dd348 pc=0x439512
sync.throw(0x4e6ccbc, 0x1e)
        /usr/local/go/src/runtime/panic.go:1102 +0x35 fp=0xc0021dd398 sp=0xc0021dd378 pc=0x46cbb5
sync.(*Mutex).lockSlow(0x8313a38)
        /usr/local/go/src/sync/mutex.go:147 +0x165 fp=0xc0021dd3e0 sp=0xc0021dd398 pc=0x47d2a5
sync.(*Mutex).Lock(...)
        /usr/local/go/src/sync/mutex.go:81
sync.(*Pool).pinSlow(0xc00007ec68, 0x0, 0x0)
        /usr/local/go/src/sync/pool.go:213 +0x271 fp=0xc0021dd460 sp=0xc0021dd3e0 pc=0x47de31
sync.(*Pool).pin(0xc00007ec68, 0xc00bb1fab2, 0x4)
        /usr/local/go/src/sync/pool.go:206 +0x65 fp=0xc0021dd488 sp=0xc0021dd460 pc=0x47db85
sync.(*Pool).Get(0xc00007ec68, 0x113, 0x1)
        /usr/local/go/src/sync/pool.go:128 +0x2f fp=0xc0021dd4d0 sp=0xc0021dd488 pc=0x47d82f

then will be business logic, take something from pool.

my go version is 1.14

@cagedmantis cagedmantis changed the title sync/mutex: sync: inconsistent mutex state sync: inconsistent mutex state Sep 28, 2020
@cagedmantis cagedmantis added the WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. label Sep 28, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Sep 28, 2020
@gopherbot
Copy link

Timed out in state WaitingForInfo. Closing.

(I am just a bot, though. Please speak up if this is a mistake or you have the requested information.)

@beaquant
Copy link

what's the final solution? I catch this bug again.
go version go1.16.5 linux/amd64

fatal error: sync: inconsistent mutex state

goroutine 3295163 [running]:
runtime.throw(0xa5268d, 0x1e)
	/home/ubuntu/go/src/runtime/panic.go:1117 +0x72 fp=0xc000623ec8 sp=0xc000623e98 pc=0x436292
sync.throw(0xa5268d, 0x1e)
	/home/ubuntu/go/src/runtime/panic.go:1103 +0x35 fp=0xc000623ee8 sp=0xc000623ec8 pc=0x467575
sync.(*Mutex).lockSlow(0x1582030)
	/home/ubuntu/go/src/sync/mutex.go:125 +0x1fb fp=0xc000623f38 sp=0xc000623ee8 pc=0x483f5b
sync.(*Mutex).Lock(...)
	/home/ubuntu/go/src/sync/mutex.go:81
sync.(*RWMutex).Lock(0x1582030)
	/home/ubuntu/go/src/sync/rwmutex.go:111 +0x90 fp=0xc000623f60 sp=0xc000623f38 pc=0x485570
github.com/hftquanter/hftquant/dump.Update(0x405ea91d9571ecac, 0x0, 0x3fb70a3d70a3d70a, 0xc014000000000000, 0x4032a872b020c49c, 0x0, 0x40329d4fdf3b645a, 0x0)
	/home/ubuntu/gopath/src/github.com/hftquanter/hftquant/dump/dump.go:27 +0x3e fp=0xc000623fa0 sp=0xc000623f60 pc=0x7e385e
runtime.goexit()
	/home/ubuntu/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc000623fa8 sp=0xc000623fa0 pc=0x46bfa1

@networkimprov
Copy link

@beaquant can you provide a complete program that demonstrates the problem?

cc @ianlancetaylor

@ianlancetaylor
Copy link
Contributor

It's very unlikely that we will be able to fix this issue if we can't reproduce it.

I strongly encourage you to find a way to run the program under the race detector, as that is the most likely cause of the problem. Nobody else is reporting this problem, so the odds are against it being a bug in the standard library. In any case, if it is a bug in the standard library, we will need a way to reproduce the problem so that we can analyze it.

The lines old = m.state in sync/mutex.go are fine. The old value is only used as an input to atomic.CompareAndSwapInt32.

@ianlancetaylor ianlancetaylor added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. and removed WaitingForInfo Issue is not actionable because of missing required information, which needs to be provided. labels Jul 13, 2021
@ianlancetaylor
Copy link
Contributor

gopherbot, please don't do that.

@ianlancetaylor ianlancetaylor modified the milestones: Backlog, Unplanned Jul 13, 2021
@darellwu
Copy link

darellwu commented Aug 2, 2021

for
"
The lines old = m.state in sync/mutex.go are fine. The old value is only used as an input to atomic.CompareAndSwapInt32.
"

It's very unlikely that we will be able to fix this issue if we can't reproduce it.

I strongly encourage you to find a way to run the program under the race detector, as that is the most likely cause of the problem. Nobody else is reporting this problem, so the odds are against it being a bug in the standard library. In any case, if it is a bug in the standard library, we will need a way to reproduce the problem so that we can analyze it.

The lines old = m.state in sync/mutex.go are fine. The old value is only used as an input to atomic.CompareAndSwapInt32.

this said The old value is only used as an input to atomic.CompareAndSwapInt32.
it may ignore line 97
"
if old&(mutexLocked|mutexStarving) == mutexLocked && runtime_canSpin(iter) {
// Active spinning makes sense.
// Try to set mutexWoken flag to inform Unlock
// to not wake other blocked goroutines.
if !awoke && old&mutexWoken == 0 && old>>mutexWaiterShift != 0 &&
atomic.CompareAndSwapInt32(&m.state, old, old|mutexWoken) {
awoke = true
}
runtime_doSpin()
iter++
old = m.state
continue
}
"

old>>mutexWaiterShift != 0

here also use old value.

and the old of state may also change when awakened.
and the number of waiter will also change.

I think direct old = m.state is not safe in a concurrent scenario,why not old = atomic.LoadInt32(&m.state)?
This makes me quite confused.

@ianlancetaylor
Copy link
Contributor

The expression old>>mutexWaiterShift != 0 may use an older value of old. That is true. But it doesn't matter because the same condition also calls atomic.CompareAndSwapInt32(&m.state, old, old|mutexWoken). If the value of old is out of date, the call to atomic.CompareAndSwapInt32 will fail and the code will continue executing.

and the old of state may also change when awakened.
and the number of waiter will also change.

I'm sorry, I don't understand how this could happen. The lockSlow method only sleeps at one point, at the call to runtime_SemacquireMutex. It immediately reloads old after it wakes up. And while it may possibly pick up an out-of-date value of m.state, it still doesn't matter.

I want to repeat that we are not seeing any bug reports about mutexes other than this one. Perhaps there is something wrong with this code, but it is not showing up as problems that people are reporting. If there is a real problem here, it should be possible to write a program that reproduces the problem.

@darellwu
Copy link

darellwu commented Aug 6, 2021

The expression old>>mutexWaiterShift != 0 may use an older value of old. That is true. But it doesn't matter because the same condition also calls atomic.CompareAndSwapInt32(&m.state, old, old|mutexWoken). If the value of old is out of date, the call to atomic.CompareAndSwapInt32 will fail and the code will continue executing.

and the old of state may also change when awakened.
and the number of waiter will also change.

I'm sorry, I don't understand how this could happen. The lockSlow method only sleeps at one point, at the call to runtime_SemacquireMutex. It immediately reloads old after it wakes up. And while it may possibly pick up an out-of-date value of m.state, it still doesn't matter.

I want to repeat that we are not seeing any bug reports about mutexes other than this one. Perhaps there is something wrong with this code, but it is not showing up as problems that people are reporting. If there is a real problem here, it should be possible to write a program that reproduces the problem.


After all, using a variable is for reading and writing.

If there is a real problem here, it should be possible to write a program that reproduces the problem.

The point of attention in the sentence you said above is completely written in this respect.
I think the starting point here is correct.

But you have a lot of reading judgments that depend on this variable,
for example:

            // Don't try to acquire starving mutex, new arriving goroutines must queue.
	if old&mutexStarving == 0 {
		new |= mutexLocked
	}
	if old&(mutexLocked|mutexStarving) != 0 {
		new += 1 << mutexWaiterShift
	}
	// The current goroutine switches mutex to starvation mode.
	// But if the mutex is currently unlocked, don't do the switch.
	// Unlock expects that starving mutex has waiters, which will not
	// be true in this case.
	if starving && old&mutexLocked != 0 {
		new |= mutexStarving
	}

If there are even more cases, this is an expired value,is this judgment meaningful?(Does this fit the contextual business semantics?)
And, will a goroutine(suppose it can no longer spin) always fail to meet the conditions of line 129 ‘atomic.CompareAndSwapInt32(&m.state, old, new)’?

Will it be in a for loop instead of waiting in runtime_SemacquireMutex?
so I still think it is dangerous to read without synchronization 'old = m.state'.

I think the reason why bugs do not often appear is because of the previous spin, and the other is that the goroutine is scheduled once every 10ms.

Thank you!

@ianlancetaylor
Copy link
Contributor

Changing the variable new does nothing by itself. The only way that new affects anything is if the atomic.CompareAndSwapInt32 succeeds. And that will only succeed if m.state still holds the value old, the value that caused the changes to new to occur.

@darellwu
Copy link

darellwu commented Aug 7, 2021

I don't think we're getting to the same point. If the atomic.CompareAndSwapInt32 is successful, the state will become the new value. This means that the new value also becomes the old value. We need to loop and think again, and go back to what I said above.

Thanks.

@bingzhuo2008
Copy link

hello, has this been resolved?

@emptyfort
Copy link

I also had this issue by using sync.pool

@andig
Copy link
Contributor

andig commented Sep 25, 2022

I also had this issue by using sync.pool

I don't think that will help any. Note comment above:

I want to repeat that we are not seeing any bug reports about mutexes other than this one. Perhaps there is something wrong with this code, but it is not showing up as problems that people are reporting. If there is a real problem here, it should be possible to write a program that reproduces the problem.

@bingzhuo2008
Copy link

I changed 'throw("sync: inconsistent mutex state")' to 'throw("sync: inconsistent mutex state, new="+strconv.FormatInt(new,10))' and it printed a huge number such as 2064149489, does that mean there is hundred millions of goroutines using the mutex?

@randall77
Copy link
Contributor

Either that, or there is some corruption happening that is overwriting the mutex. I would bet on the latter.

@mountaincleanwater
Copy link

I don't think we're getting to the same point. If the atomic.CompareAndSwapInt32 is successful, the state will become the new value. This means that the new value also becomes the old value. We need to loop and think again, and go back to what I said above.

Thanks.

have you solve this problem?

@bingzhuo2008
Copy link

bingzhuo2008 commented Feb 2, 2023

I solved by removing some concurrent function call which may cause data race. But I don't know how this affected sync.mutex?

func GetCostTreeFromCtx(ctx context.Context) *CostTree {
	switch c := ctx.(type) {
	case *blademaster.Context:
		i, e := c.Get(_ctxTag)
		if e {
			ct := i.(*CostTree)
			ct.CheckPoint = time.Now()  // this line may cause data race
			return ct
		}
	}
	return NewCostTree()
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests