Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync: mutex gets stuck in locked state #60723

Closed
andrewhodel opened this issue Jun 10, 2023 · 17 comments
Closed

sync: mutex gets stuck in locked state #60723

andrewhodel opened this issue Jun 10, 2023 · 17 comments

Comments

@andrewhodel
Copy link

andrewhodel commented Jun 10, 2023

package main

import "fmt"
import "sync"
import "runtime"

func main() {
	var a = &sync.Mutex{}
	var m = 0
	var mm = 0
	go func() {
	
		for {
			fmt.Println("mm", mm)
			a.Lock()
			a.Unlock()
			mm = mm + 1
		}
	
	}()
	
	for {
		runtime.Gosched()
		fmt.Println("m", m)
		a.Lock()
		a.Unlock()
		m = m + 1
		fmt.Println("m", m)
		a.Lock()
		a.Unlock()
		m = m + 1
		fmt.Println("m", m)
		a.Lock()
		a.Unlock()
		m = m + 1
		fmt.Println("m", m)
		a.Lock()
		a.Unlock()
		m = m + 1
		fmt.Println("m", m)
		a.Lock()
		a.Unlock()
		m = m + 1
	}
}

The a mutex stops working and stays locked after some time.

uname -a
Linux ip-a.ec2.internal a.amzn2.x86_64 #1 SMP Mon Apr 24 23:34:06 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz
stepping	: 2
microcode	: 0x49
cpu MHz		: 2399.966
cache size	: 30720 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs		: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit mmio_stale_data
bogomips	: 4800.01
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:
@seankhliao
Copy link
Member

Doesn't appear to be reproducible.
We also don't support 1.18 anymore.

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Jun 10, 2023
@andrewhodel
Copy link
Author

andrewhodel commented Jun 10, 2023

It is reproducible.

The problem is that each time a new thread is started there may be a situation causing it to fail or causing it to succeed and that means that a long time needs to pass before the situation that causes the freeze will actually proceed.

That is why the problem is deeper, or longer than the code that does nothing other than constantly create new threads that are likely to fail.

@andrewhodel
Copy link
Author

Reopen the issue, if I remove runtime.Gosched() it does not fail.

@andrewhodel
Copy link
Author

@seankhliao please read the comments replying to you and re-open the issue.

@randall77
Copy link
Contributor

@seankhliao is right, 1.18 is no longer supported. We'll need a reproducer on 1.19 or later. Does it reproduce on 1.19, 1.20, or tip?
I also cannot reproduce with 1.18, on linux or darwin. If you can reproduce, please provide us with some more detailed instructions. Maybe it has something to do with ec2 servers? I'm using a linux laptop and a darwin desktop.

The a mutex stops working and stays locked after some time.

How much time? 2 seconds? 3 days?
How do you know the mutex stays locked? Can you attach a debugger and show us a backtrace?

@randall77 randall77 changed the title affected/package: sync mutex go version go1.18.9 linux/amd64 sync: mutex gets stuck in locked state Jun 10, 2023
@andrewhodel
Copy link
Author

I told you already, if you remove runtime.Gosched() it works.

If you don't know how that code explains that the mutex stayed locked by it's console output, then you are not reading the code and are repeating a list of non-answers.

The timing of it could be dependent on the temperature reading at the maximum number of zeroes immediately after the decimal in an ieee 754 float with 15 final numbers that are not 0 for all I know, it takes some time to test each breakpoint and at the end of the breakpoint testing this is the result.

If you don't support Go 1.18, close every issue that is before the version you do support.

@andrewhodel
Copy link
Author

@randall77 @seankhliao every answer is provided, please reopen the issue to provide a path to fixing and informing users of Go that runtime.Gosched() does not work correctly with mutexes.

@randall77
Copy link
Contributor

randall77 commented Jun 11, 2023

I told you already, if you remove runtime.Gosched() it works.

"it works" = the bug happens, or "it works" = the bug no longer happens? I was assuming the latter, but maybe you mean the former?

If you don't know how that code explains that the mutex stayed locked by it's console output, then you are not reading the code and are repeating a list of non-answers.

I assume you mean that eventually it stops printing things. I understand that, I read your program and understand exactly what it does (or is supposed to do). But when I run your program it never stops printing things. So you're seeing something that I'm not seeing.

If you want us to help you figure out this bug, you're going to have to help us. We can't reproduce what you are seeing. We need detailed, exact instructions that get you to a failed program execution. Exact source code, exact build instructions, exact instructions to run it. There's clearly something different about what you are doing and what we are doing, and without detailed information about what you're doing we'll never figure that out.

For instance, let's start with what program you're running that demonstrates the problem. Is it the program listed in the original post, or is it that one minus the runtime.Gosched call?

@randall77
Copy link
Contributor

Some questions left unanswered. It would really help if you could answer them:

Does it reproduce on 1.19, 1.20, or tip?

How much time? 2 seconds? 3 days?

How do you know the mutex stays locked?

In particular, how do you know that it isn't, say, a problem with the fmt package? There are surely other reasons why it might stop printing.

Can you attach a debugger and show us a backtrace?

@andrewhodel
Copy link
Author

@randall77 none of those questions have value.

Your arguments are as invalid as the temperature is outside the operating ranges of the CPU.

Please stop the littering in the discussion.

@ianlancetaylor
Copy link
Contributor

@andrewhodel Please follow the Go Community Code of Conduct. Please be respectful and charitable.

@andrewhodel
Copy link
Author

@ianlancetaylor I agree, I am helping you keep everything clean and tidy as requested.

@andrewhodel
Copy link
Author

It was not runtime.Gosched().

The problem was not being logged, here it is - #60765

@andrewhodel
Copy link
Author

Major changes to mutexes in Go 1.19

https://go-review.googlesource.com/c/go/+/34310

@ianlancetaylor
Copy link
Contributor

Major changes to mutexes in Go 1.19

Note that the changes you link to first appeared in Go 1.9, released in 2017. They were not new in Go 1.19.

@andrewhodel
Copy link
Author

The major change is in that commit.

Also fix a long standing bug that goroutines were requeued at the tail of the wait queue. That lead to even more unfair acquisition times with multiple waiters.

@andrewhodel
Copy link
Author

It wasn't #60765 either.

It was an extra lock in a non reviewed function. I went back and listened to the recorded audio assault at the building while typing and the word lock was continually played across the electrical grid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants