Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: cmd/vet: warning on some cases using declared loop variables of 3-clause for-loops #66156

Closed
go101 opened this issue Mar 7, 2024 · 22 comments
Labels
Milestone

Comments

@go101
Copy link

go101 commented Mar 7, 2024

Proposal Details

Go 1.22 changed the semantics of 3-clause for-loops. The loop variables declared in 3-clause for-loops will be (implicitly) copied once at the start of each iteration. So

  1. it is dangerous to declare no-copy values as loop variables,
  2. declaring large-size values as loop variables may silently degrade program performance.
  3. some concurrency bugs might be subtle.

So it would be great to let go vet warn on such cases.

Here, some examples for such cases are listed below (from this blog article).

  1. it is dangerous to declare no-copy values as loop variables,
package main

import (
	"sync"
	"time"
)

func process() (wait func()) {
	for wg, i := (sync.WaitGroup{}), 0; i < 3; i++ {
		if (wait == nil) {
			wait = wg.Wait
		}

		wg.Add(1)
		go func(v int) {
			defer wg.Done()
			if (v > 0) {
				time.Sleep(time.Second/8) // this line is to make the problem easily to detect
			}
			println(v)
		}(i)
	}
	return
}

func main() {
	process()()
}

Another one: #66070

  1. declaring large-size values as loop variables may silently degrade program performance.
package main

import (
	"fmt"
	"time"
)

type Large [1<<10]byte

func foo(constFunc func(*Large, int)) {
	a := Large{}
	for i := 0; i < len(a); i++ {
		constFunc(&a, i)
	}
}

func bar(constFunc func(*Large, int)) {
	for a, i := (Large{}), 0; i < len(a); i++ {
		constFunc(&a, i)
	}
}

func main() {
	readonly := func(x *Large, k int) {}
	bench := func(f func(func(*Large, int))) time.Duration {
		start := time.Now()
		f(readonly)
		return time.Since(start)
	}
	fmt.Println("foo time:", bench(foo))
	fmt.Println("bar time:", bench(bar))
}

Similar cases: a serious one and a less serious one.

  1. some concurrency bugs might be subtle.
package main

import (
	"fmt"
	"sync"
)

const NumWorkers = 3

func isGold(num uint64) bool {
	return num & 0xFFFFF == 0
}

func main() {
	var c = make(chan uint64)
	var m sync.Mutex
	for n, i := 0, uint64(0); n < NumWorkers; n++ {
		go func() {
			for {
				m.Lock()
				i++
				v := i
				m.Unlock()

				if isGold(v) {
					c <- v
				}
			}
		}()
	}

	for n := range c {
		fmt.Println("Found gold", n)
	}
}

The 3rd is actually a continuation of #16520

@go101 go101 added the Proposal label Mar 7, 2024
@gopherbot gopherbot added this to the Proposal milestone Mar 7, 2024
@go101 go101 changed the title proposal: go/vet: warning on some cases using declared as loop variables of 3-clause for-loops proposal: go/vet: warning on some cases using declared loop variables of 3-clause for-loops Mar 7, 2024
@timothy-king
Copy link
Contributor

it is dangerous to declare no-copy values as loop variables

There is a strong case for the copylock analyzer to warn on this.

Example from https://go.dev/design/60078-loopvar#language-specification

for i := 0; i < 3; i++ {
    prints = append(prints, func() { println(i) })
}

is equivalent to:

{
	i_outer := 0
	first := true
	for {
		i := i_outer 
		if first {
			first = false
		} else {
			i++
		}
		if !(i < 3) {
			break
		}
		prints = append(prints, func() { println(i) })
		i_outer = i
	}
}

A copy clearly happens at i_outer := 0 and i_outer = i. i := i_outer is somewhat subject to the compiler's ssa choices. The semantics require a new copy of i before the increment.

The variable used by each subsequent iteration is declared implicitly before executing the post statement and initialized to the value of the previous iteration's variable at that moment.

copylock reports on range statements already. (This covers the possible 'exactly 0 iterations is known' objection [not a compelling objection. just a possible one]).

declaring large-size values as loop variables may silently degrade program performance.

This is not a correctness issue and is outside of cmd/vet's criteria (README). Note this is issue applies almost equally well to range statements, which vet does not warn on.

some concurrency bugs might be subtle.

It is sufficiently subtle I am not sure what the example is supposed to illustrate. Is it that the semantics are not identical?

@timothy-king
Copy link
Contributor

timothy-king commented Mar 8, 2024

https://go.dev/cl/569955 has a proposed solution for the copying a loop variable case.

@seankhliao seankhliao changed the title proposal: go/vet: warning on some cases using declared loop variables of 3-clause for-loops proposal: cmd/vet: warning on some cases using declared loop variables of 3-clause for-loops Mar 8, 2024
@go101
Copy link
Author

go101 commented Mar 8, 2024

declaring large-size values as loop variables may silently degrade program performance.

This is not a correctness issue and is outside of cmd/vet's criteria (README). Note this is issue applies almost equally well to range statements, which vet does not warn on.

For range statements, if there is a copy cost, then it is inevitable and compiler is helpless to avoid the cost.
For 3-clause ones, it can be avoided if the copy cost is not intended, with the help of either coder or compiler.

So this is a new kind of problem introduced by the new semantics of 3-clause for-loops.
If it should not be handled by vet, there it should be handled by another toolchain builtin tool.

some concurrency bugs might be subtle.

It is sufficiently subtle I am not sure what the example is supposed to illustrate. Is it that the semantics are not identical?

Do you mean the code is correct?

@go101
Copy link
Author

go101 commented Mar 8, 2024

In fact, the 2nd point is not only performance degradation related, it is often also logic correctness related.

@timothy-king
Copy link
Contributor

So this is a new kind of problem introduced by the new semantics of 3-clause for-loops.
If it should not be handled by vet, there it should be handled by another toolchain builtin tool.

I agree that the example linked from "A similar case" is indeed a problem. Committing to a new tool in the toolchain is a fairly heavy hammer. Both for the go team and the community. It would be nice to try to find a more natural home for performance issues first.

@dominikh This looks like a plausible candidate for a staticcheck Performance issues check. (Good luck guessing the minimum size to start reporting! 1kb?)

Do you mean the code is correct?

I had not read the blog post so I did not have a great grasp on what you were trying to illustrate.

Yes I would agree that that is a non-obvious data race. My suggestion for this would be to basically have a loopclosure style vet check to look for loop variable writes that race with the implicit loop read like this.

In fact, the 2nd point is not only performance degradation related, it is often also logic correctness related.

To keep the discussion productive, do you have a non-crafted example? Crafted examples have been known for a while and so far they are not compelling in the way a real correctness issue would be.

@go101
Copy link
Author

go101 commented Mar 9, 2024

So you agree that the 1st and 3rd points should be vetted, right?

In fact, the 2nd point is not only performance degradation related, it is often also logic correctness related.

To keep the discussion productive, do you have a non-crafted example? ...

Maybe you are right. Logic correctness problems often relate to loopvar capturing.
So when there is a logic correctness problem, the main cause is loopvar capturing, not large-size values.

But I still don't think it is a good idea to let 3rd party tools to detect such problems.

@timothy-king
Copy link
Contributor

I agree on vetting the 1st point, and I have proposed an implementation.

For the 3rd point, we still need to decide on the criteria vet would use to detect such issues. In the previous loopclosure range case, the loop did a write to the loop variables each iteration so any read or write of the variable would race. This is subtly different as loop is doing an implicit read and we need to find a write to the variable. So we need a narrower criteria to detect issues in the closure. Once someone proposes solid criteria (low false positive, low false negative, intuitive enough, fast enough), I would be in favoring of adding this to vet.

@dr2chase
Copy link
Contributor

Does vet warn on any performance issues? I thought it was for correctness foot-guns.

@adonovan
Copy link
Member

adonovan commented Mar 14, 2024

I'm not convinced this is a problem that needs to be solved. To address the original points:

it is dangerous to declare no-copy values as loop variables,

It's true that one could construct a loop in which the only copy of the nocopy value is the original initialization, but why would anyone do that? I have never seen anyone initialize a WaitGroup that way and I would certainly flag it during review as bad style.

declaring large-size values as loop variables may silently degrade program performance.

That's true of any assignment. Why is this one special? (And as @dr2chase points out, that's not vet's concern anyway.)

some concurrency bugs might be subtle.

Can you give an example? If you're copying the value once in v := f() for the assignment, the extra implicit copy is slower but no less correct.

@go101
Copy link
Author

go101 commented Mar 15, 2024

it is dangerous to declare no-copy values as loop variables,

It's true that one could construct a loop in which the only copy of the nocopy value is the original initialization, but why would anyone do that? I have never seen anyone initialize a WaitGroup that way and I would certainly flag it during review as bad style.

The syntax allows it, which means the possibility is there. In my honest opinion, the rarer it happens, the more dangerous it is.

declaring large-size values as loop variables may silently degrade program performance.

That's true of any assignment. Why is this one special? (And as @dr2chase points out, that's not vet's concern anyway.)

The specialty is well explained in the blog article. It may cause silent performance degradation.

This is a new kind of problem created by the new semantics.

some concurrency bugs might be subtle.

Can you give an example? If you're copying the value once in v := f() for the assignment, the extra implicit copy is slower but no less correct.

The first comment gives such an example.

@adonovan
Copy link
Member

The syntax allows it, which means the possibility is there. In my honest opinion, the rarer it happens, the more dangerous it is.

That's the opposite from the ranking used to evaluate vet checks. According to the vet documentation, "the criteria applied when selecting which checks to add are: ... Frequency: ... A new check that finds only a handful of problems across all existing programs, even if the problem is significant, is not worth adding to the suite everyone runs daily."

The specialty is well explained in the blog article. It may cause silent performance degradation.

It's not an asymptotic change in performance, just a single additional copy, so it's not a correctness issue, and thus not vet's concern. A single extra copy is also well beneath the threshold of what we usually consider to be a performance-affecting change in library code.

The first comment gives such an example.

I meant an example from merged code, not one constructed to trigger the analyzer.

@go101
Copy link
Author

go101 commented Mar 15, 2024

The first comment gives such an example.

I meant an example from merged code, not one constructed to trigger the analyzer.

I don't appreciate that attitude of facing the problem. Developing a language is a serious thing.

If you want a case from some Go projects. just wait. I believe there will some reported later.

BTW, why do you think the example is not real. Will you think it is also constructed when a case is reported later? :D

It's not an asymptotic change in performance, just a single additional copy, so it's not a correctness issue, and thus not vet's concern.

I fully acknowledged that the performance problem doesn't fit vet's criteria. The problem is a way is need to handle the new problem caused by the new semantics.

The syntax allows it, which means the possibility is there. In my honest opinion, the rarer it happens, the more dangerous it is.

That's the opposite from the ranking used to evaluate vet checks. According to the vet documentation, "the criteria applied when selecting which checks to add are: ... Frequency: ... A new check that finds only a handful of problems across all existing programs, even if the problem is significant, is not worth adding to the suite everyone runs daily."

I don't have any words. Okay, just let some rare old code in some private projects be the victim.

@adonovan
Copy link
Member

Let me try to summarize my understanding of the whole:

Case 1. In a loop of this form:

	for x := f(); ...; ... {
		// x' := x
		use(&x)			// implicitly refers to x'
		// x = x'
	}

the compiler makes a copy (x') of x at the start of each loop and writes back its value at the end. The reference to x in the loop body is actually to x'. So, if the loop body depends on the identity of x being the same for each iteration, the language change will have introduced a bug.

You are correct that there is a theoretical possibility of programs such as your example having the potential for new bugs. But in practice it never happens: during the compiler team's analysis, there was exactly one instance in all of the hundreds of thousands of loops in Google's Go code base where the semantic change introduced a regression. But in a large fraction of cases, the change fixed a latent bug. (In other words, it's much, much more common to implicitly assume that the identity of the variable is different for each iteration--which it now is.) That's why I was asking for evidence from a real program.

I agree with @timothy-king that we should extend the copylocks check to detect no-copy types in the "init" clause.

Case 2 concerns the additional cost of the implicit loop variable copies, x' = x and x = x'. When the loop body doesn't assign to x, the variables x and x' can be fused, so there is no additional cost; this covers the majority of cases. In practice, nearly all variables declared within the "init" clause are either indexes or elements, not arbitrary large values. It is possible to contrive an example in which the performance change is detectable. You could say: the language got slower. But if this specific code shows up in a profile, it's pretty clear what to do: allocate fewer variables by declaring x outside the loop; or refer to large values by pointer, not copying.

In any case, it's not a vet problem.

Case 3 seems to be a concurrent variant of case 1. The loop body implicitly assumes that there is only one variable i, and that is no longer the case. The same argument as in case 1 applies: the language change is overwhelmingly more likely to fix a data race in a loop than to introduce one. (Also, these errors may be detected by the test -race flag.)

So, I'm not saying these kinds of problems are impossible, just that they are vanishingly rare, and that alone means vet is not the appropriate tool to detect them.

@go101
Copy link
Author

go101 commented Mar 15, 2024

Case 1. In a loop of this form: ....
...
You are correct that there is a theoretical possibility of programs such as your example having the potential for new bugs. But in practice it never happens: during the compiler team's analysis, there was exactly one instance in all of the hundreds of thousands of loops in Google's Go code base where the semantic change introduced a regression. But in a large fraction of cases, the change fixed a latent bug. (In other words, it's much, much more common to implicitly assume that the identity of the variable is different for each iteration--which it now is.) That's why I was asking for evidence from a real program.

So you only scanned the code in Google, and got the conclusion that "in practice it never happens"?
You may think this is sufficient to get the conclusion. But in my opinion, it is far to sufficient.
And by my understanding, you didn't analyze all the potential problems mentioned in this issue.

My example is real, certainly, in my opinion. :) I don't know why you think it is unreal.

Case 2 .... It is possible to contrive an example in which the performance change is detectable. You could say: the language got slower. But if this specific code shows up in a profile, it's pretty clear what to do: allocate fewer variables by declaring x outside the loop; or refer to large values by pointer, not copying.

If looks you haven't understood the problem well of case 2. Let's explain it again. The compiler may or may not succeed to make optimizations to avoid loop variable copies, depending on how the code is written. When you change the code a bit, the performance regression may appear or disappear.

And the performance regression degree may be serious or not easy to detect. When it is not so serious, it might be not detected in time.

Case 3 seems to be a concurrent variant of case 1. The loop body implicitly assumes that there is only one variable i, and that is no longer the case. The same argument as in case 1 applies: the language change is overwhelmingly more likely to fix a data race in a loop than to introduce one. (Also, these errors may be detected by the test -race flag.)

All you said is just your willing. Now, it is not the time to make conclusion. We need to wait, maybe for several years.

The problem of case 3 is that, the case is valid before Go 1.22, but become invalid since Go 1.22. What I mean here is that people may still think the code is valid in Go 1.22 semantics. And depending on specific code, the effects of the invalid code might be exposed in time or not. This is what the subtlety is.

@timothy-king
Copy link
Contributor

Can you give an example? If you're copying the value once in v := f() for the assignment, the extra implicit copy is slower but no less correct.

Case 3 seems to be a concurrent variant of case 1. The loop body implicitly assumes that there is only one variable i, and that is no longer the case. The same argument as in case 1 applies: the language change is overwhelmingly more likely to fix a data race in a loop than to introduce one. (Also, these errors may be detected by the test -race flag.)

IMO what case 3 is really is an instance of a loopclosure bug with the twist that vet would only warn when a variable is WRITTEN to instead of READ or WRITTEN to (i.e. any reference in the closure). The previous range check is based on a WRITE x' = next() operation (which races with both READ and WRITE operations on x). The 3-clause for loop post 1.22 has an implicit copy of the loop variables before the post-clause x' = x. This is a READ that can race with any WRITE in the closure. READ operations in the closure are fine.

vet reporting race conditions when it can with sufficient accuracy and low cost is relatively strong. I think loopclosure can be extended to this case.

FWIW I am not that nervous about examples that transition from happened to have no races to having races. I am actually more nervous about patterns like the following:

// iterate over a linked list.
for prev, curr := nil, l.Next(); curr != nil; prev, curr = curr, l.Next() {
  go func() {
    ...
    curr.val = foo() // problematic write
    ...
  }()
  // prev', curr' = prev, curr (implicit copy)
}

The user likely heard/remembers 'each loop gets its own copy of the variable', slightly skims the details of how this works, decides on this basis to assign to curr.val (or just skipped the previous two steps), does not write good enough unit tests (or run unit tests with -race), and ends up with a potentially nasty bug. This is a hypothetical example, but this sort of situation seems plausible to me.

just that they are vanishingly rare,

I am not sure if this will end up too rare to justify the frequency requirement or not. 3-clause for loop races definitely seem rarer than range races. And the WRITE requirement will make this less frequent still. The linked list thought experiment above does seem plausible enough to me that I'd say this is at least borderline to support in loopclosure.

@adonovan
Copy link
Member

adonovan commented Mar 18, 2024

[@go101] So you only scanned the code in Google, and got the conclusion that "in practice it never happens"?

The compiler team scanned all the Go corpora hosted by Google, which means not only Google's proprietary code, but also the much larger set of modules in the Go Module Mirror, which to a first approximation is all the world's open-source Go code.

[@go101] If looks you haven't understood the problem well of case 2.

Thanks, I understand the problem, but I come back to the question of whether this theoretical possibility of a slowdown is ever actually noticeable in programs that weren't designed to exhibit the problem.

The problem of case 3 is that, the case is valid before Go 1.22, but become invalid since Go 1.22. What I mean here is that people may still think the code is valid in Go 1.22 semantics. And depending on specific code, the effects of the invalid code might be exposed in time or not. This is what the subtlety is.

Yes, this language change has the potential to introduce bugs in bug-free code. But this issue is about whether vet can and should attempt to detect them, and that decision should be based on evidence that this is a real problem. You are right that we may not have such evidence for a while.

[@timothy-king] I am actually more nervous about patterns like the following [...]

I don't understand the problem in the example. The statement annotated as a "problematic write" is not a write, it's only a read of curr, so there's no race. If it was a write, then of course there would be a race between the goroutine and the loop increment, but that was always the case before go1.22. You're correct that the nature of that race would have changed from W-W to W-R, but I don't see why that matters to the user.

@timothy-king
Copy link
Contributor

I filed #66387 and #66388. These proposals are scoped down and they each give pretty specific criteria. My hope is to allow for each to make progress independently.

@go101
Copy link
Author

go101 commented Mar 19, 2024

@adonovan

The compiler team scanned all the Go corpora hosted by Google, which means not only Google's proprietary code, but also the much larger set of modules in the Go Module Mirror, which to a first approximation is all the world's open-source Go code.

Sorry, there is a large quantity of private Go code in the world. And again, not all patterns are analyzed.

In fact, what I mean is that the manner to prove something by scanning all existing code in the world is totally wrong. If you don't want programmer to write some code you don't like, just forbid it from the syntax level. But the current reality is, when some people write the code you don't like but the syntax allows, then you just say it is constructed. It is ridiculous to me and an ugly objective fact, it makes Go less rigorous (the trend just happened since recent Go versions). Being less and less rigorous, I don't know what are the consequences for Go.

[@go101] If looks you haven't understood the problem well of case 2.

Thanks, I understand the problem, but I come back to the question of whether this theoretical possibility of a slowdown is ever actually noticeable in programs that weren't designed to exhibit the problem.

Is it lucky or unlucky if it is not noticeable?

The problem here is that such a performance degradation problem may appear/disappear in an unexpected way, depending on whether the compiler optimization works or not. Objectively, this raise the bar to be a qualified Go programmer.

Yes, this language change has the potential to introduce bugs in bug-free code. But this issue is about whether vet can and should attempt to detect them, and that decision should be based on evidence that this is a real problem. You are right that we may not have such evidence for a while.

Again, being less and less rigorous, I don't know what are the consequences for Go.

I don't understand the problem in the example. The statement annotated as a "problematic write" is not a write, it's only a read of curr, so there's no race. If it was a write, then of course there would be a race between the goroutine and the loop increment, but that was always the case before go1.22. You're correct that the nature of that race would have changed from W-W to W-R, but I don't see why that matters to the user.

The problem is, prior to Go 1.22, when there is a problem, it is explicit to users, but since Go 1.22, it is implicit so that people tend to not admit it is a problem.


Up to now, the proven benefits of the semantic change on for;; loops are rare and tiny, even without considering the damage for the rigor of Go. But what

And I haven't seen any efforts from Go official to notify Go programmers on the potential damage of the change. What I saw are all to try to hide the potential problems instead.

Let time tell whether or not the change is worth it.

@adonovan
Copy link
Member

adonovan commented Mar 19, 2024

It sounds like you are frustrated with the fact that the language made a backward-incompatible change in go1.22, despite the Go project's longtime commitment to not breaking existing bug-free programs. In this case there were good reasons for it; they were carefully evidenced and publicly debated, and you can easily look them up. But this is not the place to re-litigate them.

As I said several times already: if and when there is evidence of a real problem, we will investigate making changes to vet to help mitigate them.

Let time tell whether or not the change is worth it.

Exactly.

@ianlancetaylor
Copy link
Contributor

@go101 We understand that you do not like the loop variable change. You expressed strong opposition before the change was made. You continue to express strong opposition. We heard you. We considered your arguments. And we made a decision. We are not going to revisit that decision at this time. I encourage you to let this matter rest. Thank you.

@go101
Copy link
Author

go101 commented Mar 19, 2024

@ianlancetaylor
As I have said in my blog:

What's done is done. In the end, I hope this article will help you write professional Go code in the Go 1.22+ era.

I'm sick of disputing on the rational of the change now.
I'm sorry I did it again here, but I really don't want to do it.

@adonovan
Copy link
Member

@timothy-king has split the two main ideas into separate issues. I'm going to close this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants