Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: redefine range loop variables in each iteration #20733

Closed
bisgardo opened this issue Jun 19, 2017 · 32 comments
Closed

proposal: spec: redefine range loop variables in each iteration #20733

bisgardo opened this issue Jun 19, 2017 · 32 comments
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) LanguageChange NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Proposal v2 A language change or incompatible library change
Milestone

Comments

@bisgardo
Copy link

As an alternative to #20725 (which was originally about go vet), let the variables of range loops be implicitly redefined in each iteration like in Dart's for loops. That is,

for k, v := range vals {
  // ...
}

should be equivalent to

for k, v := range vals {
  k := k
  v := v
  // ...
}

This will make it "safe" to take the loop variable's addresses as well as capturing the loop variables in nested functions (see #16520).

The proposal could be expanded to vanilla for loops, although that would make it diverge semantically from other languages.

@gopherbot gopherbot added this to the Proposal milestone Jun 19, 2017
@bradfitz bradfitz added v2 A language change or incompatible library change LanguageChange labels Jun 19, 2017
@bradfitz
Copy link
Contributor

We considered this prior to Go 1, but considered it too quickly. We'll definitely reconsider it for Go 2. I actually think there's a duplicate tracking bug for this.

@davecheney
Copy link
Contributor

davecheney commented Jun 19, 2017 via email

@dsnet
Copy link
Member

dsnet commented Jun 19, 2017

And I would argue even catches seasoned Go programmers occasionally. See https://golang.org/cl/40937. It took me a while staring at that logic trying to figure out why the code wasn't doing what it naturally read as.

@cznic
Copy link
Contributor

cznic commented Jun 20, 2017

Adopting this proposal can make performance of perfectly valid code worse, sometimes dramatically. For example, when escape analysis cannot determine it's safe to keep the {k,v} value on stack while it's address is taken. Then we get O(n) heap allocations instead of the current O(1).

@davecheney
Copy link
Contributor

@cznic can you give some examples of this. I would expect that most of these cases would actually be bugs because taking the address of the same variable over and over again was the problem in the first place.

@cznic
Copy link
Contributor

cznic commented Jun 20, 2017

@davecheney Consider #20725 (comment) and imagine escape analysis is not able to prove foo only reads via the passed pointer.

@davecheney
Copy link
Contributor

@cznic

Why would you want to take the address of a copy of the current range value? If i'm not mistaken, any modification to it would be lost.

@cznic
Copy link
Contributor

cznic commented Jun 20, 2017

@davecheney

Why would you want to take the address of a copy of the current range value?

  • To avoid copying a big value when passing it as an argument.
  • To call a pointer receiver method on the value.

If i'm not mistaken, any modification to it would be lost.

Taking the address of a value does not imply intent to modify the pointee.

@davecheney
Copy link
Contributor

To avoid copying a big value when passing it as an argument.

But the value is already copied as part of the range iteration. Having a separate value escape on every iteration is sub optimal, but hopefully you'll agree it's reasonable to trade a potential performance problem for a very common correctness problem.

To call a pointer receiver method on the value.

True, that is asking a lot of escape analysis and I really don't know the ways that escape analysis and the implicit taking of address interact. But I will note that your example explicitly took the address of the copy of the current index value, so if the intention was to pass a pointer to the value down to a function, you'd want to make sure you were passing a pointer to the original slice element, not a copy. In the code that I have read that is more commonly written as

for i := range vals {
     foo(&vals[i])
}

Taking the address of a value does not imply intent to modify the pointee.

Respectfully, generally it does, as in your method example above.

@cznic
Copy link
Contributor

cznic commented Jun 20, 2017

foo(&vals[i])

Iff vals is a slice/array. There are also maps and channels.

Respectfully, generally it does, as in your method example above.

Even the innocent looking fmt.Println(value) always takes the address of 'value'. As does every function taking an interface (not only interface{}, any interface) argument. Every assignment to an interface variable takes the address of the RHS (if not already a pointer), etc.

Edit: Added '(if not already a pointer)'.

@davecheney
Copy link
Contributor

davecheney commented Jun 20, 2017

Iff vals is a slice/array. There are also maps and channels.

If it's an array, it's already copied during the evaluation of the range statement. If it's a channel, it's a copy of the value received. Again i'm falling back on arguments of correctness over performance.

Even the innocent looking fmt.Println(value) always takes the address of 'value'.

I believe a copy of value is passed to fmt.Println. That copy is almost always heap allocated due to a limitation introduced in the Go 1.5 garbage collector.

@cznic
Copy link
Contributor

cznic commented Jun 20, 2017

If it's an array, it's already copied during the evaluation of the range statement. If it's a channel, it's a copy of the value received.

Right. But with this proposal there will be, in certain situations, an additional allocation per iteration.

I believe a copy of value is passed to fmt.Println. That copy is almost always heap allocated due to a limitation introduced in the Go 1.5 garbage collector.

You're right. The interface runtime struct has the address, but that's not different with this proposal. I misspoke, sorry.

@bcmills
Copy link
Contributor

bcmills commented Jun 21, 2017

@cznic

with this proposal there will be, in certain situations, an additional allocation per iteration.

Sure, but if it actually shows up on a profile there should be an easy workaround:

var bigValue someType
for _, v := range bigValues {
        bigValue = v  // Compiler should be smart enough to elide this copy.
        foo(&bigValue)  // Only one copy of bigValue escapes.
}

It is possible to write correct, efficient code with either definition of range loops; the question is mainly which should be the "default" or "typical" case. I believe that, in real usage, the proposed change would fix far more issues — and remove more workarounds — than it would introduce.

@cznic
Copy link
Contributor

cznic commented Jun 21, 2017

@bcmills

It is possible to write correct, efficient code with either definition of range loops;

True, but still the proposal has the potential to cripple existing correct and efficient code. Also, your comment does not consider v.foo(). We've agreed earlier with @davecheney that for escape analysis of pointer receivers it is sometimes, if not often, impossible to prove they are safe-for-stack. And, again, "Then we get O(n) heap allocations instead of the current O(1)."

@bcmills
Copy link
Contributor

bcmills commented Jun 21, 2017

@cznic

still the proposal has the potential to cripple existing correct and efficient code.

For this proposal (and Go 2 proposals in general), if the proposal is accepted we should either compile existing Go 1 programs as-is or provide a tool that converts them preserving existing semantics. That is: no "existing" Go 1 code should be affected either way.

for escape analysis of pointer receivers it is sometimes, if not often, impossible to prove they are safe-for-stack

Escape analysis of pointer methods is mostly straightforward; you only really need to worry about arguments to interface methods. And note that iterating over a container of n values is already O(n), so doing O(n) memory-management work changes only the constant factors (not the asymptotic behavior).

@cznic
Copy link
Contributor

cznic commented Jun 21, 2017

@bcmills

That is: no "existing" Go 1 code should be affected either way.

I don't think this can be achieved without solving the halting problem. But let's assume it can be done. Even then we would have to put the particular compiler optimization that you've mentioned earlier, directly in the specification of the language. That can be done, no doubt. Isn't it better to avoid such leaks of abstractions? I think so.

@bcmills
Copy link
Contributor

bcmills commented Jun 21, 2017

@cznic

we would have to put the particular compiler optimization that you've mentioned earlier, directly in the specification of the language.

Why? Generally, the only time an optimization needs to go into a language spec is if it affects the asymptotic behavior of the program, the canonical example being tail-call-elimination in functional languages (which changes memory usage from O(1) to O(N) with the depth of recursion). As I noted above, per-iteration memory-management overhead affects only constant factors: running time for loops is already O(N) with the number of iterations, and peak memory usage is O(1) either way (because the value can be garbage-collected at each iteration).

@bisgardo
Copy link
Author

bisgardo commented Jun 22, 2017

@cznic

the proposal has the potential to cripple existing correct and efficient code

If by "cripple" you mean introduce bugs, this will only happen in code that depends on a variable having been overwritten after its capture. I find it quite hard to imaging cases where this would be a requirement for correctness (would appreciate examples). But even if it is, it's easy to make a tool (see #20725; original formulation) that would flag all relevant code for scrutinization as part of a migration process.

As to the performance argument, I believe that if a good escape analysis can't figure out where a pointer is going, then neither can programmers (or it does in fact escape the loop). Thus, if such code is correct, it's probably by accident and not for performance reasons.

@martisch
Copy link
Contributor

martisch commented Jun 22, 2017

If by "cripple" you mean introduce bugs, this will only happen in code that depends on a variable having been overwritten after its capture.

I thought maybe something like the below just with := can be constructed where this happens too:

	x := []int{1, 2, 3, 4, 5}
	var i int
	for x[i], i = range x {
		//...
		i = 2
	}

Edit: on closer look i guess currently := allows only simple variable names so this is not an issue.

@leonklingele
Copy link
Contributor

As this does not require changes to the for-range syntax, it is still valid Go 1 code.
What if one tries to run some Go 2 code using such a for-range loop with a Go 1 compiler? I can already see people complaining "this code works fine in Go 2 but misbehaves for whatever reason in Go 1".
This proposal also should add a way to break compilation with Go 1 (e.g. a // +build !go1 annotation) which no one will ever use.

@bcmills
Copy link
Contributor

bcmills commented Nov 29, 2017

Ran across this interesting example today (https://play.golang.org/p/DUstLropfJ):

func (k *key) f(v string) {
	fmt.Printf("%#v.f(%#v)\n", k, v)
}

func main() {
	…
	for k, v := range m {
		defer k.f(v)
	}
}

The interaction between loop variables and implicit pointer receivers makes it possible to accidentally close over a loop variable without taking its address or writing a function literal.

@Merovius
Copy link
Contributor

For posterity, today someone posted on golang-nuts about stumbling into this with (*testing.T).Parallel:
https://groups.google.com/d/msg/golang-nuts/SAZ6wCSLXU0/F-TtRwDCAAAJ
As a further datapoint to fix this in Go2.

With regards to the performance-concerns: I'd expect the difference, in general, to be optimized out. Even if there are still edge-cases: Given the subtlety and prevalence of this problem, I'd strongly favor correctness over performance. If your code is too slow, you can always benchmark and hand-optimize. If your code is subtly incorrect, that's a lot harder to remedy.

@ianlancetaylor ianlancetaylor added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Feb 27, 2018
@davecb
Copy link

davecb commented May 9, 2018

On 08/05/18 06:13 PM, Bryan C. Mills wrote:

So I see the cost/benefit tradeoff of a breaking change as:
Costs
Update code generators.
Reprogram users' muscle memory.
go fix existing code, which may or may not be necessary anyway (depending on what else goes into Go 2).

Benefits
Prompt users to omit redundant workarounds.
(This proposal.) Make the common / default cases more concise.

It's up to the proposal committee to weigh those costs and benefits; the balance is not obvious to me. 
The point of this proposal is to lay out some benefits on the “breaking change” side that might not be obvious otherwise.

From previous experience with managed change (which dates back to Multics and dinosaurs roaming the earth (:-)) one of the largest problems of trying to maintain compatibility is that it can lead to persons depending on a bug, and thereby preventing it from ever being fixed.

I would argue that when this kind of deranged dependency occurs, that in order one

  1. introduces the new capability,
  2. provides a means of migration from old to new,
    
  3. marks the old as deprecated, and
    
  4. withdraws the old
    

In the special case of linked-library changes, the caller may have lost the source code they need to change, so we instead made the last step

  1. require positive action to continue to use the old (in our case that was a linker option)

Go's case is not as hard: if you can do steps 1-4, then I strongly recommend you do so.

--dave

@gopherbot
Copy link

Change https://golang.org/cl/164119 mentions this issue: cmd/link: do not close over the loop variable in testDWARF

gopherbot pushed a commit that referenced this issue Feb 27, 2019
Fixes #30429
Updates #16520
Updates #20733

Change-Id: Iae41f06c09aaaed500936f5496d90cefbe8293e4
Reviewed-on: https://go-review.googlesource.com/c/164119
Run-TryBot: Bryan C. Mills <bcmills@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
@gopherbot gopherbot removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Aug 16, 2019
@gopherbot gopherbot added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Sep 3, 2019
@bronze1man

This comment has been minimized.

@danielchatfield
Copy link

I'm very supportive of this change. For code that really needs to be optimised then the allocation can be avoided by declaring the variables outside the loop like this:

var (
	i int
	v string
)
for i, v = range []string{"foo", "bar"} {
	println(i, v)
}

@candlerb
Copy link

candlerb commented Jan 2, 2021

Last year there was high profile example of this class of bug:
https://community.letsencrypt.org/t/2020-02-29-caa-rechecking-bug/114591
https://www.theregister.com/2020/03/03/lets_encrypt_cert_revocation/
Detailed analysis: https://bugzilla.mozilla.org/show_bug.cgi?id=1619047
Fix: https://github.com/letsencrypt/boulder/pull/4690/files#diff-d02067a9f9a2bed1110fd4e98641c2effcf5d1d5f18461e35d6ac1535f6e2c21L1411-R1414

@earthboundkid
Copy link
Contributor

earthboundkid commented Mar 16, 2022

I think this can be done now by keeping it behind a go.mod version flag. When you run go mod edit version -go=1.XX it should scan for existing closure captures and convert them to safe equivalents as part of the upgrade. I think it's doable. E.g. it would change this:

for i := 0; i < n; i++ {
  use(&i)
}

to this as part of the upgrade:

{ 
  i := 0
  for ; i < n; i++ {
    use(&i)
  }
}

@thepudds
Copy link
Contributor

thepudds commented Mar 16, 2022

Hi @carlmjohnson, FWIW, the Go language changes design document uses this issue here as an example:

I think the only feasible safe approach is to not permit language redefinitions.

We are stuck with our current semantics. This doesn't mean we can't improve them. For example, for issue 20733, the range issue, we could change range loops so that taking the address of a range parameter, or referring to it from a function literal, is forbidden. This would not be a redefinition; it would be a removal. That approach might eliminate the bugs without the potential of breaking code unexpectedly.

That document also describes using the go directive in go.mod to control the version of the language used by each module (with some additional discussion & refinement in #28221).

So I suspect it is at least possible to remove this as a feature. One minor comment is I suspect go mod edit would be not used to change source code; it would more likely be go fix from what I understand.

@zpavlinovic zpavlinovic added the Analysis Issues related to static analysis (vet, x/tools/go/analysis) label Mar 21, 2022
@mpx
Copy link
Contributor

mpx commented Oct 4, 2022

FYI, there is a discussion covering how to redefine "for" loop variable semantics on #56010.

rytaft added a commit to rytaft/cockroach that referenced this issue Oct 19, 2022
This commit fixes a bug that was recently introduced when registering
the costfuzz and unoptimized-query-oracle tests. This bug was due to
the well-known range loop variable issue described here:
golang/go#20733.

This commit also adds some additional logging and debug information to
help with future issues with these test cases.

Fixes cockroachdb#90010

Release note: None
craig bot pushed a commit to cockroachdb/cockroach that referenced this issue Oct 19, 2022
90221: roachtest: fix costfuzz and unoptimized-query-oracle setups r=rytaft a=rytaft

This commit fixes a bug that was recently introduced when registering the costfuzz and unoptimized-query-oracle tests. This bug was due to the well-known range loop variable issue described here: golang/go#20733.

This commit also adds some additional logging and debug information to help with future issues with these test cases.

Fixes #90010

Release note: None

Co-authored-by: Rebecca Taft <becca@cockroachlabs.com>
@rsc
Copy link
Contributor

rsc commented May 9, 2023

Closing as duplicate of #60078.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Analysis Issues related to static analysis (vet, x/tools/go/analysis) LanguageChange NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. Proposal v2 A language change or incompatible library change
Projects
None yet
Development

No branches or pull requests