New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: spec: allow range-over-func to omit iteration variables #65236
Comments
What if it's just a vet check that looks for errors not silenced with |
I don't think I understand what is being proposed. Doesn't the spec text of #61405 already "allow range-over-func to omit iteration variables"? The example :
The proposed text:
Anyways I am confused about what is being proposed here.
If we think there is a subset of functions like |
I don't see how this buys anything other than ceremony. |
We know that the design of We can guess that range-over-func will become the dominant way to iterate over all sorts of sequences where failure is possible: file lines, DB rows, RPCs that list resources, and so on. I'm not saying that we should require the iteration variables, but I am saying that if we don't, we need a good story for how to avoid introducing many bugs into Go code. Like a vet check for when the second return type is |
This comment was marked as outdated.
This comment was marked as outdated.
I agree that not handling the error is a real problem. I do not think that iterators yielding errors solves that problem very well, especially if that means having to immediately buttress that in a way that forces everyone to use range-over-func differently than native iterators. Even if I'm wrong and yielding errors is a boon, then static analysis is more than capable of linting it only in the case when an error is iterated over. @jba here's https://pkg.go.dev/bufio#example-Scanner.Bytes rewritten to use a hypothetical iterator that returns |
The only benefit of passing an I see a few possible patterns related to errors: for line, err := range FileLines("/etc/passwd") {
if err != nil {
// log or handle error
break
}
...
} for line, err := range FileLines("/etc/passwd") {
if err != nil {
return err
}
...
} Both of these are IMO better solutions than handling the error after the loop as with the The other possible pattern ( for line, err := range FileLines("/etc/passwd") {
if err != nil {
// ignore errors
continue
}
...
} If the intent of the developer is to ignore errors, then the above could instead be written as (same semantic meaning): for line, _ := range FileLines("/etc/passwd") {
...
} The However, I would advice against allowing the following pattern if the iterator returns an error, because the for line := range FileLines("/etc/passwd") {
...
} If you want to handle/log individual errors and continue, you would still need to use this pattern: for line, err := range FileLines("/etc/passwd") {
if err != nil {
// handle or log error
continue
}
...
} Handling and logging individual errors is not possible with the handling after the loop approach. |
Looks like they finally fixed it, but for months the marketing page for Copilot showed code with a missing call to |
@meling To clarify: I fully support yielding I oppose an error that signals an abnormal end to (or failure to start) the iteration being handled within the loop. In other words, I think if you can |
What about an iterator that's using a I've been writing such an iterator, and I've changed my mind about the answers to these questions several times now. |
This is a really interesting question - to me a practically useful solution has been to delay passing |
That's already the case for my approach, but it doesn't obviate the question of whether the iterator should yield |
What would be a reason for the iterator to not return the |
This seems certainly an interesting API design question, but I don't think it's something that needs to be answered here. If it doesn't yield an error, then there is no question. If it does, that doesn't seem different from any other error, as far as this issue is concerned. Personally, I'd just leave that up to the third party package, in any case, if there is no clearly correct answer. |
It could just stop the iteration upon detecting that the |
Maybe pure blasphemy, but for err, line := range FileLines("/etc/passwd") {...} helps force the issue for error handling. |
It's a moderately common mistake to write |
It probably could be changed the way loop iteration variable semantics were changed? Different types being iterated over having different rules for omitting loop variables feels very surprising. I think if it gets addressed, it should be addressed for all types that can be iterated over in |
Then it was just re-introduced with range-over-int for range 5 {
...
} (errors are simple values, nothing special) +1 for the vet rule though |
Here is an experience report that bears on this issue. Previously, I advocated that iterators that can fail should return
Over the weekend I converted a hobby project to use the experimental range-over-func feature. Part of the project is a persistent map: I started by defining
As I anticipated, to iterate over keys-value pairs I would need a type for those pairs, since there is no 3-variable range statement. That was always going to be a little awkward. In my mind I was imagining loops like
where I could at least use good names for the keys and values. But these maps are generic, which means
to
And that started to really feel ugly. So I started looking for another way. First I used pointer-to-error arguments, as mentioned above:
That's an improvement, but it still made me uncomfortable. For one thing, although you do have to pass an argument, the compiler doesn't help you check it—it considers passing the pointer to be using the variable, so it doesn't fail with "err declared and not used" if you don't check Another problem is that you might check the error too soon:
That's definitely a mistake— Where I ended up was something that is obvious in hindsight:
You use it like this:
(Like the pointer-to-error approach, this assumes that the first error ends the loop.) This might look like it still has the problems of Having the compiler complain about an unused Why is this obvious in hindsight? Remember that
to functions over sequences:
so we move from errors:
to functions over errors:
A Haskell programmer would say that we've "lifted" our values into the space of functions. (I think.) For every utility function that "collapses" a sequence to a value like a slice, we can have a variant that takes an error function:
So simple uses of iterators, ones where you just want the get the slice (or map or whatever), still can be simple:
For many other functions on iterators, like Map, Filter and Reduce, getting the error out of the Unfortunately functions that take multiple sequences will need special attention. For example, consider the function that Python calls Writing an iterator that returns an error function involves some boilerplate that we can encapsulate in a type:
Here is how you would use it to write an iterator over the lines of a file:
|
@jba IMO that is still problematic. Note that an Not that this isn't also a problem with taking a pointer (which I also don't like, FWIW). I just don't feel the same level of "this solves everything" optimism I read in your post. |
@jba Thank you for the report. Comparing this: for item, err := range productsByID.Items() {
if err != nil {
// handle error or return or continue
}
id, product := item.Key, item.Value
...
} to this: seq, errf := productsByID.Items()
for id, product := range seq {
...
}
if err := errf(); err != nil {
return err
} There are no extra lines. For the drawbacks of the latter, I'm still very much in favor of the former approach; it is simple and easy to understand. Regarding the functional part of your post; the boilerplate and complexity doesn't seem worth it. I think a simpler approach is necessary, if I'm going to like it. But I confess that I have not thought carefully about how iterator functions can be used and combined when there are errors involved. edit: added missing range keyword in first example. |
@Merovius, good point about using the sequence multiple times. The
@meling, I'm not sure what this refers to. What boilerplate? |
I am a bit confused by the examples. Isn't this an issue of nested iterations? I thought the error returning function could perhaps be a good idea until I saw how it would be used and it made me think. I guess my question is whether the error is about the iterator failing (?) , which should not be an error but a fault/panic probably, or the value returned by the iterator being wrong? Said otherwise, is it possible to consider that iterators never fail while it is possible that some of them just return erroneous values (the error value being used as a sentinel)? For some errors, as soon one such error is returned, while designing the iterator, one might decide that this error is always returned so as to indicate that the object of the iteration is in an error state, or one might decide to make the iterator fault-tolerant? In which case the initial error returning code would be fine? Just check for the error within the loop and either continue or break? re. ItemJust thinking that maybe, you might want to compose your generic constructors to have specific maps with specific types that implement your desired interface. Just an idea, not sure it works. |
Interesting. Let me be the first to say, welcome to |
@jba making an iterator function that can be safely called multiple times in parallel with error functions would be a pain (this follows up on @Merovius 's concern about multiple calls - it's a nastier instance of the same problem). It might be that these two things will rarely occur at once, but when they will - the error function approach will not work out. AFAIU a key concern of range-over-func is to reduce the mental tax on programmers reading code that iterates over smth else than a slice or map. If iteration-related errors need to be reported differently in different situations, that might still be a win. But it's a smaller one than if everyone uses the same way to report errors. |
@jba I think this is more pointing to a language problem of restricting range to only support functions with 1 or 2 parameters to range over. 3 parameters would handle |
There was only one missing range (I think). I edited my comment to add it.
I think that some iterators might enumerate all values, even "bad" ones, and never actually fail. In that case I'd expect the error to be stored in the value somehow. The iteration would look normal, error-free; you'd only look at the error if you cared. Example: you parse the code of a Go module and iterate over the packages. The type is That's not the case I'm concerned about, and it's not common. The usual case is that something goes wrong while the iterator is doing its thing, like reading from a file or network. I guess you could call that the iterator failing, but I don't know why you'd want to panic there any more than you would if you didn't use an iterator and were just reading data in an ordinary loop. You want to access the error, and almost always, you want to stop iterating.
If you're using the So that leaves the first two cases: yield an error and stop, or keep yielding the same error. The first seems simpler and cleaner. That is what the |
I think we should focus on this proposal: allow range-over-func to omit iteration variables and not discuss what functions/methods that return sequences should return. No matter the outcome of this proposal there could be (also
If we focus on error it depends when the error could happen:
In my opinion a |
@szabba, I agree that calling an iterator concurrently will interact badly with error functions. But that seems like a bad idea, or maybe a useless one? When would you want to iterate over the same sequence concurrently? One case is where you're dealing with something simple an immutable, like a sequence of integers, but there are no errors there. Errors usually come into play when you're iterating over something stateful, or slow, or both: files, networks, databases, etc. Why would I want to make the same sequence of calls to my cloud service's List RPC concurrently? That seems slow and expensive. I don't think I've ever written that sort of thing. What I have done often is to loop over something once, then farm out the individual items to different goroutines. But errors in those goroutines can be handled by existing mechanisms, like I would like to see real examples where it's useful to concurrently iterate over a sequence that can have errors. |
@timothy-king, a 3-value range would be nice, but I think it's the least likely outcome. |
They're related, because I think the only good reason to disallow omitting range variables is when the second variable is an error. My point is that if that is going to happen very rarely, then it would be okay to keep the language as it is and avoid making a special case for range-over-func.
I would recommend that the sequence-creating function returns
I would rather see the error incorporated into the returned value. This case seems very rare to me. It would be nice to have some concrete examples to discuss.
Both of these are handled well by both |
I see. Makes sense. Still a bit unsure. Seems that it is easy to forget checking the iteration state after it is done. I was leaning toward the alternative of simply pausing the iteration on error (and yielding the same error unless the standard loop constructs are used, i.e. break, continue etc.). That should allow for retries, fault-tolerance, etc. depending on the error type. Basically the client would remain in control of the iteration lifetime in this case, if I'm not mistaken. |
If you are invoking a set of state machine replicas (e.g., Paxos or Raft) that should receive the same sequence of RPC calls, but some may return an error. Here you want to invoke the replicas concurrently, because doing it sequentially is too slow. You want to wait for a quorum to reply, but for liveness, you don't want to wait for all. Some of the details can be viewed here. I've experimented with a similar idea using range-over-func for another project. I've also done scanning of many different prefix-ranges over a leveldb database, which is substantially faster when done concurrently. Of course, I'm using the leveldb API iterators for this, but hopefully future DB APIs should be able to use range functions. |
This sounds like you're looping over the list of replicas and maybe the list of RPCs, both of which are in memory, so there is no error in the iteration. The replica calls happen concurrently, each in its own goroutine, but that is not iteration, just the usual pattern that It sounds like you want this function, which I have implemented as part of my hobby project:
Each prefix-range is different, so the iterators are different. We are looking for cases where the same iterator is invoked concurrently:
|
Yes, that is a feature of the
There is no way for the client—the loop body—to talk back to the iterator, so I don't see how it could ask for a retry. Actually there is a way, if the things you're iterating over contain the data but also methods that interact with the iterator function. But that that point you might as well put the error on those things too. |
IMHO the main problem with errf functions, is that you have to return it all the way to the level where iterator is actually used. Consider, for example, a Seq created on storage level, but ranged over on http handler level. You will have to pass errf function all the way to the handler level. If other levels try to wrap Seq with other fallible operations you will have to return their errf functions too. |
One scenario is parsing a log file where each line contains a structured log. But it happens that the file will get polluted with other lines that do not have structured logs, and most of the times these lines are ignored. Having err as part of the sequence it might look something like this. type MyStr struct
func parseLine(line string) (*MyStr, error)
var lines Seq2[string, error] = FileLines("/var/logs") // reuse a file reading function
parsedLines := iter.Map2(lines, func (line string, err error) (*MyStr, error) { // some sequence operator we might have in iter package
if err != nil {
return nil, err // error from FileLines, no more items will be yield
}
p, perr := parseLine(line)
// this error will not stop the iteration, maybe report it but continue, more items will be yield
return p, nil
})
parsedLines = iter.Filter2(parsedLines, func (p *MyStr, err error) bool { // another iter operator
return !(p == nil && err == nil) // ignore the items where there was a parsing error
})
for parsed, err := range parsedLines {
if err != nil {
// the error when reading file, do something, return
}
// process the parsed line
} I wonder what it might be with Maybe (which I kinda even like 🤔): type MyStr struct
func parseLine(line string) (*MyStr, error)
lines, errf = FileLines("/var/logs") // lines is Seq[string]
parsedLines := iter.Map(lines, func (line string) *MyStr { // some sequence operator we might have in iter package
p, perr := parseLine(line)
// this error will not stop the iteration, maybe report it but continue, more items will be yield
return p
})
parsedLines = iter.Filter(parsedLines, func (p *MyStr) bool { // another iter operator
return p != nil // ignore the items where there was a parsing error
})
for parsed := range parsedLines {
// process the parsed line
}
if err := errf(); err != nil {
// handle error
} Of course having the operators as methods of the sequence it becomes even nicer 😄 |
It seems like an |
In discussion during the implementation of #61405 we changed range-over-func to require mentioning iteration variables. The idea is that if we do end up with idioms like:
Then we want to diagnose:
as an error. However, this is inconsistent with other range loops and also not what the #61405 text said. Probably we should change it. Starting a separate proposal for that discussion.
The text was updated successfully, but these errors were encountered: