Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: spec: allow range-over-func to omit iteration variables #65236

Open
rsc opened this issue Jan 23, 2024 · 40 comments
Open

proposal: spec: allow range-over-func to omit iteration variables #65236

rsc opened this issue Jan 23, 2024 · 40 comments
Labels
Milestone

Comments

@rsc
Copy link
Contributor

rsc commented Jan 23, 2024

In discussion during the implementation of #61405 we changed range-over-func to require mentioning iteration variables. The idea is that if we do end up with idioms like:

for line, err := range FileLines("/etc/passwd") {
	...
}

Then we want to diagnose:

for line := range FileLines("/etc/passwd") {
	...
}

as an error. However, this is inconsistent with other range loops and also not what the #61405 text said. Probably we should change it. Starting a separate proposal for that discussion.

@earthboundkid
Copy link
Contributor

What if it's just a vet check that looks for errors not silenced with , _? Then you could still write for key := range mapish.All().

@timothy-king
Copy link
Contributor

I don't think I understand what is being proposed. Doesn't the spec text of #61405 already "allow range-over-func to omit iteration variables"?

The example :

For example if f has type func(yield func(T1, T2) bool) bool any of these are valid:

for x, y := range f { ... }
for x, _ := range f { ... }
for _, y := range f { ... }
for x := range f { ... }
for range f { ... }

The proposed text:

As with range over other types, it is permitted to declare fewer iteration variables
than there are iteration values.

Anyways I am confused about what is being proposed here.

for line, err := range FileLines("/etc/passwd") {

If we think there is a subset of functions like FileSet that need to not ignore one of the values in RangeStmts, this seems like it could fits in well with the unusedresult checker in vet. I am not sure it makes sense to enforce usage at the language level.

@jimmyfrasche
Copy link
Member

I don't see how this buys anything other than ceremony.

@jba
Copy link
Contributor

jba commented Jan 23, 2024

We know that the design of bufio.Scanner, which requires a call to the Err method after iteration, is flawed. We have empirical evidence that people forget the call.

We can guess that range-over-func will become the dominant way to iterate over all sorts of sequences where failure is possible: file lines, DB rows, RPCs that list resources, and so on.

I'm not saying that we should require the iteration variables, but I am saying that if we don't, we need a good story for how to avoid introducing many bugs into Go code. Like a vet check for when the second return type is error that is enabled for testing.

@meling

This comment was marked as outdated.

@jimmyfrasche
Copy link
Member

I agree that not handling the error is a real problem. I do not think that iterators yielding errors solves that problem very well, especially if that means having to immediately buttress that in a way that forces everyone to use range-over-func differently than native iterators. Even if I'm wrong and yielding errors is a boon, then static analysis is more than capable of linting it only in the case when an error is iterated over.

@jba here's https://pkg.go.dev/bufio#example-Scanner.Bytes rewritten to use a hypothetical iterator that returns ([]byte, error): https://go.dev/play/p/G-Wv80AbfqF I don't think having the error handling in the loop is an improvement. One of the alternatives I saw posted was for the iterator factory to take a *error. I didn't much like the idea initially but it has grown on me: https://go.dev/play/p/DdqILPDt9B7

@meling
Copy link

meling commented Jan 24, 2024

The only benefit of passing an *error to the iterator as mentioned @jimmyfrasche is that you must declare and pass an error to the iterator, but you can still forget to check it after the loop. More problematic though is that the pattern prevents you from logging/handling individual errors.

I see a few possible patterns related to errors:

for line, err := range FileLines("/etc/passwd") {
    if err != nil {
        // log or handle error
	break
    }
    ...
}
for line, err := range FileLines("/etc/passwd") {
    if err != nil {
	return err
    }
    ...
}

Both of these are IMO better solutions than handling the error after the loop as with the bufio.Scanner example, as the error handling happens immediately after the call producing the error.

The other possible pattern (if-err-continue) is:

for line, err := range FileLines("/etc/passwd") {
    if err != nil {
        // ignore errors
	continue
    }
    ...
}

If the intent of the developer is to ignore errors, then the above could instead be written as (same semantic meaning):

for line, _ := range FileLines("/etc/passwd") {
    ...
}

The if-err-continue pattern above could be detected by a tool (gopls) and a rewrite to the simple form could be suggested (similar to some other patterns).

However, I would advice against allowing the following pattern if the iterator returns an error, because the _ is a more explicit signal that ignoring the error is intentional:

for line := range FileLines("/etc/passwd") {
    ...
}

If you want to handle/log individual errors and continue, you would still need to use this pattern:

for line, err := range FileLines("/etc/passwd") {
    if err != nil {
        // handle or log error
	continue
    }
    ...
}

Handling and logging individual errors is not possible with the handling after the loop approach.

@earthboundkid
Copy link
Contributor

We have empirical evidence that people forget the call.

Looks like they finally fixed it, but for months the marketing page for Copilot showed code with a missing call to .Err().

@jimmyfrasche
Copy link
Member

@meling To clarify:

I fully support yielding (T, error)—when the error is per iteration.

I oppose an error that signals an abnormal end to (or failure to start) the iteration being handled within the loop.

In other words, I think if you can continue on err != nil it's fine, but if you MUST break/return it's not. (Presumably in the latter case the code would still accidentally work correctly if you continued as the iteration would stop but that's not a great argument in the pattern's favor).

@seh
Copy link
Contributor

seh commented Jan 25, 2024

What about an iterator that's using a context.Context internally to govern its fetching of additional values? If the Context is done, do you think that the iterator should yield the result of Context.Err? If so, should it respect that call to yield returning true and attempt to continue, or should it ignore yield's return value?

I've been writing such an iterator, and I've changed my mind about the answers to these questions several times now.

@AndrewHarrisSPU
Copy link

What about an iterator that's using a context.Context internally to govern its fetching of additional values?

This is a really interesting question - to me a practically useful solution has been to delay passing context.Context far enough that it isn't deeply internal. In other words, don't pass around an iterator with an internal context, pass around a func(context.Context) -> iter.Seq and a context.Context as far as possible, so that the base context.Context is in scope where it should be checked. If a computation using iterated values needs to branch, the context should be in scope.

@seh
Copy link
Contributor

seh commented Jan 26, 2024

In other words, don't pass around an iterator with an internal context, pass around a func(context.Context) -> iter.Seq and a context.Context as far as possible, so that the base context.Context is in scope where it should be checked.

That's already the case for my approach, but it doesn't obviate the question of whether the iterator should yield Context.Err when it detects that the Context is done.

@thediveo
Copy link

That's already the case for my approach, but it doesn't obviate the question of whether the iterator should yield Context.Err when it detects that the Context is done.

What would be a reason for the iterator to not return the ctx.Err()?

@Merovius
Copy link
Contributor

This seems certainly an interesting API design question, but I don't think it's something that needs to be answered here. If it doesn't yield an error, then there is no question. If it does, that doesn't seem different from any other error, as far as this issue is concerned.

Personally, I'd just leave that up to the third party package, in any case, if there is no clearly correct answer.

@seh
Copy link
Contributor

seh commented Jan 27, 2024

What would be a reason for the iterator to not return the ctx.Err()?

It could just stop the iteration upon detecting that the Context is done, and leave it to the caller to discern that may have been the cause. Supplying the Context as a governor for how long to try acquiring more items means that the caller expects that the effort may need to cease prematurely.

@stevedalton
Copy link

stevedalton commented Feb 1, 2024

Maybe pure blasphemy, but

for err, line := range FileLines("/etc/passwd") {...}

helps force the issue for error handling.

@ianlancetaylor
Copy link
Contributor

It's a moderately common mistake to write for v := range s when the author really meant for _, v := range s. This mistake is undetected at compile time if s is []int. Perhaps the mistake was in permitting range variables to be omitted at all. We can't change that for the existing types, but we don't need to repeat the mistake for the new function types.

@szabba
Copy link

szabba commented Feb 10, 2024

It probably could be changed the way loop iteration variable semantics were changed?

Different types being iterated over having different rules for omitting loop variables feels very surprising. I think if it gets addressed, it should be addressed for all types that can be iterated over in for ... range.

@isgj
Copy link

isgj commented Feb 11, 2024

We can't change that for the existing types, but we don't need to repeat the mistake for the new function types.

Then it was just re-introduced with range-over-int

for range 5 {
        ...
}

(errors are simple values, nothing special)

+1 for the vet rule though

@jba
Copy link
Contributor

jba commented Feb 20, 2024

Here is an experience report that bears on this issue.

Previously, I advocated that iterators that can fail should return iter.Seq2[T, error], so loops would look like

for t, err := range seq {...}

Over the weekend I converted a hobby project to use the experimental range-over-func feature. Part of the project is a persistent map: Map[K, V any]. Since items are read from disk, there is always the chance of an error.

I started by defining

func (*Map[K, V]) Keys() iter.Seq2[K, error]
func (*Map[K, V]) Values() iter.Seq2[V, error]
func (*Map[K, V]) Items() iter.Seq2[Item[K, V], error]

As I anticipated, to iterate over keys-value pairs I would need a type for those pairs, since there is no 3-variable range statement. That was always going to be a little awkward. In my mind I was imagining loops like

for item, err := range productsByID.Items() {
    if err != nil {...}
    fmt.Printf("id=%d, product=%+v\n", item.ID, item.Product)
}

where I could at least use good names for the keys and values. But these maps are generic, which means Item[K, V] is generic, which means the fields are never going to be any better than Key and Value. At this point we've gone from

for id, product := productsByID  // normal map iteration

to

for item, err := range productsByID.Items() {
   if err ...
   ... item.Key ... item.Value
}

And that started to really feel ugly. So I started looking for another way. First I used pointer-to-error arguments, as mentioned above:

func (*Map[K, V]) Items(errp *error) iter.Seq2[K, V]

That's an improvement, but it still made me uncomfortable. For one thing, although you do have to pass an argument, the compiler doesn't help you check it—it considers passing the pointer to be using the variable, so it doesn't fail with "err declared and not used" if you don't check err.

Another problem is that you might check the error too soon:

var err error
seq := productsByID.Items(&err)
for id, product := range seq {
   if err != nil {...}
  ...
}

That's definitely a mistake—err will always be nil inside the loop—but nothing catches it.

Where I ended up was something that is obvious in hindsight:

func (Map[K, V]) Items() (iter.Seq2[K, V], func() error)

You use it like this:

seq, errf := productsByID.Items()
for id, product := range seq {
    ...
}
if err := errf(); err != nil {
   return err
}

(Like the pointer-to-error approach, this assumes that the first error ends the loop.)

This might look like it still has the problems of bufio.Scanner. But it doesn't, not quite. It's unfortunate that the error check is far away from the source. But unlike Scanner.Err, which doesn't appear in your code unless you write it, here we have errf, which the compiler will complain about if you don't use it. (Two caveats to that: if this is your second loop in a function and you've already defined errf, you won't get an error if you don't use the second one; and the compiler will only insist that you use errf, not call it.)

Having the compiler complain about an unused errf is a fix for the first problem of error pointers: no notice if you don't check them. And we can fix the second problem, checking the error too early: since errf is a function, we can make it panic if it's called before the end of the loop.

Why is this obvious in hindsight? Remember that iter.Seq is a function type. Just as we move from sequences:

itemSlice := m.ItemsSlice() // returns a slice (a sequence)

to functions over sequences:

itemSeq := m.Items() //  returns a function over a sequence

so we move from errors:

itemSlice, err := m.ItemsSlice()

to functions over errors:

itemSeq, errf := m.Items()

A Haskell programmer would say that we've "lifted" our values into the space of functions. (I think.)

For every utility function that "collapses" a sequence to a value like a slice, we can have a variant that takes an error function:

func ToSlice[T any](iter.Seq[T]) []T
func ToSliceErrf[T any](iter.Seq[T], func() error) ([]T, error)

So simple uses of iterators, ones where you just want the get the slice (or map or whatever), still can be simple:

itemSlice, err := ToSliceErrf(productsByID.Items())

For many other functions on iterators, like Map, Filter and Reduce, getting the error out of the iter.Seq means that we don't need error-aware variants of those. If there is an error, they will just observe a shorter sequence and stop early.

Unfortunately functions that take multiple sequences will need special attention. For example, consider the function that Python calls Chain, which takes a list of sequences and returns their concatenation. If the first sequence fails, Chain will just move on to the next. In general, there is no safe time to call the error function for the initial sequence until the whole chain finishes. I haven't thought carefully about the best approach for those kinds of functions. But this problem isn't severe enough to make me want to give up on error functions.

Writing an iterator that returns an error function involves some boilerplate that we can encapsulate in a type:

type ErrorState ...

func (es *ErrorState) Done() // indicates the iteration is done
func (es *ErrorState) Set(err error)  // sets the error if err != nil
func (es *ErrorState) Func() func() error // returns an error function

Here is how you would use it to write an iterator over the lines of a file:

func Lines(filename string) (iter.Seq[string], func() error) {
	var es ErrorState
	return func(yield func(string) bool) {
		defer es.Done()
		f, err := os.Open(filename)
		if err != nil {
			es.Set(err)
			return
		}
		defer f.Close()
		s := bufio.NewScanner(f)
		for s.Scan() {
			if !yield(s.Text()) {
				break
			}
		}
		es.Set(s.Err())
	}, es.Func()
}

@Merovius
Copy link
Contributor

@jba IMO that is still problematic. Note that an iter.Seq can be called (iterated over) multiple times. To me, your "error-function return" doesn't really tell the user how errf calls correspond to iterations. That is, if I iterate over the returned iter.Seq twice, is the errf re-used? I'll note that it doesn't get reset in your ErrorState abstraction, though that can be fixed by adding a es.Set(nil) at the top of the function-literal.

Not that this isn't also a problem with taking a pointer (which I also don't like, FWIW). I just don't feel the same level of "this solves everything" optimism I read in your post.

@meling
Copy link

meling commented Feb 20, 2024

@jba Thank you for the report. Comparing this:

for item, err := range productsByID.Items() {
   if err != nil {
      // handle error or return or continue
   }
   id, product := item.Key, item.Value
   ...
}

to this:

seq, errf := productsByID.Items()
for id, product := range seq {
    ...
}
if err := errf(); err != nil {
   return err
}

There are no extra lines. For the drawbacks of the latter, I'm still very much in favor of the former approach; it is simple and easy to understand.

Regarding the functional part of your post; the boilerplate and complexity doesn't seem worth it. I think a simpler approach is necessary, if I'm going to like it. But I confess that I have not thought carefully about how iterator functions can be used and combined when there are errors involved.

edit: added missing range keyword in first example.

@jba
Copy link
Contributor

jba commented Feb 20, 2024

@Merovius, good point about using the sequence multiple times. The ErrorState should be reset at the start of the iteration function, as you suggest. I don't feel that "this solves everything," it just seems to offer better trade-offs than the other approaches.

Regarding the functional part of your post; the boilerplate and complexity doesn't seem worth it.

@meling, I'm not sure what this refers to. What boilerplate?

@atdiar
Copy link

atdiar commented Feb 20, 2024

I am a bit confused by the examples.
Are there missing "range" keywords?

Isn't this an issue of nested iterations?

I thought the error returning function could perhaps be a good idea until I saw how it would be used and it made me think.

I guess my question is whether the error is about the iterator failing (?) , which should not be an error but a fault/panic probably, or the value returned by the iterator being wrong?
(the latter which is similar in spirit to the traditional comma, ok idiom with maps retrieval)

Said otherwise, is it possible to consider that iterators never fail while it is possible that some of them just return erroneous values (the error value being used as a sentinel)?

For some errors, as soon one such error is returned, while designing the iterator, one might decide that this error is always returned so as to indicate that the object of the iteration is in an error state, or one might decide to make the iterator fault-tolerant?

In which case the initial error returning code would be fine? Just check for the error within the loop and either continue or break?

re. Item

Just thinking that maybe, you might want to compose your generic constructors to have specific maps with specific types that implement your desired interface.
If Item[K, V] is too generic, maybe that it should be a type parameter that implements a given interface instead?

Just an idea, not sure it works.

@earthboundkid
Copy link
Contributor

Interesting. Let me be the first to say, welcome to errf. 😆

@szabba
Copy link

szabba commented Feb 20, 2024

@jba making an iterator function that can be safely called multiple times in parallel with error functions would be a pain (this follows up on @Merovius 's concern about multiple calls - it's a nastier instance of the same problem). It might be that these two things will rarely occur at once, but when they will - the error function approach will not work out.

AFAIU a key concern of range-over-func is to reduce the mental tax on programmers reading code that iterates over smth else than a slice or map. If iteration-related errors need to be reported differently in different situations, that might still be a win. But it's a smaller one than if everyone uses the same way to report errors.

@meling
Copy link

meling commented Feb 20, 2024

@meling, I'm not sure what this refers to. What boilerplate?

@jba I was referring to the three-method ErrorState interface and the complexity of using it.

@timothy-king
Copy link
Contributor

As I anticipated, to iterate over keys-value pairs I would need a type for those pairs, since there is no 3-variable range statement.

@jba I think this is more pointing to a language problem of restricting range to only support functions with 1 or 2 parameters to range over. 3 parameters would handle <K,V,error> neatly. This is doable, but with some difficulty (Hint: *ast.RangeStmt.{Key,Value}). Also would there be an iter.Seq3 in the standard library?

@jba
Copy link
Contributor

jba commented Feb 20, 2024

@atdiar:

am a bit confused by the examples. Are there missing "range" keywords?

There was only one missing range (I think). I edited my comment to add it.

I guess my question is whether the error is about the iterator failing (?) , which should not be an error but a fault/panic probably, or the value returned by the iterator being wrong?
(the latter which is similar in spirit to the traditional comma, ok idiom with maps retrieval)

Said otherwise, is it possible to consider that iterators never fail while it is possible that some of them just return erroneous values (the error value being used as a sentinel)?

I think that some iterators might enumerate all values, even "bad" ones, and never actually fail. In that case I'd expect the error to be stored in the value somehow. The iteration would look normal, error-free; you'd only look at the error if you cared. Example: you parse the code of a Go module and iterate over the packages. The type is iter.Seq[*Package], and there is an Error field of Package.

That's not the case I'm concerned about, and it's not common. The usual case is that something goes wrong while the iterator is doing its thing, like reading from a file or network. I guess you could call that the iterator failing, but I don't know why you'd want to panic there any more than you would if you didn't use an iterator and were just reading data in an ordinary loop. You want to access the error, and almost always, you want to stop iterating.

For some errors, as soon one such error is returned, while designing the iterator, one might decide that this error is always returned so as to indicate that the object of the iteration is in an error state, or one might decide to make the iterator fault-tolerant?

If you're using the iter.Seq2[T, error] approach, then you could do a few different things: keep yielding the same error, yield the error once and then terminate, or yield the error but then forget about it and do the operation again the next time around the loop. I think the last choice is probably a bad idea and probably not going to get you anywhere; if the implementor of the iterator thought it could keep making progress after an error, I think they should instead model that as I described above, where the error gets put inside the value being returned.

So that leaves the first two cases: yield an error and stop, or keep yielding the same error. The first seems simpler and cleaner. That is what the errf idiom does, except that after the iteration stops you have to call errf to see if there was actually an error.

@isgj
Copy link

isgj commented Feb 20, 2024

I think we should focus on this proposal: allow range-over-func to omit iteration variables and not discuss what functions/methods that return sequences should return.

No matter the outcome of this proposal there could be (also Seq2[T,V] versions):

  • SomeF() Seq[T]
  • SomeF() (Seq[T], err), maybe here the error tells the sequence couldn't be initialized (if err != nil before for ...range)
  • ... whatever the language allows to return

If we focus on error it depends when the error could happen:

  • on initialization of the sequence like above
  • on iteration
    • error on this iteration, but the sequence will continue yielding values where future errors could be nil or something
    • no more values will be yield
  • on cleanup

In my opinion a Seq2[T, err] can cover nicely all kind of errors.

@jba
Copy link
Contributor

jba commented Feb 20, 2024

@szabba, I agree that calling an iterator concurrently will interact badly with error functions. But that seems like a bad idea, or maybe a useless one? When would you want to iterate over the same sequence concurrently? One case is where you're dealing with something simple an immutable, like a sequence of integers, but there are no errors there. Errors usually come into play when you're iterating over something stateful, or slow, or both: files, networks, databases, etc. Why would I want to make the same sequence of calls to my cloud service's List RPC concurrently? That seems slow and expensive. I don't think I've ever written that sort of thing.

What I have done often is to loop over something once, then farm out the individual items to different goroutines. But errors in those goroutines can be handled by existing mechanisms, like golang.org/x/sync/errgroup.

I would like to see real examples where it's useful to concurrently iterate over a sequence that can have errors.

@jba
Copy link
Contributor

jba commented Feb 20, 2024

@timothy-king, a 3-value range would be nice, but I think it's the least likely outcome.

@jba
Copy link
Contributor

jba commented Feb 20, 2024

I think we should focus on this proposal: #65236 (comment) and not discuss what functions/methods that return sequences should return.

They're related, because I think the only good reason to disallow omitting range variables is when the second variable is an error. My point is that if that is going to happen very rarely, then it would be okay to keep the language as it is and avoid making a special case for range-over-func.

  • on initialization of the sequence like above

I would recommend that the sequence-creating function returns error as a second return value, especially if the sequence itself can't have errors.

  • error on this iteration, but the sequence will continue yielding values where future errors could be nil or something

I would rather see the error incorporated into the returned value. This case seems very rare to me. It would be nice to have some concrete examples to discuss.

  • no more values will be yield
  • on cleanup

Both of these are handled well by both iter.Seq[T, error] and errf.

@atdiar
Copy link

atdiar commented Feb 20, 2024

I see. Makes sense. Still a bit unsure. Seems that it is easy to forget checking the iteration state after it is done.
Perhaps it can be made semi-mandatory somehow reusing unused variable checks.

I was leaning toward the alternative of simply pausing the iteration on error (and yielding the same error unless the standard loop constructs are used, i.e. break, continue etc.).
That means that when the iteration can fail (signified by the presence of the error return), the code has to handle the failure mode appropriately or it loops infinitely.
Might be easier to check for mistakes if things go obviously wrong?

That should allow for retries, fault-tolerance, etc. depending on the error type.

Basically the client would remain in control of the iteration lifetime in this case, if I'm not mistaken.

@meling
Copy link

meling commented Feb 20, 2024

@szabba, I agree that calling an iterator concurrently will interact badly with error functions. But that seems like a bad idea, or maybe a useless one? When would you want to iterate over the same sequence concurrently? One case is where you're dealing with something simple an immutable, like a sequence of integers, but there are no errors there. Errors usually come into play when you're iterating over something stateful, or slow, or both: files, networks, databases, etc. Why would I want to make the same sequence of calls to my cloud service's List RPC concurrently? That seems slow and expensive. I don't think I've ever written that sort of thing.

What I have done often is to loop over something once, then farm out the individual items to different goroutines. But errors in those goroutines can be handled by existing mechanisms, like golang.org/x/sync/errgroup.

I would like to see real examples where it's useful to concurrently iterate over a sequence that can have errors.

If you are invoking a set of state machine replicas (e.g., Paxos or Raft) that should receive the same sequence of RPC calls, but some may return an error. Here you want to invoke the replicas concurrently, because doing it sequentially is too slow. You want to wait for a quorum to reply, but for liveness, you don't want to wait for all. Some of the details can be viewed here. I've experimented with a similar idea using range-over-func for another project.

I've also done scanning of many different prefix-ranges over a leveldb database, which is substantially faster when done concurrently. Of course, I'm using the leveldb API iterators for this, but hopefully future DB APIs should be able to use range functions.

@jba
Copy link
Contributor

jba commented Feb 20, 2024

If you are invoking a set of state machine replicas (e.g., Paxos or Raft) that should receive the same sequence of RPC calls, but some may return an error. Here you want to invoke the replicas concurrently, because doing it sequentially is too slow.

This sounds like you're looping over the list of replicas and maybe the list of RPCs, both of which are in memory, so there is no error in the iteration. The replica calls happen concurrently, each in its own goroutine, but that is not iteration, just the usual pattern that errgroup works well for.

It sounds like you want this function, which I have implemented as part of my hobby project:

// ConcurrentMapError calls f on each element of seq concurrently,
// returning a sequence of f's results.
//
// The concurrency argument controls the number of goroutines that can run concurrently.
// 0 is equivalent to [runtime.NumCPUs].
//
// If any invocation of f returns a non-nil error, the returned sequence ends.
// f may still be called on some, possibly all, subsequent elements of seq.
// After the returned sequence ends, errf can be called to obtain the error.
// If errf is called before the sequence ends, it panics.
func ConcurrentMapError[From, To any](
        ctx context.Context,
        concurrency int,
        seq iter.Seq[From],
        f func(From) (To, error),
) (_ iter.Seq[To], errf func() error) 

I've also done scanning of many different prefix-ranges over a leveldb database, which is substantially faster when done concurrently.

Each prefix-range is different, so the iterators are different. We are looking for cases where the same iterator is invoked concurrently:

seq := MyIterator(args)
for i := range 10 {
   go func() {
        for x := range seq {...}
    }()
}

@jba
Copy link
Contributor

jba commented Feb 20, 2024

Perhaps it can be made semi-mandatory somehow reusing unused variable checks.

Yes, that is a feature of the errf idea.

That means that when the iteration can fail (signified by the presence of the error return), the code has to handle the failure mode appropriately or it loops infinitely...That should allow for retries, fault-tolerance, etc. depending on the error type.

There is no way for the client—the loop body—to talk back to the iterator, so I don't see how it could ask for a retry.

Actually there is a way, if the things you're iterating over contain the data but also methods that interact with the iterator function. But that that point you might as well put the error on those things too.

@DmitriyMV
Copy link
Contributor

IMHO the main problem with errf functions, is that you have to return it all the way to the level where iterator is actually used. Consider, for example, a Seq created on storage level, but ranged over on http handler level. You will have to pass errf function all the way to the handler level.

If other levels try to wrap Seq with other fallible operations you will have to return their errf functions too.

@isgj
Copy link

isgj commented Feb 24, 2024

  • error on this iteration, but the sequence will continue yielding values where future errors could be nil or something

I would rather see the error incorporated into the returned value. This case seems very rare to me. It would be nice to have some concrete examples to discuss.

One scenario is parsing a log file where each line contains a structured log. But it happens that the file will get polluted with other lines that do not have structured logs, and most of the times these lines are ignored.

Having err as part of the sequence it might look something like this.

type MyStr struct

func parseLine(line string) (*MyStr, error)

var lines Seq2[string, error] = FileLines("/var/logs")  // reuse a file reading function

parsedLines := iter.Map2(lines, func (line string, err error) (*MyStr, error) {  // some sequence operator we might have in iter package
        if err != nil {
                return nil, err // error from FileLines, no more items will be yield
        }
        p, perr := parseLine(line)
        // this error will not stop the iteration, maybe report it but continue, more items will be yield
        return p, nil
})

parsedLines = iter.Filter2(parsedLines, func (p *MyStr, err error) bool {  // another iter operator
        return !(p == nil && err == nil) // ignore the items where there was a parsing error
})

for parsed, err := range parsedLines {
        if err != nil {
                // the error when reading file, do something, return
        }
        // process the parsed line
}

I wonder what it might be with errf?

Maybe (which I kinda even like 🤔):

type MyStr struct

func parseLine(line string) (*MyStr, error)

lines, errf  = FileLines("/var/logs")  // lines is Seq[string]

parsedLines := iter.Map(lines, func (line string) *MyStr {  // some sequence operator we might have in iter package
        p, perr := parseLine(line)
        // this error will not stop the iteration, maybe report it but continue, more items will be yield
        return p
})

parsedLines = iter.Filter(parsedLines, func (p *MyStr) bool {  // another iter operator
        return p != nil   // ignore the items where there was a parsing error
})

for parsed := range parsedLines {
        // process the parsed line
}

if err := errf(); err != nil {
        // handle error
}

Of course having the operators as methods of the sequence it becomes even nicer 😄

@AndrewHarrisSPU
Copy link

It seems like an errf could be a more circumscribed alternative to context or errgroup - it can decide to block internally, like context.Done(), to be more idempotent with respect to non-determinism in scheduling. It looks promising.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

No branches or pull requests