Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: x/exp/slices: Find function to select first element matching a condition #52006

Closed
tmaxmax opened this issue Mar 29, 2022 · 18 comments
Closed

Comments

@tmaxmax
Copy link

tmaxmax commented Mar 29, 2022

When we want to retrieve an element that matches a given condition from a slice, with the current design of the slices package we use IndexFunc:

i := slices.IndexFunc(slice, func (value T) bool {
  // a condition
})
if i == -1 {
  return
}

elem := slice[i] // or use it as slice[i]

The issue with this solution is that many times we do not need the index of the value. In those cases, we either end up with a variable declaration and an otherwise unused index variable that both hamper readability, or we name the returned index suggestively, which results in lengthy names and clumsy access to the element. For example:

var input []string

indexInputWithFooPrefix := slices.IndexFunc(input, func (value string) bool {
  return strings.HasPrefix(value, "foo")
}
if indexInputWithFooPrefix == -1 {
  // not found
}

processInput(input[indexInputWithFooPrefix])

Both patterns can get really ugly when having to find elements that match other conditions, too. Either reuse an index variable, or name all of them accordingly. Besides, the i == -1 (or i >= 0, if preferred) check adds unnecessary cognitive load.

To solve these readability and usability concerns, I propose adding a Find function to the slices package, with the following declaration:

func Find[E any](s []E, f func(T) bool) (T, bool)

The examples above become much more easier to reason about:

var input []string

inputWithFooPrefix, found := slices.Find(input, func (value string) bool {
  return strings.HasPrefix(value, "foo")
}
if !found {
  // not found
}

// do whatever

The found variable can safely be reused, as demonstrated by our usage of the "comma, ok" idiom. It could also be named ok - this way, Find would fit together with map lookups and type assertions, which would make the language more uniform.

Here are some examples from open-source repositories showing how Find could simplify code:

Before:

func (r *router) findRoute(name string) *route {
	for _, route := range r.getRoutes() {
		if route.name == name {
			return route
		}
	}

	return nil
}

After:

func (r *router) findRoute(name string) *route {
	route, _ := slices.Find(r.getRoutes(), func (rt *route) bool {
		return rt.name == name
	})
	return route
}

Before:

func GetDebuggerPath() string {
	// --snipped--
	for _, debugger := range list {
		if strings.Contains(debugger.Url, "spotify") {
			return debugger.WebSocketDebuggerUrl
		}
	}

	return ""
}

After:

func GetDebuggerPath() string {
	// --snipped--
	debugger, _ := slices.Find(list, func (d debugger) bool {
		return strings.Contains(d.Url, "spotify")
	})
	return debugger.WebSocketDebuggerUrl
}

Before:

func dbGetArticle(id string) (*Article, error) {
	for _, a := range articles {
		if a.ID == id {
			return a, nil
		}
	}
	return nil, errors.New("article not found.")
}

After:

func dbGetArticle(id string) (*Article, error) {
	article, found := slices.Find(articles, func (a *Article) bool {
		return a.ID == id
	})
	if !found {
		return nil, errors.New("article not found.")
	}
	return article, nil
}

In this case, I would have written the function like this:

func dbGetArticle(id string) (*Article, bool) {
	return slices.Find(articles, func (a *article) bool {
		return a.ID == id
	})
}

and create the error message at the call site. This is actually very similar to the use case that made me write this proposal: in our codebase we have a Store interface with many FindBy* methods, which return the type and a boolean indicated whether the value was found or not.

Before:

func (c *Cron) Entry(id EntryID) Entry {
	for _, entry := range c.Entries() {
		if id == entry.ID {
			return entry
		}
	}
	return Entry{}
}

After:

func (c *Cron) Entry(id EntryID) Entry {
	entry, _ := slices.Find(c.Entries(), func (e Entry) bool {
		return e.ID == id
	})
	return entry
}

We see that in most cases the zero value is actually helpful, which results in the found check being completely elided, in the same way we don't always check the ok value when using maps. IndexFunc does not give us this benefit - indexing with -1 is an obvious panic, so we always have to check the index. Having the Find function could even potentially eliminate all these helper methods shown here - using Find directly in code is pleasant and concise.

Adding this function would bring us in line with languages like Scala, Rust and Javascript, too.

Note: This is not similar to #50340, as that proposal modifies the existing BinarySearch functions.

@gopherbot gopherbot added this to the Proposal milestone Mar 29, 2022
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Mar 29, 2022
@earthboundkid
Copy link
Contributor

There could be two functions, Find and FindSorted, where FindSorted expects a presorted slice and the function returns -1, 0, 1 like a typical cmp function so that the lookup is O(log N).

@ianlancetaylor
Copy link
Contributor

FindSorted is covered by #50340.

@tmaxmax
Copy link
Author

tmaxmax commented Mar 29, 2022

@carlmjohnson That could prove useful, I imagine that in some cases the element at the index returned by BinarySearchFunc is needed, not the index itself. I wonder how often these cases occur in practice, though, as other languages don't seem to have bothered implementing this - their binary search counterparts only return indexes. Could you maybe provide an example where you'd use it?

@ianlancetaylor Compared to #50340, the FindSorted potentially proposed here would return the element itself similarly to Find. BinarySearchFunc would be to FindSorted what IndexFunc is to Find.

@rsc
Copy link
Contributor

rsc commented Mar 30, 2022

Given that sort.Find is going to return an index, this function returning the value at an index should not be called Find.
Perhaps First?

I'm still not sure we've established this is common enough.

@rsc rsc moved this from Incoming to Active in Proposals (old) Mar 30, 2022
@rsc
Copy link
Contributor

rsc commented Mar 30, 2022

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@mdlayher
Copy link
Member

I had intended to open this proposal myself today, nice timing! slices.First seems like a reasonable name to me, so here are two more data points in code I have written with 1.18.

  • Finding the first element of a slice where a single struct field matches the expected value

Before:

vars := p.Variables // []zedhook.Variable
i := slices.IndexFunc(vars, func(v zedhook.Variable) bool {
    return v.Key == "HOME"
})
if i == -1 {
    t.Fatal("HOME was not found in variables")
}
val := vars[i]

After:

vars := p.Variables // []zedhook.Variable
val, ok := slices.First(vars, func(v zedhook.Variable) bool {
    return v.Key == "HOME"
})
if !ok {
    t.Fatal("HOME was not found in variables")
}
  • Picking the first element out of an interface slice which matches a concrete type implementing that interface

Usage:

opt, ok := pickFirst[*ndp.PrefixInformation](options)

Before:

func pickFirst[T ndp.Option](options []ndp.Option) (T, bool) {
    for _, o := range options {
        if t, ok := o.(T); ok {
            return t, true
        }
    }

    return *new(T), false
}

After:

func pickFirst[T ndp.Option](options []ndp.Option) (T, bool) {
    return slices.First(options, func(o ndp.Option) bool {
        _, ok := o.(T)
        return ok
    })
}

Overall I am still unsure where to draw the line between writing a manual for loop or trying to experiment with new generic code. But I figured it was worth sharing my experiences if nothing else.

@tmaxmax
Copy link
Author

tmaxmax commented Mar 30, 2022

@mdlayher Thank you for the additional examples and shared sentiment!

@rsc First could be a suitable name! My only concern is that it may be difficult for newcomers to find it in the library, as its counterparts in other languages are all called Find. It may also cause confusion related to its functionality, as First could also mean the first element in the slice.

Alternatives could be to:

  • name this function something else, like FindValue or FindFirst (this variant would make adding a possible FindLast easier)
  • rename sort.Find to sort.FindIndex or similar (altough I'm not sure if this is possible, now that sort: add Find #50340 is accepted)

What I like about First is that it's short, just like Find. I still see Find as the optimal name, it provides the best ergonomics and conveys the functionality correctly.

A question we could ask ourselves is what would be more confusing: having both sort.Find and slices.Find, or looking for slices.Find when it is actually slices.First? I see slices as the more popular package, which means that by the moment you're using sort.Find you already know what slices.Find is. This is subjective, though, so I'm waiting on everyone's input.

@earthboundkid
Copy link
Contributor

I hadn't read through #50340 before. Seeing it now makes me think, the binary search case is handled by that issue and slices.First is a good name for this. I think finding the first struct that meets some predicate is a fairly common operation.

@elversatile
Copy link

I like the proposal! However, I don't like the name First -- it is very ambiguous. I'd personally prefer Find. If Find is not an option, I'd prefer if the function was called FindFirst. This way, we could also add FindLast or FindAll (returning []T).

@earthboundkid
Copy link
Contributor

earthboundkid commented Mar 31, 2022

FindAll is traditionally called “filter”. The name “Last” works about as well as “First”.

@rsc
Copy link
Contributor

rsc commented Apr 6, 2022

FindAnything is not an option. sort.Find returns an index. This function returns a value. It can't be called FindSomething, because that would make people think it returns an index.

It seems like we are looking at the difference between:

i := slices.IndexFunc(x, f)
if i < 0 { ... }
use x[i]

and

v, ok := slices.TBD(x, f)
if !ok { ... }
use v

That is, there is still a required result check, and it still is fundamentally a statement. The win is entirely not repeating x[ ] around the result.

It just doesn't seem like it comes up enough to be worth the complexity of adding new functions to the package. (And it's easily done in third-party packages.)

@brandonbloom
Copy link

brandonbloom commented Apr 10, 2022

Not sure if this would make the cut, but I went looking for a slices.Last function, since this is something I find myself doing not-infrequently:

func Last[T any](xs T[]) T {
  return xs[len(xs) - 1]
}

EDIT: Apologies, I put this comment on the wrong thread.

@tmaxmax
Copy link
Author

tmaxmax commented Apr 11, 2022

@rsc Thank you for your input! I find it doubtful naming it "Find" or Find* would make people confuse it with sort.Find, as I've argumented above. There is a counterpart for this function in every other programming language I can think of, so this name would be the most familiar. Naming aside, this function also provides the convenience of using the default value of the type, when it is useful, if no element satisfying the condition is found in the slice, eliding the found check completely. This is not possible with IndexFunc. Carrying index variables around is also not ideal, in my opinion. I could work on collecting more examples of improvements slices.Find could bring.
It is true that external packages could easily provide this utility, but given its general purpose usage and straight-forward implementation it fits better in the standard library - having to pull another library just for a function or rewrite it in every project are signs that show this aspect has been overlooked.

@brandonbloom This would work as a separate proposal, as the intended functions, possibly named First or Last, would return the first/last element that match a condition. This could also prove that these names are a poor fit, as they are already causing confusion. Either way, to retrieve the first/last element of a slice is a great addition to the standard library - they may require a length check before, to actually be useful.

@earthboundkid
Copy link
Contributor

Either way, to retrieve the first/last element of a slice is a great addition to the standard library - they may require a length check before, to actually be useful.

To me, First/Last element are only useful if they don't have a length check. There are frequently cases where you are happy to use the zero value if there is no first/last element, so jumping through hoops to get it is unhelpful.

I think the point though of slices is to get a collection of functions that are either frequently used and/or somewhat tedious to write your own version of. This is drifting away from that ideal. There are a number of fiddly details, and the easiest way of getting little details right ends up being to just write it yourself as needed.

@tmaxmax
Copy link
Author

tmaxmax commented Apr 11, 2022

@carlmjohnson I've unclearly expressed myself, as I agree with you. What I meant was to have that check inside First/Last and return a default value if the slice is empty. I concur with your second point, too - these are trivial to implement. Let's not drift away and keep this proposal focused on Find!

@rsc
Copy link
Contributor

rsc commented Apr 13, 2022

Repeating what I said in #52006 (comment), it seems like we don't have a good name and it doesn't seem to save significant amounts of code. In particular it does not serve as an expression, and we already have code that works just as well as a statement.

@rsc
Copy link
Contributor

rsc commented Apr 13, 2022

Based on the discussion above, this proposal seems like a likely decline.
— rsc for the proposal review group

@rsc rsc moved this from Active to Likely Decline in Proposals (old) Apr 13, 2022
@rsc rsc moved this from Likely Decline to Declined in Proposals (old) May 4, 2022
@rsc
Copy link
Contributor

rsc commented May 4, 2022

No change in consensus, so declined.
— rsc for the proposal review group

@rsc rsc closed this as completed May 4, 2022
@golang golang locked and limited conversation to collaborators May 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

8 participants