x/tools/gopls: express completion scoring in terms of down-ranking factors #63282
Labels
gopls/completion
Issues related to auto-completion in gopls.
gopls
Issues related to the Go language server, gopls.
Tools
This label describes issues relating to any tools in the x/tools repository.
Milestone
While investigating #62665, I noticed that there were lots of opportunities to optimize deep completion (see also #63263). However, one obstacle to optimization is the fact that completion candidate score adjustment occurs in multiple places, and is a combination of multiplication by numbers >1 and <1, explicit setting of score, and subtraction.
It's clear that a lot of thought and experimentation went into the current scoring, and it works well overall. However, the way scoring is expressed makes it hard to reason about and hard to refactor (because operations don't commute), and therefore hard to improve.
Additionally, it means that we can never determine that the score is already too low and stop doing work. For example, in one of our kubernetes benchmarks we spend approximately 80ms in type checking and "normal" completion combined, and then another 100-200ms doing deep completion. All of that extra work produces at most 3 additional candidates. It is likely that by doing cheaper scoring operations first, we could immediately invalidate most of the candidates without having to do more expensive operations such as fuzzy matching and type inference.
I therefore believe that a first step toward improving completions should be to enforce the following rule: scores only go down, by multiplicative factors. In other words:
I think we can make this change in the existing codebase without changing anything else: each scoring operation needs to be recalibrated to express its outcome using a down-ranking factor.
Unfortunately, there is a lot of knowledge embedded in the current scoring, and therefore making this change naively is likely to result in inferior completions in the short term. @pjweinb has been looking at some large scale testing that may be useful for calibrating completion results, and may help us make this change without regression (or, perhaps, while simultaneously improving results).
CC @adonovan @muirdm @heschi, who have experience with the completion code. WDYT?
The text was updated successfully, but these errors were encountered: