Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/secure/precis: Comparison API #17386

Closed
SamWhited opened this issue Oct 8, 2016 · 13 comments
Closed

x/text/secure/precis: Comparison API #17386

SamWhited opened this issue Oct 8, 2016 · 13 comments
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@SamWhited
Copy link
Member

SamWhited commented Oct 8, 2016

Currently, the PRECIS comparison function is implemented as a method on a profile:

func (p *Profile) Compare(a, b string) bool {}

This is a pretty straight forward API, but it means we have to perform enforcement (with any additional canonical form mappings) every time we want to compare two strings. If we want to compare, say, a nickname against three other nicknames, we have to do 6 transformations (the first nickname three times and the three we're comparing it against once each). It would be useful to have an API to do comparison that required us to transform each string we want to compare only once (4 transformations in the prevoius scenario).

I see a few options:

  1. Memoize calls to enforce internally or perform some other caching (eg. cache all steps up to where enforcement and comparison diverge, then when the user calls the comparison method only execute the last few steps, possibly caching that output too)
  2. A function that returns a new transformer that creates canonical strings for comparison (and possibly other related functions to actually do the transformation on strings and bytes, mirroring the existing API for enforcement)
  3. A function that returns a new canonical version of the entire profile

Internally caching enforcement operations would potentially make the package less useful on resource constrained environments (not sure if we care or not), and is potentially a lot of work. Since this package isn't for a specific use case, it may be hard to create an effective caching strategy.

Adding some functions that mirror the enforcement API make sense:

func (p *Profile) CanonicalString(s string) (string, error) {}
func (p *Profile) CanonicalBytes(b []byte) ([]byte, error)  {}
func (p *Profile) CanonicalTransformer() *Transformer {}

but this may be confusing (since the user will have to manually remember what transformation to apply in each situation, or whether or not they'd already mapped something to its canonical form before they do comparison, etc.)

Allowing them to keep a single, separate "comparison" profile may help this, but I'm not sure if it's all that much better to have to think about two separate profiles and when to use them:

func (p *Profile) CanonicalProfile() *Profile {}

Other suggestions appreciated; or maybe this is a non-issue that doesn't really matter given how fast the transformations in the x/text packages are?

/cc @mpvl @nigeltao

@quentinmit quentinmit added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Oct 9, 2016
@quentinmit quentinmit added this to the Unreleased milestone Oct 9, 2016
@nigeltao
Copy link
Contributor

A function that returns a new canonical version of the entire profile

I'm confused about how this would work. Can you elaborate on how this solves the "I have four nicknames" problem?

Similarly, I'm confused about what a CanonicalTransformer would do.

Is it because preparation and enforcement are separable steps? (Sorry, I'm not very familiar with PRECIS)?

since the user will have to manually remember... whether or not they'd already mapped something to its canonical form before they do comparison

I don't have a complete answer, but one API possibility is to have

type CanonicalString string

func (p *Profile) CanonicalString(s string) (CanonicalString, error) {}

so that you can't compare a regular and a canonical string via ==.

or maybe this is a non-issue that doesn't really matter given how fast the transformations in the x/text packages are?

I don't have any experience here, but it may be a non-issue. Has performance been a concern in practice or only in theory? If it does turn out to be a problem in practice, we can always add the CanonicalXxx methods (and types?) later.

@SamWhited
Copy link
Member Author

SamWhited commented Oct 12, 2016

A function that returns a new canonical version of the entire profile

I'm confused about how this would work. Can you elaborate on how this solves the "I have four nicknames" problem?

Similarly, I'm confused about what a CanonicalTransformer would do.
Is it because preparation and enforcement are separable steps?

Not preparation and enforcement (we got rid of preparation, and there's been a little talk about getting rid of it in a future version of the RFC too), but enforcement and comparison (the form you have to transform the string into to see if two strings are equivalant). Because there is a specific order of operations to the steps we can't just tack any extra comparison steps (eg. to lowercase before a case insensitive comparison) on to the already enforced text, we have to start all over from the original input and execute the transformations over again. Right now the only output from our comparison function is true or false, I'm proposing giving us a way to access the data from the comparison transformation:

//  Creates output useful for comparison
t := precis.Nickname.NewCompareTransformer()
a := "Compare me"
b := "Compare Me"
a = t.String(a) // We only have to do this transformation once, then we can compare with == as many times as we want against as many strings as we want without having to transform "a" again
b = t.String(b)
// a == b == "compare me"

// Create output useful for enforcement
t = precis.Nickname.NewTransformer()
a = "Compare me"
a = t.String(a)
// a == "Compare me"

type CanonicalString string

That's an interesting idea; have a special type just for comparison, keying on in maps, etc. I wonder if it would end up being confusing? I'm really not sure. Does it buy us anything over effectively just duplicating the current APIs and having separate methods for enforcement and comparison? Definitely worth thinking about.

@mpvl
Copy link
Contributor

mpvl commented Oct 12, 2016

In the collate package there is a similar problem. I'm not saying we copy the solution there, as I'm not entirely happy with it, but we could copy the terminology, where it is called Key.
So something like CompareKey or Fold, rather than canonical would make more sense.
at the very least, I would not go overboard with growing the API surface too much. Maybe just having a AppendCompareKey() or the like is sufficient. But haven't given it too much thought.

@SamWhited
Copy link
Member Author

SamWhited commented Oct 12, 2016

In the collate package there is a similar problem. I'm not saying we copy the solution there, as I'm not entirely happy with it, but we could copy the terminology, where it is called Key.

"Key" is definitely simpler and makes more sense than "canonical/non-canonical" to me.

It probably makes sense to match the collate package's API at first; I don't think that would stop us from adding on a transformer like method later if we wanted, and we'll probably want the simple "just do this operation on bytes or a string" functions regardless of what other APIs eventually get added (if any).

@gopherbot
Copy link

CL https://golang.org/cl/30952 mentions this issue.

@nigeltao
Copy link
Contributor

I'm not saying that it's a good idea, but before I forget, another API possibility is one or both of:

func (p *Profile) CompareAll(s string, candidates ...string) bool
func (p *Profile) CompareAny(s string, candidates ...string) bool

depending on whether users need && or || semantics. (Again, I'm not overly familiar with PRECIS).

@DanielOaks
Copy link

I'm planning to use this to create the casefolded (or key or 'canonical') version of strings because I use the casefolded version of their name as an identifier. Another point which I can't shirk is that I need to compare one user's name with hundreds/thousands of other names very often (at the moment accomplished by looking it up as part of a map).

Because of this, I'd very much enjoy being able to get the canonical/key/casefolded version of a given string somehow, and this would make me very happy indeed. Thanks for your work on this cool language :)

@SamWhited
Copy link
Member Author

Because of this, I'd very much enjoy being able to get the canonical/key/casefolded version of a given string somehow

Yah, I think whatever the final API looks like, it needs to have a way to actually get the key. However, the CompareAll and CompareAny functions also make good sense, and might replace the existing Compare function. I'll submit another CL for that so that we can play around with it and see if it's an API we like.

Thanks for your input all.

@gopherbot
Copy link

CL https://golang.org/cl/30981 mentions this issue.

@mpvl
Copy link
Contributor

mpvl commented Oct 18, 2016

I'm not a fan of the CompareAll/CompareAny API.

  • It adds the API surface with something that can easily be done in a simple loop
  • It doesn't really do much better w.r.t. allocations/performance, especially not when repeatedly comparing against the same set (which supposedly is the goal here).

@SamWhited
Copy link
Member Author

SamWhited commented Oct 18, 2016

It adds the API surface with something that can easily be done in a simple loop

That's a fair point; maybe it's worth getting rid of Compare the method entirely then and only having the APIs that return keys for comparison?

It doesn't really do much better w.r.t. allocations/performance, especially not when repeatedly comparing against the same set (which supposedly is the goal here).

I was more thinking about comparing the same initial value against lots of things, not the same set multiple times.

@SamWhited
Copy link
Member Author

For the sake of keeping history and discussion here, I wanted to comment that I've changed the API in one of my CLs at @mpvl's suggestion:

func (p *Profile) AppendCompareKey(dst, src []byte) ([]byte, error) {}
func (p *Profile) CompareKey(s string) (string, error) {}

@mpvl
Copy link
Contributor

mpvl commented Oct 18, 2016

@SamWhited: I think it is fine to keep Compare. It is the easiest to use if you don't care about performance.

@golang golang locked and limited conversation to collaborators Oct 25, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

6 participants