x/text/secure/precis: Comparison API #17386

SamWhited · 2016-10-08T15:40:38Z

Currently, the PRECIS comparison function is implemented as a method on a profile:

func (p *Profile) Compare(a, b string) bool {}

This is a pretty straight forward API, but it means we have to perform enforcement (with any additional canonical form mappings) every time we want to compare two strings. If we want to compare, say, a nickname against three other nicknames, we have to do 6 transformations (the first nickname three times and the three we're comparing it against once each). It would be useful to have an API to do comparison that required us to transform each string we want to compare only once (4 transformations in the prevoius scenario).

I see a few options:

Memoize calls to enforce internally or perform some other caching (eg. cache all steps up to where enforcement and comparison diverge, then when the user calls the comparison method only execute the last few steps, possibly caching that output too)
A function that returns a new transformer that creates canonical strings for comparison (and possibly other related functions to actually do the transformation on strings and bytes, mirroring the existing API for enforcement)
A function that returns a new canonical version of the entire profile

Internally caching enforcement operations would potentially make the package less useful on resource constrained environments (not sure if we care or not), and is potentially a lot of work. Since this package isn't for a specific use case, it may be hard to create an effective caching strategy.

Adding some functions that mirror the enforcement API make sense:

func (p *Profile) CanonicalString(s string) (string, error) {}
func (p *Profile) CanonicalBytes(b []byte) ([]byte, error)  {}
func (p *Profile) CanonicalTransformer() *Transformer {}

but this may be confusing (since the user will have to manually remember what transformation to apply in each situation, or whether or not they'd already mapped something to its canonical form before they do comparison, etc.)

Allowing them to keep a single, separate "comparison" profile may help this, but I'm not sure if it's all that much better to have to think about two separate profiles and when to use them:

func (p *Profile) CanonicalProfile() *Profile {}

Other suggestions appreciated; or maybe this is a non-issue that doesn't really matter given how fast the transformations in the x/text packages are?

/cc @mpvl @nigeltao

The text was updated successfully, but these errors were encountered:

nigeltao · 2016-10-12T01:19:32Z

A function that returns a new canonical version of the entire profile

I'm confused about how this would work. Can you elaborate on how this solves the "I have four nicknames" problem?

Similarly, I'm confused about what a CanonicalTransformer would do.

Is it because preparation and enforcement are separable steps? (Sorry, I'm not very familiar with PRECIS)?

since the user will have to manually remember... whether or not they'd already mapped something to its canonical form before they do comparison

I don't have a complete answer, but one API possibility is to have

type CanonicalString string

func (p *Profile) CanonicalString(s string) (CanonicalString, error) {}

so that you can't compare a regular and a canonical string via ==.

or maybe this is a non-issue that doesn't really matter given how fast the transformations in the x/text packages are?

I don't have any experience here, but it may be a non-issue. Has performance been a concern in practice or only in theory? If it does turn out to be a problem in practice, we can always add the CanonicalXxx methods (and types?) later.

SamWhited · 2016-10-12T01:42:33Z

A function that returns a new canonical version of the entire profile

I'm confused about how this would work. Can you elaborate on how this solves the "I have four nicknames" problem?

Similarly, I'm confused about what a CanonicalTransformer would do.
Is it because preparation and enforcement are separable steps?

Not preparation and enforcement (we got rid of preparation, and there's been a little talk about getting rid of it in a future version of the RFC too), but enforcement and comparison (the form you have to transform the string into to see if two strings are equivalant). Because there is a specific order of operations to the steps we can't just tack any extra comparison steps (eg. to lowercase before a case insensitive comparison) on to the already enforced text, we have to start all over from the original input and execute the transformations over again. Right now the only output from our comparison function is true or false, I'm proposing giving us a way to access the data from the comparison transformation:

//  Creates output useful for comparison
t := precis.Nickname.NewCompareTransformer()
a := "Compare me"
b := "Compare Me"
a = t.String(a) // We only have to do this transformation once, then we can compare with == as many times as we want against as many strings as we want without having to transform "a" again
b = t.String(b)
// a == b == "compare me"

// Create output useful for enforcement
t = precis.Nickname.NewTransformer()
a = "Compare me"
a = t.String(a)
// a == "Compare me"

type CanonicalString string

That's an interesting idea; have a special type just for comparison, keying on in maps, etc. I wonder if it would end up being confusing? I'm really not sure. Does it buy us anything over effectively just duplicating the current APIs and having separate methods for enforcement and comparison? Definitely worth thinking about.

mpvl · 2016-10-12T09:56:55Z

In the collate package there is a similar problem. I'm not saying we copy the solution there, as I'm not entirely happy with it, but we could copy the terminology, where it is called Key.
So something like CompareKey or Fold, rather than canonical would make more sense.
at the very least, I would not go overboard with growing the API surface too much. Maybe just having a AppendCompareKey() or the like is sufficient. But haven't given it too much thought.

SamWhited · 2016-10-12T13:59:11Z

In the collate package there is a similar problem. I'm not saying we copy the solution there, as I'm not entirely happy with it, but we could copy the terminology, where it is called Key.

"Key" is definitely simpler and makes more sense than "canonical/non-canonical" to me.

It probably makes sense to match the collate package's API at first; I don't think that would stop us from adding on a transformer like method later if we wanted, and we'll probably want the simple "just do this operation on bytes or a string" functions regardless of what other APIs eventually get added (if any).

gopherbot · 2016-10-12T14:40:37Z

CL https://golang.org/cl/30952 mentions this issue.

nigeltao · 2016-10-13T06:38:31Z

I'm not saying that it's a good idea, but before I forget, another API possibility is one or both of:

func (p *Profile) CompareAll(s string, candidates ...string) bool
func (p *Profile) CompareAny(s string, candidates ...string) bool

depending on whether users need && or || semantics. (Again, I'm not overly familiar with PRECIS).

DanielOaks · 2016-10-13T14:08:39Z

I'm planning to use this to create the casefolded (or key or 'canonical') version of strings because I use the casefolded version of their name as an identifier. Another point which I can't shirk is that I need to compare one user's name with hundreds/thousands of other names very often (at the moment accomplished by looking it up as part of a map).

Because of this, I'd very much enjoy being able to get the canonical/key/casefolded version of a given string somehow, and this would make me very happy indeed. Thanks for your work on this cool language :)

SamWhited · 2016-10-13T14:12:30Z

Because of this, I'd very much enjoy being able to get the canonical/key/casefolded version of a given string somehow

Yah, I think whatever the final API looks like, it needs to have a way to actually get the key. However, the CompareAll and CompareAny functions also make good sense, and might replace the existing Compare function. I'll submit another CL for that so that we can play around with it and see if it's an API we like.

Thanks for your input all.

gopherbot · 2016-10-13T14:25:38Z

CL https://golang.org/cl/30981 mentions this issue.

mpvl · 2016-10-18T16:03:53Z

I'm not a fan of the CompareAll/CompareAny API.

It adds the API surface with something that can easily be done in a simple loop
It doesn't really do much better w.r.t. allocations/performance, especially not when repeatedly comparing against the same set (which supposedly is the goal here).

SamWhited · 2016-10-18T16:06:43Z

It adds the API surface with something that can easily be done in a simple loop

That's a fair point; maybe it's worth getting rid of Compare the method entirely then and only having the APIs that return keys for comparison?

It doesn't really do much better w.r.t. allocations/performance, especially not when repeatedly comparing against the same set (which supposedly is the goal here).

I was more thinking about comparing the same initial value against lots of things, not the same set multiple times.

SamWhited · 2016-10-18T18:50:02Z

For the sake of keeping history and discussion here, I wanted to comment that I've changed the API in one of my CLs at @mpvl's suggestion:

func (p *Profile) AppendCompareKey(dst, src []byte) ([]byte, error) {}
func (p *Profile) CompareKey(s string) (string, error) {}

mpvl · 2016-10-18T19:02:23Z

@SamWhited: I think it is fine to keep Compare. It is the easiest to use if you don't care about performance.

quentinmit added the NeedsDecision label Oct 9, 2016

quentinmit added this to the Unreleased milestone Oct 9, 2016

gopherbot closed this as completed in golang/text@16e1d1f Oct 25, 2016

DanielOaks mentioned this issue Jan 10, 2017

Build failures :/ ergochat/ergo#80

Closed

golang locked and limited conversation to collaborators Oct 25, 2017

gopherbot added the FrozenDueToAge label Oct 25, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/text/secure/precis: Comparison API #17386

x/text/secure/precis: Comparison API #17386

SamWhited commented Oct 8, 2016 •

edited

Loading

nigeltao commented Oct 12, 2016

SamWhited commented Oct 12, 2016 •

edited

Loading

mpvl commented Oct 12, 2016

SamWhited commented Oct 12, 2016 •

edited

Loading

gopherbot commented Oct 12, 2016

nigeltao commented Oct 13, 2016

DanielOaks commented Oct 13, 2016

SamWhited commented Oct 13, 2016

gopherbot commented Oct 13, 2016

mpvl commented Oct 18, 2016

SamWhited commented Oct 18, 2016 •

edited

Loading

SamWhited commented Oct 18, 2016

mpvl commented Oct 18, 2016

x/text/secure/precis: Comparison API #17386

x/text/secure/precis: Comparison API #17386

Comments

SamWhited commented Oct 8, 2016 • edited Loading

nigeltao commented Oct 12, 2016

SamWhited commented Oct 12, 2016 • edited Loading

mpvl commented Oct 12, 2016

SamWhited commented Oct 12, 2016 • edited Loading

gopherbot commented Oct 12, 2016

nigeltao commented Oct 13, 2016

DanielOaks commented Oct 13, 2016

SamWhited commented Oct 13, 2016

gopherbot commented Oct 13, 2016

mpvl commented Oct 18, 2016

SamWhited commented Oct 18, 2016 • edited Loading

SamWhited commented Oct 18, 2016

mpvl commented Oct 18, 2016

SamWhited commented Oct 8, 2016 •

edited

Loading

SamWhited commented Oct 12, 2016 •

edited

Loading

SamWhited commented Oct 12, 2016 •

edited

Loading

SamWhited commented Oct 18, 2016 •

edited

Loading