Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text: add precis implementation #13336

Closed
SamWhited opened this issue Nov 20, 2015 · 53 comments
Closed

x/text: add precis implementation #13336

SamWhited opened this issue Nov 20, 2015 · 53 comments
Labels
FrozenDueToAge Proposal Testing An issue that has been verified to require only test changes, not just a test failure.

Comments

@SamWhited
Copy link
Member

I would like to see the golang.org/x/text packages contain a PRECIS implementation.

PRECIS is a replacement for stringprep which allows for the classification of characters based on Unicode properties. For more information on PRECIS see RFC 7564 (txt)

I am filing this issue in accordance with the contributing document, and have started implementing a package which might be used for a base here: https://bitbucket.org/mellium/golang-x-text. I will be happy to keep iterating on this if it does look like something that would warrant inclusion in x/text. Feedback on the approach would be appreciated (though perhaps the mailing list is a better place for that? The guidelines are a bit vague when it comes to the x packages and how their development is handled).

@bradfitz
Copy link
Contributor

/cc @mpvl for thoughts

@bradfitz bradfitz changed the title Package proposal: x/text/unicode/precis proposal: add PRECIS package x/text/unicode/precis Nov 20, 2015
@mpvl
Copy link
Contributor

mpvl commented Nov 20, 2015

Thanks for the proposal. I'm fully in favor of adding this functionality to the text repo.

That said, I have a few remarks and suggestions on the current design:

  • Usage of this package should be encouraged, and putting it under the unicode directory might be hiding it too much.
  • Similarly, the name precis may be too obscure. Perhaps "golang.org/x/text/secure" (secure.Identifier, secure.FreeForm would be of type secure.Profile)?. Open to suggestions, though.
  • The operations prescribed by RFC 7564 are inherently string-based, instead of rune-based:
    • Exposing this as Transforms (with the usual helper methods) would be more appropriate.
    • Tries seem a more natural data structure here. RangeTables can be used for now, but I would not expose them in the API.
    • I would suggest not having any rune-based API for now (except for in an internal package, if needed).
  • I'm a bit worried that this will possibly result in a third way to compare strings (collate, cases). OTOH, this clearly serves a purpose and is more appropriate than using the inherently low-level cases/width etc.
  • The Class type should not be an integer?:
    • Different classes may import different tables, but this design will pull all in unconditionally.
    • We should allow for user profiles, as the standard suggests. I would make this the same type.
  • This package may potentially pull in a huge amount of data (unicode/norm, cases, width, bidi (TODO), etc.) . The API should be designed to drop tables if not needed. (very doable.)

@mpvl
Copy link
Contributor

mpvl commented Nov 20, 2015

FWIW, my first hunch at an API that is in line with the rest of the text repo would be the below. A bit elaborate, but it allows for extensive optimization as most used transformers are stateless.
Usage examples:

  • s, err := secure.Identifier.String(str)
  • if err := secure.Identifier.Valid(b); err != nil {
  • p, err := secure.NewIdentifier(secure.Norm(norm.NFKD), secure.Width(width.Fold), secure.Disallow(runes.In(unicode.Mn)))
    if p.Equal(a, b) { … }

This is a quick sketch, so by no means saying this is how it should be.

package secure

type Profile struct{}

func NewIdentifier(opts …Option) (_Profile, error)
func NewFreeForm(opts …Option) (_Profile, error)
// Creation of a profile could be expensive. Basing new profiles on old ones is explicitly not allowed, as per the RFC.

var Identifier *Profile
var FreeForm *Profile

// principle methods
func (p *Profile) Allowed() runes.Set
func (p *Profile) NewTransformer() *Transformer

// wrapper methods for Transformer equivalents, allowing optimization and simpler API when transformer is stateless (most cases).
func (p *Profile) String(s string) (string, error)
func (p *Profile) Bytes(b []byte) (string, error)
func (p *Profile) Valid(b []byte) error
func (p *Profile) ValidString(s string) error
func (p *Profile) Equal(a, b []byte) bool
func (p *Profile) EqualString(a, b string) bool

type Option func(*options)

func Norm(f norm.Form) Option
func Transform(f …func() transform.Transform) Option
// func Cases(func() transform.Transformer) Option
func FoldCase(t language.Tag) Option
func Width(w *width.Transformer) Option
func Disallow(set runes.Set) Option
var BidiRule Option
...

type Transformer
// Transformer API plus equivalent of wrapper methods above.

@SamWhited
Copy link
Member Author

@mpvl Thanks for the feedback.

Usage of this package should be encouraged, and putting it under the unicode directory might be hiding it too much.

Fair enough; I'm not sure that anyone who doesn't know what this is would use it / apply it properly anyways though, and if they know what it is they'll (probably?) look for it under this name.

Similarly, the name precis may be too obscure. Perhaps "golang.org/x/text/secure" (secure.Identifier, secure.FreeForm would be of type secure.Profile)?. Open to suggestions, though.

That's not bad, although that might give people a false impression about what the package should be used for; some classes (FreeForm) are much more about expressiveness than security. I'd hate people to get the wrong impression from the package name (or maybe I'm overthinking it). Either name is fine by me, however; I don't have any alternative suggestions.

At least while I'm developing it separately I'll keep it as precis, we can change it if we merge it in later.

The operations prescribed by RFC 7564 are inherently string-based, instead of rune-based:
Exposing this as Transforms (with the usual helper methods) would be more appropriate.
...
I would suggest not having any rune-based API for now (except for in an internal package, if needed).

Absolutely; that's actually on my radar. The current work has all been into generating the tables, and the Valid() methods were really just so that I could write some tests and have an easy way to test a few runes. I think I have an open issue right now for "define an actual API".

Tries seem a more natural data structure here. RangeTables can be used for now, but I would not expose them in the API.

Will do; I wouldn't have wanted to use an internal package unless I was working on this with the aim of contributing it to the text package, but now that that's the goal I'll look into it.

I'm a bit worried that this will possibly result in a third way to compare strings (collate, cases). OTOH, this clearly serves a purpose and is more appropriate than using the inherently low-level cases/width etc.

The Class type should not be an integer?:
Different classes may import different tables, but this design will pull all in unconditionally.
We should allow for user profiles, as the standard suggests. I would make this the same type.
This package may potentially pull in a huge amount of data (unicode/norm, cases, width, bidi (TODO), > etc.) . The API should be designed to drop tables if not needed. (very doable.)

Agreed.

Thanks for all your feedback. I'll start working on a proper API for the package this weekend.

@SamWhited
Copy link
Member Author

Out of curiosity, where is the appropriate place to discuss the package / design? Do you all tend to use the proposal, or the mailling list?

@SamWhited SamWhited changed the title proposal: add PRECIS package x/text/unicode/precis proposal: add PRECIS package x/text/ Nov 20, 2015
@SamWhited SamWhited changed the title proposal: add PRECIS package x/text/ proposal: add PRECIS package Nov 21, 2015
@SamWhited
Copy link
Member Author

Update: I've switched the implementation around to using the x/text/internal/gen and x/text/internal/triegen packages in the text-package branch. Further work will continue here.

Also, that API makes good sense to me and I've started implementing it. I'm not sure if it makes more sense to use the methods that @mpvl mentioned, which are consistent with a lot of names in other x/text packages, or if it makes more sense to name them as they're named in the RFC. Eg. I'm not sure which of these makes the most sense:

func (p *Profile) Enforce(s string) (string, error) {}
func (p *Profile) String(s string) (string, error) {}

func (p *Profile) PrepareString(s string) bool {}
func (p *Profile) ValidString(s string) bool {} // This was error above, but chances are we just want to check it. Maybe a separate API to get a reason? I'll think about it.

@mpvl
Copy link
Contributor

mpvl commented Nov 21, 2015

Neat. Note that you can also use x/text/internal/ucd to further simplify the code. It parses UCD files for you.

I think It makes more sense for you to submit directly in golang.org/x/text directly so we can make use of the Gerrit code review etc. Just specify in the package comments that the package is under full development and that the API may change.

As for the package name: the text repo has mostly adopted the convention of naming packages after functionality instead of standards. Noticeable exceptions are the packages for handling data formats (cldr, ucd). I agree secure.FreeForm is a bit confusing/misleading. Let's stick with precis for now and decide before the "under construction label" is removed. (Also precis is actually a word as well, related to text nonetheless, so this may be even more confusing).

As for naming, I find Enforce and PrepareString somewhat misleading names myself. Also, it is more likely that the reader of code understands Go convention than PRECIS convention. It should be clear in the documentation that the Go names are related to respective PRECIS names.

Also, I'm not entirely agreeing with the standard. Valid/Prepare should not just return false if a string contains characters disallowed by the underlying Class, but also, in this case, for characters disallowed by a profile or even a transformation associated with a profile (normalization, case/width folding, etc. always succeed, but enforcing single script might not). So not following PRECIS convention with naming has the additional benefit that a PRECIS expert will read the documentation to see what the method does exactly. So using Go names is a double win. :)

As for Valid returning a bool: I think the error would be useful, but it is indeed more consistent to get a bool. Note that the user could always get the error from the Transform, although this would not be as efficient.

@SamWhited
Copy link
Member Author

That convinces me; thanks.

I think It makes more sense for you to submit directly in golang.org/x/text directly

Wilco.

@SamWhited
Copy link
Member Author

I subtree merged my changes into the x/text package, but didn't want to bother learning the gerrit tool that Go appears to use. I'll figure out how to submit to Gerrit / add the "Under construction" label later when I'm being less lazy (or just git mail it and let someone else deal with Gerrit).

For now, I've moved my repo to https://bitbucket.org/SamWhited/golang-x-text (which is now a full mirror of the text package) and will try to remember to keep it updated in case anyone wants to see the progress.

@mpvl
Copy link
Contributor

mpvl commented Nov 23, 2015

Using Gerrit will allow me to give feedback on the code early. It will help catch issues early and avoids rewrites. But either way.

@SamWhited
Copy link
Member Author

@mpvl Are there instructions for how patches should be submitted online anywhere that don't require you to download some extra tool? Should I just mail a patch to golang-codereviews manually?

@ianlancetaylor
Copy link
Contributor

@SamWhited Sorry, we can only accept patches through the Gerrit interface. Please do not send a manual patch. Docs for contributing via Gerrit are at https://golang.org/doc/contribute.html .

@SamWhited
Copy link
Member Author

@ianlancetaylor So there's no way to do that w/o installing git-codereview? I couldn't find anything in the Gerrit interface about it.

@bradfitz
Copy link
Contributor

You don't "download" some extra tool. You "go get" it. That should be a low bar for anybody interested in contributing to Go.

@SamWhited
Copy link
Member Author

Ah, okay, sorry, not interested in contributing then. I'll keep hacking on my package as part of the x/text package and juse use a local version, and you're welcome to take it or leave it if you want. Thanks anyways.

@mpvl
Copy link
Contributor

mpvl commented Dec 16, 2015

I think we ultimately need to include this. It will have to go through Gerrit. Even if this may not happen now, we will need this functionality in the future. Reopening so we can keep track of this.

@SamWhited
Copy link
Member Author

Sorry, I should have followed up: Someone else informed me that it IS possible to submit to Gerrit manually without installing the extra dev tools. I will finish the package / do this at some point. Right now I'm trying to stop being lazy and rewrite the API (the options API didn't work out, because I was forgetting that the steps might be applied in different orders / at different stages, preparation, enforcement, etc. for different profiles).

Perhapse this weekend I'll rework it a bit and finish it up.

@bradfitz bradfitz reopened this Dec 16, 2015
@SamWhited
Copy link
Member Author

Went ahead and pushed the initial changes as requested, even though the API probably needs a rethink: https://go-review.googlesource.com/17933

@SamWhited
Copy link
Member Author

Thanks for the review!

I've fixed a number of things from the review (still haven't redesigned the API, so most of the things that weren't fixed were things that would be removed anyways) and want to update; what's the process for updating on Gerrit? The relavant section in the docs doesn't appear to have normal Git workflow instructions. If I push up a new detatched patch-set will it create a new review, or is it smart enough to update my old one?

This process is insane; I'd forgotten why I stopped using Gerrit years ago, they seem to have no clue how Git is supposed to work.

EDIT: Nevermind, figured out that that's what the change ID is for and updated the old commit. This is still insane, why in the world would Gerrit do this instead of just using a git ref and letting you update that (a branch or a tag)?

I've now added an explicit class struct that implements the runes.Set interface. This gives us a Contains(rune) method for the classes. Will work on the profile API next. I have submitted an errata to the RFC for the language that got me confused and made me think the current API would work in the first place.

@mpvl
Copy link
Contributor

mpvl commented Dec 21, 2015

On Sat, Dec 19, 2015 at 11:02 PM, Sam Whited notifications@github.com
wrote:

I've fixed a number of things from the review (still haven't redesigned
the API, so most of the things that weren't fixed were things that would be
removed anyways) and want to update; what's the process for updating on
Gerrit? If I push up a new detatched patch-set will it create a new review,
or is it smart enough to update my old one?

It will update the old one.

This process is insane; I'd forgotten why I stopped using this damn tool
years ago, they seem to have no clue how Git is supposed to work.

This process works well for us.


Reply to this email directly or view it on GitHub
#13336 (comment).

@SamWhited
Copy link
Member Author

This process works well for us.

Someone told me that about using CVS recently too. If you're going to use Git, why not actually use Git? Anyways, didn't mean to start a flame war, I know I can't change it (and, sadly, don't have a better recommendation anyways), but it's still ridiculous.

@SamWhited
Copy link
Member Author

I don't seem to be able to put followup comments on Gerrit, so I'll bring this back here again:

RE the API for defining what steps need to be performed in prepare:

It looks like we won't be able to just always perform width mapping as part of prepare (this is not standards compliant, and I have identified cases where it will break compatibility with PRECIS implementations in other languages, which rather defeats the purpose of using a framework like PRECIS if your clients/servers/etc. can't interop). So, I see a few options for the API that might be okay:

Option 1: Define a pre-prepare option as a function

PrePrepare(f func(s string) (string, error)) Option

Option 2: Define a pre-prepare option as a list of transformers

(this is what I was doing before)

PrePrepare(t ...transform.Transformer) Option

@SamWhited
Copy link
Member Author

For now, I've also implemented the Enforce method as far as it can go, and fixed the Allowed method to show why it's different from the class level Contains method. I'm thinking that maybe allowed and disallowed should be removed in favor of just having profile's be Set's too. Next thing on my radar is more tests; a few other implementors and I are currently discussing writing a standard set of test vectors and submitting them as an RFC as well; if anyone is interested, see the PRECIS WG mailing list.

This is probably ready for another round of review.

EDIT: Yah, not sure why I did Allowed and Disallowed for profile's in the first place. Replaced them with a Contains method.

EDIT 2: I'm also starting to wonder why we have an Enforce method. Might just make profile implement Transformer and use that for enforcement? I feel like there was a reason I didn't do that originally though which I'm not thinking of.

@mpvl
Copy link
Contributor

mpvl commented Dec 23, 2015

Allowed/Disallowed: IIUC, the set of allowed runes is not inverse of the set of disallowed runes. That's why probably.

@rsc
Copy link
Contributor

rsc commented Dec 28, 2015

It sounds like the proposal - the idea of adding precis support - has been accepted. Now this issue seems to be serving as code review. @SamWhited, why not reply on Gerrit?

@rsc
Copy link
Contributor

rsc commented Dec 28, 2015

Re your comment 4 days ago, @SamWhited as long as you put a Commit-Id line in each commit, then you can push as many commits as you want to Gerrit. They all turn into separate reviews. Honestly the simplest way to do this is 'go get golang.org/x/review/git-codereview' and then run 'git-codereview hooks' in your git repo. That will install a commit-msg hook that adds the Commit-Id line for you. You don't need to use any of the rest of the workflow if you want - raw git commands are just fine - but I imagine you'd get tired of making up random Commit-Id lines at some point.

@rsc rsc changed the title proposal: add PRECIS package x/text: add precis implementation Dec 28, 2015
@rsc rsc added this to the Unreleased milestone Dec 28, 2015
@SamWhited
Copy link
Member Author

@rsc I'll double check, I tried to push several commits (with commit ID's) and I think it just told me that I had to squash first. If not, that would be far better! (although I still don't understand the need for commit ID's when Git already has commit ID's, and patch sets [aka branches/tags/any other ref], etc.)

Using the pre-commit hook sounds good; thanks. I'll start making separate reviews after the major body of the work is reviewed / merged.

RE: replying on Gerrit, I think I did reply to everything, no? I just wanted to leave updates here to keep this issue up to date; if these issues are just meant for proposals and not to track the work though, I can stop. I'm still figuring out the workflow.

EDIT: Ah, just saw your thread on go-dev talking about this and saying you'd update the contribution guidelines. Many thanks!

@SamWhited
Copy link
Member Author

@rsc just tried the pre-commit hooks you mentioned. With them, it won't even let me commit on my own branch.

git-codereview: cannot commit on precis branch

@rsc
Copy link
Contributor

rsc commented Dec 29, 2015

@SamWhited, depending on your setup that check may not be appropriate. (It is for the typical workflow we use.) If you zero the file .git/hooks/pre-commit then that check won't run anymore.

@SamWhited
Copy link
Member Author

Ah, many thanks; I guess I should have gone and actually looked at what the tool installed; didn't realize they were separate hooks (though in retrospect I suppose they'd have to be).

@SamWhited
Copy link
Member Author

@mpvl ping; any chance for another review soon (I know you're probably busy what with the upcomming 1.6 release, just wanted to keep this issue on your radar)?

@SamWhited
Copy link
Member Author

Fixed the issues from the last review, but was having some trouble with Gerrit's single-commit / commit-id bullshit; if anything looks off please let me know and I'll close that CL and open a new one.

@mpvl
Copy link
Contributor

mpvl commented Jan 25, 2016

Sorry for the delay.

It is a bit tricky with that many patch sets, but doable. So no need to do
another CL.

Note that it is perfectly fine to locally accumulate commits (e.g. using
--amend or rebasing later) and then mail them as a single patch update.

I have a few minor comments left.

On Fri, Jan 15, 2016 at 5:49 PM, Sam Whited notifications@github.com
wrote:

Fixed the issues from the last review, but was having some trouble with
Gerrit's single-commit / commit-id bullshit; if anything looks off please
let me know and I'll close that CL and open a new one.


Reply to this email directly or view it on GitHub
#13336 (comment).

@SamWhited
Copy link
Member Author

I have a few minor comments left.

Many thanks, those are now fixed. Once this is merged I'll add a new CL for the profiles that we took out and try to keep future CL's a bit smaller.

@SamWhited
Copy link
Member Author

Merged: https://godoc.org/golang.org/x/text/unicode/precis

EDIT: I don't appear to be able to update the milestone, if someone could do that I'd appreciate it. /cc @mpvl @rsc

@noway
Copy link

noway commented Feb 13, 2016

As for someone unfamiliar with the release schedule and such, when it will be available? Right now the link https://godoc.org/golang.org/x/text/unicode/precis says not found.

@mpvl
Copy link
Contributor

mpvl commented Feb 13, 2016

The package is available, but has been moved to x/text/secure/precis.

On Sat, Feb 13, 2016 at 2:39 PM, Way, No notifications@github.com wrote:

As for someone unfamiliar with the release schedule and such, when it will
be available? Right now the link
https://godoc.org/golang.org/x/text/unicode/precis says not found.


Reply to this email directly or view it on GitHub
#13336 (comment).

@SamWhited
Copy link
Member Author

As for someone unfamiliar with the release schedule and such, when it will be available? Right now the link https://godoc.org/golang.org/x/text/unicode/precis says not found.

As mpvl said, this has been moved to x/text/secure/precis. Keep in mind that the API may still be changed or moved in other ways (eg. profiles moved out into the parent package).

@mpvl: Could you update the milestone on this issue? Thanks.

@mpvl mpvl added Testing An issue that has been verified to require only test changes, not just a test failure. and removed Proposal-Accepted labels Feb 13, 2016
@mpvl mpvl modified the milestones: Go1.7, Unreleased Feb 13, 2016
@mpvl
Copy link
Contributor

mpvl commented Feb 13, 2016

I don't know if any of the go core milestones make sense here. I've cleared the field to avoid confusion.

@noway
Copy link

noway commented Feb 14, 2016

Cool, thanks! Looking forward for BIDI rules.

@mpvl
Copy link
Contributor

mpvl commented Jun 10, 2016

Update: context rules and bidi rule are now implemented. Performance has also been greatly enhanced. The package should be conform the spec now. It it is not, please let me know.

@SamWhited
Copy link
Member Author

I am currently writing a draft for the working group of PRECIS test vectors; I'll check the package against what I have so far and report back if anything is broken.

@golang golang locked and limited conversation to collaborators Jun 10, 2017
AlexanderYastrebov pushed a commit to AlexanderYastrebov/go that referenced this issue Oct 3, 2021
Adds an implementation of the PRECIS framework (RFC 7564) and related
profiles (RFC 7613, RFC 7700).

Fixes golang#13336

Change-Id: I5db5601911c45c43fff9f44ca8d8142f795ad49d
Reviewed-on: https://go-review.googlesource.com/17933
Reviewed-by: Marcel van Lohuizen <mpvl@golang.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge Proposal Testing An issue that has been verified to require only test changes, not just a test failure.
Projects
None yet
Development

No branches or pull requests

7 participants