Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: add Escape/Unescape transformers #22585

Open
SamWhited opened this issue Nov 5, 2017 · 6 comments
Open

x/net/html: add Escape/Unescape transformers #22585

SamWhited opened this issue Nov 5, 2017 · 6 comments

Comments

@SamWhited
Copy link
Member

SamWhited commented Nov 5, 2017

The golang.org/x/net/html package contains two functions similar to the ones in the html package for escaping and unescaping HTML entities:

func EscapeString(string) string
func UnescapeString(string) string

unfortunately these require loading entire documents into memory and converting them to a string before attempting to escape them.

It would be nice if there were also an implementation of the Transformer (and SpanningTransformer) interface from golang.org/x/text/transform that could perform escaping / unescaping on long byte streams without requiring buffering the entire stream into memory.

This could either be two functions which return transformers:

// Escaper returns a transformer that escapes special characters.
// See EscapeString for more information.
func Escaper() transform.SpanningTransformer

// Unescaper returns a transformer that unescapes special characters.
// See UnescapeString for more information.
func Unescaper() transform.SpanningTransformer

or a Transformer type which contains all the various helper methods (String, Bytes, etc.) that transformer based packages in the text tree have.

/cc @mpvl

@SamWhited SamWhited added this to the Proposal milestone Nov 5, 2017
@rsc
Copy link
Contributor

rsc commented Nov 6, 2017

/cc @nigeltao

@nigeltao
Copy link
Contributor

Sounds reasonable to me.

Sorry for the late reply. I also don't have much spare time to work on this myself.

@rsc rsc modified the milestones: Proposal, Unreleased Jan 29, 2018
@rsc rsc changed the title proposal: golang.org/x/net/html: add Escape/Unescape transformers golang.org/x/net/html: add Escape/Unescape transformers Jan 29, 2018
@mpvl
Copy link
Contributor

mpvl commented Feb 21, 2018

Sounds reasonable.
Most packages in x/text use a wrapper type with convenience methods. I found this works best for most users.

@bradfitz bradfitz changed the title golang.org/x/net/html: add Escape/Unescape transformers x/net/html: add Escape/Unescape transformers Feb 21, 2018
@SamWhited
Copy link
Member Author

@mpvl @nigeltao do have a preferred API? I'd forgotten about this, and have since come up with a workaround for the project I needed it for, but will attempt to find the time to submit a PR.

@mpvl
Copy link
Contributor

mpvl commented Feb 21, 2018

Not at the moment, but something along the lines of (all x/text) width.Transformer or cases.Caser makes sense. Using the encoding.Encoding approach seems overkill, but may be worth it if you have a variety of matching (un)escapers.

The trickiness is in the implementation. NewTransformer(FromFunc) in https://godoc.org/github.com/mpvl/textutil could help get you started. It has an example escaping and unescaping implementation. Not as efficient as implementing from scratch, but not too bad either and definitely handy for prototyping.

@SamWhited
Copy link
Member Author

SamWhited commented Feb 21, 2018

The trickiness is in the implementation. NewTransformer(FromFunc) in https://godoc.org/github.com/mpvl/textutil could help get you started. It has an example escaping and unescaping implementation. Not as efficient as implementing from scratch, but not too bad either and definitely handy for prototyping.

Sounds good; I've actually started implementing this in the past but had trouble convincing the tests to pass, so if I give it another shot maybe I'll start with that. For me the speed wasn't as important as not buffering (very-) large documents into memory. I've done some much simpler escaping and unescaping in my XMPP address package, and that was quite a pain to get right from the ground up, so hopefully this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants