Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html: Duplicate input provided by Tokenizer.Raw when tokenization errors occur #7029

Closed
CSEMike opened this issue Dec 30, 2013 · 1 comment
Closed

Comments

@CSEMike
Copy link

CSEMike commented Dec 30, 2013

Tokenizer.Raw is intended to provide the unmodified text of the current token --
http://godoc.org/code.google.com/p/go.net/html#Tokenizer.Raw

But, when a tokenization error occurs, the raw text may be duplicated. For example:

    z := NewTokenizer(strings.NewReader("foo  bar"))
    tt := z.Next()
    fmt.Printf("%v: %q\n", tt, string(z.Raw()))
    tt = z.Next()
    fmt.Printf("%v: %q\n", tt, string(z.Raw()))

duplicates "foo  bar" in both a text and error token:

    Text: "foo  bar"
    Error: "foo  bar"

The concatenated results of z.Raw() should reproduce the original input without
duplication.

The fix is adjust the way Next updates the raw and data spans. I'll send a change for
this shortly.
@bradfitz
Copy link
Contributor

bradfitz commented Jan 2, 2014

Comment 1:

This issue was closed by revision golang/net@480e7b0.

Status changed to Fixed.

@CSEMike CSEMike added the fixed label Jan 2, 2014
@mikioh mikioh added repo-net and removed repo-net labels Dec 23, 2014
@mikioh mikioh changed the title go.net/html: Duplicate input provided by Tokenizer.Raw when tokenization errors occur html: Duplicate input provided by Tokenizer.Raw when tokenization errors occur Jan 4, 2015
@golang golang locked and limited conversation to collaborators Jun 25, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants