You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tokenizer.Raw is intended to provide the unmodified text of the current token --
http://godoc.org/code.google.com/p/go.net/html#Tokenizer.Raw
But, when a tokenization error occurs, the raw text may be duplicated. For example:
z := NewTokenizer(strings.NewReader("foo bar"))
tt := z.Next()
fmt.Printf("%v: %q\n", tt, string(z.Raw()))
tt = z.Next()
fmt.Printf("%v: %q\n", tt, string(z.Raw()))
duplicates "foo bar" in both a text and error token:
Text: "foo bar"
Error: "foo bar"
The concatenated results of z.Raw() should reproduce the original input without
duplication.
The fix is adjust the way Next updates the raw and data spans. I'll send a change for
this shortly.
The text was updated successfully, but these errors were encountered:
mikioh
changed the title
go.net/html: Duplicate input provided by Tokenizer.Raw when tokenization errors occur
html: Duplicate input provided by Tokenizer.Raw when tokenization errors occur
Jan 4, 2015
The text was updated successfully, but these errors were encountered: