-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/html: tokenizer error #34281
Comments
/cc @namusyaka @nigeltao |
Yeah, it's probably a bug in the HTML tokenizer. It's been a while since I've looked at https://www.w3.org/TR/html52/syntax.html#tokenization. Somebody would need to figure out how it maps back to the token.go code and therefore what the spec-compliant fix is. I don't have a lot of spare time right now. Sorry. |
Thanks for the feedback. Neither do I but I'll see if I can find some time to work on it. |
Change https://golang.org/cl/196620 mentions this issue: |
@nigeltao Since we haven't conformed the spec in the tokenizer implementation, I've reviewed the CL and suggested the quick fix considering the whatwg spec. This is the best effort totally. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I was extracting titles and meta tags from webpages and I found out that, if a title tag contains a
<
at the end, the tokenizer cannot tell when the text tag ends and the closing tag starts.https://play.golang.org/p/KO2-PEfpccQ
What did you expect to see?
What did you see instead?
The text was updated successfully, but these errors were encountered: