Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: readRawOrRCDATA() parsing issue when text ends with '<' #20741

Open
sprungknoedl opened this issue Jun 21, 2017 · 0 comments
Open
Milestone

Comments

@sprungknoedl
Copy link

What version of Go are you using (go version)?

go version go1.8.3
golang.org/x/net/html: 057a25b06247e0c51ba15d8ae475feb2fcb72164

What operating system and processor architecture are you using (go env)?

linux/amd64

What did you do?

Tried to parse HTML that contains a textarea that ends with <. Minimal example below:

package main

import (
	"fmt"
	"strings"

	"golang.org/x/net/html"
)

func main() {
	example := `<textarea>ends with <</textarea>`
	z := html.NewTokenizer(strings.NewReader(example))

	for {
		tt := z.Next()
		switch tt {
		case html.ErrorToken:
			return

		case html.TextToken:
			txt := z.Text()
			fmt.Printf("$%s$ ", txt)

		case html.StartTagToken:
			tn, _ := z.TagName()
			fmt.Printf("<%s> ", tn)

		case html.EndTagToken:
			tn, _ := z.TagName()
			fmt.Printf("</%s> ", tn)
		}
	}
}

What did you expect to see?

<textarea> $ends with <$ </textarea>

What did you see instead?

<textarea> $ends with <</textarea>$

Possible solution

This diff fixes the problem and passes all tests, but not sure it is correct in all cases and according to the HTML spec:

diff --git a/html/token.go b/html/token.go
index 893e272..41cb76f 100644
--- a/html/token.go
+++ b/html/token.go
@@ -347,6 +347,7 @@ loop:
                        break loop
                }
                if c != '/' {
+                       z.raw.end--
                        continue loop
                }
                if z.readRawEndTag() || z.err != nil {
diff --git a/html/token_test.go b/html/token_test.go
index 20221c3..f8e3fdf 100644
--- a/html/token_test.go
+++ b/html/token_test.go
@@ -254,6 +254,11 @@ var tokenTests = []tokenTest{
                "<textarea>$&lt;div&gt;$</textarea>",
        },
        {
+               "textarea ends with '<'",
+               "<textarea><</textarea>",
+               "<textarea>$&lt;$</textarea>",
+       },
+       {
                "title with tag and entity",
                "<title><b>K&amp;R C</b></title>",
                "<title>$&lt;b&gt;K&amp;R C&lt;/b&gt;$</title>",
@gopherbot gopherbot added this to the Unreleased milestone Jun 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants