Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: Weird behavior with comment parsing #37771

Closed
empijei opened this issue Mar 10, 2020 · 2 comments
Closed

x/net/html: Weird behavior with comment parsing #37771

empijei opened this issue Mar 10, 2020 · 2 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@empijei
Copy link
Contributor

empijei commented Mar 10, 2020

This bug was found by fuzzing. See master issue #27848

What version of Go are you using (go version)?

1.14

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

Not relevant

What did you do?

https://play.golang.org/p/kSpNN8dsXkk

func main() {
	Render([]byte("<html><!--!--> <head></head><body></body></html>"))
	fmt.Println()
	Render([]byte("<html><!--! --><head></head><body></body></html>"))
}

func Render(b []byte) {
	nodes, err := html.ParseFragment(bytes.NewReader(b), nil)
	if err != nil {
		panic(err)
	}
	var sb strings.Builder
	for _, n := range nodes {
		if err := html.Render(&sb, n); err != nil {
			fmt.Println(err)
			panic("cannot render parsed HTML")
		}
	}
	fmt.Println(sb.String())
}

What did you expect to see?

<html><!--!--> <head></head><body></body></html>

<html><!--! --><head></head><body></body></html>

What did you see instead?

<html><!--!--> <head></head><body></body></html>--><head></head><body></body></html>

<html><!--! --><head></head><body></body></html>

Comment

I find this to be a rather odd behavior. It looks like the bang after the open comment tag confuses the parser into believing the entire rest of the string is still inside the comment. I don't know if this is intended but I don't think it should be.

Modern browsers seem to agree with my assessment: https://www.w3schools.com/code/tryit.asp?filename=GCOZ59QJSL6W

Edit: this happens also with "<" and other symbols.

/cc @katiehockman @FiloSottile @LMMilewski

@gopherbot gopherbot added this to the Unreleased milestone Mar 10, 2020
@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 10, 2020
@ewjoachim
Copy link

ewjoachim commented Oct 6, 2022

Here is a simpler version that triggers the same bug with the tokenizer, so the issue is with the tokenization and not the parsing:

package main

import (
	"fmt"
	"strings"

	"golang.org/x/net/html"
)

func main() {

	text := "<!--!--><h1>Hey</h1>"

	tkn := html.NewTokenizer(strings.NewReader(text))

	for {

		tt := tkn.Next()
		if tt == html.ErrorToken {
			return
		}
		fmt.Println(tt)
		t := tkn.Token()
		fmt.Println("  ", t.Data)

	}
}

Expected output:

Comment
   !
StartTag
   h1
Text
   Hey
EndTag
   h1

Actual output:

Comment
   !--><h1>Hey</h1>

@gopherbot
Copy link

Change https://go.dev/cl/442496 mentions this issue: html: properly handle exclamation marks in comments

@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 13, 2022
WeiminShang added a commit to WeiminShang/net that referenced this issue Nov 16, 2022
Properly handle the case where HTML comments begin with exclamation
marks and have no other content, i.e. "<!--!-->". Previously these
comments would cause the tokenizer to consider everything following to
also be considered part of the comment.

Fixes golang/go#37771

Change-Id: I78ea310debc3846f145d62cba017055abc7fa4e0
Reviewed-on: https://go-review.googlesource.com/c/net/+/442496
Run-TryBot: Roland Shoemaker <roland@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Damien Neil <dneil@google.com>
@golang golang locked and limited conversation to collaborators Oct 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants