Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: incorrect handling nodes in the <head> section #23064

Open
jdeng opened this issue Dec 8, 2017 · 6 comments
Open

x/net/html: incorrect handling nodes in the <head> section #23064

jdeng opened this issue Dec 8, 2017 · 6 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@jdeng
Copy link

jdeng commented Dec 8, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go1.9.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"

What did you do?

package main

import (
	"bytes"
	"fmt"
	"golang.org/x/net/html"
	"strings"
)

func main() {
	s := "<html><head><meta/><a/><style></style></head><body></body></html>"
	doc, _ := html.Parse(strings.NewReader(s))

	var buf bytes.Buffer
	if err := html.Render(&buf, doc); err == nil {
		fmt.Println(buf.String())
	}
}

What did you expect to see?

Same structure as the input

 <html><head><meta/><a/><style></style></head><body></body></html> 

What did you see instead?

Nodes from are moved into the part

 <html><head><meta/></head><body><a><style></style></a></body></html> 
@gopherbot gopherbot added this to the Unreleased milestone Dec 8, 2017
@jdeng
Copy link
Author

jdeng commented Dec 9, 2017

Adding the default handling code in parser.go/inHeadIM helped to solve this but I am not sure whether there are side effects.

    case StartTagToken:
        switch p.tok.DataAtom {
  ...
        case a.Head:
            // Ignore the token.
            return true
        default:
            // Ignore the token.
            return true
        }

@titanous titanous added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 9, 2017
@namusyaka
Copy link
Member

namusyaka commented Dec 9, 2017

The a element is not metadata content. So I think the behavior is not kindness but not wrong, because it is not treated as a child of the head element.
I think there are two approaches to this problem.

  1. The first is to return a parse error for such illegal html, it's the ease to understand, but it will largely destroy backward-compatibility.
  2. The second is to make the document about Parse more kind and prompt warning.

@jdeng
Copy link
Author

jdeng commented Dec 9, 2017

Thanks. I think it is wrong because it misplaces the following <style> node silently.
Rejecting the HTML because of unrecognized nodes in the head section seems a little too strict and it will be a problem for quite some Microsoft office generated HTML docs.

@namusyaka
Copy link
Member

Certainly, it is a problem that the style element is unintentionally placed. But that is due to the fact that the style element is handled as a child of the a element.
I think your expectation value looks incorrect. Any another ideas?

The current implementation begins with parsing the head element when html and head elements are omitted. If we deal with this problem properly we have to be able to properly parse the HTML as follows.

"<head><a/><style></style>" and "<a/><style></style>"

@namusyaka
Copy link
Member

I'll visit again after resolveing #23071.

@namusyaka
Copy link
Member

Sorry for the delay and my confusing comment.
While investigating this issue, I found a fact that the behavior is the same with chrome's one.

So please let me change my opinion. This is an expected behavior, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants