Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: invalid parsing of html fragment with self-closing iframe #22298

Closed
urandom opened this issue Oct 16, 2017 · 2 comments
Closed

x/net/html: invalid parsing of html fragment with self-closing iframe #22298

urandom opened this issue Oct 16, 2017 · 2 comments

Comments

@urandom
Copy link

urandom commented Oct 16, 2017

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go 1.9.1

Does this issue reproduce with the latest release?

Yes, including the up-to-date x/net/html dependency

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/urandom/.go"
GORACE=""
GOROOT="/usr/lib/go"
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build316910682=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

play link:

https://play.golang.org/p/DViZItBoV5

The above example illustrates the problem. There is one self-closing iframe within the snippet. After parsing and then subsequent rendering, the self-closing tag is replaced by an open one, and a closing tag is placed in an incorrect place. In the above example, the following output is produced:

<html><head></head><body><div xmlns="http://www.w3.org/1999/xhtml"><p>If you’re running Wayland and have a touchpad capable of multi-touch, Builder (Nightly) now lets you do fun stuff like the following video demonstrates.</p>
<p>Just three-finger-swipe left or right to move the document. Content is sticky-to-fingers, which is my expectation when using gestures.</p>
<p>It might also work on a touchscreen, but I haven’t tried.</p>
<p><iframe allowfullscreen="allowfullscreen" frameborder="0" height="495" src="https://www.youtube.com/embed/gRRfjcGFcbw?feature=oembed" width="660"></p></div>
</iframe></p></div></body></html>
@tombergan
Copy link
Contributor

tombergan commented Nov 22, 2017

I think this is actually correct. <iframe> is not a self-closing tag. If I copy your original input into https://validator.w3.org/#validate_by_input, nested into <html><body>...</body></html>, I get the following error:

end tag for "IFRAME" omitted, but its declaration does not permit this

Wading through the HTML spec, I see:

https://html.spec.whatwg.org/#acknowledge-self-closing-flag

When a start tag token is emitted with its self-closing flag set, if the flag is not acknowledged when it is processed by the tree construction stage, that is a non-void-html-element-start-tag-with-trailing-solidus parse error.

https://html.spec.whatwg.org/#parse-error-non-void-html-element-start-tag-with-trailing-solidus

This error occurs if the parser encounters a start tag for an element that is not in the list of void elements or is not a part of foreign content (i.e., not an SVG or MathML element) that has a U+002F (/) code point right before the closing U+003E (>) code point. The parser behaves as if the U+002F (/) is not present.

The parser should treat everything after <iframe/> as if the element was not self-closing. According to https://html.spec.whatwg.org/#parsing-main-inbody, the tokens after <iframe> should be parsed as text, which I believe is https://html.spec.whatwg.org/#parsing-main-incdata. If you look at the structure of the tree, what actually happens is that the text after the <iframe/> gets parsed into the following TextNode:

html.Node{
  Parent:(*html.Node)(0xc4200ea690),
  FirstChild:(*html.Node)(nil),
  LastChild:(*html.Node)(nil),
  PrevSibling:(*html.Node)(nil), NextSibling:(*html.Node)(nil),
  Type:0x1,  // TextNode
  DataAtom:0x0,
  Data:"</p></div>\n\t\t",
  Namespace:"",
  Attr:[]html.Attribute(nil),
}

That is, everything after the <iframe/> gets parsed as raw text. This explains the duplicate </p></div> in the output. For comparison, I loaded your original HTML in Chrome and called document.body.outerHTML in the console. I get almost exactly the same thing:

"<body><div xmlns="http://www.w3.org/1999/xhtml"><p>If you’re running Wayland and have a touchpad capable of multi-touch, Builder (Nightly) now lets you do fun stuff like the following video demonstrates.</p>
<p>Just three-finger-swipe left or right to move the document. Content is sticky-to-fingers, which is my expectation when using gestures.</p>
<p>It might also work on a touchscreen, but I haven’t tried.</p>
<p><iframe allowfullscreen="allowfullscreen" frameborder="0" height="495" src="https://www.youtube.com/embed/gRRfjcGFcbw?feature=oembed" width="660">&lt;/p&gt;&lt;/div&gt;
</iframe></p></div></body>"

The only difference is that Chrome escapes the text inside the <iframe>. I don't know if that's necessary (there could be a bug in rendering), but it seems to me that the parser is correct.

@tombergan
Copy link
Contributor

The only difference is that Chrome escapes the text inside the <iframe>. I don't know if that's necessary (there could be a bug in rendering), but it seems to me that the parser is correct.

I don't think this is a bug. ParseFrament(originalHTML)) produces the same tree as Parse(Render(ParseFrament(originalHTML)), so it seems like the <iframe> text content does not need to be escaped. Closing because everything appears to work as intended.

@golang golang locked and limited conversation to collaborators Nov 22, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants