-
Notifications
You must be signed in to change notification settings - Fork 18k
x/net/html: Parse handles custom self-closing tags differently to all major browsers #30961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
An additional observation. If you swap the illegal <!doctype html>
<html>
<head></head>
<body>
<tag1>
<div />
<p>one</p>
</tag1>
<div>two</div>
</body>
</html> Then Chrome/Safari/Firefox all exhibit a similar behaviour to Go (though note the 'two' I don't know the spec well enough to understand why that would be the case and why all the browsers agree with one another on these two use cases, but Go disagrees on one. |
One final example, with the original example but now with the illegal <!doctype html>
<html>
<head></head>
<body>
<div>
<tag2 />
<p>one</p>
</div>
<div>two</div>
</body>
</html> In this instance Go parses it correctly: <!doctype html>
<html>
<head></head>
<body>
<div>
<tag2>
<p>one</p>
</tag2>
</div>
<div>two</div>
</body>
</html> So the erroneous handling seems to come about only when there is an non-standard tag wrapping a non-standard self-closing tag. |
The existing implementation behaves differently to all major browsers, for the instance where a self-closing element of an unknown tag type is the child of another element of an unknown tag type. Unclear if the spec is specific here. Issue golang/go#30961 has several scenarios, of which Go diverges in behaviour on one (added as a test in this commit). Fixes golang/go#30961
Change https://golang.org/cl/168638 mentions this issue: |
I've submitted a PR which fixes this issue whilst passing all the existing go and webkit unit tests. From my examples above it seems the However, I don't have a detailed understanding of the spec, so this likely needs some tweaks / overhauling. I'll keep reading the spec and update this if I get a clearer understanding. |
After some more reading, I believe I have pinpointed this better. I believe the issue is the func (p *parser) inBodyEndTagOther(tagAtom a.Atom) {
for i := len(p.oe) - 1; i >= 0; i-- {
if p.oe[i].DataAtom == tagAtom {
p.oe = p.oe[:i]
break
}
if isSpecialElement(p.oe[i]) {
break
}
}
} I need to have a think about how best to update my PR. |
I dropped some comments on https://go-review.googlesource.com/c/net/+/168638/5#message-4ac4bc26800e097755388cd7530f897286537176 |
I've responded in Gerrit and updated the PR. Thanks! :) |
The existing implementation behaves differently to all major browsers, for the instance where a self-closing element of an unknown tag type is the child of another element of an unknown tag type. Unclear if the spec is specific here. Issue golang/go#30961 has several scenarios, of which Go diverges in behaviour on one (added as a test in this commit). Fixes golang/go#30961
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I ran this program:
https://play.golang.org/p/JfyssBdfzFH
Which parses this snippet of html then re-stringifies it:
What did you expect to see?
I'd expect the custom
tag2
tag to wrap thep
tag (as a custom tag cannot be self-closing according to the spec - this is discussed in issue #26566), but everything at a higher level in the tree (includingtag1
) to remain in their respective locations:What did you see instead?
The
p
tag is wrapped as we would expect, but thediv
tag that was a sibling oftag1
is now a child oftag1
:This doesn't make sense to me - the self-closing
tag2
is in violation of the spec, but has been turned into a properly closed tag following sensible assumptions about what should become children of it (all its siblings). It makes no sense that after being closed there are further alterations to subsequent parts of the tree.The Go output is also in disagreement to how all the major browsers handle the same HTML code:
Chrome:
Firefox:
Safari:
Go script output:
The text was updated successfully, but these errors were encountered: