-
Notifications
You must be signed in to change notification settings - Fork 18k
x/net/html: 'noscript' inner HTML parsing issue #16318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Same here Mac OS 10.11.5
|
Here is the fix but I need someone who completely understand how parser and tokenizer works as I'm not sure if there is no side effects of my changes. Also I'm not sure if it works as it should in edge case tests like |
According to W3C, |
@bearburger the test changes for that seem wrong: I'm pretty sure that
should be parsed as
|
@riking you are right - iframe can't include other tags. |
I meant that the
|
Is there a workaround I can use until this is resolved? Should I recursively tokenize |
|
That is such a bad solution that I am tempted to file an issue on the project linked about it. It will not capture every case (uppercase, attributes, whitespace, etc.) and it will capture false positives. A solution must exist that does not attempt to transform input outside the scope of x/net/html. |
I have a package affected by this issue as well. The simplest solution is to just feed the content back into the parser: https://github.com/nathan-osman/go-sechat/blob/563300db9224ed01797dfe1f89f489e8eb9dd5ba/auth.go#L65 Assuming this bug is eventually fixed, I'll need to come up with a way to determine whether the recursive parsing is necessary at runtime. |
Is this accepted as a valid bug? I just found that noscript children are not parsed and I've found this issue. |
Discovered this yesterday working with goquery. While I could parse the noscript tag contents (get |
I stumbled upon this as well, everything under in my document is returned as one text node. |
It's odd that 12.2.6.4.5 isn't implemented in parse.go Ah, upon further inspection, parser assumes scripting is always enabled. There is no support for scripting = false (that I can see). Therefore, this behavior (to treat noscript contents as raw text) is correct. https://html.spec.whatwg.org/multipage/scripting.html#the-noscript-element The ask, then, is that parse.go supports scripting disabled mode. |
Even if the parser does assume scripting is enabled the behaviour is still wrong. The spec says "it represents nothing if scripting is enabled", and it's being represented as a TextNode by the parser. The documentation for the package also doesn't state anywhere that it acts as if scripting is enabled (which is the opposite of what most people would for assume for a Go HTML parser given the lack of javascript engines written in Go.) |
Change https://golang.org/cl/172557 mentions this issue: |
In the spec 12.2.6.4.5, the "in head noscript" insertion mode is defined. However, this package and its parser doesn't have the insertion mode, because the scripting=false case is not considered currently. This commit adds a test and a support for the "in head noscript" insertion mode. This change has no effect on the actual behavior. Updates golang/go#16318 Change-Id: I9314c3342bea27fa2acf2fa7d980a127ee0fbf91 Reviewed-on: https://go-review.googlesource.com/c/net/+/172557 Reviewed-by: Nigel Tao <nigeltao@golang.org>
This behavior can be considered correct given that scripting is always true. |
Isn't scripting being true itself a deviation from the spec? |
Scripting enabled/disabled should be configurable in order to support emulating different browser contexts. https://html.spec.whatwg.org/multipage/webappapis.html#enabling-and-disabling-scripting |
Yes, I'm sorry if my previous comment made you confusing, but I was thinking about how to make the |
Yeah, we should have some way to configure scripting. I'd also like to keep API compatibility, and room for future configuration options. Perhaps some new functions like
Call sites would look like:
The "options are functions" pattern here is the same as e.g. https://github.com/golang/text/blob/master/internal/ucd/ucd.go#L134 |
Change https://golang.org/cl/174157 mentions this issue: |
@nigeltao I just ran into this issue and investigated it and if I see it right, the fix is not complete yet. It just fixes parsing According to the specs the content of
I may create a pull request to fix this, but have to work through the Contribution Guide first next week. |
What version of Go are you using (
go version
)?What operating system and processor architecture are you using (
go env
)?What did you do?
What did you expect to see?
What did you see instead?
The text was updated successfully, but these errors were encountered: