New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
x/net/html: nested <a> parsing issue #18865
Comments
I can't remember if that original HTML is malformed because there are nested It's not totally obvious, but start in
In particular, the second one gives this example:
which sounds similar. In any case, the parse result (the DOM) is also what the Chrome browser yields. See the attached screenshot with Chrome's "inspect element mode" on: |
@nigeltao No. Maybe you have some old Chrome, because the newest one (56.0.2924.87 (64-bit) on OS X) doesn't insert that |
@nigeltao I've just noticed also that the original site is not HTML5, but XHTML 1.0, so I've changed my test code to: package main
import (
"fmt"
"os"
"strings"
"golang.org/x/net/html"
)
const data = `
<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body>
<a class="myclass" href="/foo/bar">
...here is some stuff...s
<div><a href="/baz/baz">some stuff</a></div>
...here is also some stuff...
</a>
</body></html>
`
func main() {
root, e := html.Parse(strings.NewReader(data))
if e != nil {
fmt.Println(e)
return
}
html.Render(os.Stdout, root)
} and it's still bugged (duplicated Here is what Chrome (56.0.2924.87 (64-bit) on OS X) shows: This is clearly bug on your side, as you can see - even Chrome renders it properly without duplicating. So please reopen this issue @nigeltao |
Huh, that's weird. My Chrome version is also "56.0.2924.87 (Official Build) (64-bit)". Mine is Linux, not OS X, but I'd be very surprised if that's a significant difference. How are you loading that HTML file into Chrome? Are you loading a file:///foo/bar URL, or are you serving it over HTTP (with some HTTP headers)?? |
Textual version:
|
It's a mystery why your Chrome's DOM tree is showing something different. Can you attach (maybe as a .zip file?) the exact HTML file you're loading in your browser? |
Ok, I've found out that there was additional "" in my testing HTML, now it renders it like that in Chrome. Thanks for your support, I'll then just have to put my results into some set and not list to get rid of duplicates. |
What version of Go are you using (
go version
)?go version go1.7.4 darwin/amd64
What operating system and processor architecture are you using (
go env
)?OS X 10.11.6 (amd64)
What did you do?
I was using library for selecting html nodes (based on jQuery selection string) and it always returned me duplicated
<a>
nodes if inside them were<div>
with another<a>
(this is how webpage I'm scanning is built - I cannot change it). Creator of that library told me (PuerkitoBio/goquery#150) that this bug is becausex/net/html
acts weirdly on my input and I should report it here.My code - https://play.golang.org/p/MJ33wgyDjG
What did you expect to see?
What did you see instead?
Conclusion
As you can see, your Parser does duplicate my
<a class="myclass" href="/foo/bar">
node while parsing.The text was updated successfully, but these errors were encountered: