Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: html.Parse skips nodes #26972

Closed
empijei opened this issue Aug 13, 2018 · 1 comment
Closed

x/net/html: html.Parse skips nodes #26972

empijei opened this issue Aug 13, 2018 · 1 comment

Comments

@empijei
Copy link
Contributor

empijei commented Aug 13, 2018

What version of Go are you using (go version)?

go1.9.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/clap/go"
GORACE=""
GOROOT="/usr/lib/go-1.9"
GOTOOLDIR="/usr/lib/go-1.9/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build659916345=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

Run the following code:

package main
import (
	"fmt"
	"strings"
	"golang.org/x/net/html"
)
func main() {
	n, _ := html.Parse(strings.NewReader(`<html><head></head><body><tr><td><pre>some text</pre></td></tr></body></html>`))
	fmt.Println(n.FirstChild.Data) // html
	fmt.Println(n.FirstChild.FirstChild.Data) // head
	fmt.Println(n.FirstChild.FirstChild.NextSibling.Data) // body
	fmt.Println(n.FirstChild.FirstChild.NextSibling.FirstChild.Data) // "pre" but expected tr
	fmt.Println(n.FirstChild.FirstChild.NextSibling.FirstChild.FirstChild.Data) // "some text" but expected td
}

What did you expect to see?

html
head
body
tr
td

What did you see instead?

html
head
body
pre
some text
@gopherbot gopherbot added this to the Unreleased milestone Aug 13, 2018
@empijei
Copy link
Contributor Author

empijei commented Aug 13, 2018

Found out that the

and tags are mandatory: the Parse function skips all unexpected/invalid tags.

For future reference, the following snippet behaves as expected:

package main

import (
	"fmt"
	"strings"

	"golang.org/x/net/html"
)

func main() {
	n, _ := html.Parse(strings.NewReader(`<html><head></head><body><table><tbody><tr><td><pre>some text</pre></td></tr></tbody></table></body></html>`))
	// html
	fmt.Println(n.FirstChild.Data)
	// head
	fmt.Println(n.FirstChild.FirstChild.Data)
	// body
	fmt.Println(n.FirstChild.FirstChild.NextSibling.Data)
	// table
	fmt.Println(n.FirstChild.FirstChild.NextSibling.FirstChild.Data)
	// tbody
	fmt.Println(n.FirstChild.FirstChild.NextSibling.FirstChild.FirstChild.Data)
	// tr
	fmt.Println(n.FirstChild.FirstChild.NextSibling.FirstChild.FirstChild.FirstChild.Data)
	// td
	fmt.Println(n.FirstChild.FirstChild.NextSibling.FirstChild.FirstChild.FirstChild.FirstChild.Data)
	// pre
	fmt.Println(n.FirstChild.FirstChild.NextSibling.FirstChild.FirstChild.FirstChild.FirstChild.FirstChild.Data)
}

@empijei empijei closed this as completed Aug 13, 2018
@golang golang locked and limited conversation to collaborators Aug 13, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants