x/net/html: do not parse the blank line and line break as a TextNode #37466
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
What did you do?
I tried to parse the golang.org page with the net / html package, and tried to print all the information in TextNode, but I found that many blank lines appeared, and after testing, I found that all the blank lines and line breaks were parsed as TextNode, but They are just empty to format the code, not the information that the page needs to display, and there is no correct way to distinguish them from the real content.
my project
`package main
import (
"fmt"
"os"
)
func main() {
doc, err := html.Parse(os.Stdin)
if err != nil {
fmt.Fprintf(os.Stderr, "content print :%v\n", err)
}
contentPrint(doc)
}
func contentPrint(n *html.Node) {
if n.Type == html.ElementNode && n.Data != "script" && n.Data != "style" {
if n.FirstChild != nil && n.FirstChild.Type == html.TextNode {
fmt.Printf("content:%s\n", n.FirstChild.Data)
}
}
for c := n.FirstChild; c != nil; c = c.NextSibling {
contentPrint(c)
}
}`
What did you expect to see?
`content:The Go Programming Language
content:Documents
content:Packages
content:The Project
content:Help
content:Blog
content:Play
content:Search
content:simple
content:reliable
content:efficient
content:Try Go
content:Open in Playground
content:// You can edit this code!
// Click here and start typing.
package main
import "fmt"
func main() {
fmt.Println("Hello, 世界")
}
content:Hello, World!
content:Conway's Game of Life
content:Fibonacci Closure
content:Peano Integers
content:Concurrent pi
content:Concurrent Prime Sieve
content:Peg Solitaire Solver
content:Tree Comparison
content:Run
content:Share
content:Tour
content:Featured articles
content:Go 1.14 is released
content:download page
content:Published 25 February 2020
content:Next steps for pkg.go.dev
content:go.dev
content:Published 31 January 2020
content:Read more >
content:Featured video
content:Copyright
content:Terms of Service
content:Privacy Policy
content:Report a website issue
content:Supported by Google`
What did you see instead?
`The Go Programming Language
Documents
Packages
The Project
Help
Blog
Play
Search
simple
reliable
efficient
Try Go
Open in Playground
// You can edit this code!
// Click here and start typing.
package main
import "fmt"
func main() {
fmt.Println("Hello, 世界")
}
Hello, World!
Conway's Game of Life
Fibonacci Closure
Peano Integers
Concurrent pi
Concurrent Prime Sieve
Peg Solitaire Solver
Tree Comparison
Run
Share
Tour
Featured articles
Go 1.14 is released
download page
Published 25 February 2020
Next steps for pkg.go.dev
go.dev
Published 31 January 2020
Read more >
Featured video
Copyright
Terms of Service
Privacy Policy
Report a website issue
Supported by Google
~/go/src/go_bible/ch5/5.3 go build
~/go/src/go_bible/ch5/5.3 ./5.3 <golang_org.htm
^C
✘ ~/go/src/go_bible/ch5/5.3 go build
~/go/src/go_bible/ch5/5.3 ./5.3 <golang_org.htm
content:The Go Programming Language
content:
content:
content:
content:
content:
content:Documents
content:Packages
content:The Project
content:Help
content:Blog
content:Play
content:
content:
content:
content:Search
content:
content:
content:
content:
content:
Go is an open source programming language that makes it easy to build
content:simple
content:reliable
content:efficient
content:
content:
Binary distributions available for
content:
content:
content:Try Go
content:Open in Playground
content:
content:// You can edit this code!
// Click here and start typing.
package main
import "fmt"
func main() {
fmt.Println("Hello, 世界")
}
content:
content:
content:Hello, World!
content:Conway's Game of Life
content:Fibonacci Closure
content:Peano Integers
content:Concurrent pi
content:Concurrent Prime Sieve
content:Peg Solitaire Solver
content:Tree Comparison
content:
content:Run
content:
content:Share
content:Tour
content:
content:Featured articles
content:Go 1.14 is released
content:Today the Go team is very happy to announce the release of Go 1.14. You can get it from the
content:download page
content:Published 25 February 2020
content:Next steps for pkg.go.dev
content:In 2019, we launched
content:go.dev
content:Published 31 January 2020
content:Read more >
content:
content:Featured video
content:
content:
content:
content:
content:Copyright
content:Terms of Service
content:Privacy Policy
content:Report a website issue
content:Supported by Google`
The text was updated successfully, but these errors were encountered: