-
Notifications
You must be signed in to change notification settings - Fork 18k
x/net/html: incorrect handling nodes in the <head> section #42882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you reproduce with a much smaller file instead? There probably isn't a need to reproduce all the js here if you can still see it with a very short html example. Ideally just one complete file to reproduce it would be best. This problem doesn't reproduce on play.golang.org for me using the code given above: https://play.golang.org/p/oKEquK8zNeo Nor on Go 1.15.5 on darwin/amd64 or Go 1.16 darwin/arm64 with latest x/net/html Are you able to see it on another machine? What happens if you import the latest golang.org/x/net/html are you sure you're on the latest version of that? |
unfortunately the error is not always repeated I was inattentive, the error from the previous example is not repeated, but it is repeated from the example below func main() {
res, _ := http.Get("https://analytics.demo.1c.ru/analytics")
defer res.Body.Close()
if res.StatusCode != 200 {
log.Fatalf("status code error: %d %s", res.StatusCode, res.Status)
}
doc, _ := html.Parse(res.Body)
var buf bytes.Buffer
if err := html.Render(&buf, doc); err == nil {
fmt.Println(buf.String())
}
res.Body.Close()
} html code that returns https://analytics.demo.1c.ru/analytics same as in the original example |
You shouldn't need network calls to reproduce this and it complicates things. Try to make one go file with some hard-coded HTML which reproduces the problem. For example try something like this locally (adding all your html if you want to start with): https://play.golang.org/p/3O9vs3cdUC5 Then try running that locally to see what you get. If you can reproduce with static html, post the html which does so here. If you can't reproduce with static html the problem is likely elsewhere (perhaps html being mangled on the way to you somehow). |
If it was so easy I wouldn't text here. The behavior is so strange and I can't get what it depends on. For example, I call web-site and get html (string variable). I show it in the console - there is the same value as in res, _ := http.Get("https://analytics.demo.1c.ru/analytics")
defer res.Body.Close()
bytesHTML, _ := ioutil.ReadAll(res.Body)
testHTML := string(bytesHTML)
fmt.Println(testHTML)
if res.StatusCode != 200 {
log.Fatalf("status code error: %d %s", res.StatusCode, res.Status)
}
//testHTML = Test_html()
fmt.Println("------------------------------------------------------------")
doc, _ := html.Parse(strings.NewReader(testHTML))
var buf bytes.Buffer
if err := html.Render(&buf, doc); err == nil {
fmt.Println(buf.String())
} If uncomment the code |
I don't know whether it will help or not, but I output doc, _ := html.Parse(strings.NewReader(testHTML))
pp.Println(doc) Good
Bad
as you can see, the structures differ after |
/cc @namusyaka @nigeltao |
Will take a look at this in a few days. |
@LazarenkoA Looks like UTF8BOM is attached to the HTML document you are getting over the network. For example, you should get the result you want by removing the BOM in advance as follows: package main
import (
"bytes"
"fmt"
"io/ioutil"
"log"
"net/http"
"golang.org/x/net/html"
"golang.org/x/text/encoding/unicode"
"golang.org/x/text/transform"
)
var utf8BOM = []byte{0xef, 0xbb, 0xbf}
func main() {
res, _ := http.Get("https://analytics.demo.1c.ru/analytics")
defer res.Body.Close()
if res.StatusCode != 200 {
log.Fatalf("status code error: %d %s", res.StatusCode, res.Status)
}
b, err := ioutil.ReadAll(res.Body)
if err != nil {
log.Fatal(err)
}
if bytes.HasPrefix(b, utf8BOM) {
r := transform.NewReader(bytes.NewBuffer(b), unicode.UTF8BOM.NewDecoder())
var err error
b, err = ioutil.ReadAll(r)
if err != nil {
log.Fatal(err)
}
}
doc, _ := html.Parse(bytes.NewReader(b))
var buf bytes.Buffer
if err := html.Render(&buf, doc); err == nil {
fmt.Println(buf.String())
}
res.Body.Close()
} Also, according to the description in https://pkg.go.dev/golang.org/x/net/html:
I guess we can mention the UTF8BOM in our document but at least we don't need to take care of the BOM in our implementation. |
thank you very much |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Test_html()
What did you expect to see?
What did you see instead?
PS
Possibly related to issues #23064
The text was updated successfully, but these errors were encountered: