Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: Data should be unescaped #71324

Closed
pjebs opened this issue Jan 19, 2025 · 6 comments
Closed

x/net/html: Data should be unescaped #71324

pjebs opened this issue Jan 19, 2025 · 6 comments
Labels
BugReport Issues describing a possible bug in the Go implementation.
Milestone

Comments

@pjebs
Copy link
Contributor

pjebs commented Jan 19, 2025

Go version

go 1.23

Output of go env in your module/workspace:

Using go playground:  https://go.dev/play/p/JgbH2_H7MyF

What did you do?

See demo code: https://go.dev/play/p/JgbH2_H7MyF

According to documentation: https://pkg.go.dev/golang.org/x/net/html#Node

Data is unescaped, so that it looks like "a<b" rather than "a& lt;b"

In my code sample, you can see that the output is escaped.
I am replacing all non-whitespace characters to &nbsp; (non-breaking whitespace).

What did you see happen?

The output should not be escaped. The ampersands got escaped.

What did you expect to see?

You can see in the output many &amp;nbsp;. It should be &nbsp; :

<h1>&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; <strong>&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;</strong></h1>
@seankhliao seankhliao changed the title golang.org/x/net/html: Data should be unescaped x/net/html: Data should be unescaped Jan 19, 2025
@gopherbot gopherbot added this to the Unreleased milestone Jan 19, 2025
@gabyhelp
Copy link

Related Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

@seankhliao
Copy link
Member

&bbsp is the escaped form. You should be using the literal non breaking space rune \u00a0.

Unlike many projects, the Go project does not use GitHub Issues for general discussion or asking questions. GitHub Issues are used for tracking bugs and proposals only.

For questions please refer to https://github.com/golang/go/wiki/Questions

@seankhliao seankhliao closed this as not planned Won't fix, can't repro, duplicate, stale Jan 19, 2025
@pjebs
Copy link
Contributor Author

pjebs commented Jan 19, 2025

@seankhliao I want it to be EXACTLY &nbsp; (literally)
The results of the function will be going elsewhere for further processing.

The documentation even states:

Data is unescaped, so that it looks like "a<b" rather than "a& lt;b"

@gabyhelp gabyhelp added the BugReport Issues describing a possible bug in the Go implementation. label Jan 19, 2025
@seankhliao
Copy link
Member

x/net/html only escapes what is necessary, nbsp is not in that list, so you'll never get a literal &npsp; entity out of rendering or the escape functions.

@pjebs
Copy link
Contributor Author

pjebs commented Jan 19, 2025

How is that not a bug? How can I properly deal with say <pre>&npsp</pre>? That should be treated literally.

@ianlancetaylor
Copy link
Member

When you call Parse you get Node values that, as the documentation says, contain unescaped data. Your program is rewriting the data to include a string &nbsp;. That is considered to be unescaped data. When you render it as HTML, that data is escaped. That is what your program shows.

I don't understand what you are trying to do. If you want help using the x/net/html package, please use a forum, not the issue tracker. See https://go.dev/wiki/Questions. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
BugReport Issues describing a possible bug in the Go implementation.
Projects
None yet
Development

No branches or pull requests

5 participants