Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html: UnescapeString unescapes HTML character references without a final semicolon in an attribute #40320

Open
elan-sg opened this issue Jul 20, 2020 · 3 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@elan-sg
Copy link

elan-sg commented Jul 20, 2020

What version of Go are you using (go version)?

$ go version
go version go1.12.5 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (go env)?

$ go env
GOHOSTARCH="amd64"
GOHOSTOS="linux"

What did you do?

this is related to #21563
https://play.golang.com/p/Fh08ftsK9YQ

pass the string "<a href=example.com?param=value&timestamp=123>link" to html.UnescapeString

What did you expect to see?

according to https://html.spec.whatwg.org/multipage/parsing.html#character-reference-state
in an attribute, no character reference is parsed and string remains intact

it seems like an attempt was made to do this, but attribute is a constant?
https://golang.org/src/html/escape.go?s=1296:1319#L57
https://golang.org/src/html/escape.go?s=3112:3194#L142

I would expect the same string to come back

What did you see instead?

&times is changed to ×

@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 21, 2020
@toothrot toothrot added this to the Backlog milestone Jul 21, 2020
@toothrot
Copy link
Contributor

/cc @mikesamuel @empijei

@empijei
Copy link
Contributor

empijei commented Aug 4, 2020

This is interesting and it indeed looks like a bug, thanks for reporting.

My only concern with fixing it is that browsers have the tendency of adjusting invalid encodings/HTML to try and display something, even if it doesn't match the markup they received. The consequence of this is that if we leave incomplete encodings in the decoded output we might risk to introduce some mutation-based XSS.

I need to investigate a bit more on this to see if there are security risks in addressing it.

@elan-sg
Copy link
Author

elan-sg commented Aug 4, 2020

thanks for looking!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants