html: UnescapeString unescapes HTML character references without a final semicolon in an attribute #40320

elan-sg · 2020-07-20T21:27:51Z

What version of Go are you using (`go version`)?

$ go version
go version go1.12.5 linux/amd64

Does this issue reproduce with the latest release?

yes

What operating system and processor architecture are you using (`go env`)?

$ go env
GOHOSTARCH="amd64"
GOHOSTOS="linux"

What did you do?

this is related to #21563
https://play.golang.com/p/Fh08ftsK9YQ

pass the string "<a href=example.com?param=value&timestamp=123>link" to html.UnescapeString

What did you expect to see?

according to https://html.spec.whatwg.org/multipage/parsing.html#character-reference-state
in an attribute, no character reference is parsed and string remains intact

it seems like an attempt was made to do this, but attribute is a constant?
https://golang.org/src/html/escape.go?s=1296:1319#L57
https://golang.org/src/html/escape.go?s=3112:3194#L142

I would expect the same string to come back

What did you see instead?

&times is changed to ×

The text was updated successfully, but these errors were encountered:

toothrot · 2020-07-21T19:11:49Z

/cc @mikesamuel @empijei

empijei · 2020-08-04T14:13:29Z

This is interesting and it indeed looks like a bug, thanks for reporting.

My only concern with fixing it is that browsers have the tendency of adjusting invalid encodings/HTML to try and display something, even if it doesn't match the markup they received. The consequence of this is that if we leave incomplete encodings in the decoded output we might risk to introduce some mutation-based XSS.

I need to investigate a bit more on this to see if there are security risks in addressing it.

elan-sg · 2020-08-04T17:47:22Z

thanks for looking!

toothrot added the NeedsInvestigation label Jul 21, 2020

toothrot added this to the Backlog milestone Jul 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

html: UnescapeString unescapes HTML character references without a final semicolon in an attribute #40320

html: UnescapeString unescapes HTML character references without a final semicolon in an attribute #40320

elan-sg commented Jul 20, 2020 •

edited

Loading

toothrot commented Jul 21, 2020

empijei commented Aug 4, 2020

elan-sg commented Aug 4, 2020

html: UnescapeString unescapes HTML character references without a final semicolon in an attribute #40320

html: UnescapeString unescapes HTML character references without a final semicolon in an attribute #40320

Comments

elan-sg commented Jul 20, 2020 • edited Loading

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

toothrot commented Jul 21, 2020

empijei commented Aug 4, 2020

elan-sg commented Aug 4, 2020

elan-sg commented Jul 20, 2020 •

edited

Loading

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?