Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html/template: escapes + to + #42506

Open
Dynom opened this issue Nov 11, 2020 · 11 comments
Open

html/template: escapes + to + #42506

Dynom opened this issue Nov 11, 2020 · 11 comments
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@Dynom
Copy link

Dynom commented Nov 11, 2020

As far as I know, the + character has no special meaning when used as: <p> This is a plus "+" sign</p>. However html/template's escaping analysis thinks otherwise. What strikes me as odd is that template.HTMLEscaper("+") doesn't provide the same behaviour, even when skipping the allocation optimisation check.

What version of Go are you using (go version)?

$ go version
go1.15.4 darwin/amd64

Does this issue reproduce with the latest release?

It reproduces with 1.15.4, haven't tried tip.

What did you do?

package main

import (
	"html/template"
	"fmt"
	"strings"
)

func main() {
	var buf strings.Builder

	tpl := template.Must(template.New("foo").Parse(`{{ "+" }}`))
	tpl.Execute(&buf, nil)
	
	fmt.Printf("HTML Escaper's result %q\n", template.HTMLEscaper("+"))
	fmt.Printf("html/template         %q\n", buf.String())
}

Playground with the problem I'm having: https://play.golang.org/p/6rbaYLi9_Bt

(I've also tried variations of html elements to create a different escaping context (e.g.: surround the action with e.g. <li> tags), which I thought would be the problem initially. That produced the same result however.)

What did you expect to see?

The same output as if the value ran through template.HTMLEscaper()

HTML Escaper's result "+"
html/template         "+"

What did you see instead?

HTML Escaper's result "+"
html/template         "&#43;"
@Dynom Dynom changed the title html/template escapes + to &#43; html/template: escapes + to &#43; Nov 11, 2020
@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 30, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Nov 30, 2020
@cagedmantis
Copy link
Contributor

/cc @empijei

@empijei
Copy link
Contributor

empijei commented Dec 16, 2020

Hi, do you mind expanding on what is the issue? The browser should unescape the "+" in all HTML contexts so it everything should still work, right?

Is this an issue on the efficiency of the escaper?

@empijei
Copy link
Contributor

empijei commented Dec 16, 2020

The reason that gets escaped is that it is in the following map and the two defined below it:

// htmlReplacementTable contains the runes that need to be escaped
// inside a quoted attribute value or in a text node.
var htmlReplacementTable = []string{
// https://www.w3.org/TR/html5/syntax.html#attribute-value-(unquoted)-state
// U+0000 NULL Parse error. Append a U+FFFD REPLACEMENT
// CHARACTER character to the current attribute's value.
// "
// and similarly
// https://www.w3.org/TR/html5/syntax.html#before-attribute-value-state
0: "\uFFFD",
'"': "&#34;",
'&': "&amp;",
'\'': "&#39;",
'+': "&#43;",
'<': "&lt;",
'>': "&gt;",
}

According to the doc the escape set should be a composition of special chars in these states:

None of these includes a "+" sign.

I don't honestly know why it was there in the first place and, to my knowledge, it should not be.

But before we stop escaping it and we introduce XSS and break tests in the community I would like to do some additional checks on this.

@tmthrgd
Copy link
Contributor

tmthrgd commented Dec 16, 2020

Digging through git blame shows it was first introduced 9 years ago in CL 4968058. This is the justification given at the time:

// The set of runes escaped is the union of the HTML specials and
// those determined by running the JS below in browsers:
// <div id=d></div>
// <script>(function () {
// var a = [], d = document.getElementById("d"), i, c, s;
// for (i = 0; i < 0x10000; ++i) {
// c = String.fromCharCode(i);
// d.innerHTML = "<span title=" + c + "lt" + c + "></span>"
// s = d.getElementsByTagName("SPAN")[0];
// if (!s || s.title !== c + "lt" + c) { a.push(i.toString(16)); }
// }
// document.write(a.join(", "));
// })()</script>

@empijei
Copy link
Contributor

empijei commented Dec 16, 2020

This looks like it is an extra-cautious escaping to take into account URLs and such in html attributes and other contexts.

I would be inclined to leave it as it is unless you think this is causing some misbehavior.

@Dynom
Copy link
Author

Dynom commented Dec 16, 2020

Thanks for your time!

Given a <script> context, it makes sense. I would expect the JS escape analysis to do something with it. But with risk of stating the obvious, JS escaping probably shouldn't apply to HTML.

I ran into an issue when generating HTML with timestamps and where this character got escaped.

  • html/template.HTMLEscaper() does escape < to &lt;, but not +
  • html/template.JSEscaper() does escape < to \u003C, but not +

I'm not entirely sure about the behaviour similarities between the public and private variants, but it feels like these should at least be aligned. I understand that risking up XSS is undesirable. My work-around and expectations were that, if my test data would be wrapped by HTMLEscaper() that it would result in exactly the same result, so that I can rely on that in my tests, e.g.:

patternsToMatch := []string{
    data.DateCreated.String(),
}
for _, p := range patternsToMatch {
    if !strings.Contains(tplResult, template.HTMLEscaper(p)) {
        t.Errorf("expected %q to occur in the template, but it didn't", p)
    }
}

The work-around now is to

if !strings.Contains(tplResult, func(p string) string {
	return strings.ReplaceAll(template.HTMLEscaper(p), "+", "&#43;")
}(p)) {
	t.Errorf("expected %q to occur in the template, but it didn't", p)
}

So I suppose it's one of:

  1. Bring HTMLEscaper() in line with htmlEscaper()
  2. Improve the escape analysis so that + is only escaped within a JS context (e.g.: <script />), although I suspect this is nearly impossible to do perfectly if obfuscation is considered as well.
  3. Leave as is

@empijei empijei added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 16, 2020
@empijei
Copy link
Contributor

empijei commented Dec 16, 2020

I agree and I am leaning towards 1 and 3.

Asking @kele for a second opinion.

@kele
Copy link

kele commented Dec 21, 2020

I'm leaning towards option 3 (leave as it is), because I think that the benefit of doing 1 (aligning HTMLEscaper with htmlEscape) is rather small, whereas making the change might break folks' tests.

If we had it documented somewhere that HTMLEscaper will always behave exactly the same as template.Template, I'd lean stronger towards 1.

@prattmic
Copy link
Member

prattmic commented Mar 2, 2021

More generally than just html/template's HTMLEscaper vs template execution escaping, there are really a bunch of different HTML escapers in the (extended) standard library.

Grouped by the same underlying implementation, I've found:

html/template template execution: https://cs.opensource.google/go/go/+/master:src/html/template/html.go;l=42;drc=2cd2ff6f564dce5be0c4fb7f06338ff7af3fc9a9

{text,html}/template.{HTMLEscape,HTMLEscaper,HTMLEscapeString}: https://cs.opensource.google/go/go/+/master:src/text/template/funcs.go;l=603;drc=2b50ab2aee75d3c361fcd1eb39e830e2e73056b6

html.EscapeString: https://cs.opensource.google/go/go/+/master:src/html/escape.go;l=178;drc=52c4488471ed52085a29e173226b3cbd2bf22b20

x/net/html.EscapeString: https://cs.opensource.google/go/x/net/+/master:html/escape.go;l=237;drc=3d87fd621ca9a824c5cff17216ce44769456cb3f

Every single one of these implementations has a slightly different set of escaped characters. As a user, particularly one unfamiliar with HTML safety, it is not at all clear which of these is "correct" or "safest" to use, or why they are different in the first place.

cc @katiehockman @empijei

@bprosnitz
Copy link
Contributor

This is an old issue, but I also recently ran into this. I had to construct a helper function to get a test working that would match the html/templates behavior. It would be good if the format was standardized across Go escaping functions.

@bstpierre
Copy link

I can open a separate issue if needed, but this seems to be basically the same problem of multiple inconsistent implementations that @prattmic described above.

I hit this in the context of URL escaping. This makes test code challenging because the (private) template escaping functions generate different content than the publicly available escaping functions -- namely lowercase vs uppercase.

Rendering with html/template converts = to %3d (lowercase) but using template.URLQueryEscaper results in %3D (uppercase).

Playground example: https://go.dev/play/p/jLK2bRuO8cL?v=gotip

html/template lowercase conversion: https://cs.opensource.google/go/go/+/master:src/html/template/url.go;drc=07b19bf5ab1160814ffedd448ce65c0eb6e9643a;l=134

{html,text}/template.URLQueryEscaper calls url.URLQueryEscaper: https://cs.opensource.google/go/go/+/master:src/text/template/funcs.go;l=749

url.URLQueryEscaper uppercase conversion: https://cs.opensource.google/go/go/+/master:src/net/url/url.go

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

8 participants