html/template: escapes + to + #42506

Dynom · 2020-11-11T10:38:51Z

As far as I know, the + character has no special meaning when used as: <p> This is a plus "+" sign</p>. However html/template's escaping analysis thinks otherwise. What strikes me as odd is that template.HTMLEscaper("+") doesn't provide the same behaviour, even when skipping the allocation optimisation check.

What version of Go are you using (`go version`)?

$ go version
go1.15.4 darwin/amd64

Does this issue reproduce with the latest release?

It reproduces with 1.15.4, haven't tried tip.

What did you do?

package main

import (
	"html/template"
	"fmt"
	"strings"
)

func main() {
	var buf strings.Builder

	tpl := template.Must(template.New("foo").Parse(`{{ "+" }}`))
	tpl.Execute(&buf, nil)
	
	fmt.Printf("HTML Escaper's result %q\n", template.HTMLEscaper("+"))
	fmt.Printf("html/template         %q\n", buf.String())
}

Playground with the problem I'm having: https://play.golang.org/p/6rbaYLi9_Bt

(I've also tried variations of html elements to create a different escaping context (e.g.: surround the action with e.g. <li> tags), which I thought would be the problem initially. That produced the same result however.)

What did you expect to see?

The same output as if the value ran through template.HTMLEscaper()

HTML Escaper's result "+"
html/template         "+"

What did you see instead?

HTML Escaper's result "+"
html/template         "&#43;"

The text was updated successfully, but these errors were encountered:

cagedmantis · 2020-11-30T23:06:46Z

/cc @empijei

empijei · 2020-12-16T10:59:44Z

Hi, do you mind expanding on what is the issue? The browser should unescape the "+" in all HTML contexts so it everything should still work, right?

Is this an issue on the efficiency of the escaper?

empijei · 2020-12-16T11:39:42Z

The reason that gets escaped is that it is in the following map and the two defined below it:

go/src/html/template/html.go

Lines 50 to 67 in 75e16f5

    
           // htmlReplacementTable contains the runes that need to be escaped 
        
           // inside a quoted attribute value or in a text node. 
        
           var htmlReplacementTable = []string{ 
        
           	// https://www.w3.org/TR/html5/syntax.html#attribute-value-(unquoted)-state 
        
           	// U+0000 NULL Parse error. Append a U+FFFD REPLACEMENT 
        
           	// CHARACTER character to the current attribute's value. 
        
           	// " 
        
           	// and similarly 
        
           	// https://www.w3.org/TR/html5/syntax.html#before-attribute-value-state 
        
           	0:    "\uFFFD", 
        
           	'"':  "&#34;", 
        
           	'&':  "&amp;", 
        
           	'\'': "&#39;", 
        
           	'+':  "&#43;", 
        
           	'<':  "&lt;", 
        
           	'>':  "&gt;", 
        
           }

According to the doc the escape set should be a composition of special chars in these states:

None of these includes a "+" sign.

I don't honestly know why it was there in the first place and, to my knowledge, it should not be.

But before we stop escaping it and we introduce XSS and break tests in the community I would like to do some additional checks on this.

tmthrgd · 2020-12-16T12:14:15Z

Digging through git blame shows it was first introduced 9 years ago in CL 4968058. This is the justification given at the time:

go/src/pkg/exp/template/html/html.go

Lines 16 to 29 in 4670d9e

    
           // The set of runes escaped is the union of the HTML specials and 
        
           // those determined by running the JS below in browsers: 
        
           // <div id=d></div> 
        
           // <script>(function () { 
        
           // var a = [], d = document.getElementById("d"), i, c, s; 
        
           // for (i = 0; i < 0x10000; ++i) { 
        
           //   c = String.fromCharCode(i); 
        
           //   d.innerHTML = "<span title=" + c + "lt" + c + "></span>" 
        
           //   s = d.getElementsByTagName("SPAN")[0]; 
        
           //   if (!s || s.title !== c + "lt" + c) { a.push(i.toString(16)); } 
        
           // } 
        
           // document.write(a.join(", ")); 
        
           // })()</script>

empijei · 2020-12-16T13:09:50Z

This looks like it is an extra-cautious escaping to take into account URLs and such in html attributes and other contexts.

I would be inclined to leave it as it is unless you think this is causing some misbehavior.

Dynom · 2020-12-16T13:27:54Z

Thanks for your time!

Given a <script> context, it makes sense. I would expect the JS escape analysis to do something with it. But with risk of stating the obvious, JS escaping probably shouldn't apply to HTML.

I ran into an issue when generating HTML with timestamps and where this character got escaped.

html/template.HTMLEscaper() does escape < to <, but not +
html/template.JSEscaper() does escape < to \u003C, but not +

I'm not entirely sure about the behaviour similarities between the public and private variants, but it feels like these should at least be aligned. I understand that risking up XSS is undesirable. My work-around and expectations were that, if my test data would be wrapped by HTMLEscaper() that it would result in exactly the same result, so that I can rely on that in my tests, e.g.:

patternsToMatch := []string{
    data.DateCreated.String(),
}
for _, p := range patternsToMatch {
    if !strings.Contains(tplResult, template.HTMLEscaper(p)) {
        t.Errorf("expected %q to occur in the template, but it didn't", p)
    }
}

The work-around now is to

if !strings.Contains(tplResult, func(p string) string {
	return strings.ReplaceAll(template.HTMLEscaper(p), "+", "&#43;")
}(p)) {
	t.Errorf("expected %q to occur in the template, but it didn't", p)
}

So I suppose it's one of:

Bring HTMLEscaper() in line with htmlEscaper()
Improve the escape analysis so that + is only escaped within a JS context (e.g.: <script />), although I suspect this is nearly impossible to do perfectly if obfuscation is considered as well.
Leave as is

empijei · 2020-12-16T13:31:39Z

I agree and I am leaning towards 1 and 3.

Asking @kele for a second opinion.

kele · 2020-12-21T19:36:42Z

I'm leaning towards option 3 (leave as it is), because I think that the benefit of doing 1 (aligning HTMLEscaper with htmlEscape) is rather small, whereas making the change might break folks' tests.

If we had it documented somewhere that HTMLEscaper will always behave exactly the same as template.Template, I'd lean stronger towards 1.

prattmic · 2021-03-02T22:25:16Z

More generally than just html/template's HTMLEscaper vs template execution escaping, there are really a bunch of different HTML escapers in the (extended) standard library.

Grouped by the same underlying implementation, I've found:

html/template template execution: https://cs.opensource.google/go/go/+/master:src/html/template/html.go;l=42;drc=2cd2ff6f564dce5be0c4fb7f06338ff7af3fc9a9

{text,html}/template.{HTMLEscape,HTMLEscaper,HTMLEscapeString}: https://cs.opensource.google/go/go/+/master:src/text/template/funcs.go;l=603;drc=2b50ab2aee75d3c361fcd1eb39e830e2e73056b6

html.EscapeString: https://cs.opensource.google/go/go/+/master:src/html/escape.go;l=178;drc=52c4488471ed52085a29e173226b3cbd2bf22b20

x/net/html.EscapeString: https://cs.opensource.google/go/x/net/+/master:html/escape.go;l=237;drc=3d87fd621ca9a824c5cff17216ce44769456cb3f

Every single one of these implementations has a slightly different set of escaped characters. As a user, particularly one unfamiliar with HTML safety, it is not at all clear which of these is "correct" or "safest" to use, or why they are different in the first place.

cc @katiehockman @empijei

bprosnitz · 2022-11-07T09:20:02Z

This is an old issue, but I also recently ran into this. I had to construct a helper function to get a test working that would match the html/templates behavior. It would be good if the format was standardized across Go escaping functions.

bstpierre · 2022-11-27T17:28:41Z

I can open a separate issue if needed, but this seems to be basically the same problem of multiple inconsistent implementations that @prattmic described above.

I hit this in the context of URL escaping. This makes test code challenging because the (private) template escaping functions generate different content than the publicly available escaping functions -- namely lowercase vs uppercase.

Rendering with html/template converts = to %3d (lowercase) but using template.URLQueryEscaper results in %3D (uppercase).

Playground example: https://go.dev/play/p/jLK2bRuO8cL?v=gotip

html/template lowercase conversion: https://cs.opensource.google/go/go/+/master:src/html/template/url.go;drc=07b19bf5ab1160814ffedd448ce65c0eb6e9643a;l=134

{html,text}/template.URLQueryEscaper calls url.URLQueryEscaper: https://cs.opensource.google/go/go/+/master:src/text/template/funcs.go;l=749

url.URLQueryEscaper uppercase conversion: https://cs.opensource.google/go/go/+/master:src/net/url/url.go

Dynom changed the title ~~html/template escapes + to +~~ html/template: escapes + to + Nov 11, 2020

cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Nov 30, 2020

cagedmantis added this to the Backlog milestone Nov 30, 2020

empijei added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Dec 16, 2020

empijei mentioned this issue Dec 16, 2020

proposal: html/template: escape unquoted attributes by first quoting them #43224

Open

NHDaly mentioned this issue Mar 12, 2022

allow rendering big flame graphs by avoiding stack overflow in JS parser google/pprof#684

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

html/template: escapes + to + #42506

html/template: escapes + to + #42506

Dynom commented Nov 11, 2020 •

edited

cagedmantis commented Nov 30, 2020

empijei commented Dec 16, 2020

empijei commented Dec 16, 2020

tmthrgd commented Dec 16, 2020

empijei commented Dec 16, 2020

Dynom commented Dec 16, 2020 •

edited

empijei commented Dec 16, 2020

kele commented Dec 21, 2020

prattmic commented Mar 2, 2021

bprosnitz commented Nov 7, 2022

bstpierre commented Nov 27, 2022

html/template: escapes + to &#43; #42506

html/template: escapes + to &#43; #42506

Comments

Dynom commented Nov 11, 2020 • edited

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What did you do?

What did you expect to see?

What did you see instead?

cagedmantis commented Nov 30, 2020

empijei commented Dec 16, 2020

empijei commented Dec 16, 2020

tmthrgd commented Dec 16, 2020

empijei commented Dec 16, 2020

Dynom commented Dec 16, 2020 • edited

empijei commented Dec 16, 2020

kele commented Dec 21, 2020

prattmic commented Mar 2, 2021

bprosnitz commented Nov 7, 2022

bstpierre commented Nov 27, 2022

html/template: escapes + to + #42506

html/template: escapes + to + #42506

Dynom commented Nov 11, 2020 •

edited

What version of Go are you using (`go version`)?

Dynom commented Dec 16, 2020 •

edited