Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html/template: refuse to parse complex JS template literals #60055

Open
rolandshoemaker opened this issue May 8, 2023 · 17 comments
Open

html/template: refuse to parse complex JS template literals #60055

rolandshoemaker opened this issue May 8, 2023 · 17 comments
Labels
Proposal Proposal-Accepted Proposal-Security PUBLIC issues for security fixes, per https://go.dev/security/policy
Milestone

Comments

@rolandshoemaker
Copy link
Member

Forking from #9200.

Our minimal JS state machine inside of the template parser is limited enough that we cannot reasonably understand the semantics of JS template literal syntax. Rather than attempting to bodge a interpreter for nested literals, i.e. as suggested in https://go.dev/cl/484075, we propose instead to simply refuse to parse complex literals, with the justification that we cannot reasonably determine the safety of these literals.

In terms of defining "complex", we would bail if we saw any special character (`,",',$,{,}) after encountering `${. For example this would prevent parsing templates containing `${"asd"}`, `${`${...}`}`, `${let a = {};}`, etc, while still supporting the majority of JS template use cases (including multi-line strings, and basic template actions, etc).

cc @golang/security

@rolandshoemaker rolandshoemaker added Proposal Proposal-Security PUBLIC issues for security fixes, per https://go.dev/security/policy labels May 8, 2023
@gopherbot gopherbot added this to the Proposal milestone May 8, 2023
@jupenur
Copy link

jupenur commented May 8, 2023

Presumably the idea is that this would allow detecting the end of a ${ ... } placeholder inside a template literal simply by looking for the next } character after ${? If so, surely the suggested set of characters to disallow (`,",',$,{,}) is insufficient?

Consider these:

let a = `${
  // this is evil };-)
  expression + goes + here
}`;

let b = `${ /* this is equally evil };-) */ foo + 3 }`;

let c = `${/[^}]*/.exec(input)[0]}`;

Preventing misinterpretation in these cases would require blocklisting /, which would then also stop simple use cases like `${foo/2}` from working.

@rolandshoemaker
Copy link
Member Author

I think the suggested implementation would be more complex than simply looking for the next }.

We would transition from stateJSBqStr to the stateJS state when encountering ${, and set a bit that we are actually inside of a string interpolation expression (this could, probably, be a completely separate state). The parser would then (mostly) properly handle javascript comments, regexp, strings, etc. If we attempted to transition to another state, other than reentering stateJSBqStr, we would consider that an error.

We would probably also need a new state variable, similar to jsCtx (which is used for what the next / means), which indicates what the next } means, either closing an object or the string interpolation, based on what characters the parser encounters.

For instance, an incredibly basic interpretation of the parser state would be:

`${ let a = {...} }`
  ▲         ▲   ▲ ▲
  │         │   │ │
  │         │   │ exit stateJS (unset interp bit)
  │         │   │
  │         │   unset jsCtxObject
  │         │
  │         set jsCtxObject
  │
  enter stateJS (set interp bit)

@jupenur
Copy link

jupenur commented May 9, 2023

I'm not sure I understand. First you mention bailing out when encountering one of a specific set of special characters (`,",',$,{,}), because these are the "complex" cases. And you list an example of when a bailout would happen: `${let a = {};}`. This sounds like a good idea in general.

But now you're saying you'd still do "proper" handling of JavaScript inside placeholders when no bailout happens, and then use `${ let a = {...} }` as an example of how parser state would evolve during parsing. This seems like a contradiction to the previous.

Why would the parser bail out on `${let a = {};}` but not on `${ let a = {...} }`? The only difference is the semicolon, which is not in your original list of disallowed characters. And if the goal is not to simplify how placeholder endings are detected, what is the goal?

@rolandshoemaker
Copy link
Member Author

Oh bah, sorry, I managed to copy everything from my draft except the last paragraph which actually explained what I meant 🤦.

Most things we already consider "complex" have their own states, quoted strings, regexp, etc, so we can immediately fail on those transitions (perhaps allowing the comment states?). Javascript objects (and any other brace delimited syntax) on the other hand don't, so we'd need to look for when we hit a character than would cause a change into jsCtxObject, and fail (a more ambitious project would be to continue parsing here, and attempt to actually properly support nesting literals, complex template syntax, etc. It is perhaps worthwhile at least exploring how feasible this is, but I suspect it's a hole that appears shallow, but immediately drops a thousand feet).

In the above example then when we hit the first { inside of the interpolation context, we'd stop parsing and return some error, cannot safely parse complex javascript template literal, or such.

@rsc
Copy link
Contributor

rsc commented May 10, 2023

This definition of complex seems fine. I do think it's important to keep the simple cases working (passing through), even if we can't substitute into them ourselves.

@rsc
Copy link
Contributor

rsc commented May 10, 2023

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented May 17, 2023

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc
Copy link
Contributor

rsc commented May 24, 2023

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: html/template: refuse to parse complex JS template literals html/template: refuse to parse complex JS template literals May 24, 2023
@rsc rsc modified the milestones: Proposal, Backlog May 24, 2023
@dwrz
Copy link

dwrz commented Jun 8, 2023

While I understand the importance of this change, it has introduced a limitation that doesn't appear to have an easy work around.

If I want to assign a multi-line template to a JavaScript variable, is my only remaining option to create a function to transform newlines to ones compatible with '' and "" JavaScript strings (i.e, replace '\n' with '+' and '')?

I relied on template literals to implement dynamic forms with Go templates, but am struggling to find an alternative solution.

{{ define "date-row" }}
<td>{{ .Start }}</td>
<td>{{ .End }}
{{ end }}
<table>
  {{ range .Dates }}
    {{ template "date-row" . }}
  {{ end }}
</table>
<script>
  const dateRow = `{{ template "date-row" . }}`
  const addDateRange = () => {
    const row = document.createElement('tr');
    row.innerHTML = dateRow;
    
	const dates = document.getElementById('dates');
    dates.appendChild(row);
  };
</script>

@rolandshoemaker
Copy link
Member Author

Revisiting this, I spent some time prototyping a slightly more complex state tracking change in http://go.dev/cl/507995, this is essentially the "more ambitious project" described in #60055 (comment).

This prototype implements string interpolation expression depth tracking, and switches between the template string state and top-level JS state when entering and exiting template expressions (i.e. `string state ${ JS state }`). This would allow us to continue allowing complex JS template literal expressions (i.e. including ${}, while still disallowing Go template actions within the string portion of the template (i.e. allowing `example ${ {{. }} }`, and disallowing `{{. }}`).

Unless someone spots a particularly egregious error I've made in the implementation, I think this presents us with three possible paths to go down:

  1. Continue with the proposal as written here, disabling template actions when the expression depth > 0.
  2. Switch to the approach suggested in CL 507995, disabling template actions within the string portion of template literals, but allowing them within the expression portion (and inherently allowing unlimited nesting).
  3. Support actions anywhere within JS template literals, but escaping actions in the string portion, such that an action within the string portion cannot insert a expression. Actions within expressions would be treated the same as actions anywhere else in the top-level JS context. (This is the most permissive approach, which reverts some of the decisions we've made so far, but would support some use cases users have brought up.)

@jupenur
Copy link

jupenur commented Jul 18, 2023

With http://go.dev/cl/507995 this template parses incorrectly:

<script>
        var foo = `x` + "${ {{.}}";
</script>

Passing in ; alert(1); // as the input produces this:

<script>
        var foo = `x` + "${ "; alert(1); //"";
</script>

@dwrz
Copy link

dwrz commented Jul 18, 2023

I really appreciate the efforts to improve the implementation. Does the Go team have any recommendations for the use case where a template with actions needs to be stored as a multi-line JavaScript string?

@jupenur
Copy link

jupenur commented Jul 18, 2023

@dwrz you could use a script data block:

<script type="application/x-template" id="date-row-template">
  {{ template "date-row" . }}
</script>

<script>
  const dateRow = document.getElementById('date-row-template').innerHTML;
  const addDateRange = () => {
    const row = document.createElement('tr');
    row.innerHTML = dateRow;
    
	const dates = document.getElementById('dates');
    dates.appendChild(row);
  };
</script>

@rolandshoemaker
Copy link
Member Author

@jupenur 🤦 there was a silly bug where state wasn't properly reset, should be working as intended now, let me know if you can see any other edge cases.

@jupenur
Copy link

jupenur commented Jul 18, 2023

@rolandshoemaker d01c3bf and this template:

<script>
        var foo = `${ (_ => { return "x" })() + "${ {{.}}" }`;
</script>

...with the input + alert(1) + produces this:

<script>
        var foo = `${ (_ => { return "x" })() + "${ " + alert(1) + "" }`;
</script>

@rolandshoemaker
Copy link
Member Author

Ah right, we need to track brace depth across JS contexts. I feel like this is slowly draining my life force...

@ianlancetaylor
Copy link
Contributor

Alternative in #61619.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposal Proposal-Accepted Proposal-Security PUBLIC issues for security fixes, per https://go.dev/security/policy
Projects
Status: Accepted
Development

No branches or pull requests

6 participants