Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text/template: allow newlines within actions and pipelines #29770

Closed
bep opened this issue Jan 16, 2019 · 18 comments
Closed

text/template: allow newlines within actions and pipelines #29770

bep opened this issue Jan 16, 2019 · 18 comments

Comments

@bep
Copy link
Contributor

bep commented Jan 16, 2019

This issue resurfaced in gohugoio/hugo#5604 -- but has been mentioned enough times to have proven its value.

In Hugo we have many variadic template functions. One relevant example would be the dict func that accepts a list of key/value pairs.

A current example would look like this:

{{ dict "country" "Norway" "population" "5 millions" "language" "Norwegian" "languageCode" "nb" "weather" "freezing cold" "capitol" "Oslo" "largest_city" "Oslo" "currency"  "Norwegian krone" "dialing_code" "+47" }}

The above would obviously be easier to read if it could be written something like this:

{{ dict 
	"country" "Norway" 
	"population" "5 millions"
	"language" "Norwegian"
	"language_code" "nb"
	"weather" "freezing cold"
	"capitol" "Oslo"
	"largest_city" "Oslo"
	"currency"  "Norwegian krone"
	"dialing_code" "+47" 
}}

The above creates a parser error, and is a common problem when you try to use one of the "HTML beautifiers" out there on Go templates.

As a person who have spent some thousand hours inside Go templates I would say that this issue is really, really worth it. And if you don't come from that background, I think it helps to be reminded that many Go applications needs a user interface, and Go templates is a natural choice. No one questions the value of gofmt and pretty Go code, not sure why the UI code should be treated differently. I know @natefinch tinkered with a "Go template formatter" some time ago. A solution to this partucular issue would make that task more enjoyable, me thinks.

/cc @regisphilibert

@mvdan
Copy link
Member

mvdan commented Jan 16, 2019

This seems reasonable to me. I'm a bit confused by the wording of the title, however. I assume what you want is for newlines to be allowed whithin pipelines - I'm not sure I understand how delimiters play a part here.

I also wonder if there's any disadvantage in allowing newlines in those cases. I can't think of any right now, but then again I haven't written complex pipelines in Go templates.

@mvdan mvdan added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jan 16, 2019
@mvdan mvdan added this to the Go1.13 milestone Jan 16, 2019
@mvdan
Copy link
Member

mvdan commented Jan 16, 2019

Here's a quote from the package godoc:

Except for raw strings, actions may not span newlines, although comments can.

@bep have you given any thought to whether all actions would be allowed to span many newlines? Or would this be restricted to just pipelines, or just commands?

/cc @robpike

@bep
Copy link
Contributor Author

bep commented Jan 16, 2019

I'm a bit confused by the wording of the title, however. I assume what you want is for newlines to be allowed whithin pipelines - I'm not sure I understand how delimiters play a part here.

I'm not sure how "what I want" translates to "Go template terms", and I don't remember the difference between commands and pipe. But currently the items between {{ and }} are separated by spaces. If you also allow newline as a separator, it would make many people happy.

@mvdan
Copy link
Member

mvdan commented Jan 16, 2019

Thanks, I understand what you meant now. So I think what you're suggesting is to allow newlines within all actions, which includes pipelines as well as if, range, and everything else that can go between {{ and }}.

@mvdan mvdan changed the title text/template: allow newlines as command delimiter text/template: allow newlines within actions and pipelines Jan 16, 2019
@natefinch
Copy link
Contributor

strong +1 for this. I can't think of any reason this would cause problems, and I've often wanted to have a template node span more than one line.

As for the template formatter..... I tried and eventually gave up. The stdlib template parser is extremely lossy, so in order to use it for formatting, I was having to dive deep into modifying the parser to be lossless, and that turned into too heavy a lift for the time I had.

@bep
Copy link
Contributor Author

bep commented Jan 16, 2019

@mvdan yes, that is correct. So, my primary "pain case" would be the example below. But I suspect that the changes in the template lexer/parser, and its documentation, will not be simpler if you try to somehow narrow the scope.

{{ $mymap := someTemplateFunc 
	"country" "Norway" 
	"population" "5 millions"
	"language" "Norwegian"
	"language_code" "nb"
	"weather" "freezing cold"
	"capitol" "Oslo"
	"largest_city" "Oslo"
	"currency"  "Norwegian krone"
	"dialing_code" "+47" 
}}

@regisphilibert
Copy link

regisphilibert commented Jan 16, 2019

Mind you, if this simplifies things, this could be limited to certain user defined actions: in the line of {{-, a character attached to the opening curlies could indicate the content of this action accepts newlines:

{{> $mymap := someTemplateFunc 
	"country" "Norway" 
	"population" "5 millions"
	"language" "Norwegian"
}}

@pjeby
Copy link

pjeby commented May 17, 2020

So, I put together a preliminary patch to implement this, but quickly ran into a limitation of the feature as currently specified. No sooner did I have the ability to create multiline actions, than I tried writing code like this:

{{-
    partial "card" (slice $ (dict
        "title"     $section_page.Title
        "url"       $url       
        "highlight" (gt (len $child_pages) 0)   /* highlight sections with children */
        "content"   ($section_page.Summary | safeHTML)
        "actions"   (slice (dict "label" "Learn More" "url" $url))
    ))
-}}

Looks pretty good, right? Only, it doesn't parse, because just enabling multi-line actions doesn't do anything about needing literal ASCII spaces on trim markers, or the part where comments have to be separate actions from non-comment actions.

So I think the scope of the feature needs to be expanded to include allowing trim markers to be denoted using any kind of whitespace (such as a linefeed), and supporting inline comments. The former avoids the need to place invisible trailing spaces in an action formatted like the one above, while the latter is rather self-explanatory.

Reading between the lines of the current text/template/parse implementation, it appears that the reason for requiring an ASCII space and the reason for having a separate comment action type are actually the same: limitations of the existing lexer, such as its limited lookahead and tight coupling between lexing state and direct manipulation of the scanning state.

So I threw out my previous attempt at a patch and started over, refactoring the lexer to separate the lexing state machine from the scanning state (i.e. character matching, line tracking, etc.). To make a long story short, I was able to implement multiline actions, "any whitespace is good" trim markers, and inline comments, while actually significantly speeding up the lexer (at least according to the parsing benchmark) and expanding the test suite to cover scanning state independent of lexing state.

So I can see why this wasn't done before; it was a lot of work, after all, and the limitations weren't really an issue for the original use cases that text/template was created for. It's only tools like hugo, dockerize, etc. that use templates as an extension/scripting language that really need this kind of thing. (Though when you want it, you really want it.)

Anyway, I've written an implementation already, just about ready to submit a changeset for (docs, tests, and all), but I thought I'd check and see if anybody sees any other problems with this as an approach to resolving this feature request. (In case, for example, there was some other reason besides lexer limitations for not allowing this previously!)

pjeby added a commit to pjeby/go that referenced this issue May 17, 2020
Fixes golang#29770

To facilitate use of templates as a scripting or extension
mechanism (e.g. in Hugo, dockerize, etc.) where complex logic
or expressions are required, this change allows the use of
freeform whitespace and comments within template actions.

Trim markers still require whitespace, but now any of the
standard four whitespace characters are allowed, including
newlines.  Rather than having a dedicated "comment action"
syntax, an action that is empty of anything but whitespace
or comments is simply elided from the token stream received
by the parser.  (While still processing trim markers.)

To ensure correctness and improve maintainability, all
lexer code dealing directly with the scanning state (e.g.
pos, input, etc.) was extracted along with the state
itself to an embedded struct implemented in scanner.go.

The lexer now concerns itself only with the language
being recognized, while the scanner fully encapsulates
the scan state, and is ignorant of the language details.

(These operations were all needed for the new functionality,
so it seemed prudent to extract them as inline-able methods
rather than simply duplicate the manual pointer arithmetic
in new code.  Once this was done, the scanner was refactored
to eliminate performance regression on the package benchmark,
and a slight improvement beyond that.)

As a further side benefit, the extracted scannner struct
now has its own, additional unit tests, distinct from
the lexer's acceptance tests.
pjeby added a commit to pjeby/go that referenced this issue May 17, 2020
Fixes golang#29770

To facilitate use of templates as a scripting or extension
mechanism (e.g. in Hugo, dockerize, etc.) where complex logic or
expressions are required, this change allows the use of freeform
whitespace and comments within template actions.

Trim markers still require whitespace, but now any of the
standard four whitespace characters are allowed, including
newlines.  Rather than having a dedicated "comment action" syntax
an action that is empty of anything but whitespace or comments
is simply elided from the token stream received by the parser.
(While still processing trim markers.)

To ensure correctness and improve maintainability, all lexer code
dealing directly with the scanning state (e.g. pos, input, etc.)
was extracted along with the state itself to an embedded struct
implemented in scanner.go.

The lexer now concerns itself only with the language being
recognized, and the scanner fully encapsulates the scan state,
while remaining fully ignorant of the syntax being lexed.

This extraction and refactoring was done because the new syntax
required improved lookahead, backup, and item capturing
functionality, so it seemed prudent to extract those new methods
and other pointer arithmetic from lex.go to a file of their own.

Once this was done, the scanner was further refactored to improve
performance on the package's parsing benchmark. (To eliminate the
slight performance decrease caused by the new syntax, and to add
what further speed improvements could be made without using more
memory during parsing or changing the parse package's API.)

The net result is a roughly 8% improvement in the lexing benchmark
(BenchmarkParseLarge) over the previous version. In addition, the
extracted scannner type now has its own, additional unit tests,
distinct from the lexer's acceptance tests.
@robpike
Copy link
Contributor

robpike commented May 19, 2020

I am warming to this idea, but find it hard to believe the lexer needs a rewrite. I admit it's not a fast lexer, but it is interesting code with a pedagogical pedigree, being derived from the lexer in the talk Lexical Scanning in Go, https://www.youtube.com/watch?v=HxaD_trXwRE. It's important to me to keep the correspondence between the talk and this package, which is mentioned in the talk.

The original motivation for single-line actions was by analogy with single-line strings: it is a common mistake to leave out the closing token. But for template actions, the opening and closing tokens are distinct, and it's perhaps less likely that a missing close could lead to profound misparsing. A little work on the error handling, such as complaining when an open token is seen while looking for a closing one, would probably cover the gap well.

So I disagree on two fronts with your between-the-lines reading, and am considering making this change in the next cycle. I will assign it to myself and discover whether the lexer is truly the issue here.

But I am also marking it as a proposal, because although I believe the code change is modest, it could have significant downstream effects.

@robpike robpike self-assigned this May 19, 2020
@robpike robpike changed the title text/template: allow newlines within actions and pipelines proposal: text/template: allow newlines within actions and pipelines May 19, 2020
@gopherbot gopherbot removed the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label May 19, 2020
@pjeby
Copy link

pjeby commented May 19, 2020

The lexer actually changed very little, outside next/peek/accept* methods, and the few states that needed to directly change for the feature.

The refactoring was taking the parts of the lexer that dealt with scan state, pulling them to a separate scanner.go file, and then banishing all direct manipulation of .input, .pos, etc. from lex.go so that changing how the scanner part worked wouldn't require changes in lex.go. That was really only done because when I tried to add whitespace agnosticism and comment handling in the existing lexer, I kept bumping against lookahead issues with e.g. peek() making it impossible to backup().

So one of the changes was making it so backup() required a width saved by the caller of next(), rather than saving it in the scan state, so that the lexer can back up as much as is necessary. This was needed because when processing e.g. the left-hand trim marker, it's not sufficient to match a prefix of "- "; you have to go forward and match a character class, then back up again if it's not a match. In order to do that without adding more magic-number pointer arithmetic to the lexer, I made it so the lexer could back up in an encapsulated way.

In essence, the only slides from the talk my changes would've affected would be the ones on next/peek/backup/accept, and there only in the fine details. But I made those changes because I assumed that adding more .input/.pos hacking to the lexer would get my patch rejected as ugly. And then I made some changes to speed it up because doing it the not-ugly way took some performance off. 😉

But I don't think anything I did would be considered a rewrite of the lexer or invalidate the video. It's more of a devil-in-the-details thing, that I'm sure you'll find your own ways around. I expect that if I had written the lexer myself, I might have been more confident doing straight-up scan state manipulation instead of improving the scanning abstraction.

There are a lot of places where the lexer currently does lots of math to figure out how to advance, and if it was in a place that needed to change because of the syntax change (such as the delimiter states), it was easier (for me at least) to convert the math to higher-level abstractions relating to what was being matched or captured, then change that, rather than try to figure out different math, given that the assumptions on which the math was based were changing. (E.g. the left and right delimiter states and utility functions all being built with the assumption that trim delimiters are fixed strings with a space character, rather than fixed strings plus "any whitespace character").

Anyway, you'll see all that if/when you get in there, I'm sure. I assume I should not bother submitting the PR?

(tl;dr: lexer state machine is fine, doesn't need to change much outside affected states, but new syntax needs more forward/back movement, requiring either more pointer arithmetic or better movement abstractions, so I picked better movement abstractions, but other choices could certainly be made.)

@bep
Copy link
Contributor Author

bep commented May 19, 2020

It's important to me to keep the correspondence between the talk and this package, which is mentioned in the talk.

@robpike I have watched that talk several times myself and learned a lot from it. But this GitHub issue comes from real pain from people using Go templates to make real applications. We cannot freeze development of a package because it's used as an example implementation in a talk on YouTube. I used the pronoun we to underline that this an open source project.

@pjeby
Copy link

pjeby commented May 19, 2020

We cannot freeze development of a package

Whoa, nobody's saying that. As I said, someone more familiar with the lexer could probably work around the lack of lookahead/lookback by doing more pointer arithmetic of the variety that's already there in other lexer state functions. I just didn't do it myself because I expected a patch adding lots of funky pointer arithmetic would be rejected for being sketchy and fragile-looking, and adding to code smell instead of relieving it. (Not to mention actually being fragile.)

Rob's already assigned himself the issue and I'm sure will have his own way of implementing the feature, so I hardly think that qualifies as freezing development.

@rsc rsc added this to Incoming in Proposals (old) Jun 10, 2020
@robmccoll
Copy link

robmccoll commented Jun 26, 2020

This would be great to have! I think I must be missing something, but from what I can tell, the code change doesn't have to be huge. Mostly splitting this case into two. Probably combine that with the isSpace() case and treat [ \t\n\r] as white space in lexSpace. Add alternatives for -[ \t\n\r] to trim white space while still handling numbers {{-3}} the same way.

Actually, here's a small patch that works without rewriting the lexer (doesn't do anything for comments, but that would be a simple extra case for lexInsideAction). Not sure if it would be considered "ugly"...

--- a/src/text/template/parse/lex.go
+++ b/src/text/template/parse/lex.go
@@ -97,9 +97,11 @@ const eof = -1
 // space does the job.
 const (
        spaceChars      = " \t\r\n" // These are the space characters defined by Go itself.
-       leftTrimMarker  = "- "      // Attached to left delimiter, trims trailing spaces from preceding text.
-       rightTrimMarker = " -"      // Attached to right delimiter, trims leading spaces from following text.
-       trimMarkerLen   = Pos(len(leftTrimMarker))
+       trimMarkerLen   = Pos(len("- "))
+)
+var (
+       leftTrimMarkers = []string{"- ", "-\n", "-\r", "-\t"} // Alternate left trim markers to account for accepted whitespace
+       rightTrimMarkers = []string{" -", "\n-", "\r-", "\t-"} // Alternate right trim markers to account for accepted whitespace
 )

 // stateFn represents the state of the scanner as a function that returns the next state.
@@ -111,7 +113,7 @@ type lexer struct {
        input           string    // the string being scanned
        leftDelim       string    // start of action
        rightDelim      string    // end of action
-       trimRightDelim string    // end of action with trim marker
+       trimRightDelims []string  // end of action with trim marker
        pos             Pos       // current position in the input
        start           Pos       // start position of this item
        width           Pos       // width of last rune read from input
@@ -215,11 +217,14 @@ func lex(name, input, left, right string) *lexer {
                input:           input,
                leftDelim:       left,
                rightDelim:      right,
-               trimRightDelim: rightTrimMarker + right,
+               trimRightDelims: make([]string, len(rightTrimMarkers)),
                items:           make(chan item),
                line:            1,
                startLine:       1,
        }
+       for i, marker := range rightTrimMarkers {
+               l.trimRightDelims[i] =  marker + right
+       }
        go l.run()
        return l
 }
@@ -248,7 +253,7 @@ func lexText(l *lexer) stateFn {
                ldn := Pos(len(l.leftDelim))
                l.pos += Pos(x)
                trimLength := Pos(0)
-               if strings.HasPrefix(l.input[l.pos+ldn:], leftTrimMarker) {
+               if shouldTrimSpace(l.input[l.pos+ldn:], leftTrimMarkers) {
                        trimLength = rightTrimLength(l.input[l.start:l.pos])
                }
                l.pos -= trimLength
@@ -277,7 +282,7 @@ func rightTrimLength(s string) Pos {

 // atRightDelim reports whether the lexer is at a right delimiter, possibly preceded by a trim marker.
 func (l *lexer) atRightDelim() (delim, trimSpaces bool) {
-       if strings.HasPrefix(l.input[l.pos:], l.trimRightDelim) { // With trim marker.
+       if shouldTrimSpace(l.input[l.pos:], l.trimRightDelims) { // With trim marker.
                return true, true
        }
        if strings.HasPrefix(l.input[l.pos:], l.rightDelim) { // Without trim marker.
@@ -291,10 +296,19 @@ func leftTrimLength(s string) Pos {
        return Pos(len(s) - len(strings.TrimLeft(s, spaceChars)))
 }

+func shouldTrimSpace(str string, trimMarkers []string) bool {
+       for _, marker := range trimMarkers {
+               if strings.HasPrefix(str, marker) {
+                       return true
+               }
+       }
+       return false
+}
+
 // lexLeftDelim scans the left delimiter, which is known to be present, possibly with a trim marker.
 func lexLeftDelim(l *lexer) stateFn {
        l.pos += Pos(len(l.leftDelim))
-       trimSpace := strings.HasPrefix(l.input[l.pos:], leftTrimMarker)
+       trimSpace := shouldTrimSpace(l.input[l.pos:], leftTrimMarkers)
        afterMarker := Pos(0)
        if trimSpace {
                afterMarker = trimMarkerLen
@@ -336,7 +350,7 @@ func lexComment(l *lexer) stateFn {

 // lexRightDelim scans the right delimiter, which is known to be present, possibly with a trim marker.
 func lexRightDelim(l *lexer) stateFn {
-       trimSpace := strings.HasPrefix(l.input[l.pos:], rightTrimMarker)
+       trimSpace := shouldTrimSpace(l.input[l.pos:], rightTrimMarkers)
        if trimSpace {
                l.pos += trimMarkerLen
                l.ignore()
@@ -363,10 +377,10 @@ func lexInsideAction(l *lexer) stateFn {
                return l.errorf("unclosed left paren")
        }
        switch r := l.next(); {
-       case r == eof || isEndOfLine(r):
+       case r == eof:
                return l.errorf("unclosed action")
-       case isSpace(r):
-               l.backup() // Put space back in case we have " -}}".
+       case isSpace(r) || isEndOfLine(r):
+               l.backup() // Put whitespace back in case we have " -}}".
                return lexSpace
        case r == '=':
                l.emit(itemAssign)
@@ -425,7 +439,7 @@ func lexSpace(l *lexer) stateFn {
        var numSpaces int
        for {
                r = l.peek()
-               if !isSpace(r) {
+               if !isSpace(r) && !isEndOfLine(r) {
                        break
                }
                l.next()
@@ -433,7 +447,7 @@ func lexSpace(l *lexer) stateFn {
        }
        // Be careful about a trim-marked closing delimiter, which has a minus
        // after a space. We know there is a space, so check for the '-' that might follow.
-       if strings.HasPrefix(l.input[l.pos-1:], l.trimRightDelim) {
+       if shouldTrimSpace(l.input[l.pos-1:], l.trimRightDelims) {
                l.backup() // Before the space.
                if numSpaces == 1 {
                        return lexRightDelim // On the delim, so go right to that.

Edit: I don't love the string slices for the trim markers. Maybe shouldTrimSpace() could iterate the runes of a marker string and if the rune is space call accept() with the spaceChars otherwise just check that the next rune is equal? It would cut down on the amount of change overall.

@pjeby
Copy link

pjeby commented Jun 26, 2020

When I took a roughly similar approach, the parser benchmark performance dropped significantly, because the more complex closing trim-delimiter check runs in what's effectively the hottest part of the lexer (lexInsideAction) and can't be inlined due to size of the functions involved. It was this performance regression that got me started looking for possible refactorings. (Well, that and initially trying to make the lexer elide empty actions, rather than just filtering them out in the parser.)

I haven't tested your exact code, of course, but I'd suggest running the benchmark with and without it. (Comment processing won't likely affect the benchmark because it doesn't include any, and it wouldn't be in the hottest part of lexInsideAction in any event.)

@robmccoll
Copy link

For me, the difference seems pretty slight, but I was seeing variance between runs for both. Maybe I'm not handling something correctly that your approach covered?

Original:

$go test -benchtime 5s -bench .
goos: linux
goarch: amd64
pkg: text/template/parse
BenchmarkParseLarge-3                199          34254968 ns/op
BenchmarkVariableString-3       37936720               160 ns/op              72 B/op          3 allocs/op
BenchmarkListString-3            1458942              4366 ns/op            1608 B/op         31 allocs/op
PASS
ok      text/template/parse     26.613s
$go test -benchtime 30s -bench .
goos: linux
goarch: amd64
pkg: text/template/parse
BenchmarkParseLarge-3               1142          35654330 ns/op
BenchmarkVariableString-3       225743618              160 ns/op              72 B/op          3 allocs/op
BenchmarkListString-3            8640402              4162 ns/op            1608 B/op         31 allocs/op
PASS
ok      text/template/parse     136.237s

With the changes above (and removing the test case for "{{\n}}"):

$go test -benchtime 5s -bench .
goos: linux
goarch: amd64
pkg: text/template/parse
BenchmarkParseLarge-3                163          36138100 ns/op
BenchmarkVariableString-3       35149789               158 ns/op              72 B/op          3 allocs/op
BenchmarkListString-3            1452826              4144 ns/op            1608 B/op         31 allocs/op
PASS
ok      text/template/parse     25.548s
$go test -benchtime 30s -bench .
goos: linux
goarch: amd64
pkg: text/template/parse
BenchmarkParseLarge-3               1204          35624717 ns/op
BenchmarkVariableString-3       227129716              158 ns/op              72 B/op          3 allocs/op
BenchmarkListString-3            8561077              4215 ns/op            1608 B/op         31 allocs/op
PASS
ok      text/template/parse     138.243s

@pjeby
Copy link

pjeby commented Jun 27, 2020

Interesting. I will note that when I ran the benchmark, I usually had to run it three or four times for the time to settle out to stable timing, but I was using the default benchtime. I am surprised that your version has such a small drop in worst-case scenarios, since it does four times as many prefix checks as the master branch per action token. But I suppose it could also be a hardware difference from the machine I was doing testing on, as my best speed for the master branch was 53.9 ms/op, which is way slower than your machine's ~34.2 ms/op despite being the exact same code. (Same base arch, too: linux + amd64.)

In any case, this is all moot until/unless the language change is approved. (Also, as I said above, it's more than possible that somebody could come up with a better way to do it than me, I'm just still surprised that the perf drop is so small in your version since it's doing so much more work.)

@gopherbot
Copy link

Change https://golang.org/cl/254257 mentions this issue: text/template: allow newlines inside action delimiters

@dmitshur dmitshur modified the milestones: Backlog, Go1.16 Dec 1, 2020
@dmitshur
Copy link
Contributor

dmitshur commented Dec 1, 2020

@rsc If I understand correctly, this proposal was accepted, the issue state and milestone just weren't updated accordingly. I'll fix that now.

If I misunderstood, then CL 254257 should be reverted before Go 1.16 is released.

@dmitshur dmitshur changed the title proposal: text/template: allow newlines within actions and pipelines text/template: allow newlines within actions and pipelines Dec 1, 2020
@dmitshur dmitshur moved this from Incoming to Accepted in Proposals (old) Dec 1, 2020
@golang golang locked and limited conversation to collaborators Dec 1, 2021
@rsc rsc unassigned robpike Jun 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests