New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
text/template: allow newlines within actions and pipelines #29770
Comments
This seems reasonable to me. I'm a bit confused by the wording of the title, however. I assume what you want is for newlines to be allowed whithin pipelines - I'm not sure I understand how delimiters play a part here. I also wonder if there's any disadvantage in allowing newlines in those cases. I can't think of any right now, but then again I haven't written complex pipelines in Go templates. |
I'm not sure how "what I want" translates to "Go template terms", and I don't remember the difference between commands and pipe. But currently the items between |
Thanks, I understand what you meant now. So I think what you're suggesting is to allow newlines within all actions, which includes pipelines as well as |
strong +1 for this. I can't think of any reason this would cause problems, and I've often wanted to have a template node span more than one line. As for the template formatter..... I tried and eventually gave up. The stdlib template parser is extremely lossy, so in order to use it for formatting, I was having to dive deep into modifying the parser to be lossless, and that turned into too heavy a lift for the time I had. |
@mvdan yes, that is correct. So, my primary "pain case" would be the example below. But I suspect that the changes in the template lexer/parser, and its documentation, will not be simpler if you try to somehow narrow the scope.
|
Mind you, if this simplifies things, this could be limited to certain user defined actions: in the line of
|
So, I put together a preliminary patch to implement this, but quickly ran into a limitation of the feature as currently specified. No sooner did I have the ability to create multiline actions, than I tried writing code like this:
Looks pretty good, right? Only, it doesn't parse, because just enabling multi-line actions doesn't do anything about needing literal ASCII spaces on trim markers, or the part where comments have to be separate actions from non-comment actions. So I think the scope of the feature needs to be expanded to include allowing trim markers to be denoted using any kind of whitespace (such as a linefeed), and supporting inline comments. The former avoids the need to place invisible trailing spaces in an action formatted like the one above, while the latter is rather self-explanatory. Reading between the lines of the current So I threw out my previous attempt at a patch and started over, refactoring the lexer to separate the lexing state machine from the scanning state (i.e. character matching, line tracking, etc.). To make a long story short, I was able to implement multiline actions, "any whitespace is good" trim markers, and inline comments, while actually significantly speeding up the lexer (at least according to the parsing benchmark) and expanding the test suite to cover scanning state independent of lexing state. So I can see why this wasn't done before; it was a lot of work, after all, and the limitations weren't really an issue for the original use cases that Anyway, I've written an implementation already, just about ready to submit a changeset for (docs, tests, and all), but I thought I'd check and see if anybody sees any other problems with this as an approach to resolving this feature request. (In case, for example, there was some other reason besides lexer limitations for not allowing this previously!) |
Fixes golang#29770 To facilitate use of templates as a scripting or extension mechanism (e.g. in Hugo, dockerize, etc.) where complex logic or expressions are required, this change allows the use of freeform whitespace and comments within template actions. Trim markers still require whitespace, but now any of the standard four whitespace characters are allowed, including newlines. Rather than having a dedicated "comment action" syntax, an action that is empty of anything but whitespace or comments is simply elided from the token stream received by the parser. (While still processing trim markers.) To ensure correctness and improve maintainability, all lexer code dealing directly with the scanning state (e.g. pos, input, etc.) was extracted along with the state itself to an embedded struct implemented in scanner.go. The lexer now concerns itself only with the language being recognized, while the scanner fully encapsulates the scan state, and is ignorant of the language details. (These operations were all needed for the new functionality, so it seemed prudent to extract them as inline-able methods rather than simply duplicate the manual pointer arithmetic in new code. Once this was done, the scanner was refactored to eliminate performance regression on the package benchmark, and a slight improvement beyond that.) As a further side benefit, the extracted scannner struct now has its own, additional unit tests, distinct from the lexer's acceptance tests.
Fixes golang#29770 To facilitate use of templates as a scripting or extension mechanism (e.g. in Hugo, dockerize, etc.) where complex logic or expressions are required, this change allows the use of freeform whitespace and comments within template actions. Trim markers still require whitespace, but now any of the standard four whitespace characters are allowed, including newlines. Rather than having a dedicated "comment action" syntax an action that is empty of anything but whitespace or comments is simply elided from the token stream received by the parser. (While still processing trim markers.) To ensure correctness and improve maintainability, all lexer code dealing directly with the scanning state (e.g. pos, input, etc.) was extracted along with the state itself to an embedded struct implemented in scanner.go. The lexer now concerns itself only with the language being recognized, and the scanner fully encapsulates the scan state, while remaining fully ignorant of the syntax being lexed. This extraction and refactoring was done because the new syntax required improved lookahead, backup, and item capturing functionality, so it seemed prudent to extract those new methods and other pointer arithmetic from lex.go to a file of their own. Once this was done, the scanner was further refactored to improve performance on the package's parsing benchmark. (To eliminate the slight performance decrease caused by the new syntax, and to add what further speed improvements could be made without using more memory during parsing or changing the parse package's API.) The net result is a roughly 8% improvement in the lexing benchmark (BenchmarkParseLarge) over the previous version. In addition, the extracted scannner type now has its own, additional unit tests, distinct from the lexer's acceptance tests.
I am warming to this idea, but find it hard to believe the lexer needs a rewrite. I admit it's not a fast lexer, but it is interesting code with a pedagogical pedigree, being derived from the lexer in the talk Lexical Scanning in Go, https://www.youtube.com/watch?v=HxaD_trXwRE. It's important to me to keep the correspondence between the talk and this package, which is mentioned in the talk. The original motivation for single-line actions was by analogy with single-line strings: it is a common mistake to leave out the closing token. But for template actions, the opening and closing tokens are distinct, and it's perhaps less likely that a missing close could lead to profound misparsing. A little work on the error handling, such as complaining when an open token is seen while looking for a closing one, would probably cover the gap well. So I disagree on two fronts with your between-the-lines reading, and am considering making this change in the next cycle. I will assign it to myself and discover whether the lexer is truly the issue here. But I am also marking it as a proposal, because although I believe the code change is modest, it could have significant downstream effects. |
The lexer actually changed very little, outside next/peek/accept* methods, and the few states that needed to directly change for the feature. The refactoring was taking the parts of the lexer that dealt with scan state, pulling them to a separate scanner.go file, and then banishing all direct manipulation of .input, .pos, etc. from lex.go so that changing how the scanner part worked wouldn't require changes in lex.go. That was really only done because when I tried to add whitespace agnosticism and comment handling in the existing lexer, I kept bumping against lookahead issues with e.g. peek() making it impossible to backup(). So one of the changes was making it so backup() required a width saved by the caller of next(), rather than saving it in the scan state, so that the lexer can back up as much as is necessary. This was needed because when processing e.g. the left-hand trim marker, it's not sufficient to match a prefix of In essence, the only slides from the talk my changes would've affected would be the ones on next/peek/backup/accept, and there only in the fine details. But I made those changes because I assumed that adding more .input/.pos hacking to the lexer would get my patch rejected as ugly. And then I made some changes to speed it up because doing it the not-ugly way took some performance off. 😉 But I don't think anything I did would be considered a rewrite of the lexer or invalidate the video. It's more of a devil-in-the-details thing, that I'm sure you'll find your own ways around. I expect that if I had written the lexer myself, I might have been more confident doing straight-up scan state manipulation instead of improving the scanning abstraction. There are a lot of places where the lexer currently does lots of math to figure out how to advance, and if it was in a place that needed to change because of the syntax change (such as the delimiter states), it was easier (for me at least) to convert the math to higher-level abstractions relating to what was being matched or captured, then change that, rather than try to figure out different math, given that the assumptions on which the math was based were changing. (E.g. the left and right delimiter states and utility functions all being built with the assumption that trim delimiters are fixed strings with a space character, rather than fixed strings plus "any whitespace character"). Anyway, you'll see all that if/when you get in there, I'm sure. I assume I should not bother submitting the PR? (tl;dr: lexer state machine is fine, doesn't need to change much outside affected states, but new syntax needs more forward/back movement, requiring either more pointer arithmetic or better movement abstractions, so I picked better movement abstractions, but other choices could certainly be made.) |
@robpike I have watched that talk several times myself and learned a lot from it. But this GitHub issue comes from real pain from people using Go templates to make real applications. We cannot freeze development of a package because it's used as an example implementation in a talk on YouTube. I used the pronoun we to underline that this an open source project. |
Whoa, nobody's saying that. As I said, someone more familiar with the lexer could probably work around the lack of lookahead/lookback by doing more pointer arithmetic of the variety that's already there in other lexer state functions. I just didn't do it myself because I expected a patch adding lots of funky pointer arithmetic would be rejected for being sketchy and fragile-looking, and adding to code smell instead of relieving it. (Not to mention actually being fragile.) Rob's already assigned himself the issue and I'm sure will have his own way of implementing the feature, so I hardly think that qualifies as freezing development. |
This would be great to have! I think I must be missing something, but from what I can tell, the code change doesn't have to be huge. Mostly splitting this case into two. Probably combine that with the isSpace() case and treat Actually, here's a small patch that works without rewriting the lexer (doesn't do anything for comments, but that would be a simple extra case for lexInsideAction). Not sure if it would be considered "ugly"...
Edit: I don't love the string slices for the trim markers. Maybe |
When I took a roughly similar approach, the parser benchmark performance dropped significantly, because the more complex closing trim-delimiter check runs in what's effectively the hottest part of the lexer (lexInsideAction) and can't be inlined due to size of the functions involved. It was this performance regression that got me started looking for possible refactorings. (Well, that and initially trying to make the lexer elide empty actions, rather than just filtering them out in the parser.) I haven't tested your exact code, of course, but I'd suggest running the benchmark with and without it. (Comment processing won't likely affect the benchmark because it doesn't include any, and it wouldn't be in the hottest part of lexInsideAction in any event.) |
For me, the difference seems pretty slight, but I was seeing variance between runs for both. Maybe I'm not handling something correctly that your approach covered? Original:
With the changes above (and removing the test case for "{{\n}}"):
|
Interesting. I will note that when I ran the benchmark, I usually had to run it three or four times for the time to settle out to stable timing, but I was using the default benchtime. I am surprised that your version has such a small drop in worst-case scenarios, since it does four times as many prefix checks as the master branch per action token. But I suppose it could also be a hardware difference from the machine I was doing testing on, as my best speed for the master branch was 53.9 ms/op, which is way slower than your machine's ~34.2 ms/op despite being the exact same code. (Same base arch, too: linux + amd64.) In any case, this is all moot until/unless the language change is approved. (Also, as I said above, it's more than possible that somebody could come up with a better way to do it than me, I'm just still surprised that the perf drop is so small in your version since it's doing so much more work.) |
Change https://golang.org/cl/254257 mentions this issue: |
@rsc If I understand correctly, this proposal was accepted, the issue state and milestone just weren't updated accordingly. I'll fix that now. If I misunderstood, then CL 254257 should be reverted before Go 1.16 is released. |
This issue resurfaced in gohugoio/hugo#5604 -- but has been mentioned enough times to have proven its value.
In Hugo we have many variadic template functions. One relevant example would be the
dict
func that accepts a list of key/value pairs.A current example would look like this:
The above would obviously be easier to read if it could be written something like this:
The above creates a parser error, and is a common problem when you try to use one of the "HTML beautifiers" out there on Go templates.
As a person who have spent some thousand hours inside Go templates I would say that this issue is really, really worth it. And if you don't come from that background, I think it helps to be reminded that many Go applications needs a user interface, and Go templates is a natural choice. No one questions the value of
gofmt
and pretty Go code, not sure why the UI code should be treated differently. I know @natefinch tinkered with a "Go template formatter" some time ago. A solution to this partucular issue would make that task more enjoyable, me thinks./cc @regisphilibert
The text was updated successfully, but these errors were encountered: