Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text/template: confusing overlay template behavior when it starts with a BOM #28482

Open
bep opened this issue Oct 30, 2018 · 14 comments
Open

text/template: confusing overlay template behavior when it starts with a BOM #28482

bep opened this issue Oct 30, 2018 · 14 comments
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@bep
Copy link
Contributor

bep commented Oct 30, 2018

The program below prints Master: overlay once, the second prints nothing. And no error. The program is a variation of this example: https://golang.org/pkg/html/template/#example_Template_block -- which is how we do template "inheritance" in Hugo. And judging by the issue reports in this area, adding a BOM marker seems to be a fairly common practice among text editors. I can work around this on my end, but this behavior is very surprising.

https://play.golang.org/p/MQRb5HUk6eH

package main

import (
	"fmt"
	"log"
	"os"
	"text/template"
)

func main() {
	for _, prefix := range []string{"", "\ufeff"} {
		var (
			master  = `Master: {{block "list" .}}block{{end}}`
			overlay = prefix + `{{define "list"}}overlay{{end}}`
		)

		masterTmpl, err := template.New("master").Parse(master)
		if err != nil {
			log.Fatal(err)
		}
		overlayTmpl, err := template.Must(masterTmpl.Clone()).Parse(overlay)
		if err != nil {
			log.Fatal(err)
		}

		if err := overlayTmpl.Execute(os.Stdout, ""); err != nil {
			log.Fatal(err)
		}

		fmt.Println()
	}
}

gohugoio/hugo#4895

@mvdan
Copy link
Member

mvdan commented Oct 30, 2018

It seems like any non-empty prefix makes that prefix be printed, instead of using the master template with the overlay. You're just unlucky that the BOM is non-printable, so it seems like a no-op. Is that behavior wrong? I'm not used at all to blocks and overlays.

If the behavior is wrong, what would your suggestion be - skip BOM characters at the beginning of the input when parsing templates?

@mvdan
Copy link
Member

mvdan commented Oct 30, 2018

As a useful point of reference, reading Go source files does skip the first character if it's a BOM. This happens in both go/scanner and cmd/compile. So it would probably follow that the same should happen here.

@gopherbot
Copy link

Change https://golang.org/cl/145837 mentions this issue: text/template/parse: skip leading BOM character

@bep
Copy link
Contributor Author

bep commented Oct 30, 2018

If the behavior is wrong, what would your suggestion be - skip BOM characters at the beginning of the input when parsing templates?

See https://play.golang.org/p/237zLtI9WpR

I have adjusted my example to be more clear.

I understand that you have to make a choice between the "define and the rest", so the "foo" case I understand.

But I would expect the \n and BOM case to behave the same.

@ianlancetaylor
Copy link
Contributor

If your input data contains a BOM character, and you don't want it in the output, you should remove it before passing the contents to text/template. text/template should not be in the business of doing anything other than passing along the non-interpreted characters that it sees.

Go source files are a different matter. There is no program layer interposing between the Go source file and the compiler, so there is nothing to remove a BOM. Valid Go source code can not contain a BOM by definition, but there is no similar restriction on text passed to text/template. Skipping the BOM is documented in the Go spec as an implementation restriction (https://golang.org/ref/spec#Source_code_representation) and there is no such documentation for text/template.

@mvdan
Copy link
Member

mvdan commented Oct 30, 2018

Ian is right - I had not realised that the Go spec does document the behavior of BOMs, so that's not a precedent for text/template.

I'm unclear if there's anything to do here, then. I don't think ParseFiles is a good place to skip BOMs, as more often than not complex programs use Parse directly. It would be confusing if ParseFiles and Parse behaved differently in the presence of BOMs.

@mvdan
Copy link
Member

mvdan commented Oct 30, 2018

But I would expect the \n and BOM case to behave the same.

That is an interesting point. I presume that the distinction here is whitespace - will dig into the code later.

@bep
Copy link
Contributor Author

bep commented Oct 30, 2018

This issue passes my "is this a bug?" bar with flying colors, and you should fix it.

But I have made my point clear here. If my last playground example does not convince you, nothing will.

@mvdan
Copy link
Member

mvdan commented Oct 30, 2018

Note that Ian is objecting to a change in BOM handling in the lexer. That is not the only option we have - for example, I'd like to dig into the the last playground link you pasted, because I'm not sure what decides whether "the rest of the template text" is ignored or not.

In other words, bear with us while we figure out exactly what a fix would be like, if there is one.

@mvdan mvdan changed the title text/template: Fails silently when template starts with BOM text/template: confusing overlay template behavior when it starts with a BOM Oct 30, 2018
@mvdan mvdan added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Oct 30, 2018
@mvdan
Copy link
Member

mvdan commented Oct 30, 2018

I've found the source of the whitespace behavior:

Templates can be redefined in successive calls to Parse. A template definition with a body containing only white space and comments is considered empty and will not replace an existing template's body.

You can see the implementation at https://golang.org/pkg/text/template/parse/#IsEmptyTree. So, as far as I can see, the current implementation behaves exactly as documented, even if that's a bit confusing to users.

The simplest change that comes to mind would be for subsequent Parse calls to also include a leading BOM, or even any non-printable characters anywhere, when deciding whether the new body should not replace the existing body.

However, I imagine that would be a backwards incompatible change. That depends on whether text/template is designed to work with non-printable characters. If it is, then I presume we can't make this change as we could break existing templates.

@bep
Copy link
Contributor Author

bep commented Oct 30, 2018

That depends on whether text/template is designed to work with non-printable characters.

I wouldn't expect it to, but that is just a hunch ... That said, I would expect text/template to be designed to work out of the box with most common text editors. And some editors insert "U+FEFF BYTE ORDER MARK (BOM), whose appearance as a magic number at the start of a text stream" (Wikipedia)". So if text/template trims any leading (first rune) \ufeff in Parse, I cannot in my wildest dreams see anything but good.

But note that since I now know about it, this isn't something that I really need to have fixed, but I would not be surprised if others also hit this, so fixing it in the source would be a good thing, overall.

@mvdan
Copy link
Member

mvdan commented Oct 30, 2018

Thanks - I agree that in practice, this would likely be a good thing to do. But since text/template doesn't explicitly say that it is only about templating printable text, I'm not convinced we can pull the trigger on a version of that fix, even if it's just in ParseFiles.

/cc @robpike @rogpeppe for a decision

@mvdan mvdan added NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Oct 30, 2018
@ianlancetaylor
Copy link
Contributor

There are all kinds of packages that read text files. We are not going to go through all of them and change each one individually to discard a leading BOM character. That just doesn't make sense.

@mvdan
Copy link
Member

mvdan commented Nov 14, 2018

A perhaps less invasive change would be for https://golang.org/pkg/text/template/parse/#IsEmptyTree to also return true if a tree has a leading BOM character.

I still don't know what the best approach would be, but I'd hope there's something we can do in text/template, given that this is a common issue for users.

@seankhliao seankhliao added this to the Unplanned milestone Aug 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

5 participants