Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: language: raw string literals that contain backquotes #32190

Closed
donatj opened this issue May 22, 2019 · 72 comments
Closed

proposal: Go 2: language: raw string literals that contain backquotes #32190

donatj opened this issue May 22, 2019 · 72 comments
Labels
FrozenDueToAge LanguageChange Proposal v2 A language change or incompatible library change
Milestone

Comments

@donatj
Copy link

donatj commented May 22, 2019

I would like to propose Go add support for a HEREDOC syntax to make adding literals of particular precarious strings easier.

A common syntax in many programming language is <<< (boundary) to open and a line containing just said boundary to close.

I would propose something along the lines of:

sql := <<< SQL
SELECT `foo` FROM `bar` WHERE `baz` = "qux"
SQL

My personal reasoning is for MySQL queries.

Myself and my company work with MySQL a great deal. Backticks are used to quote tables and fields in MySQL. Our queries will often contain both numerous quotes and backticks - particularly queries generated by tooling.

There is no way to escape a backtick in a backtick string in Go, so we end up either a using double quotes string and escaping all the quotes within or using backticks and breaking out of the string on backtick (ala `x` + "`" + `y`)

Currently we end up with something like

sql := "SELECT `foo` FROM `bar` WHERE `baz` = \"qux\""

or in cases with massively more quotes than backticks I'll do something like

sql := `SELECT foo FROM bar WHERE `+"`baz`"+` = "qux"`

These examples are toys obviously, but this become much more of an issue on large 30+ line report queries - and more importantly makes copying queries out of code and into a MySQL client a real pain.

@ianlancetaylor ianlancetaylor changed the title Feature request: heredoc proposal: Go 2: language: here documents May 23, 2019
@gopherbot gopherbot added this to the Proposal milestone May 23, 2019
@ianlancetaylor ianlancetaylor added v2 A language change or incompatible library change LanguageChange Proposal and removed Documentation Proposal labels May 23, 2019
@ianlancetaylor
Copy link
Contributor

One possible approach that doesn't require changing the language is something like

sql := strings.ReplaceAll(`string with doubled quotes where ""bar"" == ""quuz""`, `""`, "`")

@davecheney
Copy link
Contributor

This is not a compelling use case, one should not be constructing sql query strings by hand. dragonsvulnerabilities lurk there

@cespare
Copy link
Contributor

cespare commented May 23, 2019

@davecheney writing the query with bind parameters doesn't mean there aren't backticks, right?

SELECT `foo` FROM `bar` WHERE `baz` = ?

Or do you literally mean that developers shouldn't write SQL at all and should construct queries via ORMs or other tools? (In which case that sounds quite subjective.)

@davecheney
Copy link
Contributor

Number one of the top ten OWASP security vulnerabilities is SQL injection. I think the OPs case would be strengthened by choosing a different example.

@cespare
Copy link
Contributor

cespare commented May 23, 2019

@davecheney I don't believe that addresses my questions. Avoiding SQL injections doesn't imply that you never write SQL text in Go code. OWASP's own SQL injection cheat sheet has plenty of SQL text including column names.

@cespare
Copy link
Contributor

cespare commented May 23, 2019

This example is perhaps not ideal because foo, bar, and baz don't actually need any backtick quoting in MySQL as far as I know.

But I am sympathetic to the problem that the OP has text they would like to copy-paste into Go code and cannot simply do it without some kind of O(n) text transformation. I have run into this in the past when copy-pasting text into a Go program (I believe it was Go source code that included backticks). The copy-paste problem isn't helped much by @ianlancetaylor's workaround, either.

@beoran
Copy link

beoran commented May 23, 2019

Perhaps, as in some other languages, double backticks could become an escape for a backtick? This would likely be a backwards compatible approach that doesn't require a new string quoting type. Eg:

const query = `SELECT ``foo`` FROM ``bar`` WHERE ``baz`` = "qux"` 

@ki4ka
Copy link

ki4ka commented May 23, 2019

Maybe use double back quotes?

sql := `` SELECT `foo` FROM `bar` WHERE `baz` = "qux" ``

@donatj
Copy link
Author

donatj commented May 23, 2019

But I am sympathetic to the problem that the OP has text they would like to copy-paste into Go code and cannot simply do it without some kind of O(n) text transformation

This to me is the crux of the problem. I can’t paste from my tooling into Go, and I subsequently can’t paste from Go into my tooling. Suggested solutions that still involve some form of escaping don’t solve that.

Further I don’t think the sql injection argument is relevant here. My example is not injecting any user data into the string. There are plenty of reasons one would want literals in their queries.

@mvdan
Copy link
Member

mvdan commented May 23, 2019

As someone who has dealt with Shell syntax for years, please don't add heredoc strings to Go. They are a can of worms.

For example, a number of questions start popping up:

  • What happens to any tabs indenting the heredoc body? Are they included as part of the string?
  • What about other whitespace, like leading or trainling spaces?
  • Can leading or trailing whitespace be used in the line that finishes the heredoc?
  • What if the heredoc is never finished?
  • What is a valid heredoc delimiter? Any valid identifier? What if the identifier is already declared in the scope?

@donatj
Copy link
Author

donatj commented May 23, 2019

I don’t see any of those questions as actual problems perse with heredoc.

Some of them like the internal white space aren’t really in question at all. Some are up to the language designer as implementation details. None are problems.

  • What happens to any tabs indenting the heredoc body? Are they included as part of the string?
  • What about other whitespace, like leading or trainling spaces?

All white space within the boundaries is part of the string. That’s not a hard question, thats not in question. That’s the point of heredoc. It’s a literal “here be the document”, WYSIWYG.

  • Can leading or trailing whitespace be used in the line that finishes the heredoc?

Depends on the language rules, I’ve used languages that allow it, I’ve used languages that don’t. I’d personally vote no, I think allowing leading whitespace on the closing boundary just adds trouble.

func main() {
     sql := <<< SQL
Everything between the boundaries taken
  As literal
    String
        Content 
SQL
}
  • What if the heredoc is never finished?

Same as if you don’t close a quoted or backtick string? Syntax error. I don’t think that’s a legitimate question. Is there a language where this is not the case?

  • What is a valid heredoc delimiter? Any valid identifier? What if the identifier is already declared in the scope?

Another implementation detail. I’d vote any non whitespace containing series of runes. Maybe limit it to Unicode letter and number runes? It’d be nice for internationalization though to allow non ascii for sure,

It’s just a boundary for the string. It’s scope begins and ends with the string.

@agnivade
Copy link
Contributor

I am not suggesting heredoc syntax as the only solution. But sometimes the inability to escape a backtick in a raw string literal becomes very painful depending on the type of application one is writing.

As a general rule, I do not like to use concatenation to write string literals. So whenever I have a situation where a string has a mix of backticks and double-quotes, I have to jump some mental hoops in calculating whether the string has more double-quotes or does it have more backticks, and which escaping scheme do I use to make the code more readable.

There have been past proposals which all have been declined #24475, #23228, #18221. The suggestion throughout seems to be to use + to join the strings. But readability wise, I personally think, it takes more mental effort to read a + concatenated multi-line string, than a single \ escaped string (or any other non-concatenating mechanism).

@MOZGIII
Copy link

MOZGIII commented May 24, 2019

Better SQL example where quoting is actually required (on MySQL at least):

SELECT `group` FROM my_table;

@davecheney
Copy link
Contributor

davecheney commented May 24, 2019 via email

@MOZGIII
Copy link

MOZGIII commented May 24, 2019

I think it's good enough. There are a few practical cases where the lack of the ability to use both quotes and backticks in the strings is a real pain point. Working around it is currently possible, but ugly. The lack of a sound and complete way to specify string literals in the language is a real issue, ignoring it is a poor response. I don't think it requires justification, because to me it's obvious. Is it not to everyone else?

@MOZGIII
Copy link

MOZGIII commented May 24, 2019

It boils down to the fact that using the following for coding backtick really sucks:

`+"`"+`

This is what we have to use right now, and it derives naturally from the language rules.
However, it's too long to type and is extremely ugly.
Yes, it's possible to use string replace instead, but it hurts readability too, cause you still can't put actual backticks where they're supposed to be. This sucks, because not only the runtime has to do additional work to evaluate the actual value, but humans reading and writing the code too have to do additional processing in mind. This immediately skyrockets the difficulty of working with a particular code base that uses those tricks, compared to regular Go code.
There are other workarounds, but all of them have serious drawbacks if you think about them. That is why I think it's really a language flaw.

Now, the natural solution to this issue would be heredoc syntax. I don't see any argument on why you wouldn't want that to be in the language, except the "it's not justified" one. Is it hard to implement, does it violate some design constraints on the lexer? Why not just add it?

@mvdan
Copy link
Member

mvdan commented May 24, 2019

First of all, I think many of us can agree that having a nicer way to have multiline string literals without worrying about quotes would be nice. All I'm saying is that I don't think heredocs is a good solution.

Can we please have a civil discussion about it without resorting to "this sucks", "not a legitimate question", or dismissals like "to me it's obvious" and "why not just add it"?

That’s not a hard question, thats not in question. That’s the point of heredoc.

Well, that's not how heredocs work in shell. See https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/utilities/V3_chap02.html#tag_18_07_04, in particular <<-, which has support for stripping tab indentation.

It's fine if you just want the equivalent of <<, but then clarify that in your proposal. If you borrow the language from POSIX shell, I'm going to wonder if the equivalent of <<- is also supported.

I think allowing leading whitespace on the closing boundary just adds trouble.

Not allowing any leading indentation is a possible outcome, but it means that heredocs within indented code would look out of place. It's a tradeoff, and the proposal should be clear about what side it decides on.

Is there a language where this is not the case?

Yes, bash, which I presume is your point of reference.

Another implementation detail. [...] It’s just a boundary for the string. It’s scope begins and ends with the string.

Sorry, but I disagree. A proposal to make such a large change to the language spec should be very clearly defined. This includes what can be a valid delimiter token/word/identifier/etc.

@MOZGIII
Copy link

MOZGIII commented May 24, 2019

Please excuse me, bad day :)

@MOZGIII
Copy link

MOZGIII commented May 24, 2019

I'd prefer to borrow heredoc syntax from Ruby, it has very nice properties: https://ruby-doc.org/core-2.5.0/doc/syntax/literals_rdoc.html#label-Here+Documents

@mvdan
Copy link
Member

mvdan commented May 24, 2019

Also, here's an alternative from @rogpeppe almost ten years ago: https://groups.google.com/forum/#!msg/golang-nuts/IVyT2ovIljQ/KJggKkrYGCMJ

var render = Parsetemplate("| 
        |<html> 
        |<body> 
        |<h1 $header> 
        |$text $etc 
        |</body> 
        |</html> 
        ")

It wasn't implemented, and it hasn't been formally proposed since, but it has the nice properties that it supports indentation and nesting.

@donatj
Copy link
Author

donatj commented Jun 11, 2019

We are not going to adopt "here documents". That is not a Go-like syntax.

I don't personally feel

	foo := <<< title
This is my string
title

is any more out of place in the language than

title:
	for {
		break title
	}

That said, I relent.

@donatj
Copy link
Author

donatj commented Jun 11, 2019

Also re: the

    ```foo```

purposal, what if the string I want to encapsulate starts or ends with a backtick? Ending as such is likely common in SQL MySQL / MariaDB.

@ianlancetaylor
Copy link
Contributor

Fair point. (Although SQL doesn't seem like a good example since it also permits double quotes.)

@MOZGIII
Copy link

MOZGIII commented Jun 12, 2019

How about > 2 backticks instead of just odd number?

Four backticks are syntax error currently, so it's safe to introduce.

@deanveloper
Copy link

deanveloper commented Jun 12, 2019

@MOZGIII i don't like it personally, because allowing any even numbers of backticks introduces ambiguities:

    var str1 = ``+`` // Go 1 says this is "", but under Go 2 this might be "+"
    var str2 = ````+```` // Increase to 4 backticks, but still could be "" or "+"

So Go would have two options:

  1. Break compatibility and state that str1 would be "+"
  2. Continue saying that ``+`` is "", which would make even numbers of backticks unusable because they would just be parsed as empty strings

Example of what I mean by the second option:

    var str3 = ````
    this is my raw string
    ````

There would be a syntax error on line 2. This is because str3 would be set equal to an empty string, and then the compiler would see this is my raw string, which is not a valid line of Go code, and fail compilation.

@MOZGIII
Copy link

MOZGIII commented Jun 12, 2019

@deanveloper you used two backticks in the example, but I proposed more than two. Two backticks clearly does not fit for the reasons you gave. However, with 3, 4 and any greater number of backticks there are no ambiguities - in Go 1 all of them are syntax errors.

The edge case here would probably be the ability to encode an empty string with this notation. I'd just prohibit that altogether - it'd probably be easier for the lexer that way, and it doesn't seems like a big issue to me. At least I could live with that.

What I don't like is the collision with the Markdown notation for code decoration. It may complicate using Go code in Markdown code sections. Maybe I can live with that too - but for me personally it kind of matters more than the ability to represent an empty string...

What do you think? I'm ok even with the must-be-odd number of backticks proposal - it's better than nothing, and it has it's own advantages.

@deanveloper
Copy link

Sorry for all of the edits to my comment which probably looks confusing post-edits.

I'm personally still a fan of my initial idea with requiring an identifier immediately before/after the respective opening/closing backtick. I don't know how parsing/lexing/compiling/etc works however, so I'm not sure of the severity about how much it would complicate the compiler. But it definitely doesn't collide with Markdown code fences 😉

I'm not upset with the odd number of backticks rule however, it just seems "odd" to only allow odd numbers.

@MOZGIII
Copy link

MOZGIII commented Jun 12, 2019

Actually, there is another pretty serious downside with the backticks - and that's collision with the backticks inside the string itself:

"`test`" != ````test````

This is a serious problem for me, cause it suffers from the kind of similar issue to the one that regular ` have - and that is support for representing backticks inside the string itself.

This brings me back to how well though this is in ruby. It has enough string literal forms to cover every case I can think of.

@deanveloper
Copy link

deanveloper commented Jun 12, 2019

@MOZGIII Yeah, that was brought up by @donatj a few posts ago.

Also speaking of which, @ianlancetaylor, backticks and quotes are not synonymous in SQL. Backticks are used for quoting identifiers in order to make sure that you can select tables/columns named after keywords (or contain strange characters such as spaces and commas), while quotes are used for string literals.

For instance:

 -- valid (MariaDB)
SELECT * FROM `database`.`table`

 -- invalid (MariaDB)
SELECT * FROM "database"."table"

@MichaelTJones
Copy link
Contributor

It would be great to have "is identical to" (or another Unicode marker) as a way to ultra-backtick ASCII text.

sql := ≡SELECT foo FROM bar WHERE baz = "qux"≡

...feels categorically simpler than a gang of backticks or any other in-band signaling. That’s the core issue, that using an ASCII delimiter for arbitrary ASCII text always has exceptions by definition, and a surprisingly high incidence of them in cases like this SQL example (and Markup and ...) where you’re quoting something that is likely to already be quoting things. Recursive quoting tempts fate because “smart people just like us” had the same ideas for quoting their thing, so when Go wants to quote it, the probability of collisions is very high.

OTOH, if Go used “mango” at each end, and SQL used “rose”, then the space would be huge and collisions rare. It would look dumb, of course, but would not have the collisions of everyone using the same four quotation delimiters.

This is why I propose adding U+2261 IDENTICAL TO as an alternative to back tick in marking raw text.

@MOZGIII
Copy link

MOZGIII commented Jun 12, 2019

I don't like rarely-used single rune unicode sequences mainly for the following reasons:

  1. I don't want to copy-paste it every time, and I'm not used to type non-ascii unicode symbols; this may be odd, but I feel like the program text should only consist of symbols people are used to. So I'm pretty much against using any symbol that's non-acsii (if we can do it doesn't mean we should do it).
  2. Second reason, and probably a more compelling one, is that the solution with just swapping backticks with some other character will retain all fundamental flaws of the backticks: it will for now reduce the probability of collision with other languages, but not with the Go code itself, and it does not make the coding more flexible in principle. I'd rather have a solution that brings significantly more flexibility to the table, than one just works around the backtick collision.

@alanfo
Copy link

alanfo commented Jun 12, 2019

A further possibility which I don't think has been mentioned so far is to introduce a new single-character escape \` which would only be valid within raw string literals.

This would be analogous to the existing escapes \' (only valid in rune literals) and \" (only valid in 'ordinary' string literals).

The new escape wouldn't be ideal from a 'cut and paste' perspective as you'd need to go through and prepend each back-tick with a slash. However, this would be easier than having to split each back-tick out into an ordinary string literal and (at least to my eye) would stand out more than simply doubling each back-tick as well as being a rarer combination of symbols.

Compared to solutions which involve using an odd number of back-ticks as a delimiter, it also has the advantage that leading and trailing back-ticks are easier to read.

@mibk
Copy link
Contributor

mibk commented Jun 12, 2019

@alanfo Then you would have to escape \ as well, which is not backwards compatible.

@mvdan
Copy link
Member

mvdan commented Jun 12, 2019

Perhaps it would be best to continue the discussion elsewhere, as it's drifting further apart from the original heredoc proposal. It's always simple to file another, separate proposal once another idea has fully formed.

@alanfo
Copy link

alanfo commented Jun 12, 2019

@mibk I don't follow why you would have to escape \ as well. Unless it was followed by a back-tick, a slash would be treated literally as it is now.

Also it would be backwards compatible as, at present, a raw string literal can't include a back-tick at all.

@mibk
Copy link
Contributor

mibk commented Jun 12, 2019

@alanfo Consider this example:

`\`

Is it an unterminated raw string with an escaped backslash, or a raw string containing a single backslash?

@alanfo
Copy link

alanfo commented Jun 12, 2019

It's a raw string containing a single backslash as it is now.

I'll admit it's an awkward case for the parser to deal with but `\`` would be fine (a raw string containing a single back-tick) whereas ```` might be problematic.

@alanfo
Copy link

alanfo commented Jun 12, 2019

@mvdan
Given that @ianlancetaylor said we're not going to adopt "here documents" but didn't close the issue and indeed came up with a suggestion on how else to deal with the same problem, I don't see why we shouldn't continue the discussion here. Otherwise the same points will have to be made all over again and there doesn't seem to be a consensus on an alternative proposal in any case.

@cespare
Copy link
Contributor

cespare commented Jun 12, 2019

@mvdan what @alanfo said, and also note that @ianlancetaylor retitled the issue to be more generic.

Of the ideas listed here, @deanveloper's original idea (now unfortunately hidden in the fold) seems by far the best to me.

All of @MichaelTJones's Unicode suggestions don't really work for quoting Go code itself, and are awkward for many of us to type.

The "more backticks" ideas discussed by @ianlancetaylor and others has the problem that it does not work for text that begins or ends with a backtick.

@deanveloper's idea doesn't have these issues. Really, the only one I see is the one pointed out by @jimmyfrasche: it adds a certain complexity to lexing that's different from anything in the language today. But I think that might be fundamental to any syntax which allows quoting arbitrary text.

@deanveloper
Copy link

I personally think that these syntaxes are quite a bit different from the original proposal which was asking for a feature that other languages implement, while the current discussion is simply about improving raw strings rather than implementing HEREDOC. I'll start a new proposal, which will include a lot of the discussion from this post.

@ianlancetaylor
Copy link
Contributor

@deanveloper It seems to me that most of these suggestions are things that other languages implement, or similar to them. Most current languages have some form of raw string literal these days.

My only concern with

delimiter`raw string with ` characters`delimiter

is that it doesn't lead with the fact that it is a string. C++ (R"delim( string )delim")) and Rust (r#" string "#) and Swift (#" string "#) are more clear as to when a string is starting.

@deanveloper
Copy link

deanveloper commented Jun 13, 2019

My only concern with is that it doesn't lead with the fact that it is a string.

That's a valid concern, it's a bit hard to see where the string starts and ends with long delimiters. However, with short delimiters, it seems to be much less of a problem:

// keep it with short delimiters
var x = raw`this is a string with ` characters`raw

// or all-capital letters? not previously seen as convention anywhere else in Go?
// this would make it much easier to see that it is representing a raw string.
var y = RAW`this is a string with ` characters`RAW

Perhaps establishing some sort of convention to use brief delimiters, maybe all-capital as well (such as SQL and RAW) is a good idea. Maybe golint should enforce something like this? I'm not 100% it's a good idea to enforce it with golint, but I do think that having a convention to use short and possibly capital delimiters would help with that aspect significantly.

I brought up the idea of this convention in this comment: #32190 (comment) although it was for a different reason.

@ianlancetaylor
Copy link
Contributor

By the way I think I can partially revive my broken earlier suggestion by saying that writing N backquotes (N >= 2) followed by a double quote is a raw string literal that is terminated by a double quote followed by N backquotes.

s := ``"this is a `raw` "string" literal "``
fmt.Print(s)

prints

this is a `raw` "string" literal

It doesn't collapse nicely to the current raw string literals, but it does have the advantage of sticking to existing string quotation characters. Unless I've missed something again.

@deanveloper
Copy link

deanveloper commented Jun 13, 2019

I actually like that idea. My only real issue with the original N backticks idea was that the "N is odd-only" restriction made it seem very inconsistent. It also fixes the issues with how badly the other syntax played with Markdown. I'll make sure to bring up this one in the proposal that I am working on (along with others that were in this thread).

I think the only real concern is that it would make current raw strings that start or end with quotes (ie `"my string"`) confusing to look at for future learners of Go who do not know the history of raw strings.

@agnivade
Copy link
Contributor

I just bumped on a non-SQL use case regarding this, which I wanted to add as a datapoint.

I have an html template which I am storing as a backtick-quoted string. Now in that template, I have <script> tags and javascript inside those. I was going to use a console.log() with the ${} notation for variables inside a <script> tag, and then immediately realized that I am stuck.

What would have been a nice

console.log(`time taken: ${performance.now()-start}`)

became

console.log(` + "`time taken: ${performance.now()-start}`);" +
`

@ianlancetaylor
Copy link
Contributor

@deanveloper did a nice job of summarizing the general issue over in #32590. We aren't going to adopt a here-document syntax, for the reasons discussed above. Closing this issue in favor of #32590.

Writing for @golang/proposal-review .

@golang golang locked and limited conversation to collaborators Jul 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge LanguageChange Proposal v2 A language change or incompatible library change
Projects
None yet
Development

No branches or pull requests