New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: Go 2: language: raw string literals that contain backquotes #32190
Comments
One possible approach that doesn't require changing the language is something like
|
This is not a compelling use case, one should not be constructing sql query strings by hand. |
@davecheney writing the query with bind parameters doesn't mean there aren't backticks, right?
Or do you literally mean that developers shouldn't write SQL at all and should construct queries via ORMs or other tools? (In which case that sounds quite subjective.) |
Number one of the top ten OWASP security vulnerabilities is SQL injection. I think the OPs case would be strengthened by choosing a different example. |
@davecheney I don't believe that addresses my questions. Avoiding SQL injections doesn't imply that you never write SQL text in Go code. OWASP's own SQL injection cheat sheet has plenty of SQL text including column names. |
This example is perhaps not ideal because But I am sympathetic to the problem that the OP has text they would like to copy-paste into Go code and cannot simply do it without some kind of O(n) text transformation. I have run into this in the past when copy-pasting text into a Go program (I believe it was Go source code that included backticks). The copy-paste problem isn't helped much by @ianlancetaylor's workaround, either. |
Perhaps, as in some other languages, double backticks could become an escape for a backtick? This would likely be a backwards compatible approach that doesn't require a new string quoting type. Eg: const query = `SELECT ``foo`` FROM ``bar`` WHERE ``baz`` = "qux"` |
Maybe use double back quotes?
|
This to me is the crux of the problem. I can’t paste from my tooling into Go, and I subsequently can’t paste from Go into my tooling. Suggested solutions that still involve some form of escaping don’t solve that. Further I don’t think the sql injection argument is relevant here. My example is not injecting any user data into the string. There are plenty of reasons one would want literals in their queries. |
As someone who has dealt with Shell syntax for years, please don't add heredoc strings to Go. They are a can of worms. For example, a number of questions start popping up:
|
I don’t see any of those questions as actual problems perse with heredoc. Some of them like the internal white space aren’t really in question at all. Some are up to the language designer as implementation details. None are problems.
All white space within the boundaries is part of the string. That’s not a hard question, thats not in question. That’s the point of heredoc. It’s a literal “here be the document”, WYSIWYG.
Depends on the language rules, I’ve used languages that allow it, I’ve used languages that don’t. I’d personally vote no, I think allowing leading whitespace on the closing boundary just adds trouble.
Same as if you don’t close a quoted or backtick string? Syntax error. I don’t think that’s a legitimate question. Is there a language where this is not the case?
Another implementation detail. I’d vote any non whitespace containing series of runes. Maybe limit it to Unicode letter and number runes? It’d be nice for internationalization though to allow non ascii for sure, It’s just a boundary for the string. It’s scope begins and ends with the string. |
I am not suggesting heredoc syntax as the only solution. But sometimes the inability to escape a backtick in a raw string literal becomes very painful depending on the type of application one is writing. As a general rule, I do not like to use concatenation to write string literals. So whenever I have a situation where a string has a mix of backticks and double-quotes, I have to jump some mental hoops in calculating whether the string has more double-quotes or does it have more backticks, and which escaping scheme do I use to make the code more readable. There have been past proposals which all have been declined #24475, #23228, #18221. The suggestion throughout seems to be to use + to join the strings. But readability wise, I personally think, it takes more mental effort to read a |
Better SQL example where quoting is actually required (on MySQL at least): SELECT `group` FROM my_table; |
Y’all need to find a better justification for this change.
https://dev.mysql.com/doc/refman/8.0/en/string-literals.html
… On 24 May 2019, at 19:25, MOZGIII ***@***.***> wrote:
Better SQL example where quoting is actually required under MySQL:
SELECT `group` FROM my_table;
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I think it's good enough. There are a few practical cases where the lack of the ability to use both quotes and backticks in the strings is a real pain point. Working around it is currently possible, but ugly. The lack of a sound and complete way to specify string literals in the language is a real issue, ignoring it is a poor response. I don't think it requires justification, because to me it's obvious. Is it not to everyone else? |
It boils down to the fact that using the following for coding backtick really sucks:
This is what we have to use right now, and it derives naturally from the language rules. Now, the natural solution to this issue would be heredoc syntax. I don't see any argument on why you wouldn't want that to be in the language, except the "it's not justified" one. Is it hard to implement, does it violate some design constraints on the lexer? Why not just add it? |
First of all, I think many of us can agree that having a nicer way to have multiline string literals without worrying about quotes would be nice. All I'm saying is that I don't think heredocs is a good solution. Can we please have a civil discussion about it without resorting to "this sucks", "not a legitimate question", or dismissals like "to me it's obvious" and "why not just add it"?
Well, that's not how heredocs work in shell. See https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/utilities/V3_chap02.html#tag_18_07_04, in particular It's fine if you just want the equivalent of
Not allowing any leading indentation is a possible outcome, but it means that heredocs within indented code would look out of place. It's a tradeoff, and the proposal should be clear about what side it decides on.
Yes, bash, which I presume is your point of reference.
Sorry, but I disagree. A proposal to make such a large change to the language spec should be very clearly defined. This includes what can be a valid delimiter token/word/identifier/etc. |
Please excuse me, bad day :) |
I'd prefer to borrow heredoc syntax from Ruby, it has very nice properties: https://ruby-doc.org/core-2.5.0/doc/syntax/literals_rdoc.html#label-Here+Documents |
Also, here's an alternative from @rogpeppe almost ten years ago: https://groups.google.com/forum/#!msg/golang-nuts/IVyT2ovIljQ/KJggKkrYGCMJ
It wasn't implemented, and it hasn't been formally proposed since, but it has the nice properties that it supports indentation and nesting. |
I don't personally feel
is any more out of place in the language than title:
for {
break title
} That said, I relent. |
Also re: the
purposal, what if the string I want to encapsulate starts or ends with a backtick? Ending as such is likely common in |
Fair point. (Although SQL doesn't seem like a good example since it also permits double quotes.) |
How about > 2 backticks instead of just odd number? Four backticks are syntax error currently, so it's safe to introduce. |
@MOZGIII i don't like it personally, because allowing any even numbers of backticks introduces ambiguities:
So Go would have two options:
Example of what I mean by the second option:
There would be a syntax error on line 2. This is because |
@deanveloper you used two backticks in the example, but I proposed more than two. Two backticks clearly does not fit for the reasons you gave. However, with 3, 4 and any greater number of backticks there are no ambiguities - in Go 1 all of them are syntax errors. The edge case here would probably be the ability to encode an empty string with this notation. I'd just prohibit that altogether - it'd probably be easier for the lexer that way, and it doesn't seems like a big issue to me. At least I could live with that. What I don't like is the collision with the Markdown notation for code decoration. It may complicate using Go code in Markdown code sections. Maybe I can live with that too - but for me personally it kind of matters more than the ability to represent an empty string... What do you think? I'm ok even with the must-be-odd number of backticks proposal - it's better than nothing, and it has it's own advantages. |
Sorry for all of the edits to my comment which probably looks confusing post-edits. I'm personally still a fan of my initial idea with requiring an identifier immediately before/after the respective opening/closing backtick. I don't know how parsing/lexing/compiling/etc works however, so I'm not sure of the severity about how much it would complicate the compiler. But it definitely doesn't collide with Markdown code fences 😉 I'm not upset with the odd number of backticks rule however, it just seems "odd" to only allow odd numbers. |
Actually, there is another pretty serious downside with the backticks - and that's collision with the backticks inside the string itself:
This is a serious problem for me, cause it suffers from the kind of similar issue to the one that regular ` have - and that is support for representing backticks inside the string itself. This brings me back to how well though this is in ruby. It has enough string literal forms to cover every case I can think of. |
@MOZGIII Yeah, that was brought up by @donatj a few posts ago. Also speaking of which, @ianlancetaylor, backticks and quotes are not synonymous in SQL. Backticks are used for quoting identifiers in order to make sure that you can select tables/columns named after keywords (or contain strange characters such as spaces and commas), while quotes are used for string literals. For instance:
|
It would be great to have "is identical to" (or another Unicode marker) as a way to ultra-backtick ASCII text. sql := ≡SELECT ...feels categorically simpler than a gang of backticks or any other in-band signaling. That’s the core issue, that using an ASCII delimiter for arbitrary ASCII text always has exceptions by definition, and a surprisingly high incidence of them in cases like this SQL example (and Markup and ...) where you’re quoting something that is likely to already be quoting things. Recursive quoting tempts fate because “smart people just like us” had the same ideas for quoting their thing, so when Go wants to quote it, the probability of collisions is very high. OTOH, if Go used “mango” at each end, and SQL used “rose”, then the space would be huge and collisions rare. It would look dumb, of course, but would not have the collisions of everyone using the same four quotation delimiters. This is why I propose adding U+2261 IDENTICAL TO as an alternative to back tick in marking raw text. |
I don't like rarely-used single rune unicode sequences mainly for the following reasons:
|
A further possibility which I don't think has been mentioned so far is to introduce a new single-character escape \` which would only be valid within raw string literals. This would be analogous to the existing escapes \' (only valid in rune literals) and \" (only valid in 'ordinary' string literals). The new escape wouldn't be ideal from a 'cut and paste' perspective as you'd need to go through and prepend each back-tick with a slash. However, this would be easier than having to split each back-tick out into an ordinary string literal and (at least to my eye) would stand out more than simply doubling each back-tick as well as being a rarer combination of symbols. Compared to solutions which involve using an odd number of back-ticks as a delimiter, it also has the advantage that leading and trailing back-ticks are easier to read. |
@alanfo Then you would have to escape |
Perhaps it would be best to continue the discussion elsewhere, as it's drifting further apart from the original heredoc proposal. It's always simple to file another, separate proposal once another idea has fully formed. |
@mibk I don't follow why you would have to escape Also it would be backwards compatible as, at present, a raw string literal can't include a back-tick at all. |
@alanfo Consider this example:
Is it an unterminated raw string with an escaped backslash, or a raw string containing a single backslash? |
It's a raw string containing a single backslash as it is now. I'll admit it's an awkward case for the parser to deal with but `\`` would be fine (a raw string containing a single back-tick) whereas ```` might be problematic. |
@mvdan |
@mvdan what @alanfo said, and also note that @ianlancetaylor retitled the issue to be more generic. Of the ideas listed here, @deanveloper's original idea (now unfortunately hidden in the fold) seems by far the best to me. All of @MichaelTJones's Unicode suggestions don't really work for quoting Go code itself, and are awkward for many of us to type. The "more backticks" ideas discussed by @ianlancetaylor and others has the problem that it does not work for text that begins or ends with a backtick. @deanveloper's idea doesn't have these issues. Really, the only one I see is the one pointed out by @jimmyfrasche: it adds a certain complexity to lexing that's different from anything in the language today. But I think that might be fundamental to any syntax which allows quoting arbitrary text. |
I personally think that these syntaxes are quite a bit different from the original proposal which was asking for a feature that other languages implement, while the current discussion is simply about improving raw strings rather than implementing HEREDOC. I'll start a new proposal, which will include a lot of the discussion from this post. |
@deanveloper It seems to me that most of these suggestions are things that other languages implement, or similar to them. Most current languages have some form of raw string literal these days. My only concern with
is that it doesn't lead with the fact that it is a string. C++ ( |
That's a valid concern, it's a bit hard to see where the string starts and ends with long delimiters. However, with short delimiters, it seems to be much less of a problem:
Perhaps establishing some sort of convention to use brief delimiters, maybe all-capital as well (such as I brought up the idea of this convention in this comment: #32190 (comment) although it was for a different reason. |
By the way I think I can partially revive my broken earlier suggestion by saying that writing N backquotes (N >= 2) followed by a double quote is a raw string literal that is terminated by a double quote followed by N backquotes.
prints
It doesn't collapse nicely to the current raw string literals, but it does have the advantage of sticking to existing string quotation characters. Unless I've missed something again. |
I actually like that idea. My only real issue with the original N backticks idea was that the "N is odd-only" restriction made it seem very inconsistent. It also fixes the issues with how badly the other syntax played with Markdown. I'll make sure to bring up this one in the proposal that I am working on (along with others that were in this thread). I think the only real concern is that it would make current raw strings that start or end with quotes (ie |
I just bumped on a non-SQL use case regarding this, which I wanted to add as a datapoint. I have an html template which I am storing as a backtick-quoted string. Now in that template, I have <script> tags and javascript inside those. I was going to use a console.log() with the What would have been a nice
became
|
@deanveloper did a nice job of summarizing the general issue over in #32590. We aren't going to adopt a here-document syntax, for the reasons discussed above. Closing this issue in favor of #32590. Writing for @golang/proposal-review . |
I would like to propose Go add support for a HEREDOC syntax to make adding literals of particular precarious strings easier.
A common syntax in many programming language is
<<< (boundary)
to open and a line containing just said boundary to close.I would propose something along the lines of:
My personal reasoning is for MySQL queries.
Myself and my company work with MySQL a great deal. Backticks are used to quote tables and fields in MySQL. Our queries will often contain both numerous quotes and backticks - particularly queries generated by tooling.
There is no way to escape a backtick in a backtick string in Go, so we end up either a using double quotes string and escaping all the quotes within or using backticks and breaking out of the string on backtick (ala
`x` + "`" + `y`
)Currently we end up with something like
or in cases with massively more quotes than backticks I'll do something like
These examples are toys obviously, but this become much more of an issue on large 30+ line report queries - and more importantly makes copying queries out of code and into a MySQL client a real pain.
The text was updated successfully, but these errors were encountered: