Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strconv: add QuotedPrefix #45033

Closed
dsnet opened this issue Mar 15, 2021 · 19 comments
Closed

strconv: add QuotedPrefix #45033

dsnet opened this issue Mar 15, 2021 · 19 comments

Comments

@dsnet
Copy link
Member

dsnet commented Mar 15, 2021

The existing Unquote function unescapes a Go string assuming that the entirety of input string is the escaped the string. However, in many parsing applications, we have a quoted string followed by some amount of unconsumed input, the presence of arbitrary characters after the quoted string currently breaks the Unquote function.

I propose adding:

// UnquoteLeading interprets the start of s as a single-quoted, double-quoted, or backquoted Go string literal.
// Any subsequent characters after the terminating quote are returned as rem.
func UnquoteLeading(s string) (out, rem string, error)

This functionality may simplify a number of standard packages that implement their own logic to determine the end of a quoted string so that they can pass the correctly sized string to strconv.Unquote:

@gopherbot gopherbot added this to the Proposal milestone Mar 15, 2021
@ianlancetaylor ianlancetaylor added this to Incoming in Proposals (old) Mar 15, 2021
@mvdan
Copy link
Member

mvdan commented Mar 17, 2021

I think this proposal makes sense; having to find the end of the quote manually means reimplementing a basic version of Unquote.

I also wonder if the input should be a []byte; I realise the package is called strconv, but the use case is precisely about reading and buffering, where one almost always has a byte slice. The fmt and reflect examples you show have to explicitly convert to string before calling strconv, for example. I think the result is less important, as it's possible to construct the result with a strings.Builder.

@rsc
Copy link
Contributor

rsc commented Mar 24, 2021

It may be that the operation to expose is "find me the end of this string" and not "also Unquote it".
I was discussing recently with Austin about adding a similar operation to regexp/parse, but I was thinking it would be separate from actually parsing. You don't always want to parse it, but you always need to identify a boundary.

@mvdan
Copy link
Member

mvdan commented Mar 24, 2021

That sounds more flexible, with the downside that one would end up looping over the input bytes twice. That already happens today, so perhaps the cost isn't significant enough to matter.

@dsnet
Copy link
Member Author

dsnet commented Mar 24, 2021

Given that many strings do not have any escape characters, the operation could report whether the string had any escape characters. If not, the user could trivially truncate off the leading and trailing quote characters and avoid a second pass.

@rsc
Copy link
Contributor

rsc commented Apr 7, 2021

This proposal has been added to the active column of the proposals project
and will now be reviewed at the weekly proposal review meetings.
— rsc for the proposal review group

@rsc rsc moved this from Incoming to Active in Proposals (old) Apr 7, 2021
@rsc
Copy link
Contributor

rsc commented Apr 14, 2021

So if we do the string-end-finder, I guess the API would be:

package strconv

// QuotedPrefix returns the quoted string (as defined by Quote and Unquote) at the prefix of s.
// If s does not start with a valid quoted string, QuotedPrefix returns an error.
func QuotedPrefix(s string) (string, error)

?

Reporting the presence of absence of escape characters seems like premature optimization.

@dsnet
Copy link
Member Author

dsnet commented Apr 14, 2021

Wouldn't you need to know where the prefixed quoted string ended so that the caller can start parsing what comes afterwards?

Reporting the presence of absence of escape characters seems like premature optimization.

I agree. In my all my use-cases, performance was not paramount. Also, this loop does not seem hard to write in the extremely performance sensitive cases.

@ianlancetaylor
Copy link
Contributor

Wouldn't you need to know where the prefixed quoted string ended so that the caller can start parsing what comes afterwards?

You can call the len function on the result.

@dsnet
Copy link
Member Author

dsnet commented Apr 14, 2021

Ah yes, I apologize. I missed the part where it "returns the quoted string".

@dsnet
Copy link
Member Author

dsnet commented Apr 14, 2021

I feel like an API that simply returns the length of a valid quoted string (and the entire length otherwise) is simpler to use.

Suppose we had:

// QuotedPrefixLen returns the length of the quoted string (as defined by Unquote) at the prefix of s.
// If s does not start with a valid quoted string, QuotedPrefix returns len(s).
func QuotedPrefixLen(s string) int

For example, if I wanted to parse the following text:

"string1" "string2" "string3"

the code would look like:

in = strings.TrimSpace(in)
for len(in) > 0 {
    n := strconv.QuotedPrefixLen(in)
    s, err := strconv.Unquote(in[:n])
    if err != nil {
        ...
    }
    ... // make use of s
    in = strings.TrimSpace(in[n:])
}

Alternatively, with the QuotedPrefix API, you have to do an additional error check:

in = strings.TrimLeft(in, " ")
for len(in) > 0 {
    s1, err := strconv.QuotedPrefix(in)
    if err != nil {
        ...
    }
    s2, err := strconv.Unquote(s1)
    if err != nil {
        ... // technically this will never happen, but it's not obvious to reviewers
    }
    ... // make use of s2
    in = strings.TrimLeft(in[len(s1):], " ")
}

That said, I'm still okay with QuotedPrefix as its better than nothing.

@rogpeppe
Copy link
Contributor

I feel like an API that simply returns the length of a valid quoted string (and the entire length otherwise) is simpler to use.

Suppose we had:

// QuotedPrefixLen returns the length of the quoted string (as defined by Unquote) at the prefix of s.
// If s does not start with a valid quoted string, QuotedPrefix returns len(s).
func QuotedPrefixLen(s string) int

For example, if I wanted to parse the following text:

"string1" "string2" "string3"

the code would look like:

in = strings.TrimSpace(in)
for len(in) > 0 {
    n := strconv.QuotedPrefixLen(in)

Would QuotedPrefixLen return 0 if there's an escaping error inside the quoted prefix?

I guess if that's the case, you could still call Unquote on the full string and it'll produce the expected error, so it'll work out OK.

@dsnet
Copy link
Member Author

dsnet commented Apr 15, 2021

I struggled between having it return 0 or len(s) on error situations. The advantage of returning len(s) is that naive slicing of the input would pass an invalid string to strconv.Unquote, which would then produce a good error message. The downside is that you can't distinguish between a valid string that happens to be the full length or an invalid one.

@mvdan
Copy link
Member

mvdan commented Apr 15, 2021

Have it return (int, bool)? You could just ignore the boolean result if you don't need it.

@rsc
Copy link
Contributor

rsc commented Apr 21, 2021

You can always wrap QuotedPrefix to get the single-result.
Returning the error lets you get specific information about what went wrong,
which the sentinel integer does not.

@rsc rsc changed the title proposal: strconv: add UnquoteLeading proposal: strconv: add QuotedPrefix Apr 21, 2021
@rsc rsc moved this from Active to Likely Accept in Proposals (old) Apr 21, 2021
@rsc
Copy link
Contributor

rsc commented Apr 21, 2021

Based on the discussion above, this proposal seems like a likely accept.
— rsc for the proposal review group

@rsc rsc moved this from Likely Accept to Accepted in Proposals (old) Apr 28, 2021
@rsc
Copy link
Contributor

rsc commented Apr 28, 2021

No change in consensus, so accepted. 🎉
This issue now tracks the work of implementing the proposal.
— rsc for the proposal review group

@rsc rsc changed the title proposal: strconv: add QuotedPrefix strconv: add QuotedPrefix Apr 28, 2021
@rsc rsc modified the milestones: Proposal, Backlog Apr 28, 2021
@dsnet
Copy link
Member Author

dsnet commented Apr 28, 2021

@rsc, To be clear, we're going with the API you proposed in #45033 (comment), right?

@dsnet
Copy link
Member Author

dsnet commented Apr 28, 2021

Returning the error lets you get specific information about what went wrong,

I just took a look at the implementation of strconv.Unquote and it always returns strconv.ErrSyntax no matter what. I'm not sure how much the rationale of "getting specific information" applies. I would expect strconv.Unquote and strconv.UnquotePrefix to return the exact same error for some invalid input. It's not clear to me that we can (due to Hyrum's law) nor should (due to performance implications) make strconv.Unquote give more informative errors.

@gopherbot
Copy link

Change https://golang.org/cl/314775 mentions this issue: strconv: add QuotedPrefix

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
No open projects
Development

No branches or pull requests

7 participants