Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: no way to replace submatches with a function #5690

Open
gopherbot opened this issue Jun 12, 2013 · 20 comments
Open

regexp: no way to replace submatches with a function #5690

gopherbot opened this issue Jun 12, 2013 · 20 comments
Milestone

Comments

@gopherbot
Copy link

by denys.seguret:

ReplaceAllStringFunc is useful when you need to process the match to compute the
replacement, but sometimes you need to match a bigger string than the one you want to
replace. A similar function able to replace submatch(es) seems necessary.

Let's say you have strings like

    input := `bla b:foo="hop" blabla b:bar="hu?"`

and you want to replace the part between quotes in b:foo="hop" and
b:bar="hu?" using a function.

It's easy to build a regular expression to get the match and submatch, for example

    r := regexp.MustCompile(`\bb:\w+="([^"]+)"`)

but when you use ReplaceAllStringFunc, the callback is only provided the whole match,
not the submatch, and must return the whole string. Practically this means you need to
execute the regexp (or another one) in the callback, for example like this :

        input := `bla bla b:foo="hop" blablabla b:bar="hu?"`
        r := regexp.MustCompile(`(\bb:\w+=")([^"]+)`)
        fmt.Println(r.ReplaceAllStringFunc(input, func(m string) string {
                parts := r.FindStringSubmatch(m)
                return parts[1] + complexFunc(parts[2])
        }))

I think a function ReplaceAllStringSubmatchFunc would be useful and would avoid the
second pass. The callback would receive the submatch and return the replacement of the
submatch. The last example would be rewritten as

        input := `bla bla b:foo="hop" blablabla b:bar="hu?"`
        r := regexp.MustCompile(`\bb:\w+="([^"]+)"`)
        fmt.Println(r.ReplaceAllStringSubmatchFunc(input, complexFunc))
        
A similar function (ReplaceAllStringSubmatchSliceFunc ?) could be designed to give the
callback an array of strings that the callback would change. In fact it could be decided
that only this last function is really necessary.

Links :

 - "How-to" question on Stack-Overflow : http://stackoverflow.com/q/17065465/263525
 - Playground link : http://play.golang.org/p/I6Pg8OUeTj
@robpike
Copy link
Contributor

robpike commented Jun 12, 2013

Comment 1:

Labels changed: added priority-later, packagechange, removed priority-triage.

Owner changed to @rsc.

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Jul 30, 2013

Comment 3:

Labels changed: added go1.3maybe.

@robpike
Copy link
Contributor

robpike commented Aug 20, 2013

Comment 4:

Labels changed: removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 5:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 6:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 7:

Labels changed: added repo-main.

@gopherbot
Copy link
Author

Comment 8:

CL https://golang.org/cl/106360043 mentions this issue.

@gopherbot
Copy link
Author

Comment 9 by denys.seguret:

Small comment : the whole thing could be cleaner that what I initially proposed by
accepting a callback with submatches passed as variadic instead of an explicit array.

@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@victorhooi
Copy link

I just hit this issue as well. Does "Unplanned" mean this is unlikely to get worked on?

I'm also including some information on my use-case, in case that helps.

I'm trying to transformed loglines containing key-value pairs, to redact any string values. So for example:

name: "Joe", last_name: "Bloggs", age: 5, nickname: "Jogs" }

might become:

name: "SOME_HASH", last_name: "SOME_HASH", age: 5, $comment: "do not redact me", nickname: "SOME_HASH" }

I only want to target quoted strings that are followed by either , (comma) or } (closing curly-braces), and I also want to ignore any $comment fields.

I know that Go's regexp doesn't have lookahead/lookbehinds, which means I can't check for the above. using those. That restricts me somewhat. However, I figured I'd just capture everything using a regex like this:

quoted_string_regex, _ := regex.Compile(`(\$comment: )?"([^"]*)"[,| }]`)

and then check the actual subgroups to see if $comment was there, and also grab out the comma or curly-brace, and put that back on at the end.

However, I'm using ReplaceAllStringFunc which only gives you the entire match - so it seem like I either need to do a second regex inside my callback function, or I need to do a bunch of contains/splits/ends-with etc.

(Obviously, if I've missed something obvious that is available in Go, please feel free to correct the above).

@josharian
Copy link
Contributor

Does "Unplanned" mean this is unlikely to get worked on?

Unplanned just means that this won't potentially block a release. I know that @michaelmatloob has been looking at regexp stuff recently; perhaps he is interested.

@crenz
Copy link

crenz commented Oct 26, 2016

Just wanted to add that I hit the very same issue today. I was trying to implement a simple tag replacement, e.g.

Name: {name}
First name: {firstname}

becomes

Name: Doe
First name: Jon

I'm coming from a Perl background; my first intuition was using a regexp like /{([^}]+)}/. Note the submatch in parentheses: In Perl, it would be possible to use replace (and call a function on the submatch) or use split (and get the submatches returned). In Go, split never returns the part that matches, and ReplaceAllStringFunc will return the complete string instead of just the submatch.

@matloob
Copy link
Contributor

matloob commented Oct 26, 2016

I'm not planning on working on this. If you're interested in contributing this, feel free to do so, but note that the freeze will start in a few days.

@AlekSi
Copy link
Contributor

AlekSi commented Sep 27, 2017

Is this issue solved by Regexp.Expand and Regexp.ExpandString?

@ghost
Copy link

ghost commented Sep 27, 2017

@AlekSi
I guess not, at least not in a straightforward way. The number of variables in the expand template is limited, whereas the number of matches in a string isn't.

@srackham
Copy link

srackham commented Jan 8, 2018

I came across this post by Elliot Chance, it solved a JavaScript to Go porting problem I was having (for consistency it would be nice if it was incorporated as a new method in the Go regexp package):

http://elliot.land/post/go-replace-string-with-regular-expression-callback

Gist here: https://gist.github.com/elliotchance/d419395aa776d632d897

@golang golang deleted a comment from c9s Aug 9, 2018
@alisonatwork
Copy link

Thanks for the link @srackham - I hit exactly the same problem with trying to port something from JavaScript to Go. It would definitely be nice to see this functionality inside the standard regexp package.

I also found another project which appears to implement similar functionality in perhaps a cleaner way because it replaces the default regexp: https://github.com/agext/regexp

This gives some idea of how the solution could look: https://github.com/agext/regexp/blob/master/agext.go#L105

@slimsag
Copy link

slimsag commented Jan 23, 2020

Here is a snippet for anyone else looking for a way to replace submatches with a function using bytes (not strings) and without having to deal with intermediate (non-captured) data: https://gist.github.com/slimsag/14c66b88633bd52b7fa710349e4c6749

@inliquid
Copy link

I have the same problem.

  1. There are text posts which may include specific links to files stored in a directory structure
  2. I need to parse these posts, find links to files, and then
  3. Move these files to different directory structure,
  4. Manipulate the original path, and
  5. Return new path as a replacement (and replace at the same time if possible).

I would use ReplaceAllStringFunc but I also need submatches which lead to making an additional call to same regexp within the repl function with FindAllStringSubmatch.

@rsc rsc removed their assignment Jun 22, 2022
@entonio
Copy link

entonio commented Nov 19, 2023

I've met this issue today. I'm sure I've met it before, but I've probably used some tedious, bug-prone, workaround.

@volodymyrprokopyuk
Copy link

Hi,

A solution I use to solve this problem does two regexp matches: one for Replace and another for Find which is inefficient:

func main() {
  str := "a: b, c: d"
  re := regexp.MustCompile(`(\w+): (\w+)`)
  transformString := func(s string) string {
    m := re.FindStringSubmatch(s) // inefficiency: match again
    k, v := m[1], m[2]
    return fmt.Sprintf("%v: %v", strings.ToUpper(v), strings.ToUpper(k))
  }
  rpl := re.ReplaceAllStringFunc(str, transformString) // first match
  fmt.Println(rpl) // B: A, D: C
}

The function ReplaceAllStringSubmatchFunc() is missing from the regexp package. With this function the code would look like:

func main() {
  str := "a: b, c: d"
  re := regexp.MustCompile(`(\w+): (\w+)`)
  transformSubmatch := func(m []string) string {
    k, v := m[1], m[2]
    return fmt.Sprintf("%v: %v", strings.ToUpper(v), strings.ToUpper(k))
  }
  rpl := re.ReplaceAllStringSubmatchFunc(str, transformSubmatch) // new function
  fmt.Println(rpl) // B: A, D: C
}

I'm looking forward for the ReplaceAllStringSubmatchFunc() to be included into the regexp package, as this situation is quite recurring.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests