Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: backreference to capturing group breaks if followed by underscore #39594

Open
ghost opened this issue Jun 15, 2020 · 3 comments
Open
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@ghost
Copy link

ghost commented Jun 15, 2020

What version of Go are you using (go version)?

$ go version
1.14.4

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/a/.cache/go-build"
GOENV="/home/a/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/a/.local/share/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build462357482=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Minimal case: Play

What did you expect to see?

I expect the pattern "$2_$1" to work without needing to escape into "${2}_$1", as in python etc.

@ghost ghost changed the title Backreference to capturing group breaks if followed by underscore Regexp: backreference to capturing group breaks if followed by underscore Jun 15, 2020
@andybons andybons changed the title Regexp: backreference to capturing group breaks if followed by underscore regexp: backreference to capturing group breaks if followed by underscore Jun 15, 2020
@andybons
Copy link
Member

@rsc

@andybons andybons added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 15, 2020
@andybons andybons added this to the Unplanned milestone Jun 15, 2020
@antong
Copy link
Contributor

antong commented Jun 15, 2020

This may be counter-intuitive, but if I interpret the documentation correctly, I think this is the way it is supposed to work:

In the template, a variable is denoted by a substring of the form $name or ${name}, where name is a non-empty sequence of letters, digits, and underscores.
...
In the $name form, name is taken to be as long as possible: $1x is equivalent to ${1x}, not ${1}x, and, $10 is equivalent to ${10}, not ${1}0.

So, the template in the example "$2_$1" is the same as "${2_}${1}", not "${2}_${1}".

@mattn
Copy link
Member

mattn commented Jun 16, 2020

JavaScript

console.log('foo,bar'.replace(/(\w+),(\w+)/, '$2_$1'));

Result is bar_foo

Perl

my $a = 'foo,bar';
$a =~ s/(\w+),(\w+)/\2_\1/;
warn $a;

Result is bar_foo

Ruby

puts 'foo,bar'.sub(/(\w+),(\w+)/, '\2_\1')

Result is bar_foo

So, I propose to fix the behavior of Go.

diff --git a/src/regexp/all_test.go b/src/regexp/all_test.go
index be7a2e7111..7d944d4844 100644
--- a/src/regexp/all_test.go
+++ b/src/regexp/all_test.go
@@ -227,6 +227,7 @@ var replaceTests = []ReplaceTest{
 	{"(a)(((b))){0}c", ".$1.", "xacxacx", "x.a.x.a.x"},
 	{"((a(b){0}){3}){5}(h)", "y caramb$2", "say aaaaaaaaaaaaaaaah", "say ay caramba"},
 	{"((a(b){0}){3}){5}h", "y caramb$2", "say aaaaaaaaaaaaaaaah", "say ay caramba"},
+	{"(Hello)_(World)", "$2_$1", "Hello_World!", "World_Hello!"},
 }
 
 var replaceLiteralTests = []ReplaceTest{
diff --git a/src/regexp/regexp.go b/src/regexp/regexp.go
index b547a2ab97..7bab7a5d81 100644
--- a/src/regexp/regexp.go
+++ b/src/regexp/regexp.go
@@ -981,12 +981,24 @@ func extract(str string) (name string, num int, rest string, ok bool) {
 		str = str[1:]
 	}
 	i := 0
-	for i < len(str) {
-		rune, size := utf8.DecodeRuneInString(str[i:])
-		if !unicode.IsLetter(rune) && !unicode.IsDigit(rune) && rune != '_' {
-			break
+	b := str[0]
+	if !brace && '0' <= b && b <= '9' {
+		i++
+		for i < len(str) {
+			rune, size := utf8.DecodeRuneInString(str[i:])
+			if !unicode.IsLetter(rune) && !unicode.IsDigit(rune) {
+				break
+			}
+			i += size
+		}
+	} else {
+		for i < len(str) {
+			rune, size := utf8.DecodeRuneInString(str[i:])
+			if !unicode.IsLetter(rune) && !unicode.IsDigit(rune) && rune != '_' {
+				break
+			}
+			i += size
 		}
-		i += size
 	}
 	if i == 0 {
 		// empty name is not okay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

3 participants