Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp/syntax: named capture groups don't support non-latin alphabets #64678

Open
igorzhilianin opened this issue Dec 12, 2023 · 3 comments · May be fixed by #64662
Open

regexp/syntax: named capture groups don't support non-latin alphabets #64678

igorzhilianin opened this issue Dec 12, 2023 · 3 comments · May be fixed by #64662
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@igorzhilianin
Copy link
Contributor

Go version

go version go1.21.4 linux/amd64

What operating system and processor architecture are you using (go env)?

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/lib/go-1.21'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/lib/go-1.21/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.4'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/dev/null'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1924873596=/tmp/go-build -gno-record-gcc-switches'

What did you do?

Python's re and google/re2 has no issue compiling named capture groups with international characters.

Go doesn't support it, as you could see here running this sample: https://go.dev/play/p/d1pVihwOznE

package main

import (
	"regexp"
)

func main() {
	regexp.MustCompile(`(?P<тест>a)`)
}

What did you expect to see?

No errors.

What did you see instead?

panic: regexp: Compile(`(?P<тест>a)`): error parsing regexp: invalid named capture: `(?P<тест>`

goroutine 1 [running]:
regexp.MustCompile({0x485746, 0xf})
	/usr/lib/go-1.21/src/regexp/regexp.go:319 +0xb4
main.main()
	/root/test.go:8 +0x1f
@gopherbot
Copy link

Change https://go.dev/cl/548997 mentions this issue: regexp/syntax: allow extended Unicode characters in capture names

@prattmic
Copy link
Member

cc @rsc

@prattmic prattmic added this to the Backlog milestone Dec 12, 2023
@prattmic prattmic added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Dec 12, 2023
@seankhliao
Copy link
Member

we rejected #60784 a while ago, though re2 allows more unicode these google/re2@6a99418

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants