Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: Documentation doesn't mention that (?<name>...) capturing group syntax isn't currently supported #64108

Open
DylanSp opened this issue Nov 14, 2023 · 10 comments · May be fixed by #64574
Assignees
Labels
Documentation help wanted NeedsFix The path to resolution is known, but the work has not been done.

Comments

@DylanSp
Copy link

DylanSp commented Nov 14, 2023

The documentation for the regexp package doesn't mention that the (?<name>) syntax for capturing groups isn't currently supported by Go. This syntax has been added to RE2, there's an accepted proposal to add it to Go (#58458), but that hasn't yet been implemented in Go. The RE2 wiki page on syntax mentions that this syntax is valid, but the regexp package docs link to that page without mentioning that (?<name>) is unsupported in Go.

What version of Go are you using (go version)?

$ go version
go version go1.21.3 linux/amd64

Does this issue reproduce with the latest release?

Yes; the Go playground link below reproduces it in Go 1.21.4.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/codespace/.cache/go-build'
GOENV='/home/codespace/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.21.3'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/workspaces/toy-robot-test/go/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3059296184=/tmp/go-build -gno-record-gcc-switches'

What did you do?

https://go.dev/play/p/8Jtqu9zq8LZ - this is the example for regexp.Regexp's SubexpIndex method, modified to use ?<first> and ?<last> instead of ?P<first> and ?P<last>.

What did you expect to see?

true
last => 2
Turing

(the output from running the existing example)

What did you see instead?

panic: regexp: Compile(`(?<first>[a-zA-Z]+) (?<last>[a-zA-Z]+)`): error parsing regexp: invalid or unsupported Perl syntax: `(?<`
@rittneje
Copy link

rittneje commented Nov 14, 2023

It should be noted that the new syntax does work with "Go dev branch" in the playground, which means that support is slated for 1.22.

Unfortunately, even historical versions of the regexp docs simply link to https://github.com/google/re2/wiki/Syntax, so no matter what this can cause some confusion. Kind of seems like the RE2 spec needs to be versioned so that the regexp docs (and others) can refer to a particular one.

@mauri870
Copy link
Member

Support for this ?<name> syntax was added in CL 513838, which will be released with go 1.22.

I believe there are no plans to backport this.

I think re2 is linked for cross reference, the supported syntax is listed by running go doc regexp/syntax, as mentioned in the docs.

@rittneje
Copy link

@mauri870 The regexp docs currently say

The syntax of the regular expressions accepted is the same general syntax used by Perl, Python, and other languages. More precisely, it is the syntax accepted by RE2 and described at https://golang.org/s/re2syntax, except for \C. For an overview of the syntax, run go doc regexp/syntax

In other words, it claims that it accepts RE2 as described on that wiki page (which is no longer the case), and only suggests running go doc for an "overview".

In addition, I think the go doc suggestion is somewhat misleading. Really you just want to refer to docs for regexp/syntax.

@mauri870
Copy link
Member

Yeah I agree, this link always point out to the latest wiki which might include features that are not available for older versions of Go. The proper approach would be to just mention go doc regexp/syntax, which is bundled with the toolchain.

So this is about updating the documentation on tip. Given our policy for backports this will likely not be fixed for 1.21 and 1.20, but it is a nice to have from now on.

@DylanSp
Copy link
Author

DylanSp commented Nov 14, 2023

@mauri870 Thanks for taking a look at this. Would it be possible to update the regexp docs to just link to the regexp/syntax docs (and/or mention go doc regexp/syntax), removing the link to the RE2 syntax, in a minor version of 1.21?

@mauri870
Copy link
Member

mauri870 commented Nov 14, 2023

Backports are for critical issues with no workaround, and minor releases prioritize backwards compatibility as much as possible, per the wiki. If it doesn't seem critical and can be worked around a backport is not needed.

You are more than welcome tho to send a patch that updates the docs for the current tip (future 1.22)

@DylanSp
Copy link
Author

DylanSp commented Nov 14, 2023

Got it; looking at the wiki page you linked, "Important documentation-only changes [...] may also be included as well, but nothing more.", and this probably doesn't qualify as important.

I'll look into sending a patch to update the docs for the current tip, though I can't commit to it.

@mauri870 mauri870 added help wanted NeedsFix The path to resolution is known, but the work has not been done. labels Nov 14, 2023
@mauri870 mauri870 self-assigned this Dec 6, 2023
mauri870 added a commit to mauri870/go that referenced this issue Dec 6, 2023
The regexp documention point to the google/re2 syntax page, which
causes confusion since users expect feature parity with RE2, which might
not be the case if you are using an older version of Go.

In regexp/syntax/doc.go there is already an autogenerated syntax based
on RE2 and we already link to that in the docs, so removing the external
link does not result in any loss of information.

Fixes golang#64108
@gopherbot
Copy link

Change https://go.dev/cl/547795 mentions this issue: regexp: remove reference to external RE2 wiki from the docs

@dolmen
Copy link
Contributor

dolmen commented Dec 12, 2023

I wonder if that comment in the code is still relevant:

// DO NOT EDIT. This file is generated by mksyntaxgo from the RE2 distribution.

@mauri870
Copy link
Member

I wonder if that comment in the code is still relevant:

// DO NOT EDIT. This file is generated by mksyntaxgo from the RE2 distribution.

I recall running mksyntaxgo a couple days ago and it produced an up-to-date file with no changes, so I think it is still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants