Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: doc: lead with the precise syntax rather than referring to Perl, Python, etc. #39405

Open
Jarch09 opened this issue Jun 4, 2020 · 11 comments
Labels
Documentation help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@Jarch09
Copy link

Jarch09 commented Jun 4, 2020

What version of Go are you using (go version)?

1.13.5

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env darwin/amd64
$ go env

What did you do?

  • Regexp is not catching all of the whitespace values that is should
  • In particular \s is not matching hair spaces (whereas perl regex does)

Here's a go playground link

What did you expect to see?

It should match the white space.

What did you see instead?

It did not recognize the hair space as valid white space.

If this is indeed the intended behavior, the docs should be more clear that golang's regex does not conform with perl's and will not match all of the standard regex special characters.

If the docs already say this loudly and clearly and I just missed that section, apologies for being an idiot.

@theckman
Copy link
Contributor

theckman commented Jun 4, 2020

It's worth noting that Go's regular expression engine doesn't aim to be compatible with Perl's (PCRE). Our implementation is aimed towards being compatible with re2. According to re2's documentation, \s is equivalent to: [\t\n\f\r ].

@Jarch09
Copy link
Author

Jarch09 commented Jun 4, 2020

Okay that makes sense. Thanks for the quick response.

I think it'd be worth making that very obvious in the regexp docs (again, apologies if it already is and I missed it).

My uneducated view is that most golang users expect it be compatible with PCRE and are surprised when it is not.

I think it'd also be beneficial to the docs to list some of the special characters and their equivalencies, or include a link to the re2 syntax page.

@ianlancetaylor
Copy link
Contributor

This is documented at https://golang.org/pkg/regexp/syntax.

@Jarch09
Copy link
Author

Jarch09 commented Jun 4, 2020

Hey thanks for that. Below is the first paragraph in the regexp docs: https://golang.org/pkg/regexp/

Package regexp implements regular expression search.

The syntax of the regular expressions accepted is the same general syntax
used by Perl, Python, and other languages. More precisely, it is the syntax 
accepted by RE2 and described at https://golang.org/s/re2syntax, except
for \C. For an overview of the syntax, run...

Given this thread, I think this is a bit misleading. It should read:

The syntax of the regular expressions accepted is the syntax prescribed by RE2
and described at https://golang.org/s/re2syntax, except for \C.

This syntax largely conforms with the syntax used by Perl, Python, and other
languages, but there are some subtle differences, so please check the syntax docs.

@Jarch09
Copy link
Author

Jarch09 commented Jun 4, 2020

@ianlancetaylor - please consider this change. I think it'd be helpful to make this a little more explicit.

@ianlancetaylor ianlancetaylor changed the title Regexp Bug - Whitespace regexp: doc: lead with the precise syntax rather than referring to Perl, Python, etc. Jun 4, 2020
@ianlancetaylor ianlancetaylor reopened this Jun 4, 2020
@ianlancetaylor ianlancetaylor added Documentation help wanted NeedsFix The path to resolution is known, but the work has not been done. labels Jun 4, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Jun 4, 2020
@ianlancetaylor
Copy link
Contributor

I reopened the issue. Want to send a change?

@Jarch09
Copy link
Author

Jarch09 commented Jun 4, 2020

Sure will do. Thank you!

@benhoyt
Copy link
Contributor

benhoyt commented Jul 8, 2020

It seems weird to me that the first thing the Go regexp documentation does is link to another project, i.e., the docs for the C++ RE2 syntax. Could we link to Go documentation -- https://golang.org/pkg/regexp/syntax/ -- instead?

@qbradq
Copy link
Contributor

qbradq commented Jul 17, 2020

    This package uses RE2 syntax for regular expressions. Users of regular
    expressions in Perl, Python, and other languages and tools will find it
    familiar. For an overview of the syntax, see

        https://golang.org/pkg/regexp/syntax/

    Or run

        go doc regexp/syntax

@benhoyt Do you like this wording? Forgive me if @Jarch09 has already submitted a change.

@benhoyt
Copy link
Contributor

benhoyt commented Jul 17, 2020

@qbradq I like that wording, yes.

@gopherbot
Copy link

Change https://golang.org/cl/243399 mentions this issue: regexp: link to the regexp syntax documentation from go doc regexp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants