Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp/syntax: improve printing of flags #57950

Closed
theseion opened this issue Jan 22, 2023 · 6 comments
Closed

regexp/syntax: improve printing of flags #57950

theseion opened this issue Jan 22, 2023 · 6 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@theseion
Copy link

Parsing and printing expressions with flags produces pretty bad output. Example: (?i:ab*c|d?e) -> (?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E). While the output is semantically correct, it causes the resulting expression to grow in size by approximately factor 3.

I am a core developer of https://github.com/coreruleset and we use Go to generate regular expressions that can become very large (hundreds of KB). A growth of an expression by factor 3 is not acceptable for us. The issue for us is mostly about the fold case flag (i) but other flags are affected as well, e.g., ab*c.|d?e. -> ab*c(?-s:.)|d?e(?-s:.) (better: (?:ab*c|d?e)(?-s:.), or (?-s)(?:ab*c|d?e).).

I propose to improve printing of regular expressions, so that flags are only repeated if necessary.

@seankhliao
Copy link
Member

why wouldn't you just keep the original, like regexp.Regex ?

cc @rsc

@robpike
Copy link
Contributor

robpike commented Jan 22, 2023

How are you printing it? The existing code does exactly what you want: https://go.dev/play/p/R5sPE02FYKJ

package main

import (
	"fmt"
	"regexp"
)

func main() {
	fmt.Println(regexp.MustCompile(`(?i:ab*c|d?e)`))
}

@theseion
Copy link
Author

theseion commented Jan 23, 2023

We generate a single regular expression out of multiple smaller expressions. Here's an example:

\s+__\$
command1
command2

Each line is treated as an alternation. The result would be \s+__\$|command1|command2. We do more complicated things too, such as substitutions and concatenation. In the end, we always end up with a list of alternations as above. We then feed these to a library (https://github.com/itchyny/rassemble-go) for final assembly and optimisation.

This process is also recursive in some cases, so the output from rassemble-go may become one of the alternations.

rassemble-go uses regex/syntax to parse and process the expression:

import "regexp/syntax"

r, _ := syntax.Parse("(?i:ab*c|d?e)", syntax.ClassNL|syntax.PerlX)
println(r.String())

The result is (?i:A)(?i:B)*(?i:C)|(?i:D)?(?i:E).

To finalize:

  1. the "original" expression may be embedded in another expression and I don't know the position in the string where I would need to insert the flags
  2. rassemble-go uses regexp/syntax to parse and manipulate the regular expression, hence the output differs from the one produced by regexp.Regexp

@rsc rsc changed the title proposal: regexp/syntax: improve printing of flags regexp/syntax: improve printing of flags Jun 28, 2023
@rsc rsc removed the Proposal label Jun 28, 2023
@gopherbot
Copy link

Change https://go.dev/cl/507015 mentions this issue: regexp/syntax: use more compact Regexp.String output

@rsc
Copy link
Contributor

rsc commented Jun 28, 2023

I think this is a simple enough change that it doesn't need a proposal. I sent https://go-review.googlesource.com/c/go/+/507015. It will probably be in Go 1.22 (too late for the upcoming Go 1.21). You can copy the code if you want to use it ahead of then.

@cagedmantis cagedmantis added the NeedsFix The path to resolution is known, but the work has not been done. label Jun 29, 2023
@cagedmantis cagedmantis modified the milestones: Proposal, Go1.22 Jun 29, 2023
@theseion
Copy link
Author

Wow, that's amazing @rsc! I wouldn't call that a "small change" but I'm not complaining 😄.

theseion added a commit to theseion/rassemble-go that referenced this issue Feb 17, 2024
1.22 improves printing of regular expressions.

See golang/go#57950
See https://go-review.googlesource.com/c/go/+/507015
theseion added a commit to coreruleset/crs-toolchain that referenced this issue Feb 17, 2024
theseion added a commit to coreruleset/crs-toolchain that referenced this issue Feb 17, 2024
Go 1.22 includes improved printing of regular expression flags in
regexp/syntax.

See golang/go#57950
See https://go-review.googlesource.com/c/go/+/507015
theseion added a commit to coreruleset/crs-toolchain that referenced this issue Feb 17, 2024
Go 1.22 includes improved printing of regular expression flags in
regexp/syntax.

See golang/go#57950
See https://go-review.googlesource.com/c/go/+/507015
theseion added a commit to coreruleset/crs-toolchain that referenced this issue Feb 17, 2024
Go 1.22 includes improved printing of regular expression flags in
regexp/syntax.

See golang/go#57950
See https://go-review.googlesource.com/c/go/+/507015
theseion added a commit to coreruleset/crs-toolchain that referenced this issue Feb 18, 2024
Go 1.22 includes improved printing of regular expression flags in
regexp/syntax.

See golang/go#57950
See https://go-review.googlesource.com/c/go/+/507015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants