Issue 4538123: code review 4538123: exp/regexp/syntax: syntax data structures, parser

Issue 4538123: code review 4538123: exp/regexp/syntax: syntax data structures, parser (Closed)

Can't Edit
Can't Publish+Mail
Start Review

Created:
13 years, 9 months ago by rsc

Modified:
13 years, 9 months ago

Reviewers:

CC:
r, Sam, kevlar, eds2, golang-dev

Visibility:
Public.

Description

exp/regexp/syntax: syntax data structures, parser Parser is a work in progress but can populate most of the interesting parts of the data structure, so a good checkpoint. All the complicated Perl syntax is missing, as are various important optimizations made during parsing to the syntax tree. The plan is that exp/regexp's API will mimic regexp, and exp/regexp/syntax provides the parser directly for programs that need it (and for implementing exp/regexp). Once finished, exp/regexp will replace regexp.

Patch Set 1 #

Patch Set 2 : diff -r bece9e7a644b https://go.googlecode.com/hg/ #

Patch Set 3 : diff -r bece9e7a644b https://go.googlecode.com/hg/ #

Total comments: 45

Patch Set 4 : diff -r 69f12bae0f09 https://go.googlecode.com/hg #

Patch Set 5 : diff -r 69f12bae0f09 https://go.googlecode.com/hg #

Patch Set 6 : diff -r 69f12bae0f09 https://go.googlecode.com/hg #

Total comments: 6

Patch Set 7 : diff -r d1e5d7548fee https://go.googlecode.com/hg #

Total comments: 11

Patch Set 8 : diff -r 881a0fc6528d https://go.googlecode.com/hg #

Created: 13 years, 9 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+1050 lines, -0 lines)			Patch
M	src/pkg/Makefile	View	1 2 3 4 5 6 7	1 chunk	+1 line, -0 lines	0 comments	Download
A	src/pkg/exp/regexp/syntax/Makefile	View	1	1 chunk	+12 lines, -0 lines	0 comments	Download
A	src/pkg/exp/regexp/syntax/parse.go	View	1 2 3 4 5 6 7	1 chunk	+561 lines, -0 lines	0 comments	Download
A	src/pkg/exp/regexp/syntax/parse_test.go	View	1 2 3 4 5 6 7	1 chunk	+266 lines, -0 lines	0 comments	Download
A	src/pkg/exp/regexp/syntax/regexp.go	View	1 2 3 4 5 6 7	1 chunk	+210 lines, -0 lines	0 comments	Download

Messages

Total messages: 18

Expand All Messages | Collapse All Messages

rsc

Hello r, sam (cc: golang-dev@googlegroups.com), I'd like you to review this change to https://go.googlecode.com/hg/

13 years, 9 months ago (2011-06-08 19:12:41 UTC) #1

kevlar

Very nice. Just a few things I noticed while reading through and trying to understand ...

13 years, 9 months ago (2011-06-08 20:24:39 UTC) #2

rsc

On Wed, Jun 8, 2011 at 16:24, <kevlar@google.com> wrote: > Very nice. Just a few ...

13 years, 9 months ago (2011-06-08 20:26:52 UTC) #3

eds2

I'm excited to see some work on an improved regexp package. Thanks! I noticed a ...

13 years, 9 months ago (2011-06-08 22:30:25 UTC) #4

rsc

> http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/regexp.go#newcode24 > src/pkg/exp/regexp/syntax/regexp.go:24: Sub0 [1]*Regexp // storage > for short Sub > It seems ...

13 years, 9 months ago (2011-06-08 22:43:06 UTC) #5

Sam

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go File src/pkg/exp/regexp/syntax/parse.go (right): http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go#newcode30 src/pkg/exp/regexp/syntax/parse.go:30: ErrInternalError ErrorCode = "regexp/syntax: internal error" you only need ...

13 years, 9 months ago (2011-06-09 02:15:49 UTC) #6

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go File src/pkg/exp/regexp/syntax/parse.go (right): http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go#newcode41 src/pkg/exp/regexp/syntax/parse.go:41: ErrMissingBracket = "missing closing ]" why does the ErrorCode ...

13 years, 9 months ago (2011-06-09 02:56:03 UTC) #7

rsc

13 years, 9 months ago (2011-06-09 14:05:13 UTC) #8

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
File src/pkg/exp/regexp/syntax/parse.go (right):

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
src/pkg/exp/regexp/syntax/parse.go:30: ErrInternalError ErrorCode =
"regexp/syntax: internal error"
On 2011/06/09 02:15:49, Sam wrote:
> you only need ErrorCode here, not below.

No, since there are initializers it's needed on each line.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
src/pkg/exp/regexp/syntax/parse.go:41: ErrMissingBracket                  =
"missing closing ]"
On 2011/06/09 02:56:04, r wrote:
> why does the ErrorCode type stop here?

I screwed up.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
src/pkg/exp/regexp/syntax/parse.go:69: POSIX Flags = 0                          
              // POSIX syntax
On 2011/06/09 02:56:04, r wrote:
> how close is POSIX to egrep, that is, to the existing regexp package?

POSIX really means POSIX egrep syntax, but that's a little
more than what regexp does now.

The differences compared to the current regexp are:
  - escapes like \a \f \n \t \v \x7F \023 
  - counted repetition like x{5} x{5,} x{5,10}
  - classes like [[:space:]]

Except for those, I believe they are the same.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
src/pkg/exp/regexp/syntax/parse.go:113: p.push(re)
On 2011/06/09 02:56:04, r wrote:
> the whole thing could be one line if you want, but it's a matter of taste how
> much you want to break it up. i don't care.

It used to do more (and push used to not return a *Regexp).
Will fix.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
src/pkg/exp/regexp/syntax/parse.go:402: 
On 2011/06/09 02:15:49, Sam wrote:
> You want to look for ':' between the negate and the beginning of the loop. If
> that's the case, it's a [:alnum:] class only.

That will come.  Even after I decided to stop I had to
throw code away to make the code review a reasonable size.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
src/pkg/exp/regexp/syntax/parse.go:414: // TODO: Look for [:alnum:]
On 2011/06/09 02:15:49, Sam wrote:
> if t[0] == '[' {
>     // TODO: parse [:alnum:]
>     continue
> }

It will come later, I promise.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
src/pkg/exp/regexp/syntax/parse.go:416: // TODO: Look for Perl group.
On 2011/06/09 02:15:49, Sam wrote:
> You could replace these two lines with a call to parseClassChar.

Possibly but possibly not; parseClassChar reads a single
char that might be involved in an a-z range.  It's not
set up to return a giant class.

Let's wait until I address the TODO.
This code is following the RE2 parser fairly closely,
and that code does work.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
File src/pkg/exp/regexp/syntax/parse_test.go (right):

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
src/pkg/exp/regexp/syntax/parse_test.go:226: case OpEndText:
On 2011/06/09 02:56:04, r wrote:
> default?

Default is print nothing inside the braces,
like emp{} and bol{}.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/reg...
File src/pkg/exp/regexp/syntax/regexp.go (right):

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/reg...
src/pkg/exp/regexp/syntax/regexp.go:40: OpLiteral                      //
matches Runes sequence
On 2011/06/09 02:15:49, Sam wrote:
> this seems ambiguous; can OpLiteral match a long string, or just a single UTF8
> rune? It does both in your parser.

It matches the sequence of (one or more) Runes.

> This seems difficult to build a runner around, as it's not clear that each Op
is
> a single char (e.g. OpAnyChar) or zero chars (e.g. OpWordBoundary). The state
of
> the runner will need to be a set of (state, index_in_string).

This is just the parse tree.  A separate compilation pass
will have to turn this into a program to run.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/reg...
src/pkg/exp/regexp/syntax/regexp.go:51: OpStar                         //
matches Sub[0] zero or more times
On 2011/06/09 02:15:49, Sam wrote:
> So in general I feel like this is an interesting tradeoff between clarity in
the
> in-memory representation versus actually parsing a regexp down to a simpler
> representation (e.g., why provide OpPlus when you can just duplicate the node,
> connect them, and use OpStar on the second).
> I suspect you have strong opinions on this (and it does allow you to render
the
> RE completely back to the user). Anyway, I feel like it adds more complexity
to
> any eventual runner.

This data structure is about parsing into a form that can be analyzed
and processed.  One of the ways to process it is to compile it into a
program to run.  But even during compilation + is important to handle on its
own.

Rewriting x+ into xx* during compilation makes a copy of x.
If you keep it as x+ you can avoid that copy.  See the diagrams
in the "Converting Regular Expressions to NFAs" section of
http://swtch.com/~rsc/regexp/regexp1.html and the instruction 
listings near the beginning of http://swtch.com/~rsc/regexp/regexp2.html

Especially if x is some long expression, it's a big win not to
make an unnecessary copy of it.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/reg...
src/pkg/exp/regexp/syntax/regexp.go:64: b.WriteString("<invalid op>")
On 2011/06/09 02:56:04, r wrote:
> always good to indicate the value: + strconv.ItoA(re.Op)? maybe not worth it;
> not sure what this is for (no doc comment, for one thing).  generally the
error
> states in here don't say much

Thanks. I was trying not to use fmt and forgot I had strconv.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/reg...
src/pkg/exp/regexp/syntax/regexp.go:149: b.WriteString(strconv.Itoa(re.Min))
On 2011/06/09 02:15:49, Sam wrote:
> re.Max
> (also, if re.Max < re.Min, I suspect that's "bad" from a user-point of view,
but
> I suppose it may still be valid).

Nice catch.  Untested since the parser doesn't read that yet.  :-)

rsc

Hello r@golang.org, sam.thorogood@gmail.com, kevlar@google.com, edsrzf@gmail.com (cc: golang-dev@googlegroups.com), Please take another look.

13 years, 9 months ago (2011-06-09 22:14:51 UTC) #9

rsc

PTAL http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go File src/pkg/exp/regexp/syntax/parse.go (right): http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go#newcode60 src/pkg/exp/regexp/syntax/parse.go:60: NonGreedy // repetition operators are non-greedy by default ...

13 years, 9 months ago (2011-06-09 22:17:14 UTC) #10

Sam

LGTM Although I'm hardly authoritative. http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go File src/pkg/exp/regexp/syntax/parse.go (right): http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go#newcode60 src/pkg/exp/regexp/syntax/parse.go:60: NonGreedy // repetition operators ...

13 years, 9 months ago (2011-06-10 00:14:54 UTC) #11

LGTM

Although I'm hardly authoritative.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
File src/pkg/exp/regexp/syntax/parse.go (right):

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
src/pkg/exp/regexp/syntax/parse.go:60: NonGreedy                       //
repetition operators are non-greedy by default
On 2011/06/09 22:17:14, rsc wrote:
> On 2011/06/09 02:15:49, Sam wrote:
> > I think you mean 'are greedy by default'.
> 
> No, the NonGreedy flag makes them non-greedy by default.
> Changed to 'make repetition operators default to non-greedy'.

Yes, the 'by default' part was confusing and seemed to imply default state
without the flag. The new comment is much better :-)

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
src/pkg/exp/regexp/syntax/parse.go:168: for i > 0 && p.stack[i-1].Op < opPseudo
{
On 2011/06/09 02:15:49, Sam wrote:
> ditto here re: the comment searching for '|' or '('.

Actually, we should never find a | here, because it should always be swapped and
then popped before this call. Right?

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/reg...
File src/pkg/exp/regexp/syntax/regexp.go (right):

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/reg...
src/pkg/exp/regexp/syntax/regexp.go:40: OpLiteral                      //
matches Runes sequence
On 2011/06/09 14:05:13, rsc wrote:
> On 2011/06/09 02:15:49, Sam wrote:
> > this seems ambiguous; can OpLiteral match a long string, or just a single
UTF8
> > rune? It does both in your parser.
> 
> It matches the sequence of (one or more) Runes.
> 
> > This seems difficult to build a runner around, as it's not clear that each
Op
> is
> > a single char (e.g. OpAnyChar) or zero chars (e.g. OpWordBoundary). The
state
> of
> > the runner will need to be a set of (state, index_in_string).
> 
> This is just the parse tree.  A separate compilation pass
> will have to turn this into a program to run.

Ok, this resolves my concerns with this data structure. I guess I'm a tad
confused as to whether Regexp will be the canonical representation passed around
by users (cost of compiling it on every use), or whether it's better to compile
it to this program first (cost of memory).

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/reg...
src/pkg/exp/regexp/syntax/regexp.go:51: OpStar                         //
matches Sub[0] zero or more times
> Especially if x is some long expression, it's a big win not to
> make an unnecessary copy of it.

Yeah, I get that.

http://codereview.appspot.com/4538123/diff/20001/src/pkg/exp/regexp/syntax/pa...
File src/pkg/exp/regexp/syntax/parse.go (right):

http://codereview.appspot.com/4538123/diff/20001/src/pkg/exp/regexp/syntax/pa...
src/pkg/exp/regexp/syntax/parse.go:63: WasDollar                       // regexp
OpEndText was $, not \z
I'm kind of confused as to why you need this in this way, and don't need a
parallel for ^ vs \a - it's clearly for use in a follow-up CL. I suspect you
know what you're doing :)

http://codereview.appspot.com/4538123/diff/20001/src/pkg/exp/regexp/syntax/pa...
File src/pkg/exp/regexp/syntax/parse_test.go (right):

http://codereview.appspot.com/4538123/diff/20001/src/pkg/exp/regexp/syntax/pa...
src/pkg/exp/regexp/syntax/parse_test.go:22: //	{ "ab", "str{ab}" },
won't these str{} cases work - as just an extension of lit{} to >1 char?

http://codereview.appspot.com/4538123/diff/20001/src/pkg/exp/regexp/syntax/pa...
src/pkg/exp/regexp/syntax/parse_test.go:108: //	{ "(ab)*", "star{cap{str{ab}}}"
},
I think this one should work now

rsc

PTAL http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go File src/pkg/exp/regexp/syntax/parse.go (right): http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go#newcode168 src/pkg/exp/regexp/syntax/parse.go:168: for i > 0 && p.stack[i-1].Op < opPseudo ...

13 years, 9 months ago (2011-06-10 00:27:37 UTC) #12

PTAL

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
File src/pkg/exp/regexp/syntax/parse.go (right):

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/par...
src/pkg/exp/regexp/syntax/parse.go:168: for i > 0 && p.stack[i-1].Op < opPseudo
{
On 2011/06/10 00:14:54, Sam wrote:
> On 2011/06/09 02:15:49, Sam wrote:
> > ditto here re: the comment searching for '|' or '('.
> 
> Actually, we should never find a | here, because it should always be swapped
and
> then popped before this call. Right?

Hey, it's not fair to ask for comments and then
complain when they're not correct.  
That's entrapment or something.  :-)

Fixed.

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/reg...
File src/pkg/exp/regexp/syntax/regexp.go (right):

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/reg...
src/pkg/exp/regexp/syntax/regexp.go:40: OpLiteral                      //
matches Runes sequence
> Ok, this resolves my concerns with this data structure. I guess I'm a tad
> confused as to whether Regexp will be the canonical representation passed
around
> by users (cost of compiling it on every use), or whether it's better to
compile
> it to this program first (cost of memory).

The plan is that the regexp package would take an input string,
call syntax.Parse to get a *syntax.Regexp, then call a compile
function to get a *regexp.prog, throw away the *syntax.Regexp,
and execute the *regexp.prog on each call.  Most packages will
have no idea that *syntax.Regexp exists, and it will just be
garbage generated during regexp.Compile.  

But sometimes it is very useful, and when you see how complex
the parser is going to get, you'll understand why we don't have
a separate parser for generating the Prog directly.  :-)
Even though the parser will get more complex, this Regexp representation
will not; the parser and the Regexp data structure serve to hide a
the complexity of the Perl syntax from the compilation phase.

http://codereview.appspot.com/4538123/diff/20001/src/pkg/exp/regexp/syntax/pa...
File src/pkg/exp/regexp/syntax/parse.go (right):

http://codereview.appspot.com/4538123/diff/20001/src/pkg/exp/regexp/syntax/pa...
src/pkg/exp/regexp/syntax/parse.go:63: WasDollar                       // regexp
OpEndText was $, not \z
On 2011/06/10 00:14:55, Sam wrote:
> I'm kind of confused as to why you need this in this way, and don't need a
> parallel for ^ vs \a - it's clearly for use in a follow-up CL. I suspect you
> know what you're doing :)

In Perl, there is \Z (match end of text or right before \n at end of text)
and then there is \z (end of text only).  
In Perl single-line mode $ means \Z, but we don't
support \Z, so in our single-line mode we map $ to \z.  If we every want
to run tests against PCRE to make sure we get the same answers,
it's important to know if a particular OpEndText was a $, because then
we should expect our library and PCRE to disagree once in a while
about whether something matches.

There is no equivalent for ^ and \A because Perl didn't screw up \A.

http://codereview.appspot.com/4538123/diff/20001/src/pkg/exp/regexp/syntax/pa...
File src/pkg/exp/regexp/syntax/parse_test.go (right):

http://codereview.appspot.com/4538123/diff/20001/src/pkg/exp/regexp/syntax/pa...
src/pkg/exp/regexp/syntax/parse_test.go:22: //	{ "ab", "str{ab}" },
On 2011/06/10 00:14:55, Sam wrote:
> won't these str{} cases work - as just an extension of lit{} to >1 char?

That's the plan, but for now they compile into the next line below:
cat{lit{a}lit{b}} instead of lit{ab}.  The alternate and concat methods
will get much more complicated before all this is done, and one
of the complications is turning cat{lit{a}lit{b}} into str{ab}.

http://codereview.appspot.com/4538123/diff/20001/src/pkg/exp/regexp/syntax/pa...
src/pkg/exp/regexp/syntax/parse_test.go:108: //	{ "(ab)*", "star{cap{str{ab}}}"
},
On 2011/06/10 00:14:55, Sam wrote:
> I think this one should work now

Same thing as above: the uncommented copy a few lines from
now is the current parse tree, because nothing generates str yet.

Sam

http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go File src/pkg/exp/regexp/syntax/parse.go (right): http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go#newcode168 src/pkg/exp/regexp/syntax/parse.go:168: for i > 0 && p.stack[i-1].Op < opPseudo { ...

13 years, 9 months ago (2011-06-10 01:03:32 UTC) #13

rsc

On Thu, Jun 9, 2011 at 21:03, <sam.thorogood@gmail.com> wrote: > > http://codereview.appspot.com/4538123/diff/4001/src/pkg/exp/regexp/syntax/parse.go > File src/pkg/exp/regexp/syntax/parse.go ...

13 years, 9 months ago (2011-06-10 01:08:18 UTC) #14

Sam

Sounds good. LGTM again. On Fri, Jun 10, 2011 at 11:08, Russ Cox <rsc@golang.org> wrote: ...

13 years, 9 months ago (2011-06-10 01:10:46 UTC) #15

LGTM for checkpoint http://codereview.appspot.com/4538123/diff/6006/src/pkg/exp/regexp/syntax/parse.go File src/pkg/exp/regexp/syntax/parse.go (right): http://codereview.appspot.com/4538123/diff/6006/src/pkg/exp/regexp/syntax/parse.go#newcode12 src/pkg/exp/regexp/syntax/parse.go:12: ) when the time comes, i'd ...

13 years, 9 months ago (2011-06-10 04:49:10 UTC) #16

rsc

> http://codereview.appspot.com/4538123/diff/6006/src/pkg/exp/regexp/syntax/parse.go#newcode12 > src/pkg/exp/regexp/syntax/parse.go:12: ) > when the time comes, i'd like a doc.go with ...

13 years, 9 months ago (2011-06-13 12:55:07 UTC) #17

>
http://codereview.appspot.com/4538123/diff/6006/src/pkg/exp/regexp/syntax/par...
> src/pkg/exp/regexp/syntax/parse.go:12: )
> when the time comes, i'd like a doc.go with the definition of the syntax
> and how the flags affect syntax and behavior. it can wait, but not too
> long because it helps me understand the minutiae.

http://code.google.com/p/re2/wiki/Syntax

>
http://codereview.appspot.com/4538123/diff/6006/src/pkg/exp/regexp/syntax/par...
> src/pkg/exp/regexp/syntax/parse.go:59: OneLine
> // treat ^ and $ as only matching at beginning and end of text
> this doesn't feel like the right name but there might not be a good one.
>  it's not about the lines, it's about ^ and $ vs inner newline
> characters.
>
> i don't have a great idea. InnerAnchors? AnchorEnds? AnchorNLs?
>
>
http://codereview.appspot.com/4538123/diff/6006/src/pkg/exp/regexp/syntax/par...
> src/pkg/exp/regexp/syntax/parse.go:61: PerlX
> // allow Perl extensions
> not a fan of this either, because of X.  what's wrong with PerlExt?

I agree that these names are not great.
They are what RE2 uses, though, so for now I'd like to
leave them the same.  Once the coding is done I'm
happy to rename/rearrange things.

>
http://codereview.appspot.com/4538123/diff/6006/src/pkg/exp/regexp/syntax/par...
> src/pkg/exp/regexp/syntax/parse_test.go:197: func dumpRegexp(b
> *bytes.Buffer, re *Regexp) {
> how/why does this differ from writeRegexp? function comments would help.
> this looks like it's a dump of the tree; write looks like it recreates
> the expression.

Yes.  Added.  dump is needed to test that you get the
syntax tree you expect (for example, a string node instead
of a concatenation of two literal nodes; not that that works yet).

>
http://codereview.appspot.com/4538123/diff/6006/src/pkg/exp/regexp/syntax/reg...
> src/pkg/exp/regexp/syntax/regexp.go:21: Op       Op // operator
> do you intend to keep all these fields public?

Yes, but this is the regexp/syntax package, so clients of regexp won't see them.

>
http://codereview.appspot.com/4538123/diff/6006/src/pkg/exp/regexp/syntax/reg...
> src/pkg/exp/regexp/syntax/regexp.go:59: const opPseudo Op = 128 // where
> pseudo-ops start
> i'd move this up into the bock of consts above

If I do that godoc will tell people there are unexported constants.

>
http://codereview.appspot.com/4538123/diff/6006/src/pkg/exp/regexp/syntax/reg...
> src/pkg/exp/regexp/syntax/regexp.go:180: const meta = `\.+*?()|[]^$`
> are {} meta?

Yes, fixed.

Russ

rsc

13 years, 9 months ago (2011-06-13 13:20:25 UTC) #18

*** Submitted as http://code.google.com/p/go/source/detail?r=5e5d55182675 ***

exp/regexp/syntax: syntax data structures, parser

Parser is a work in progress but can populate most of the
interesting parts of the data structure, so a good checkpoint.
All the complicated Perl syntax is missing, as are various
important optimizations made during parsing to the
syntax tree.

The plan is that exp/regexp's API will mimic regexp,
and exp/regexp/syntax provides the parser directly
for programs that need it (and for implementing exp/regexp).

Once finished, exp/regexp will replace regexp.

R=r, sam.thorogood, kevlar, edsrzf
CC=golang-dev
http://codereview.appspot.com/4538123

Expand All Messages | Collapse All Messages