Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp/syntax: document that \b and \B are ASCII-only #5896

Closed
knuesel opened this issue Jul 16, 2013 · 4 comments
Closed

regexp/syntax: document that \b and \B are ASCII-only #5896

knuesel opened this issue Jul 16, 2013 · 4 comments
Milestone

Comments

@knuesel
Copy link

knuesel commented Jul 16, 2013

Matching word boundaries with '\b' does not work when the first or last character in the
word is a multi-byte UTF-8 code point such as 'é'. 

Example:

http://play.golang.org/p/1to3IN9Mnf


What is the expected output?
Matching should succeed in all cases

What do you see instead?
Matching fails when the string includes "é" at the word boundary

Which compiler are you using (5g, 6g, 8g, gccgo)?
6g


Which operating system are you using?
Debian Squeeze

Which version are you using?  (run 'go version')
go version go1.1.1 linux/amd64
@rsc
Copy link
Contributor

rsc commented Jul 16, 2013

Comment 1:

This is intentional: \b and \B are ASCII-only. Making them full Unicode
would require too much lookahead/lookbehind if we ever want to make a
faster byte-at-a-time matcher. This is the same tradeoff made by RE2. I
will update the regexp/syntax package doc.
Russ

@knuesel
Copy link
Author

knuesel commented Jul 16, 2013

Comment 3:

I see. The syntax documentation on https://code.google.com/p/re2/wiki/Syntax defines 
\b as "at word boundary (\w on one side and \W, \A, or \z on the other)". Since \w is
defined as "word characters (≡ [0-9A-Za-z_])", I suppose the documentation is already
correct, but drawing attention to this behavior would probably not hurt.

@rsc
Copy link
Contributor

rsc commented Jul 30, 2013

Comment 4:

Labels changed: added priority-later, go1.2, removed priority-triage.

Status changed to Accepted.

@robpike
Copy link
Contributor

robpike commented Aug 8, 2013

Comment 5:

This issue was closed by revision b4f370c.

Status changed to Fixed.

@rsc rsc added this to the Go1.2 milestone Apr 14, 2015
@rsc rsc removed the go1.2 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants