You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Matching word boundaries with '\b' does not work when the first or last character in the
word is a multi-byte UTF-8 code point such as 'é'.
Example:
http://play.golang.org/p/1to3IN9Mnf
What is the expected output?
Matching should succeed in all cases
What do you see instead?
Matching fails when the string includes "é" at the word boundary
Which compiler are you using (5g, 6g, 8g, gccgo)?
6g
Which operating system are you using?
Debian Squeeze
Which version are you using? (run 'go version')
go version go1.1.1 linux/amd64
The text was updated successfully, but these errors were encountered:
This is intentional: \b and \B are ASCII-only. Making them full Unicode
would require too much lookahead/lookbehind if we ever want to make a
faster byte-at-a-time matcher. This is the same tradeoff made by RE2. I
will update the regexp/syntax package doc.
Russ
I see. The syntax documentation on https://code.google.com/p/re2/wiki/Syntax defines
\b as "at word boundary (\w on one side and \W, \A, or \z on the other)". Since \w is
defined as "word characters (≡ [0-9A-Za-z_])", I suppose the documentation is already
correct, but drawing attention to this behavior would probably not hurt.
The text was updated successfully, but these errors were encountered: