Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regexp: regex pattern matching error #52460

Closed
syinwu opened this issue Apr 21, 2022 · 5 comments
Closed

regexp: regex pattern matching error #52460

syinwu opened this issue Apr 21, 2022 · 5 comments

Comments

@syinwu
Copy link

syinwu commented Apr 21, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18 darwin/arm64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="arm64"
GOBIN="/Users/bxlxx.wu/go/bin"
GOCACHE="/Users/bxlxx.wu/Library/Caches/go-build"
GOENV="/Users/bxlxx.wu/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="arm64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/bxlxx.wu/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/bxlxx.wu/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/opt/homebrew/Cellar/go/1.18/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/opt/homebrew/Cellar/go/1.18/libexec/pkg/tool/darwin_arm64"
GOVCS=""
GOVERSION="go1.18"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/g6/81m9ft5d2_q9xxrxn6x0cl0c0000gn/T/go-build4185655680=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

func main() {
	re, err := regexp.Compile(`\x8A`)
	if err != nil {
		fmt.Println(err)
		return
	}

	fmt.Println(re.Match([]byte("\x8A")) == true)
}

What did you expect to see?

The output of the test code should be the true, however, the output of the console is the false

@robpike
Copy link
Contributor

robpike commented Apr 21, 2022

The output should be false, because you are using backquotes `` in the argument to Compile, which makes the argument the literal three bytes '\' '8' 'A', and that's certainly not what you mean.

However, changing `` to "" doesn't help, as you can see here: https://go.dev/play/p/mMd57XnQKL6

The argument to Compile must be valid UTF-8, although I'm not sure that's explicitly documented.

@syinwu
Copy link
Author

syinwu commented Apr 21, 2022

I think this is a bug. If I use the following code, the result is true.

func main() {
	re, err := regexp.Compile(`\x30`)
	if err != nil {
		fmt.Println(err)
		return
	}

	fmt.Println(re.Match([]byte("\x30")) == true)
}

I checked the source code and used the WriteRune function in the regexp.Compile call chain. When writing a character with ASCII greater than 127, theWriteRune will write 2 bytes to the bytes array or string, which leads to the failure of the match function.

And, If I am wrong, how should I match a character whose ascii is greater than 127?

@syinwu syinwu closed this as completed Apr 21, 2022
@syinwu syinwu reopened this Apr 21, 2022
@robpike
Copy link
Contributor

robpike commented Apr 21, 2022

Please read https://go.dev/blog/strings for background about how text works in Go as well as in UTF-8. The term "character" is too vague to be technically useful in this context.

To answer your direct question, if we assume by "character" you mean "Unicode code point", use a Unicode escape like "\u1234" or write out the bytes of its UTF-8 encoding.

Also, \x30 in backquotes is 4 bytes, not one, and none of them is the byte with value 0x30.

@syinwu syinwu closed this as completed Apr 21, 2022
@syinwu
Copy link
Author

syinwu commented Apr 21, 2022

Thank you very much!

@robpike
Copy link
Contributor

robpike commented Apr 21, 2022

Ha! I'm wrong (as usual, but only a little). The ASCII zero numeral is hexadecimal 30, so my last sentence in my previous reply was incorrect. Still, the overarching point is right: Matching plain bytes won't work in general.

@golang golang locked and limited conversation to collaborators Apr 21, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants