Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unicode: unicode.Is and bytes.Buffer.WriteRune get confused by negative runes #43254

Closed
davidben opened this issue Dec 18, 2020 · 8 comments
Closed
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@davidben
Copy link
Contributor

What version of Go are you using (go version)?

$ go version
go version go1.15.6 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/davidben/Library/Caches/go-build"
GOENV="/Users/davidben/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/davidben/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/davidben/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD="/Users/davidben/boringssl/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/qc/[...]/T/go-build280151090=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

https://play.golang.org/p/9ZkvjGuE1so

What did you expect to see?

unicode.Is and related functions should return false on negative values, as they do for other invalid runes. bytes.Buffer.WriteRune should write a replacement character, as it does for other invalid runes. In particular, a UTF-32 decoder could easily construct a negative rune before checking. (Looks like x/text/encoding/unicode/utf32 does that and then relies on RuneLen noticing.)

What did you see instead?

unicode.Is thinks some negative values are printable, and bytes.Buffer.WriteRune accidentally runs a single-byte fast path.

I wasn't sure at first whether this was a bug, but most functions seem to check for negative values or cast to uint32, so I think these should as well. (If they expect the rune be a real code point, that should probably be documented.)

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 22, 2020
@cagedmantis cagedmantis added this to the Backlog milestone Dec 22, 2020
@cagedmantis
Copy link
Contributor

/cc @robpike @mpvl

@gopherbot
Copy link

Change https://golang.org/cl/280493 mentions this issue: unicode: correctly handle negative runes

@gopherbot
Copy link

Change https://golang.org/cl/280492 mentions this issue: bufio, bytes, strings: handle negative runes in WriteRune

@davidben
Copy link
Contributor Author

If you all agree that the functions should handle this case, I uploaded some CLs to fix the instances I found.

gopherbot pushed a commit that referenced this issue Feb 24, 2021
Is and isExcludingLatin did not handle negative runes when dispatching
to is16. TestNegativeRune covers this along with the existing uint32
casts in IsGraphic, etc. (For tests, I picked the smallest non-Latin-1
code point in each range.)

Updates #43254

Change-Id: I17261b91f0d2b5b5125d19219411b45c480df74f
Reviewed-on: https://go-review.googlesource.com/c/go/+/280493
Run-TryBot: Rob Pike <r@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Rob Pike <r@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
gopherbot pushed a commit that referenced this issue Feb 24, 2021
Updates #43254

Change-Id: I7d4bf3b99cc36ca2156af5bb01a1c595419d1d3c
Reviewed-on: https://go-review.googlesource.com/c/go/+/280492
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Rob Pike <r@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Run-TryBot: Emmanuel Odeke <emmanuel@orijtech.com>
TryBot-Result: Go Bot <gobot@golang.org>
@odeke-em
Copy link
Member

Thank you @davidben for the report and for mailing the CLs, I’ve just merged both: shall we then close this issue?

@odeke-em
Copy link
Member

We might also need to add release notes. I’ll mark the CLs as requiring them.

@davidben
Copy link
Contributor Author

Thanks!

@gopherbot
Copy link

Change https://golang.org/cl/317469 mentions this issue: doc/go1.17: document fixes for negative rune handling

@dmitshur dmitshur modified the milestones: Backlog, Go1.17 May 6, 2021
@dmitshur dmitshur added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels May 6, 2021
gopherbot pushed a commit that referenced this issue May 6, 2021
CL 317273 accidentally grouped a fix for bufio, bytes, strings
packages into a single entry, but they should be separate ones.

Fix that, and document these negative rune handling fixes.

The list of fixed functions in package unicode was computed by
taking the functions covered by the new TestNegativeRunes test,
and including those that fail when tested with Go 1.16.3.

For #44513.
Updates #43254.

Change-Id: I6f387327f83ae52543526dbdcdd0bb5775c678bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/317469
Reviewed-by: David Benjamin <davidben@google.com>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
Trust: Alexander Rakoczy <alex@golang.org>
Trust: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Alexander Rakoczy <alex@golang.org>
@golang golang locked and limited conversation to collaborators May 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

5 participants