Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: Text() corrupts underlying tokenizer buffer #43268

Closed
Max-1892 opened this issue Dec 18, 2020 · 4 comments
Closed

x/net/html: Text() corrupts underlying tokenizer buffer #43268

Max-1892 opened this issue Dec 18, 2020 · 4 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@Max-1892
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.13.4 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/Cellar/go/1.13.4/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.13.4/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/1c/p7qvswhs13gbxtq6j5lqybp40000gn/T/go-build012787672=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I just wanted to verify the intent of the tokenizer interface. According to the documentation for Raw,

Raw returns the unmodified text of the current token. Calling Next, Token, Text, TagName or TagAttr may change the contents of the returned slice.

In the case of escaped characters, in this example, I notice the underlying buffer returned by Raw gets partially updated after a call to Text with the unescaped version of the character but parts of the escaped character still exist. For example, if the original buffer was a&lt;b, after a call to Text(), the buffer looks like a<bt;b. This behavior seems correct according to the documentation but I wanted to verify this is consistent with the intent of Raw. It seems to limit the usefulness of Raw when used with Next and Text.

What did you expect to see?

I was hoping to see the escaped character completely replaced in the underlying buffer returned by Raw() (a<b).

What did you see instead?

The underlying buffer looks like a<bt;b.

@gopherbot gopherbot added this to the Unreleased milestone Dec 18, 2020
@slrz
Copy link

slrz commented Dec 19, 2020

How would that work? You'd have to update the slice's length which the example stashes away in a local variable.

@Max-1892
Copy link
Author

Would it be useful to expose what parts of Raw are 'valid' (z.raw.start and z.raw.end) via additional methods?

@toothrot toothrot added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jan 5, 2021
@toothrot
Copy link
Contributor

toothrot commented Jan 5, 2021

/cc @namusyaka @nigeltao

@nigeltao
Copy link
Contributor

nigeltao commented Jan 9, 2021

This behavior seems correct according to the documentation but I wanted to verify this is consistent with the intent of Raw.

This is working as intended.

If you want "a&lt;b", call Raw.

If you want "a<b", call Text.

If you want both, call Raw, copy the []byte to your own buffer, then call Text.

Would it be useful to expose what parts of Raw are 'valid' (z.raw.start and z.raw.end) via additional methods?

I don't understand the proposal. If you want "a<b" then it's not 'raw' unmodified text. The "&lt;" escape code has been 'cooked' to be "<".

@nigeltao nigeltao closed this as completed Jan 9, 2021
@golang golang locked and limited conversation to collaborators Jan 9, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants