Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: Subsequent Token method calls in Tokenizer don't return same result #22621

Closed
rpeshkov opened this issue Nov 7, 2017 · 4 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@rpeshkov
Copy link

rpeshkov commented Nov 7, 2017

What version of Go are you using (go version)?

go 1.9.2

Does this issue reproduce with the latest release?

Yes, including the up-to-date x/net/html dependency

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/rpeshkov"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.9.2/libexec"
GOTOOLDIR="/usr/local/Cellar/go/1.9.2/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/dl/d4m5fbhs6k1cx9hhz1zdk4v40000gn/T/go-build962492023=/tmp/go-build -gno-record-gcc-switches -fno-common"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"

What did you do?

Link for code: https://play.golang.org/p/7ZwQB98kHE

What did you expect to see?

<div class="hello">
<div class="hello">

What did you see instead?

<div class="hello">
<>
@gopherbot gopherbot added this to the Unreleased milestone Nov 7, 2017
@agnivade
Copy link
Contributor

agnivade commented Nov 8, 2017

I believe your understanding of the API is incorrect.

Next scans the next token and returns its type.

Token returns the next token. So the code is doing what is expected.

Here is something which is more on the lines of what you are expecting -

func main() {
  reader := strings.NewReader("<div class=\"hello\">SomeText</div>")
  tokenizer := html.NewTokenizer(reader)
  tokenizer.Next()
  fmt.Printf("%s\n", tokenizer.Raw())
  fmt.Printf("%s\n", tokenizer.Raw())
}

@rpeshkov
Copy link
Author

rpeshkov commented Nov 8, 2017

But Raw return slice of bytes, while Token returns an instance of Token struct.

It's ok that Next changes the internal state of the object, but I don't feel that it's kind of logical that Token changes the state.

Also, I think that documentation is a little bit incorrect.

Next scans the next token and returns its type.

That's correct

Token returns the next token.

However, Token doesn't return the next token. It wraps the token you're currently on into struct and returns it to you. It doesn't somehow advance the iterator.

Basically, why I think that current logic is incorrect? Because since Token just wraps current 'state' into struct, it's weird for me that after wrap, internal state is messed.

@agnivade
Copy link
Contributor

@namusyaka

@namusyaka
Copy link
Member

@agnivade Thanks for pinging. I would like to work on this issue after resolving some crashing issues.

@ALTree ALTree added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 3, 2019
@rpeshkov rpeshkov closed this as not planned Won't fix, can't repro, duplicate, stale Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants