Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/net/html: Tokenizer could have a Position method #28343

Open
bdw opened this issue Oct 23, 2018 · 4 comments
Open

x/net/html: Tokenizer could have a Position method #28343

bdw opened this issue Oct 23, 2018 · 4 comments
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Milestone

Comments

@bdw
Copy link

bdw commented Oct 23, 2018

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go version go1.10.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

amd64/linux

What did you do?

I have a program that reads HTML streams and extracts sections of interest.
These byte slices are currently copied of from tokenizer.Raw().
But I'd rather just store the offsets and lengths into the full stream, given that I still want to hang on to the stream anyway. This would reduce the amount of copying needed and make my application more efficient.

This would be really easy if html.Tokenizer had a method Position() that would return the offset of the start of the current token (or possibly the start and end of the current token) from the start of the stream returned by its reader.

What did you expect to see?

A method Position() that tells me the offset of the start of the current token. And maybe also the end as the second return value.

What did you see instead?

There's no such method. And there's no such count either in the struct.

@gopherbot gopherbot added this to the Unreleased milestone Oct 23, 2018
@bradfitz bradfitz changed the title x/net/html.Tokenizer could have a Position() method x/net/html: Tokenizer could have a Position method Oct 23, 2018
@gopherbot
Copy link

Change https://golang.org/cl/145337 mentions this issue: html: enable copy-free usage of Tokenizer

@nigeltao
Copy link
Contributor

Do you really need a new Position or Offset method? Can't you just keep a running tally of len(tokenizer.Raw())?

@bdw
Copy link
Author

bdw commented Jan 18, 2019

@nigeltao sure we can, but the Offset method is convienient, uses a field that we already compute, and avoids creating and returns a bunch of slices that we're not interested in.

@nigeltao
Copy link
Contributor

Creating and returning a slice is cheap. It doesn't copy the underlying bytes, which IIUC was the original concern.

@ALTree ALTree added the NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made. label Jan 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsDecision Feedback is required from experts, contributors, and/or the community before a change can be made.
Projects
None yet
Development

No branches or pull requests

4 participants