New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go/scanner: Scanner.Scan sometimes returns token.ILLEGAL and empty literal string #28112
Comments
The |
@cznic The offending character here is the 'b' - it if were a '.' it would be ok. This is clearly a bug in the scanner. |
Heh, used |
@cznic My apologies - I concluded a bit too quickly. It's not clear here what exactly the offending character should be. Your suggestion of an empty character might be just as valid. It's certainly problematic that the 'if' starting on scanner.go:740 has not 'else' branch and thus silently exits the big 'switch' statement. Because the result 'tok' is not set at all, it returns the initial (invalid) token. |
For reference: #28128 |
Actually, while this is a bug in the scanner (and there are others, related to this), the spec is pretty clear:
So the longest valid sequence of characters here is a dot I vaguely remember having been aware of this for a long time but never bothered to get it 100% correct in go/scanner because it doesn't matter in practice. There's no Go code containing two dots ".." that is valid, so as long as we get an error, we're mostly ok. There are probably other such cases such as invalid octal or hex numbers, say: I will try to address the specific problem here if I can find an easy fix (which doesn't slow down the scanner everywhere because we need to keep more state around) but leave the more general problem open. |
Change https://golang.org/cl/141337 mentions this issue: |
In case you're curious how I discovered this, I ran into this in issue #28082. You left a comment there containing a fenced code block marked up with "Go" syntax, but it was actually a diff of Go code. I was viewing that comment with some software that performs syntax highlighting for Go using a highlighter implemented on top of |
@dmitshur Interesting. The scanner should always return the offending character (lit) if there's an ILLEGAL token. I believe this should be the case now. |
The previous logic did not handle the situation where token.ILLEGAL was returned together with an empty literal string, and panicked because it performed an out of bounds slice operation (s[low:high], where low > high). Improve the code so that doesn't happen, and add test cases for it. The token.ILLEGAL with empty literal string possibility was a bug in the go/scanner package (see golang/go#28112). It has been fixed in https://golang.org/cl/141337, and shouldn't happen again. Vendor a copy of go/scanner with https://golang.org/cl/141337 applied. It can and should be removed after Go 1.12 is released with that fix. Clean up the code in general.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes it does, with Go 1.11.1.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I tried scanning over the following minified Go code containing illegal tokens:
With the following program:
(Playground: https://play.golang.org/p/No77Yx0D4Ki.)
What did you expect to see?
go/scanner.Scanner.Scan
documentation says:Perhaps I'm misunderstanding the documentation, but I expected that if returned token is
token.ILLEGAL
, then the literal string should not be empty. I expected to see the literal string contain the offending character or characters.What did you see instead?
A
token.ILLEGAL
token and empty literal string:/cc @griesemer
The text was updated successfully, but these errors were encountered: