Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text/scanner: Float tokens allow trailing 'e' or 'E' and sign (but Go literals don't) #26374

Closed
benhoyt opened this issue Jul 13, 2018 · 7 comments
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@benhoyt
Copy link
Contributor

benhoyt commented Jul 13, 2018

The documentation for text/scanner says that it "recognizes all literals as defined by the Go language specification". However, it seems to break that and be a little more liberal in that it accepts floating-point literals with a trailing e or E and no digits. It also allows a trailing + or - sign after the e/E. Neither of these are valid Go floating point literals (nor as input to ParseFloat, which seems to match the Go literal spec here).

Here is a Playground link to a small program that reproduces the issue: https://play.golang.org/p/leS1zsI4Rfz

In short, scanning the string "1.5e 1.5E 1e 1E 1e+ 1e- 1.0 1 1.5z" gives the following sequence of tokens (third value on each line is whether it's valid according to Go/ParseFloat or not):

Float "1.5e" false
Float "1.5E" false
Float "1e" false
Float "1E" false
Float "1e+" false
Float "1e-" false
Float "1.0" true
Int "1" true
Float "1.5" true
Ident "z" false

However, what I'd expect according to the text/scanner docs is for the trailing e's and signs to be not included in the float, but parsed as subsequent tokens, like so:

Float "1.5" true
Ident "e" false
Float "1.5" true
Ident "E" false
Int "1" true
Ident "e" false
Int "1" true
Ident "E" false
Int "1" true
Ident "e" false
"+" "+" false
Int "1" true
Ident "e" false
"-" "-" false
Float "1.0" true
Int "1" true
Float "1.5" true
Ident "z" false

Here's the relevant scanner.go source code. scanExponent uses scanMantissa, which allows 0 or more digits instead of 1 or more digits (for this case).

I ran into this issue when using text/scanner and then passing the TokenText() return value for Float tokens to ParseFloat, and getting an unexpected error on strings like 1.5e.

Go version and OS/arch: go1.10 darwin/amd64. Though from looking at the code it's not arch- or OS-specific. Yes, the issue reproduces with the latest release.

If folks (@griesemer?) agree that this is a bug, I'd be interested in contributing a fix. However, if it's decided that users may be depending on this behavior (seem unlikely as this apparently hasn't been reported yet -- I searched for issues) and it's too late to change it, then we'll want a documentation fix.

@ianlancetaylor ianlancetaylor added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 13, 2018
@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Jul 13, 2018
@benhoyt
Copy link
Contributor Author

benhoyt commented Jul 16, 2018

@ianlancetaylor What investigation does this need? Is it something I could tackle?

@ianlancetaylor
Copy link
Contributor

The "investigation" that it needs is exactly the answer to your question in the last paragraph: do people like @griesemer agree that this is a bug? @griesemer is on vacation this month and will no doubt take a look when he returns.

@griesemer griesemer added NeedsFix The path to resolution is known, but the work has not been done. and removed NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. labels Aug 7, 2018
@griesemer
Copy link
Contributor

This does look like a bug to me; and there doesn't seem to be a test expecting these erroneous constants either. If there was any intent behind this at some point, I don't remember.

It seems ok to fix this; I'd be surprised if existing code relies on this, especially because the respective constant literals wouldn't make it through the strconv routines anyway.

The fix might be as simple as:

diff --git a/src/text/scanner/scanner.go b/src/text/scanner/scanner.go
index 4e76664dc0..893a4edbaf 100644
--- a/src/text/scanner/scanner.go
+++ b/src/text/scanner/scanner.go
@@ -384,6 +384,9 @@ func (s *Scanner) scanExponent(ch rune) rune {
                if ch == '-' || ch == '+' {
                        ch = s.next()
                }
+               if !isDecimal(ch) {
+                       s.error("illegal exponent")
+               }
                ch = s.scanMantissa(ch)
        }
        return ch

(plus test cases, etc.)

@benhoyt
Copy link
Contributor Author

benhoyt commented Aug 7, 2018

Thanks, @griesemer. I'll see about writing up a CL with tests when I get back from vacation in a week or so.

@gopherbot
Copy link

Change https://golang.org/cl/129095 mentions this issue: text/scanner: don't allow Float exponents with no mantissa

@benhoyt
Copy link
Contributor Author

benhoyt commented Aug 11, 2018

@griesemer I added the fix you suggested and also tests to match: https://go-review.googlesource.com/c/go/+/129095

One of the things that was a bit unclear to me was how much data should be consumed when an error occurs. For example, parsing 0xg gives "illegal hexadecimal number" and stops just before the g, whereas parsing 0183 gives "illegal octal number" but stops at the end after the 3 (Playground link). What we have here for the exponent error is simple and greedy (it stops after the e and the +), which I guess is fine.

@griesemer
Copy link
Contributor

@benhoyt The scanner should consume characters as long as they are possibly leading to a valid token, and stop as soon as an error is detected. (I'm not saying this is what happens consistently, but at least that's the idea.)

For 0xg, 0x clearly is a valid start, and the g is clearly invalid so we don't consume it. For 0183 we only know after looking at the next character if it is valid or not (0183. is a valid floating point number), so this seems consistent.

@golang golang locked and limited conversation to collaborators Aug 20, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants