Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/scanner: scanNumber() method produces inconsistent error message #59441

Closed
hafus opened this issue Apr 4, 2023 · 3 comments
Closed

go/scanner: scanNumber() method produces inconsistent error message #59441

hafus opened this issue Apr 4, 2023 · 3 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@hafus
Copy link

hafus commented Apr 4, 2023

What version of Go are you using (go version)?

$ go version
go version go1.20 windows/amd64

Does this issue reproduce with the latest release?

Yes, it's

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=auto
set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOEXPERIMENT=
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOVCS=
set GOVERSION=go1.20
set GCCGO=gccgo
set GOAMD64=v1
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=0
set GOMOD=
set GOWORK=
set CGO_CFLAGS=-O2 -g
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-O2 -g
set CGO_FFLAGS=-O2 -g
set CGO_LDFLAGS=-O2 -g
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=

What did you do?

I ran the code snippets listed below to check the error message printed by the number parser. The test targets octal numbers, but it is valid for other number systems as well. As you can see the error message is varies depending on the invalid value being in the decimal range or not.

1- In decimal range.

func main() {
	fmt.Println(0o19)
}

2- In the hex range.

func main() {
	fmt.Println(0o1A)
}

What did you expect to see?

1- Output:

./prog.go:8:17: invalid digit '9' in octal literal

2- Output:

./prog.go:8:17: invalid digit 'A' in octal literal

What did you see instead?

1- Output:

./prog.go:8:17: invalid digit '9' in octal literal

2- Output:

./prog.go:8:17: syntax error: unexpected A in argument list; possibly missing comma or )
@hafus
Copy link
Author

hafus commented Apr 4, 2023

I think the root cause is on the func below. It should be extended to handle invalid values for other number systems.

func (s *Scanner) digits(base int, invalid *int) (digsep int) {
	if base <= 10 {
		max := rune('0' + base)
		for isDecimal(s.ch) || s.ch == '_' {
			ds := 1
			if s.ch == '_' {
				ds = 2
			} else if s.ch >= max && *invalid < 0 {
				*invalid = s.offset // record invalid rune offset
			}
			digsep |= ds
			s.next()
		}
	} else {
		for isHex(s.ch) || s.ch == '_' {
			ds := 1
			if s.ch == '_' {
				ds = 2
			}
			digsep |= ds
			s.next()
		}
	}
	return
}

@mknyszek
Copy link
Contributor

mknyszek commented Apr 5, 2023

@mknyszek mknyszek added this to the Backlog milestone Apr 5, 2023
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Apr 5, 2023
@griesemer
Copy link
Contributor

griesemer commented Apr 10, 2023

It's a bit of a judgement call: in both cases, when we see a character (9 or a) that's not a valid octal, we can decide that 1) the number is finished and the next character belongs to the next token, or 2) the number contains an invalid digit. But we should probably handle both cases the same.

Looking at this more closely, the "inconsistent" behavior is actually deliberate: the number 01 is a valid octal number, but 018 is not because the numeric value of the character 8 is equal the number base (which is 8). So we could stop accepting digits at this point and we would get the error "unexpected 8 in argument list", etc. But 018e0 is a valid floating-point number (with value 18.0). Thus, the scanner cannot stop when it sees the 8, it must accept at least the characters 8 and 9.

Per the spec:

While breaking the input into tokens, the next token is the longest sequence of characters that form a valid token.

When the scanner sees the sequence 01A it sees two valid tokens 01 and A. When it sees the sequence 018 there's no error yet (there could be an e following). Only when it sees, say an ), do we know the token boundary and then we can provide the error about 018 being incorrect. It's a bit different when we use the 0o notation (we know it cannot be a floating-point number). But it's the same when we use the '0x` notation (we could have a hex floating-point number).

In general one may have to look ahead (or step back) arbitrarily far: there can be arbitrary many digits in 09....9 before we see an e or something else.

Basically, the spec says that the lexer only ever reads a piece of source text once, in correct code. We could be more clever in incorrect code but it gets very complicated very quickly with number literals and it's not worth the effort, given that these kinds of errors are very rare.

Thus, this is all working as intended. Closing.

@golang golang locked and limited conversation to collaborators Apr 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants