-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fmt: confusion about Scanf patterns, newline, space #13565
Comments
Program:
|
"Space in pattern matches end of string" is already documented, but perhaps not centrally enough:
Plus: func Scanf(format string, a ...interface{}) (n int, err error) func Scan(a ...interface{}) (n int, err error) If you think more is needed, I think it can be made explicit in the package comment. |
The text you mention all seems to me to be about the way %s and %v handle spaces in input, which is fine. It does not seem to say anything about the meaning of a space in the format pattern. What I do see is:
But that would suggest that " " in the format pattern should not match "" in the input. Yet it does. P.S. The Scanf doc comment seems like it should drop "space-separated": scanning "%d,%d" against "1,2" doesn't have anything to do with spaces. |
Well, I don't know how to document the scanning part of this package (or make it work, but that's another problem). To be fair, no one else seems to know how to define it either. C's scanf is well known but not documented to any level of precision. Maybe I should have taken that route and just hinted at what it does. Suggestions welcome. |
I'm happy to figure out docs if we can answer these questions:
|
Based entirely on my mental model:
Make sense? I wish fmt.Scanf did not exist. Its overuse and its pain wildly outrun its value. |
OK, I'll try to document that. |
I sent CL 17723 with documentation of the current behavior, but I think I found a serious enough problem to warrant postponing any changes to Scanf until Go 1.7. Specifically, "X\n Y" does not match "X\n Y" (the same string). The fundamental issue (and a source of much confusion for me until I read the code) is that in the input format, any newline surrounded by zero or more space characters (that is, any section of input format matching Less serious but still odd, "X %c" does not match "X \n" (reading '\n' into the %c argument) while "X%c" does match "X\n". I think we should roll back the one newline-related change from the Go 1.6 cycle and revisit the whole big picture for Go 1.7. If we're going to break user code, we might as well do it just once. I sent CL 17724 for the rollback of Go 1.6's newline change. |
SGTM |
CL https://golang.org/cl/17723 mentions this issue. |
Didn't happen for Go 1.7. Kicking to 1.8Early. |
CL https://golang.org/cl/30610 mentions this issue. |
CL https://golang.org/cl/30611 mentions this issue. |
There are no semantic changes here, just tests to establish the status quo. A followup CL will make some semantic changes, the (limited) scope of which should be clear from the number of tests that change. For #13565. Change-Id: I960749cf59d4dfe39c324875bcc575096654f883 Reviewed-on: https://go-review.googlesource.com/30610 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rob Pike <r@golang.org>
While updating code using fmt.Scanf after CL 16165, I discovered that there were users with patterns ending in \n as a way to assert that Scanf had consumed an entire input.
The examples that follow are given as three-line stanzas printed by the program at the end of the report. The first line is the Scanf call, the second line is its return values, and the third line is the values of x and y after the call.
The case that CL 16165 changed intentionally is:
which became:
But there exists code using a \n at the end of the pattern to reject matching a partial input token:
The \n here is serving to reject inputs that would be accepted without it, because \n matched end-of-string in Go 1.5 and earlier. Unfortunately, in Go 1.6 the \n will reject end-of-string, so code like this needs to be adjusted. How?
It seems that changing \n to space works: space matches end-of-string still.
Of course, using space in this way still accepts spaces, which \n did not. So if \n was being used this way:
now the same code produces:
and changing \n to space in the pattern does not help reject the second input:
The reason this seems to come up is that people read a line at a time from some source and then process it with fmt.Scanf with a pattern containing \n, perhaps not realizing that the \n has been stripped off by the line reader, or perhaps just taking advantage of the fact that \n served like the regexp $ in such processing.
I am not sure how people should update their code. I can't find any promise in the docs that a space in the pattern matches end-of-string, and it seems odd to me that it does. But if we don't do that, then there's no way to update some common uses. And even if we do make the promise, it doesn't help reject unmatched text beginning with space, as in the last few examples. (To be fair, \n doesn't help reject unmatched text beginning with \n, but that doesn't come up in line-at-a-time processing, as seems to be common.)
If we do (3), then there becomes no way to preserve the semantics of many existing uses, since Scanf gives no indication that it did not process the entire input string.
The text was updated successfully, but these errors were encountered: