-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fmt: inconsistent formatting of unicode with %c and %q #14569
Comments
CC @robpike |
Having looked at this a bit more i would argue the following: Returning a badVerb error string for any integer and %q or %c is a bug in my opinion since the documentation defines these verbs are ok for any integer. I also dont see any other case in fmt where badVerb triggers bases on value and not based on type. If however "the character represented by the corresponding Unicode code point" means if there exist no character for the Unicode code point it should be an error then the current behavior of returning utf8.RuneError for other invalid runes below utf8.MaxRune is a bug. Not returning an Error however is explicitly documented in the code as "// If the character is not valid Unicode, it will print '\ufffd'.". Either way it seems inconsistent with the documentation to me. My proposed resolution would therefore be to return utf8.RuneError (escaped for %q) for any invalid rune regardless of the integer type or if its > utf8.MaxRune. This should make it also easier to check for an invalid Unicode code point since instead of checking for an error string and utf8.RuneError one can now only check for the later. The character for RuneError would be RuneError before and after the change. Also this behavior can be implemented solely in the fmtC (better renamed and moved to fmt_c) and fmt_qc functions with no range checks outside these functions. |
As this came up in the report #40175 again I would like to continue the discussion here. My idea would still be that badVerb does not trigger for value ranges but only for types and that all integers that do not map to a valid unicode point are printed as RuneError ('\uFFFD') rune. This would align with the Otherwise I think it should be documented that |
Change https://golang.org/cl/248759 mentions this issue: |
Go1.6
https://play.golang.org/p/XcrX5-8-om
( Note that strconv should only be looked at within rune/int32 range so minInt64 and maxInt64 here are not relevant due to the explicit rune type conversion in the call to strconv in this example program).
I would expect fmt %q to be similar to strconv.QuoteRune within rune/int32 range (including negative numbers). E.g. -1 and 1114112 print a quoted utf.RuneError.
For values outside int32 range i would expect fmt %c and %q to behave similar. Either both print a badVerb error string or both print an utf8.RuneError (quoted in case of %q). If they print an error string then so should probably %U too.
Another possibility is that any invalid unicode point could be rejected by fmt formatting with a badVerb error string for %c %q %U.
Can/Should fmt be changed to handle these cases more consistently?
The text was updated successfully, but these errors were encountered: