fmt: inconsistent formatting of unicode with %c and %q #14569

martisch · 2016-02-29T15:39:55Z

Go1.6

https://play.golang.org/p/XcrX5-8-om

invalid:
string     |�|
fmt      c |�|
fmt      q |%!q(int64=1114112)|
strconv  q |'�'|
fmt      U |U+110000|
fmt U+%04X |U+110000|
fmt     #U |U+110000|

non-printable:
string     ||
fmt      c ||
fmt      q |'\x00'|
strconv  q |'\x00'|
fmt      U |U+0000|
fmt U+%04X |U+0000|
fmt     #U |U+0000|

printable:
string     |⌘|
fmt      c |⌘|
fmt      q |'⌘'|
strconv  q |'⌘'|
fmt      U |U+2318|
fmt U+%04X |U+2318|
fmt     #U |U+2318 '⌘'|

maxInt64:
string     |�|
fmt      c |�|
fmt      q |%!q(int64=9223372036854775807)|
strconv  q |'�'|
fmt      U |U+7FFFFFFFFFFFFFFF|
fmt U+%04X |U+7FFFFFFFFFFFFFFF|
fmt     #U |U+7FFFFFFFFFFFFFFF|

negative:
string     |�|
fmt      c |�|
fmt      q |%!q(int64=-1)|
strconv  q |'�'|
fmt      U |U+FFFFFFFFFFFFFFFF|
fmt U+%04X |U+-001|
fmt     #U |U+FFFFFFFFFFFFFFFF|

minInt64:
string     ||
fmt      c |�|
fmt      q |%!q(int64=-9223372036854775808)|
strconv  q |'\x00'|
fmt      U |U+8000000000000000|
fmt U+%04X |U+-8000000000000000|
fmt     #U |U+8000000000000000|

( Note that strconv should only be looked at within rune/int32 range so minInt64 and maxInt64 here are not relevant due to the explicit rune type conversion in the call to strconv in this example program).

I would expect fmt %q to be similar to strconv.QuoteRune within rune/int32 range (including negative numbers). E.g. -1 and 1114112 print a quoted utf.RuneError.

For values outside int32 range i would expect fmt %c and %q to behave similar. Either both print a badVerb error string or both print an utf8.RuneError (quoted in case of %q). If they print an error string then so should probably %U too.

Another possibility is that any invalid unicode point could be rejected by fmt formatting with a badVerb error string for %c %q %U.

Can/Should fmt be changed to handle these cases more consistently?

The text was updated successfully, but these errors were encountered:

ianlancetaylor · 2016-02-29T17:26:47Z

CC @robpike

martisch · 2016-03-06T09:35:23Z

Having looked at this a bit more i would argue the following:

Returning a badVerb error string for any integer and %q or %c is a bug in my opinion since the documentation defines these verbs are ok for any integer. I also dont see any other case in fmt where badVerb triggers bases on value and not based on type.

If however "the character represented by the corresponding Unicode code point" means if there exist no character for the Unicode code point it should be an error then the current behavior of returning utf8.RuneError for other invalid runes below utf8.MaxRune is a bug. Not returning an Error however is explicitly documented in the code as "// If the character is not valid Unicode, it will print '\ufffd'.".

Either way it seems inconsistent with the documentation to me.

My proposed resolution would therefore be to return utf8.RuneError (escaped for %q) for any invalid rune regardless of the integer type or if its > utf8.MaxRune.

This should make it also easier to check for an invalid Unicode code point since instead of checking for an error string and utf8.RuneError one can now only check for the later. The character for RuneError would be RuneError before and after the change.

Also this behavior can be implemented solely in the fmtC (better renamed and moved to fmt_c) and fmt_qc functions with no range checks outside these functions.

martisch · 2020-07-14T13:21:03Z

As this came up in the report #40175 again I would like to continue the discussion here.

My idea would still be that badVerb does not trigger for value ranges but only for types and that all integers that do not map to a valid unicode point are printed as RuneError ('\uFFFD') rune. This would align with the c verb.

Otherwise I think it should be documented that %q is only valid for integers i with 0 <= i <= utf8.MaxRune.

gopherbot · 2020-08-17T04:25:41Z

Change https://golang.org/cl/248759 mentions this issue: fmt: avoid badverb formatting for %q when used with integers

ianlancetaylor added this to the Unplanned milestone Feb 29, 2016

martisch mentioned this issue Jul 12, 2020

fmt.Sprintf: verb %q returns a surjective function #40175

Closed

gopherbot closed this as completed in 99f179f Aug 17, 2020

golang locked and limited conversation to collaborators Aug 17, 2021

gopherbot added the FrozenDueToAge label Aug 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fmt: inconsistent formatting of unicode with %c and %q #14569

fmt: inconsistent formatting of unicode with %c and %q #14569

martisch commented Feb 29, 2016

ianlancetaylor commented Feb 29, 2016

martisch commented Mar 6, 2016

martisch commented Jul 14, 2020

gopherbot commented Aug 17, 2020

fmt: inconsistent formatting of unicode with %c and %q #14569

fmt: inconsistent formatting of unicode with %c and %q #14569

Comments

martisch commented Feb 29, 2016

ianlancetaylor commented Feb 29, 2016

martisch commented Mar 6, 2016

martisch commented Jul 14, 2020

gopherbot commented Aug 17, 2020