x/net/html/charset: DetermineEncoding returns encoding.Nop and "utf-8" as name #46343
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
I called DetermineEncoding for a utf8 invalid string that has invalid bytes after first 1024 bytes.
https://play.golang.org/p/M0LeleRdj16
What did you expect to see?
I expect encoding returned by DetermineEncoding to decode my content into valid utf8 string even if certain == false (with replacement characters if needed).
I expected https://pkg.go.dev/golang.org/x/text/encoding/unicode#UTF8 to be returned with utf-8 name.
It seems wrong to me to return encoding.Nop if we only checked prefix of all content.
https://github.com/golang/net/blob/master/html/charset/charset.go#L98
What did you see instead?
encoding.Nop is returned with name utf-8 (https://pkg.go.dev/golang.org/x/text@v0.3.5/encoding#Nop)
And NewDecoder did not return valid utf8 string
The text was updated successfully, but these errors were encountered: