Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/encoding/simplifiedchinese: missing decoding data #61165

Open
folivoramao opened this issue Jul 4, 2023 · 3 comments
Open

x/text/encoding/simplifiedchinese: missing decoding data #61165

folivoramao opened this issue Jul 4, 2023 · 3 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@folivoramao
Copy link

What version of Go are you using (go version)?

$ go version
go version go1.20.2 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/mjc/Library/Caches/go-build"
GOENV="/Users/mjc/Library/Application Support/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GOMODCACHE="/Users/mjc/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/mjc/go"
GOPRIVATE=""
GOPROXY="https://goproxy.cn,direct"
GOROOT="/usr/local/Cellar/go/1.20.2/libexec"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.20.2/libexec/pkg/tool/darwin_amd64"
GOVCS=""
GOVERSION="go1.20.2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="cc"
CXX="c++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -arch x86_64 -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/hd/v9qhg5rj04z7bp6wb5kpdc_m0000gn/T/go-build3435336719=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

I encountered a problem in character set encoding conversion:
when using the simplifiedchinese package to convert a GB18030-encoded character to UTF8, an error is reported.
But I can convert successfully when I use the mahonia package.
code link:https://go.dev/play/p/NhBp0JQ2RUp

package main

import (
	"encoding/hex"
	"fmt"

	"github.com/axgle/mahonia"
	"golang.org/x/text/encoding/simplifiedchinese"
)

func main() {
	s := `FDD2`
	hd, _ := hex.DecodeString(s)
	r, _ := simplifiedchinese.GB18030.NewDecoder().Bytes(hd)
	he := hex.EncodeToString([]byte(r))
	fmt.Println(he) // efbfbd

	r2 := mahonia.NewDecoder("GB18030").ConvertString(string(hd))
	he2 := hex.EncodeToString([]byte(r2))
	fmt.Println(he2) // ee90bb
}

What did you expect to see?

ee90bb

What did you see instead?

efbfbd
@robpike
Copy link
Contributor

robpike commented Jul 4, 2023

Not sure what's wrong, as I am not familiar with the encoding, but I can point out a couple of details.
First, you're getting the replacement character U+FFFD, which means there is something wrong with that character according to x/text. That is interesting. You can see this by printing things differently, and you can also simplify your example significantly since fmt.Printf can do all the hex/string work for you:

https://go.dev/play/p/kDgB3ybMa8c

Finally, you should always check your errors, especially when debugging, although that didn't help here.

@seankhliao
Copy link
Member

It would appear that the decode table is just lacking data, the given test case would decode to 23705.
https://go.googlesource.com/text/+/refs/heads/master/encoding/simplifiedchinese/tables.go#22009

whatwg seems to have changed urls for their table data, so I'm not sure what a new table would be generated from (presumably one of these https://encoding.spec.whatwg.org/#indexes )

@seankhliao seankhliao changed the title text/x: incorrect convert from gb18030 to utf8 x/text/encoding/simplifiedchinese: missing decoding data Jul 4, 2023
@gopherbot gopherbot added this to the Unreleased milestone Jul 4, 2023
@bcmills bcmills added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jul 5, 2023
@bcmills
Copy link
Contributor

bcmills commented Jul 5, 2023

(CC @mpvl per https://dev.golang.org/owners)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

5 participants