Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/encoding/traditionalchinese: wrong coding mapping #21910

Closed
beikege opened this issue Sep 16, 2017 · 9 comments
Closed

x/text/encoding/traditionalchinese: wrong coding mapping #21910

beikege opened this issue Sep 16, 2017 · 9 comments

Comments

@beikege
Copy link

beikege commented Sep 16, 2017

What version of Go are you using (go version)?

1.9
go get -u golang.org/x/text/

What did you do?

package main

import (
	"fmt"
	"log"
	"unicode/utf8"

	"golang.org/x/text/encoding/traditionalchinese"
)

func main() {
	str := "包"
	b, err := traditionalchinese.Big5.NewEncoder().Bytes([]byte(str))
	if err != nil {
		log.Fatalln(err)
	}
	r, _ := utf8.DecodeRuneInString(str)
	fmt.Printf("unicode:0x%X big5:0x%X\n", r, b) //incorrect
}

What did you expect to see?

unicode:0x5305 big5:0xA55D

What did you see instead?

unicode:0x5305 big5:0xFABD

reference:
http://moztw.org/docs/big5/table/cp950-u2b.txt
http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP950.TXT
https://encoding.spec.whatwg.org/index-big5.txt

@mpvl

@gopherbot gopherbot added this to the Unreleased milestone Sep 16, 2017
@ghost
Copy link

ghost commented Sep 20, 2017

Does a grep of the following files clarify anything?

encoding/simplifiedchinese/tables.go

encoding/traditionalchinese/tables.go

@beikege
Copy link
Author

beikege commented Sep 20, 2017

@ghost
Copy link

ghost commented Sep 20, 2017

% grep A55D golang.org/x/text/encoding/traditionalchinese/tables.go
% grep B0A8 golang.org/x/text/encoding/traditionalchinese/tables.go
39340 - 11904: 0xB0A8,
%

Try substituting the character "馬".

@beikege
Copy link
Author

beikege commented Sep 20, 2017

package main

import (
	"bytes"
	"fmt"
	"log"
	"unicode/utf8"

	"golang.org/x/text/encoding/traditionalchinese"
)

func main() {

	src := []byte{165, 93} //big5 : 包

	// big5 to utf8
	b1, err := traditionalchinese.Big5.NewDecoder().Bytes(src)
	if err != nil {
		log.Fatalln(err)
	}

	r, _ := utf8.DecodeRune(b1)
	fmt.Printf("包 unicode:0x%X big5:0x%X\n", r, src)

	// utf8 to big5
	b2, err := traditionalchinese.Big5.NewEncoder().Bytes(b1)
	if err != nil {
		log.Fatalln(err)
	}

	// not equal
	fmt.Println(src, b2, bytes.Equal(src, b2))

	fmt.Println("--------------------------")

	src = []byte{176, 168} //big5 : 馬

	// big5 to utf8
	b1, err = traditionalchinese.Big5.NewDecoder().Bytes(src)
	if err != nil {
		log.Fatalln(err)
	}

	r, _ = utf8.DecodeRune(b1)
	fmt.Printf("馬 unicode:0x%X big5:0x%X\n", r, src)

	// utf8 to big5
	b2, err = traditionalchinese.Big5.NewEncoder().Bytes(b1)
	if err != nil {
		log.Fatalln(err)
	}

	// equal
	fmt.Println(src, b2, bytes.Equal(src, b2))

}
包 unicode:0x5305 big5:0xA55D
[165 93] [250 189] false
--------------------------
馬 unicode:0x99AC big5:0xB0A8
[176 168] [176 168] true

@ghost
Copy link

ghost commented Sep 22, 2017

Have you read this?

https://golangtc.com/t/541560ab320b527a3b0001d9

@ghost
Copy link

ghost commented Sep 22, 2017

@ianlancetaylor
Copy link
Contributor

CC @mpvl

@ghost
Copy link

ghost commented Oct 4, 2017

包 bag
馬 horse

@seankhliao seankhliao changed the title x/text/encoding/traditionalchinese:wrong coding mapping x/text/encoding/traditionalchinese: wrong coding mapping Jun 18, 2021
@beikege
Copy link
Author

beikege commented Oct 19, 2022

Fixes #43581

@beikege beikege closed this as completed Oct 19, 2022
@golang golang locked and limited conversation to collaborators Oct 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants