Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/collate: collation does not work for Korean #19087

Open
MickMonaghan opened this issue Feb 14, 2017 · 0 comments
Open

x/text/collate: collation does not work for Korean #19087

MickMonaghan opened this issue Feb 14, 2017 · 0 comments
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@MickMonaghan
Copy link

MickMonaghan commented Feb 14, 2017

What version of Go are you using (go version)?

go1.8beta2 linux/amd64

What operating system and processor architecture are you using (go env)?

GOARCH="amd64"
GOBIN=""
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"
GOPATH="/home/anx/go"
GORACE=""
GOROOT="/usr/local/go"
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GCCGO="gccgo"
CC="gcc"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build492080285=/tmp/go-build -gno-record-gcc-switches"
CXX="g++"
CGO_ENABLED="1"
PKG_CONFIG="pkg-config"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"

What did you do?

I attempted to sort some strings according to Korean rules.
These rules say that Korean characters should be sorted before Latin characters.

import (
  "fmt"
  "golang.org/x/text/collate"
  "golang.org/x/text/language"
)
func main() {
  strs := []string{"abc", "나는"}
  cl := collate.New(language.Korean) //Korean collator
  cl.SortStrings(strs)
  fmt.Println(strs)
}

What did you expect to see?

Expected output: [나는 abc]

  • Korean sorted before Latin
  • ICU gives this correct behavior

What did you see instead?

Actual output: [abc 나는]

  • Latin sorted before Korean

This issue was discussed in #12750, at which point @mpvl noted that:

...the implementation is based on the CLDR UCA tables. If I look at the collation elements of both the DUCET (Unicode's tables) and CLDR (the tailorings) they both show Hangul to have a higher primary collation value then Latin. So that explains why Korean is sorted later.
What is probably happening in ICU is that the the script for the selected language is sorted before other scripts. The Go implementation currently does not support script reordering, though. This is an TODO, but depends on changing the implementation to using fractional weights...

@ianlancetaylor ianlancetaylor changed the title Go collation does not work for Korean test/collate: collation does not work for Korean Feb 14, 2017
@ianlancetaylor ianlancetaylor changed the title test/collate: collation does not work for Korean text/collate: collation does not work for Korean Feb 14, 2017
@ianlancetaylor ianlancetaylor changed the title text/collate: collation does not work for Korean x/text/collate: collation does not work for Korean Feb 14, 2017
@ianlancetaylor ianlancetaylor added this to the Unreleased milestone Feb 14, 2017
@ALTree ALTree added the NeedsFix The path to resolution is known, but the work has not been done. label Sep 22, 2018
@rsc rsc unassigned mpvl Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

4 participants