x/text/collate: Norwegian collation order differs from Danish #59908
Labels
NeedsInvestigation
Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Code on playground
The collation order for
language.Norwegian
sorts the lettersæ
,ø
, andå
aftera
,o
, anda
respectively, rather than as the last three letters in the alphabet (in that order). The collation forlanguage.Danish
puts those three letters at the end of the alphabet, as expected. It's my understanding that Norwegian and Danish use the same alphabetic order, which is the same initial 26 letter order as English, followed by the three others, which are not treated as diacritics. This ordering for both Norwegian and Danish is called out in the introduction to Unicode Technical Standard #10: Unicode Collation Algorithm and is also described in the "Danish and Norwegian alphabet" Wikipedia page.What did you expect to see?
Norwegian and Danish should collate the same, with Æ, Ø, and Å at the end of the alphabet. These are U+00C6 LATIN CAPITAL LETTER AE, U+00D8 LATIN CAPITAL LETTER O WITH STROKE, U+00C5 LATIN CAPITAL LETTER A WITH RING ABOVE, and "SMALL" variants for lower case.
What did you see instead?
Norwegian (but not Danish) sorts these letters similar to diacritics in other European languages rather than treating them as independent letters.
The text was updated successfully, but these errors were encountered: