Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/unicode/cldr: Decoder fails to load cldr-common-41.0.zip #53016

Open
dolmen opened this issue May 20, 2022 · 4 comments
Open

x/text/unicode/cldr: Decoder fails to load cldr-common-41.0.zip #53016

dolmen opened this issue May 20, 2022 · 4 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@dolmen
Copy link
Contributor

dolmen commented May 20, 2022

What version of Go are you using (go version)?

$ go version
go version go1.18.2 darwin/amd64
$ go list -m golang.org/x/text
golang.org/x/text v0.3.7

Does this issue reproduce with the latest release?

Yes. golang.org/x/text v0.3.7

What did you do?

Using the latest CLDR release v41.0: http://unicode.org/Public/cldr/41/cldr-common-41.0.zip

	zip, err := os.Open(cldrArchivePath)
	if err != nil {
		log.Fatalf("%s: %s", cldrArchivePath, err)
	}

	cldrDecoder := &cldr.Decoder{}
	log.Println("Loading...")
	db, err := cldrDecoder.DecodeZip(zip)
	if err != nil {
		log.Fatalf("%s: %s", cldrArchivePath, err)
	}
	log.Println("success.")

Full code: https://github.com/blueboardio/cldr/blob/master/currency/currencies_gen.go#L73

What did you expect to see?

success.

(like with cldr-common-40.0.zip)

What did you see instead?

cldr-common-41.0.zip: supplemental-temp/coverageLevels2: missing identity element

@gopherbot gopherbot added this to the Unreleased milestone May 20, 2022
@mknyszek mknyszek added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label May 20, 2022
@neild neild self-assigned this May 20, 2022
dolmen added a commit to blueboardio/cldr that referenced this issue May 20, 2022
Regenerate files with CLDR 40.0 data. No change in data produced since
CLDR 38.1.

Note: CLDR 41.0 is available but can't be load with
golang.org/x/text/unicode/cldr. See golang/go#53016
@mknyszek
Copy link
Contributor

CC @mpvl also.

@neild
Copy link
Contributor

neild commented May 28, 2022

I attempted to fix this, and spent rather more time learning about CLDR than I intended. Documenting what I found so far:

  • CLDR 41 has a supplemental-temp directory which needs to be handled in unicode/cldr.Decoder.Decode.
  • The latest version of iana/assignments/language-subtag-registry has an entry where the three letter language code adp is deprecated in favor of dz. This results in the generator getting confused (internal/language/gen.go#814) while generating the lang table because it assumes a three-letter code is never deprecated in favor of a two-letter one. I think the fix is to generate this as adp\x00.
  • CLDR 41 violates the specification of parent locales (TR 35 4.1.3) with two entries that change the base language code of a locale:
    <parentLocale parent="en_IN" locales="hi_Latn"/>
    <parentLocale parent="no" locales="nb nn"/>
    
    This confuses writeParents. I'm not sure what the correct fix for this is.

@neild neild removed their assignment Jun 6, 2022
@dolmen
Copy link
Contributor Author

dolmen commented Jan 11, 2023

Still happens with latest code (golang.org/x/text v0.6.0) and data (cldr-common-42.0.zip).

@dolmen
Copy link
Contributor Author

dolmen commented Jan 11, 2023

I found a workaround for my use case: use a DirFilter:

cldrDecoder.SetDirFilter("main", "supplemental")

dolmen added a commit to blueboardio/cldr that referenced this issue Jan 13, 2023
Use a filter on CLDR data to avoid issue when parsing recent CLDR data which is not supported by the golang.org/x/text/unicode/cldr

See
golang/go#53016 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

4 participants