Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/unicode/norm: index out of range error (runtime panic) #60860

Open
alasdairforsythe opened this issue Jun 18, 2023 · 5 comments
Open

x/text/unicode/norm: index out of range error (runtime panic) #60860

alasdairforsythe opened this issue Jun 18, 2023 · 5 comments
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@alasdairforsythe
Copy link

go version go1.20.5 linux/arm64

I can confirm this runtime panic occurs on v0.9 and v0.10 of x/text.

panic: runtime error: index out of range [32] with length 32

goroutine 1 [running]:
golang.org/x/text/unicode/norm.(*reorderBuffer).insertOrdered(...)
        /root/tokenize/v3/vendor/golang.org/x/text/unicode/norm/composition.go:201
golang.org/x/text/unicode/norm.(*reorderBuffer).insertSingle(0x4000028280, {{0x11712b, 0x2}, {0x0, 0x0, 0x0}}, 0x0?, {0x0, 0x2, 0x0, ...})
        /root/tokenize/v3/vendor/golang.org/x/text/unicode/norm/composition.go:269 +0x160
golang.org/x/text/unicode/norm.(*reorderBuffer).insertCGJ(...)
        /root/tokenize/v3/vendor/golang.org/x/text/unicode/norm/composition.go:274
golang.org/x/text/unicode/norm.decomposeSegment(0x4000028280, 0x1, 0x1)
        /root/tokenize/v3/vendor/golang.org/x/text/unicode/norm/normalize.go:540 +0x3f0
golang.org/x/text/unicode/norm.doAppend(0x4000028280, {0x400014e000?, 0xfc2?, 0x2500?}, 0x0?)
        /root/tokenize/v3/vendor/golang.org/x/text/unicode/norm/normalize.go:235 +0x270
golang.org/x/text/unicode/norm.(*normWriter).Write(0x4000028280, {0x4022a80000?, 0x2939ecaf?, 0x29395ef2?})
        /root/tokenize/v3/vendor/golang.org/x/text/unicode/norm/readwriter.go:30 +0xf8
main.norm_UTF8_NFD({0x4022a80000, 0x293a1d4e, 0x339815bd})
        /root/tokenize/v3/getalltokens.go:126 +0x1a4
main.main()
        /root/tokenize/v3/getalltokens.go:994 +0x998
func norm_UTF8_NFD(input []byte) ([]byte, error) {
	normalized := bytes.NewBuffer(make([]byte, 0, len(input) + (len(input) / 3) + 4))
	normalizer := norm.NFD.Writer(normalized)
	_, err := normalizer.Write(input) // ------------- This is line 126 -------------
	if err != nil {
		return nil, err
	}
	err = normalizer.Close()
	if err != nil {
		return nil, err
	}
	return normalized.Bytes(), nil
}

If you want the exact input causing the error, it's 600MB of text. I can send it somewhere if it's required.

@gopherbot gopherbot added this to the Unreleased milestone Jun 18, 2023
@dmitshur dmitshur changed the title x/text: Index out of range error from golang.org/x/text/unicode/norm (runtime panic) x/text/unicode/norm: index out of range error (runtime panic) Jun 18, 2023
@seankhliao
Copy link
Member

can you upload a zipped file of input here?

@ianlancetaylor
Copy link
Contributor

CC @mpvl

@alasdairforsythe
Copy link
Author

Here is a zip containing the text file that causes the issue and a small code sample that reproduces the error.

@dmitshur dmitshur added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 22, 2023
@alasdairforsythe
Copy link
Author

Since v0.11 the error is now panic: runtime error: index out of range [1] with length 1 (instead of 32)

Any timeline on the fix for this? Currently I'm having to work around it.

@mbwmbw1337
Copy link

I see the same issue. Cross-linking: ledgerwatch/erigon#8235

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants