Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x/text/language: index out of range #41617

Closed
hongrich opened this issue Sep 24, 2020 · 10 comments
Closed

x/text/language: index out of range #41617

hongrich opened this issue Sep 24, 2020 · 10 comments
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@hongrich
Copy link

This following program on the latest stable release, go1.14.8 and golang.org/x/text@v0.3.3:

https://play.golang.org/p/0MCiLN3ojeZ

package main

import (
	"golang.org/x/text/language"
)

func main() {
	language.ParseExtension("t_pt_MLt")
}

crashes as follows:

panic: runtime error: index out of range [4] with length 4

goroutine 1 [running]:
golang.org/x/text/internal/language.(*scanner).toLower(0xc00009eeb8, 0x0, 0x5)
	/tmp/gopath963636377/pkg/mod/golang.org/x/text@v0.3.3/internal/language/parse.go:116 +0x65
golang.org/x/text/internal/language.parseExtension(0xc00009eeb8, 0x0)
	/tmp/gopath963636377/pkg/mod/golang.org/x/text@v0.3.3/internal/language/parse.go:553 +0xea5
golang.org/x/text/internal/language.ParseExtension(0x4ce884, 0x8, 0x0, 0x0, 0x4b3560, 0xc000094058)
	/tmp/gopath963636377/pkg/mod/golang.org/x/text@v0.3.3/internal/language/language.go:260 +0xbe
golang.org/x/text/language.ParseExtension(...)
	/tmp/gopath963636377/pkg/mod/golang.org/x/text@v0.3.3/language/language.go:370
main.main()
	/tmp/sandbox787883426/prog.go:8 +0x36

This is found with go-fuzz

@gopherbot gopherbot added this to the Unreleased milestone Sep 24, 2020
@bcmills
Copy link
Contributor

bcmills commented Sep 25, 2020

Is this also reproducible with the current head commit (v0.3.4-0.20200826142016-a8b467125457)?

@hongrich
Copy link
Author

Yeah. Here's a playground using the current head commit:
https://play.golang.org/p/R7-w1-xWBa1

panic: runtime error: index out of range [4] with length 4

goroutine 1 [running]:
golang.org/x/text/internal/language.(*scanner).toLower(0xc000066eb8, 0x0, 0x5)
	/tmp/gopath897727467/pkg/mod/golang.org/x/text@v0.3.4-0.20200826142016-a8b467125457/internal/language/parse.go:116 +0x65
golang.org/x/text/internal/language.parseExtension(0xc000066eb8, 0x0)
	/tmp/gopath897727467/pkg/mod/golang.org/x/text@v0.3.4-0.20200826142016-a8b467125457/internal/language/parse.go:553 +0xea5
golang.org/x/text/internal/language.ParseExtension(0x4ce884, 0x8, 0x0, 0x0, 0x4b3560, 0xc00005e058)
	/tmp/gopath897727467/pkg/mod/golang.org/x/text@v0.3.4-0.20200826142016-a8b467125457/internal/language/language.go:260 +0xbe
golang.org/x/text/language.ParseExtension(...)
	/tmp/gopath897727467/pkg/mod/golang.org/x/text@v0.3.4-0.20200826142016-a8b467125457/language/language.go:370
main.main()
	/tmp/sandbox940662716/prog.go:8 +0x36

@darkLord19
Copy link
Contributor

"t_pt_MLt" ends up being t-mt in scan.b for some reason and so s.b[4] is throwing index out of bound. I tried testing for other strings with similar pattern as given string but they were working fine.

@bcmills bcmills added help wanted NeedsFix The path to resolution is known, but the work has not been done. labels Sep 25, 2020
@anton-kuklin
Copy link
Contributor

I'll work on this.

@andresag01
Copy link

I had some spare time so decided to take pick up a Go issue for the first time. But I am not terribly familiar with the Go source

I agree with @darkLord19. Internally, the call to ParseExtension("t_pt_MLt") ends up in call to parseTag(scan) with the following arguments: scan.b:"t-pt-mlt" scan.token:"pt" scan.start:2 scan.end:4. This means that the call getLangID("pt") from parseTag() returns (t.LangID:"mt", e:dont_care).

Later on, parseTag(scan) calls scan.scan() , so scan ends up with the following internal state: scan.b:"t-pt-mlt" scan.token:"mlt" scan.start:5 scan.end:8. Then this block of code is executed and scan.b:"t-mt-" and the other call to scan.scan() modifies that buffer to scan.b:"t-mb" and returns end = 5 which happens to be out-of-bounds. Therefore, this call ends up as scan.toLower(start:0, end:5) which is clearly an out-of-bounds range.

Again, I am not very familiar with this code (so please let me know if I am wrong), but I think there are a few issues here:

  • The return value of the scan() function here is a bit confusing. It states that the function returns the end position of the last token consumed, but what is end when there is no more data in the buffer? what is end when there is an empty tag (e.g. the last character is -)?
    • I suggest to document these corner cases and possibly tidy up the return value
  • It is not clear to me why the scan.toLower() call here that triggers the failure is needed at all. Note that the call to parseTag() already calls scan.toLower() here before returning, so the second call (the one that fails) seems redundant to me (I might be wrong though).
  • There probably are a few missing test cases in this list. In particular, it would be good to add things that trigger corner cases for scan.scan() when the input buffer is modified.

Anyways, please let me know if this is helpful and if you would like me to submit a patch to make the changes. I am aware that someone else already said that they are working on it...

@anton-kuklin
Copy link
Contributor

@andresag01 I'm currently a little bit busy, so you are absolutely welcome to contribute a patch for this issue.

@hongrich
Copy link
Author

hongrich commented Oct 7, 2020

I have a patch that I can submit for this if no one else has submitted one yet. As @andresag01 mentioned, the relevant part is inside parseTag where it will try to normalize <lang>-<extlang> into <extlang>. In this case, it tries to normalize pt-mlt into mt becuase getLangID("mlt").String() return mt. Then it tries to adjust the positions after doing this replacement, but it currently assumes that getLangID(token).String() returns a 3-character string which is not always the case. I believe we can just change the hardcoded numbers in this part to be based on the length of getLangID(token).String().

https://github.com/golang/text/blob/a8b4671254579a87fadf9f7fa577dc7368e9d009/internal/language/parse.go#L300-L308

@andresag01
Copy link

@hongrich: I do not have a patch and I won't work on it if you plan to submit yours. If it helps, I am happy to review your change once you've raised the PR

@gopherbot
Copy link

Change https://golang.org/cl/260177 mentions this issue: internal/language: fix canonicalization of extlang

@mpvl
Copy link
Contributor

mpvl commented Nov 25, 2020

The problem is that it should not do the extlang normalization within a -t extension.

@golang golang locked and limited conversation to collaborators Sep 14, 2022
xhit pushed a commit to xhit/text that referenced this issue Oct 10, 2022
parseTag tries to replace <lang>-<extlang> with <extlang>, but <extlang>
itself can also be replaced with its canonical form which can be a
different length than the original <extlang>. The existing
implementation assumes that the length of <extlang> is 3 and would leave
scanner positions in an incorrect state if the length of <extlang> is
not 3.

Fixes golang/go#41617

Change-Id: Ie0da320530e2545f9b521e7b8cf503d854c50b45
Reviewed-on: https://go-review.googlesource.com/c/text/+/260177
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Trust: Cherry Mui <cherryyz@google.com>
Trust: Marcel van Lohuizen <mpvl@golang.org>
Reviewed-by: Marcel van Lohuizen <mpvl@golang.org>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

7 participants