x/text/cases: Upper drops the 129th character #11460

chowey · 2015-06-29T04:54:22Z

Here is a simple program that should uppercase some text:

package main

import (
    "golang.org/x/text/cases"
    "golang.org/x/text/language"
)

func main() {
    const a = "abcdefghijklmnopqrstuvwx\n"
    text := a + a + a + a + a + a
    print(cases.Upper(language.Make("en")).String(text))
}

I would expect to get:

ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX

but instead I get:

ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX
ABCEFGHIJKLMNOPQRSTUVWX

The last "D" is dropped, which is the 129th character.

The text was updated successfully, but these errors were encountered:

chowey · 2015-06-29T05:06:38Z

Greek works fine.

print(cases.Upper(language.Make("el")).String(text))

ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX
ABCDEFGHIJKLMNOPQRSTUVWX

chowey · 2015-07-16T03:35:28Z

The problem starts in transform/transform.go (line 571):

        nDst, nSrc, err = t.Transform(dst[pDst:], src[:n], pSrc+n == len(s))

At this point, dst[pDst:] will be a zero-length slice. This calls cases/map.go (lines 135-144):

func (t *undUpperCaser) Transform(dst, src []byte, atEOF bool) (nDst, nSrc int, err error) {
    c := context{dst: dst, src: src, atEOF: atEOF}
    for c.next() {
        upper(&c)
    }
    // Standard upper case does not need any lookahead so we can safely not use
    // the checkpointing mechanism. pDst and pSrc will always point to the
    // furthest possible position.
    return c.pDst, c.pSrc, c.err
}

But unfortunately c.next() will, after two loops, increment c.pSrc. See cases/context.go (lines 73-93):

func (c *context) next() bool {
    c.pSrc += c.sz
    if c.pSrc == len(c.src) || c.err != nil {
        c.info, c.sz = 0, 0
        return false
    }
    v, sz := trie.lookup(c.src[c.pSrc:])
    c.info, c.sz = info(v), sz
    if c.sz == 0 {
        if c.atEOF {
            // A zero size means we have an incomplete rune. If we are atEOF,
            // this means it is an illegal rune, which we will consume one
            // byte at a time.
            c.sz = 1
        } else {
            c.err = transform.ErrShortSrc
            return false
        }
    }
    return true
}

Back in transform/transform.go, this means dst will grow and then start reading from pSrc+1. This is wrong and leads to the letter being skipped.

I see two solutions. First, if passing an empty dst is supposed to result in undefined behavior, then a simple fix is to avoid it by changing transform/transform.go (line 554) to:

    if pDst+nDst < initialBufSize {

If instead an empty dst should be checked for in a robust way, then context.next() should be fixed like so:

func (c *context) next() bool {
    if len(c.dst) == 0 {
        c.err = transform.ErrShortDst
        return false
    }
    c.pSrc += c.sz
    if c.pSrc == len(c.src) || c.err != nil {
        c.info, c.sz = 0, 0
        return false
    }
    v, sz := trie.lookup(c.src[c.pSrc:])
    c.info, c.sz = info(v), sz
    if c.sz == 0 {
        if c.atEOF {
            // A zero size means we have an incomplete rune. If we are atEOF,
            // this means it is an illegal rune, which we will consume one
            // byte at a time.
            c.sz = 1
        } else {
            c.err = transform.ErrShortSrc
            return false
        }
    }
    return true
}

chowey · 2015-08-06T17:05:41Z

Resolved, see https://go-review.googlesource.com/#/c/13076/.

chowey changed the title ~~x/text/cases: Upper drops the 128th character after a non-breaking space (nbsp)~~ x/text/cases: Upper drops the 129th character Jun 29, 2015

ianlancetaylor added this to the Unreleased milestone Jun 29, 2015

ianlancetaylor assigned mpvl Jun 29, 2015

chowey closed this as completed Aug 6, 2015

golang locked and limited conversation to collaborators Aug 5, 2016

gopherbot added the FrozenDueToAge label Aug 5, 2016

rsc unassigned mpvl Jun 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x/text/cases: Upper drops the 129th character #11460

x/text/cases: Upper drops the 129th character #11460

chowey commented Jun 29, 2015

chowey commented Jun 29, 2015

chowey commented Jul 16, 2015

chowey commented Aug 6, 2015

x/text/cases: Upper drops the 129th character #11460

x/text/cases: Upper drops the 129th character #11460

Comments

chowey commented Jun 29, 2015

chowey commented Jun 29, 2015

chowey commented Jul 16, 2015

chowey commented Aug 6, 2015