encoding/base64: decoder output depends on chunking of underlying reader #31626

AxbB36 · 2019-04-23T04:49:09Z

The output of a decoder produced from Encoding.NewDecoder differs depending on how you chunk the input to it. I noticed these differences:

The decoder may ignore "internal" padding (= characters not at the end of the stream). For example, decoding ["QQ==Qg=="] (correctly) results in an error, but ["QQ==", "Qg=="] (incorrectly) decodes to "AB".
The byte offset in error messages may get reset to 0 instead of indicating the absolute offset in the stream. For example, decoding ["AAAA####"] says the error occurs at offset 4, but decoding ["AAAA" "####"] says the error occurs at offset 0.

I think that the output of a decoder should always be the same as if the entire Reader were serialized to a string and then passed to DecodeString.

Item 1 is more important IMO. Item 2 was unexpected but I can live with inconsistent byte offsets in error messages. However seeing as CorruptInputError is already an int64, it would be nice to have if it doesn't complicate the internals too much.

This bug is somewhat similar to #25296 for encoding/base32.

What version of Go are you using (`go version`)?

$ go version
go version go1.11.5 linux/amd64

Does this issue reproduce with the latest release?

Yes, using go1.12 on play.golang.org

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GOARCH="amd64"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"

What did you do?

https://play.golang.org/p/6rcDYtro36S

package main

import (
	"encoding/base64"
	"fmt"
	"io"
	"io/ioutil"
)

func test(chunks []string) {
	fmt.Printf("\n")
	fmt.Printf("%+q\n", chunks)

	pr, pw := io.Pipe()
	go func() {
		for _, chunk := range chunks {
			pw.Write([]byte(chunk))
		}
		pw.Close()
	}()
	dec := base64.NewDecoder(base64.StdEncoding, pr)
	output, err := ioutil.ReadAll(dec)
	fmt.Printf("%+q %v\n", output, err)
}

func main() {
	fmt.Printf("DecodeString(%+q)\n", "QQ==Qg==")
	output, err := base64.StdEncoding.DecodeString("QQ==Qg==")
	fmt.Printf("%+q %v\n", output, err)
	fmt.Printf("DecodeString(%+q)\n", "AAAA####")
	output, err = base64.StdEncoding.DecodeString("AAAA####")
	fmt.Printf("%+q %v\n", output, err)

	for _, chunks := range [][]string{
		{"QQ==Qg=="},
		{"Q", "Q==Qg=="},
		{"QQ==", "Qg=="},
		{"QQ==Qg=", "="},
		{"Q", "Q", "=", "=", "Q", "g", "=", "="},
		{"AAAA####"},
		{"AAAA", "####"},
	} {
		test(chunks)
	}
}

What did you expect to see?

DecodeString("QQ==Qg==")
"A" illegal base64 data at input byte 4
DecodeString("AAAA####")
"\x00\x00\x00" illegal base64 data at input byte 4

["QQ==Qg=="]
"A" illegal base64 data at input byte 4

["Q" "Q==Qg=="]
"A" illegal base64 data at input byte 4

["QQ==" "Qg=="]
"A" illegal base64 data at input byte 4

["QQ==Qg=" "="]
"A" illegal base64 data at input byte 4

["Q" "Q" "=" "=" "Q" "g" "=" "="]
"A" illegal base64 data at input byte 4

["AAAA####"]
"\x00\x00\x00" illegal base64 data at input byte 4

["AAAA" "####"]
"\x00\x00\x00" illegal base64 data at input byte 4

What did you see instead?

DecodeString("QQ==Qg==")
"A" illegal base64 data at input byte 4
DecodeString("AAAA####")
"\x00\x00\x00" illegal base64 data at input byte 4

["QQ==Qg=="]
"A" illegal base64 data at input byte 4

["Q" "Q==Qg=="]
"A" illegal base64 data at input byte 4

["QQ==" "Qg=="]
"AB" <nil>

["QQ==Qg=" "="]
"AB" <nil>

["Q" "Q" "=" "=" "Q" "g" "=" "="]
"AB" <nil>

["AAAA####"]
"\x00\x00\x00" illegal base64 data at input byte 4

["AAAA" "####"]
"\x00\x00\x00" illegal base64 data at input byte 0

The text was updated successfully, but these errors were encountered:

josharian · 2019-04-23T14:12:17Z

cc @zegl

AxbB36 · 2020-04-25T16:33:38Z

The bug still exists with 1.14.2. I'm not sure why this issue got the Proposal label; it's just a bug in the base64 package.

$ go version
go version go1.14.2 linux/amd64

Here is another demonstration of the bug. Here, the same input is given to a decoder, each time split differently. The output should always be the same, but it is not. It's not hard to imagine a case where a decoder is reading from a network socket, say, and accepts or rejects an input depending on where packet boundaries happen to fall.

package main

import (
	"encoding/base64"
	"fmt"
	"io"
	"io/ioutil"
)

func test(chunks [][]byte) {
	pr, pw := io.Pipe()
	go func() {
		for _, chunk := range chunks {
			pw.Write(chunk)
		}
		pw.Close()
	}()
	output, err := ioutil.ReadAll(base64.NewDecoder(base64.StdEncoding, pr))
	fmt.Printf("%+q -> %+q %v\n", chunks, output, err)
}

func main() {
	input := []byte("Rw==bw==")
	for i := 0; i < len(input)+1; i++ {
		test([][]byte{input[:i], input[i:]})
	}
}

["" "Rw==bw=="] -> "G" illegal base64 data at input byte 4
["R" "w==bw=="] -> "G" illegal base64 data at input byte 4
["Rw" "==bw=="] -> "G" illegal base64 data at input byte 4
["Rw=" "=bw=="] -> "G" illegal base64 data at input byte 4
["Rw==" "bw=="] -> "Go" <nil>
["Rw==b" "w=="] -> "Go" <nil>
["Rw==bw" "=="] -> "Go" <nil>
["Rw==bw=" "="] -> "Go" <nil>
["Rw==bw==" ""] -> "G" illegal base64 data at input byte 4

ianlancetaylor · 2020-04-25T18:59:37Z

Yes, this looks like a bug. Not sure why it got a proposal label.

aweglteo · 2020-07-01T11:31:23Z

I will work on this.

gopherbot · 2020-08-02T18:23:51Z

Change https://golang.org/cl/246377 mentions this issue: encoding/base64: fix base64 encoding when stream input comes

katiehockman added the Proposal label Apr 29, 2019

AxbB36 mentioned this issue Apr 25, 2020

encoding/base32: decoder output depends on chunking of underlying reader #38657

Closed

ianlancetaylor added help wanted NeedsFix The path to resolution is known, but the work has not been done. and removed Proposal labels Apr 25, 2020

ianlancetaylor added this to the Backlog milestone Apr 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

encoding/base64: decoder output depends on chunking of underlying reader #31626

encoding/base64: decoder output depends on chunking of underlying reader #31626

AxbB36 commented Apr 23, 2019 •

edited

josharian commented Apr 23, 2019

AxbB36 commented Apr 25, 2020

ianlancetaylor commented Apr 25, 2020

aweglteo commented Jul 1, 2020

gopherbot commented Aug 2, 2020

encoding/base64: decoder output depends on chunking of underlying reader #31626

encoding/base64: decoder output depends on chunking of underlying reader #31626

Comments

AxbB36 commented Apr 23, 2019 • edited

What version of Go are you using (go version)?

Does this issue reproduce with the latest release?

What operating system and processor architecture are you using (go env)?

What did you do?

What did you expect to see?

What did you see instead?

josharian commented Apr 23, 2019

AxbB36 commented Apr 25, 2020

ianlancetaylor commented Apr 25, 2020

aweglteo commented Jul 1, 2020

gopherbot commented Aug 2, 2020

AxbB36 commented Apr 23, 2019 •

edited

What version of Go are you using (`go version`)?

What operating system and processor architecture are you using (`go env`)?