Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encoding/base64: decoder output depends on chunking of underlying reader #31626

Open
AxbB36 opened this issue Apr 23, 2019 · 5 comments
Open
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done.
Milestone

Comments

@AxbB36
Copy link

AxbB36 commented Apr 23, 2019

The output of a decoder produced from Encoding.NewDecoder differs depending on how you chunk the input to it. I noticed these differences:

  1. The decoder may ignore "internal" padding (= characters not at the end of the stream). For example, decoding ["QQ==Qg=="] (correctly) results in an error, but ["QQ==", "Qg=="] (incorrectly) decodes to "AB".
  2. The byte offset in error messages may get reset to 0 instead of indicating the absolute offset in the stream. For example, decoding ["AAAA####"] says the error occurs at offset 4, but decoding ["AAAA" "####"] says the error occurs at offset 0.

I think that the output of a decoder should always be the same as if the entire Reader were serialized to a string and then passed to DecodeString.

Item 1 is more important IMO. Item 2 was unexpected but I can live with inconsistent byte offsets in error messages. However seeing as CorruptInputError is already an int64, it would be nice to have if it doesn't complicate the internals too much.

This bug is somewhat similar to #25296 for encoding/base32.

What version of Go are you using (go version)?

$ go version
go version go1.11.5 linux/amd64

Does this issue reproduce with the latest release?

Yes, using go1.12 on play.golang.org

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOOS="linux"

What did you do?

https://play.golang.org/p/6rcDYtro36S

package main

import (
	"encoding/base64"
	"fmt"
	"io"
	"io/ioutil"
)

func test(chunks []string) {
	fmt.Printf("\n")
	fmt.Printf("%+q\n", chunks)

	pr, pw := io.Pipe()
	go func() {
		for _, chunk := range chunks {
			pw.Write([]byte(chunk))
		}
		pw.Close()
	}()
	dec := base64.NewDecoder(base64.StdEncoding, pr)
	output, err := ioutil.ReadAll(dec)
	fmt.Printf("%+q %v\n", output, err)
}

func main() {
	fmt.Printf("DecodeString(%+q)\n", "QQ==Qg==")
	output, err := base64.StdEncoding.DecodeString("QQ==Qg==")
	fmt.Printf("%+q %v\n", output, err)
	fmt.Printf("DecodeString(%+q)\n", "AAAA####")
	output, err = base64.StdEncoding.DecodeString("AAAA####")
	fmt.Printf("%+q %v\n", output, err)

	for _, chunks := range [][]string{
		{"QQ==Qg=="},
		{"Q", "Q==Qg=="},
		{"QQ==", "Qg=="},
		{"QQ==Qg=", "="},
		{"Q", "Q", "=", "=", "Q", "g", "=", "="},
		{"AAAA####"},
		{"AAAA", "####"},
	} {
		test(chunks)
	}
}

What did you expect to see?

DecodeString("QQ==Qg==")
"A" illegal base64 data at input byte 4
DecodeString("AAAA####")
"\x00\x00\x00" illegal base64 data at input byte 4

["QQ==Qg=="]
"A" illegal base64 data at input byte 4

["Q" "Q==Qg=="]
"A" illegal base64 data at input byte 4

["QQ==" "Qg=="]
"A" illegal base64 data at input byte 4

["QQ==Qg=" "="]
"A" illegal base64 data at input byte 4

["Q" "Q" "=" "=" "Q" "g" "=" "="]
"A" illegal base64 data at input byte 4

["AAAA####"]
"\x00\x00\x00" illegal base64 data at input byte 4

["AAAA" "####"]
"\x00\x00\x00" illegal base64 data at input byte 4

What did you see instead?

DecodeString("QQ==Qg==")
"A" illegal base64 data at input byte 4
DecodeString("AAAA####")
"\x00\x00\x00" illegal base64 data at input byte 4

["QQ==Qg=="]
"A" illegal base64 data at input byte 4

["Q" "Q==Qg=="]
"A" illegal base64 data at input byte 4

["QQ==" "Qg=="]
"AB" <nil>

["QQ==Qg=" "="]
"AB" <nil>

["Q" "Q" "=" "=" "Q" "g" "=" "="]
"AB" <nil>

["AAAA####"]
"\x00\x00\x00" illegal base64 data at input byte 4

["AAAA" "####"]
"\x00\x00\x00" illegal base64 data at input byte 0
@josharian
Copy link
Contributor

cc @zegl

@AxbB36
Copy link
Author

AxbB36 commented Apr 25, 2020

The bug still exists with 1.14.2. I'm not sure why this issue got the Proposal label; it's just a bug in the base64 package.

$ go version
go version go1.14.2 linux/amd64

Here is another demonstration of the bug. Here, the same input is given to a decoder, each time split differently. The output should always be the same, but it is not. It's not hard to imagine a case where a decoder is reading from a network socket, say, and accepts or rejects an input depending on where packet boundaries happen to fall.

package main

import (
	"encoding/base64"
	"fmt"
	"io"
	"io/ioutil"
)

func test(chunks [][]byte) {
	pr, pw := io.Pipe()
	go func() {
		for _, chunk := range chunks {
			pw.Write(chunk)
		}
		pw.Close()
	}()
	output, err := ioutil.ReadAll(base64.NewDecoder(base64.StdEncoding, pr))
	fmt.Printf("%+q -> %+q %v\n", chunks, output, err)
}

func main() {
	input := []byte("Rw==bw==")
	for i := 0; i < len(input)+1; i++ {
		test([][]byte{input[:i], input[i:]})
	}
}
["" "Rw==bw=="] -> "G" illegal base64 data at input byte 4
["R" "w==bw=="] -> "G" illegal base64 data at input byte 4
["Rw" "==bw=="] -> "G" illegal base64 data at input byte 4
["Rw=" "=bw=="] -> "G" illegal base64 data at input byte 4
["Rw==" "bw=="] -> "Go" <nil>
["Rw==b" "w=="] -> "Go" <nil>
["Rw==bw" "=="] -> "Go" <nil>
["Rw==bw=" "="] -> "Go" <nil>
["Rw==bw==" ""] -> "G" illegal base64 data at input byte 4

@ianlancetaylor
Copy link
Contributor

Yes, this looks like a bug. Not sure why it got a proposal label.

@ianlancetaylor ianlancetaylor added help wanted NeedsFix The path to resolution is known, but the work has not been done. and removed Proposal labels Apr 25, 2020
@ianlancetaylor ianlancetaylor added this to the Backlog milestone Apr 25, 2020
@aweglteo
Copy link

aweglteo commented Jul 1, 2020

I will work on this.

@gopherbot
Copy link

Change https://golang.org/cl/246377 mentions this issue: encoding/base64: fix base64 encoding when stream input comes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted NeedsFix The path to resolution is known, but the work has not been done.
Projects
None yet
Development

No branches or pull requests

6 participants