mime/multipart: allow unescaped CR and LF through #4771

gopherbot · 2013-02-07T00:43:28Z

by runningmaster:

Please see for details:
http://play.golang.org/p/CPJFI_qMM4

This is as an extension
https://golang.org/issue/4411

go version devel +6837136b5df8 Thu Feb 07 00:39:52 2013 +0800 linux/amd64

bradfitz · 2013-02-07T02:21:06Z

Comment 1:

That's too much code to understand.  Could you reduce that to a minimal example showing
the problem?
And the play.golang.org link is fine, but please also copy/paste the relevant parts into
the bug here also.

Status changed to WaitingForReply.

gopherbot · 2013-02-07T10:09:06Z

Comment 2 by runningmaster:

This is minimal example showing the problem:
http://play.golang.org/p/fOQSjqr8Ri

gopherbot · 2013-02-07T10:35:11Z

Comment 3 by runningmaster:

package main
import (
    "bytes"
    "fmt"
    "io"
    "mime/multipart"
    "strings"
)
var file string = `MIME-Version: 1.0
Content-Type: multipart/related; boundary="----=_NextPart_01CABE02.6F15EB80"
------=_NextPart_01CABE02.6F15EB80
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="us-ascii"
<html>
<body>
runningmaste�=
14;`s
issue
</body>
</html>
------=_NextPart_01CABE02.6F15EB80--
`
var want string = `<html>
<body>
runningmaster`s
issue
</body>
</html>
`
func main() {
    rdr := multipart.NewReader(strings.NewReader(file), "----=_NextPart_01CABE02.6F15EB80")
    prt, _ := rdr.NextPart()
    buf := &bytes.Buffer{}
    io.Copy(buf, prt)
    got := buf.String()
    if got != want {
        fmt.Printf("wrong part value:\n got: %q\n want: %q\n", got, want)
    }
    // Output: MUST BE "runningmaster`s issue" NOT "runningmaster`sissue" in browser
    /*
        wrong part value:
         got: "<html><body>runningmaster`sissue</body></html>"
         want: "<html>\n<body>\nrunningmaster`s\nissue\n</body>\n</html>\n\n"
    */
}

minux · 2013-02-07T11:49:53Z

Comment 4:

the problem is whether qp-decode should preserve non-quoted "\r\n" or "\n"
sequences (or translate them into local line ending).
that is:
decode("foo\n\rbar") should return "foo\n\rbar".
from my reading of RFC 2045, i think it should.
however, "\r" and "\n" are explicitly ignored in pkg/mime/multipart/quotedprintable.go.
Brad?

bradfitz · 2013-02-07T20:13:05Z

Comment 5:

"foo\n\rbar" is a LFCR, not a CRLF.
So those bytes are invalid QP according to:
    (4)   Control characters other than TAB, or CR and LF as
          parts of CRLF pairs, must not appear. The same is true
          for octets with decimal values greater than 126.  If
          found in incoming quoted-printable data by a decoder, a
          robust implementation might exclude them from the
          decoded data and warn the user that illegal characters
          were discovered.
Did you mean \r\n instead?

gopherbot · 2013-02-07T22:16:46Z

Comment 6 by runningmaster:

> "foo\n\rbar"
IMHO, this is typo, '\r\n' is correct.

minux · 2013-02-08T06:32:23Z

Comment 7:

yeah, sorry, i mean \r\n.
i experimented with some other implementations, and i discovered that they
all seem to recognize \n or even \r as hard line break and preserve them, i'm
not sure whether we should do the same.
two little examples:
echo '<? echo quoted_printable_decode("a\rb\r\nc\nd\n\re"); ?>' | php
echo 'import quopri; print(quopri.decodestring("a\rb\r\nc\nd\n\re"))' | python

gopherbot · 2013-02-11T10:38:52Z

Comment 8 by runningmaster:

> i experimented with some other implementations, and i discovered that they
all seem to recognize \n
Please inform me, will you fix this problem in Go or I must find a some workaround for
it?

minux · 2013-02-11T13:52:57Z

Comment 9:

brad, what's your opinion?

bradfitz · 2013-02-11T19:13:59Z

Comment 10:

Yes, I guess we could allow bare \r and \n when decoding.
I wouldn't use Python as our exemplar, though, since it can't even round-trip its input: 
>>> import quopri
>>> quopri.decodestring(quopri.encodestring("A\nB\r\nC"))
'A\nB\nC'
... not very impressive.

Labels changed: added go1.1, removed go1.1maybe.

Status changed to Accepted.

bradfitz · 2013-02-11T22:51:17Z

Comment 11:

Status changed to Started.

bradfitz · 2013-02-11T22:54:56Z

Comment 12:

Minux, please review this thoroughly: https://golang.org/cl/7300092
Please do not trust that it's correct.  Read the spec, patch this CL in, mess with it,
and compare Python and other popular decoders' answers.
I was even considering checking Python's answer with os/exec for each possible sequence
in the new test.  If you use sys.stdout.write(quopri.decodestring(..)) instead of print,
Python doesn't write the final newline.

Owner changed to @minux.

Status changed to WaitingForReply.

minux · 2013-02-14T18:47:05Z

Comment 13:

i will compare Go's behavior with that of php and python's.
I also want to cross-check with at least one C/C++ implementation,
as there are many out there, any suggestions?

bradfitz · 2013-02-14T19:22:31Z

Comment 14:

Mozilla Thunderbird? Send yourself an email with an attachment MIME part encoded as
quoted-printable, and then see the file that Thunderbird decodes?
Or this:
http://packages.debian.org/sid/qprint
There's also:
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/net/QuotedPrintableCodec.html

bradfitz · 2013-02-14T22:33:08Z

Comment 15:

Hah. Everybody sucks at this.
In addition to that Python module not being able to decode its own encodings, the
"qprint" command gets into states where it craps itself:
$ echo -n "0000= " | qprint -d
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
(spins forever)
This was from me trying to use qprint to check all possible sequences of up to 6 bytes
with an alphabet of "0A \r\n=":
+       if useQprint {
+           cmd := exec.Command("qprint", "-d")
+           cmd.Stdin = strings.NewReader(s)
+           want, err := cmd.Output()
+           if err != nil {
+               t.Errorf("qprint(%q) = err: %v", s, err)
+           } else {
+               if string(want) != buf.String() {
+                   t.Errorf("go decode(%q) = %q; qprint = %q", s, want, buf.String())
+               }
+           }
+       }
I'll try to detect qprint spinning now and work around it.
But at this point I trust nothing.

bradfitz · 2013-02-14T22:57:09Z

Comment 16:

Modifying my test to be tolerant of qprint -d spinning on malformed input, the
exhaustive test reports plenty of differences.
I don't have time to look into them now, though.
quotedprintable_test.go:157:    go decode("=\nA\r\r ") = "A\n\n "; qprint = "A"
quotedprintable_test.go:157:    go decode("=\nA\r\r\r") = "A\n\n\n"; qprint = "A"
quotedprintable_test.go:157:    go decode("=\nA\r\r\n") = "A\n\n"; qprint = "A\r\n"
quotedprintable_test.go:157:    go decode("=\nA\r\n0") = "A\n0"; qprint = "A\r\n0"
quotedprintable_test.go:157:    go decode("=\nA\r\nA") = "A\nA"; qprint = "A\r\nA"
quotedprintable_test.go:157:    go decode("=\nA\r\n ") = "A\n "; qprint = "A\r\n"
quotedprintable_test.go:157:    go decode("=\nA\r\n\r") = "A\n\n"; qprint = "A\r\n"
quotedprintable_test.go:157:    go decode("=\nA\r\n\n") = "A\n\n"; qprint = "A\r\n\n"
quotedprintable_test.go:160:    qprint -d("=\nA\r= ") = Error: invalid character
"0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input.
quotedprintable_test.go:157:    go decode("=\nA\r=\r") = "A\n"; qprint = "A\r"
quotedprintable_test.go:157:    go decode("=\nA\r=\n") = "A\n"; qprint = "A\r"
quotedprintable_test.go:157:    go decode("=\nA\n0 ") = "A\n0 "; qprint = "A\n0"
quotedprintable_test.go:157:    go decode("=\nA\n0\r") = "A\n0\n"; qprint = "A\n0"
quotedprintable_test.go:157:    go decode("=\nA\nA ") = "A\nA "; qprint = "A\nA"
quotedprintable_test.go:157:    go decode("=\nA\nA\r") = "A\nA\n"; qprint = "A\nA"
quotedprintable_test.go:157:    go decode("=\nA\n 0") = "A\n 0"; qprint = "A\n0"
quotedprintable_test.go:157:    go decode("=\nA\n A") = "A\n A"; qprint = "A\nA"
....
(more attached)
Also attached my quotedprintable_test.go with the os/exec stuff in it.

Attachments:

fail.txt (38005 bytes)
quotedprintable_test.go (5138 bytes)

minux · 2013-02-19T10:05:43Z

Comment 17:

yeah, every implementation is different in these aspects:
1. "=\r\r\r\n"  should it decode to "\r\r\n" (php), or "" (python)? (RFC 2045 says this
is illegal)
2. while space/tab at end of line, remove them or not? (RFC 2045 says so, but both
implementations
keep them)
relevant source code:
http://hg.python.org/cpython/file/aa77f7eb2bf1/Lib/quopri.py  [py3] or
http://hg.python.org/cpython/file/70274d53c1dd/Lib/quopri.py [py2] and
http://hg.python.org/cpython/file/aa77f7eb2bf1/Modules/binascii.c#l1224 [py3]
http://hg.python.org/cpython/file/70274d53c1dd/Modules/binascii.c#l1191 [py2]
(normally python will use binascii's C version, but C version and python version doesn't
agree with each other)
http://lxr.php.net/xref/php-src/ext/standard/quot_print.c
http://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/WebCore/platform/text/QuotedPrintable.cpp&sq=package:chromium
all three decoder disregard the 2nd space rule (except the pure python version, but
normally it's not used).
webkit's qp decoder is the most simple of the three, only ignore \r\n if immediately
preceded by =,
and in all other cases, it will pass the source data through.
php's decoder only treats "=\r", "=\n" and "=\r\n" as soft line breaks and remove them.
python's decoder will treat "=\n” or "=\r[^\n]*\n" as soft line breaks and remove
them, even "=\raaa\n" will decoded as "".
Given that all existing implementation don't agree with each other, my proposal for our
decoder is:
1. pass through \r, \n if they are not immediately preceded by =, this should fix this
particular issue.
2. disregard the aforementioned point 2 about removing while spaces at the line end, and
hope that
there won't be any broken MTA anymore.
3. treat "=\n", "=\r\n" as soft line breaks only, and give error for other cases (in
particular, "=\r" without
being followed by "\n" is an error)
What do you think?

bradfitz · 2013-02-19T15:17:00Z

Comment 18:

SGTM. Thanks for doing this research!
What a mess.

bradfitz · 2013-02-19T18:10:47Z

Comment 19:

The latest version of https://golang.org/cl/7300092 does and has tests for 1)
and 3), but I don't think it's worth intentionally breaking it to violate the RFC for 2)
because 2) doesn't seem related to this bug.  Thoughts?
Please review https://golang.org/cl/7300092

bradfitz · 2013-02-22T18:40:32Z

Comment 20:

This issue was closed by revision 9c7aa5f.

Status changed to Fixed.

bradfitz · 2013-04-18T03:05:18Z

Comment 21:

This issue was updated by revision 24555c7.

Fixes issue #5295
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/8536045

gopherbot added fixed labels Apr 18, 2013

rsc added this to the Go1.1 milestone Apr 14, 2015

rsc removed the go1.1 label Apr 14, 2015

golang locked and limited conversation to collaborators Jun 24, 2016

gopherbot added the FrozenDueToAge label Jun 24, 2016

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mime/multipart: allow unescaped CR and LF through #4771

mime/multipart: allow unescaped CR and LF through #4771

gopherbot commented Feb 7, 2013

bradfitz commented Feb 7, 2013

gopherbot commented Feb 7, 2013

gopherbot commented Feb 7, 2013

minux commented Feb 7, 2013

bradfitz commented Feb 7, 2013

gopherbot commented Feb 7, 2013

minux commented Feb 8, 2013

gopherbot commented Feb 11, 2013

minux commented Feb 11, 2013

bradfitz commented Feb 11, 2013

bradfitz commented Feb 11, 2013

bradfitz commented Feb 11, 2013

minux commented Feb 14, 2013

bradfitz commented Feb 14, 2013

bradfitz commented Feb 14, 2013

bradfitz commented Feb 14, 2013

minux commented Feb 19, 2013

bradfitz commented Feb 19, 2013

bradfitz commented Feb 19, 2013

bradfitz commented Feb 22, 2013

bradfitz commented Apr 18, 2013

mime/multipart: allow unescaped CR and LF through #4771

mime/multipart: allow unescaped CR and LF through #4771

Comments

gopherbot commented Feb 7, 2013

bradfitz commented Feb 7, 2013

gopherbot commented Feb 7, 2013

gopherbot commented Feb 7, 2013

minux commented Feb 7, 2013

bradfitz commented Feb 7, 2013

gopherbot commented Feb 7, 2013

minux commented Feb 8, 2013

gopherbot commented Feb 11, 2013

minux commented Feb 11, 2013

bradfitz commented Feb 11, 2013

bradfitz commented Feb 11, 2013

bradfitz commented Feb 11, 2013

minux commented Feb 14, 2013

bradfitz commented Feb 14, 2013

bradfitz commented Feb 14, 2013

bradfitz commented Feb 14, 2013

minux commented Feb 19, 2013

bradfitz commented Feb 19, 2013

bradfitz commented Feb 19, 2013

bradfitz commented Feb 22, 2013

bradfitz commented Apr 18, 2013