Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mime/multipart: allow unescaped CR and LF through #4771

Closed
gopherbot opened this issue Feb 7, 2013 · 21 comments
Closed

mime/multipart: allow unescaped CR and LF through #4771

gopherbot opened this issue Feb 7, 2013 · 21 comments
Milestone

Comments

@gopherbot
Copy link

by runningmaster:

Please see for details:
http://play.golang.org/p/CPJFI_qMM4

This is as an extension
https://golang.org/issue/4411

go version devel +6837136b5df8 Thu Feb 07 00:39:52 2013 +0800 linux/amd64
@bradfitz
Copy link
Contributor

bradfitz commented Feb 7, 2013

Comment 1:

That's too much code to understand.  Could you reduce that to a minimal example showing
the problem?
And the play.golang.org link is fine, but please also copy/paste the relevant parts into
the bug here also.

Status changed to WaitingForReply.

@gopherbot
Copy link
Author

Comment 2 by runningmaster:

This is minimal example showing the problem:
http://play.golang.org/p/fOQSjqr8Ri

@gopherbot
Copy link
Author

Comment 3 by runningmaster:

package main
import (
    "bytes"
    "fmt"
    "io"
    "mime/multipart"
    "strings"
)
var file string = `MIME-Version: 1.0
Content-Type: multipart/related; boundary="----=_NextPart_01CABE02.6F15EB80"
------=_NextPart_01CABE02.6F15EB80
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html; charset="us-ascii"
<html>
<body>
runningmaste�=
14;`s
issue
</body>
</html>
------=_NextPart_01CABE02.6F15EB80--
`
var want string = `<html>
<body>
runningmaster`s
issue
</body>
</html>
`
func main() {
    rdr := multipart.NewReader(strings.NewReader(file), "----=_NextPart_01CABE02.6F15EB80")
    prt, _ := rdr.NextPart()
    buf := &bytes.Buffer{}
    io.Copy(buf, prt)
    got := buf.String()
    if got != want {
        fmt.Printf("wrong part value:\n got: %q\n want: %q\n", got, want)
    }
    // Output: MUST BE "runningmaster`s issue" NOT "runningmaster`sissue" in browser
    /*
        wrong part value:
         got: "<html><body>runningmaster`sissue</body></html>"
         want: "<html>\n<body>\nrunningmaster`s\nissue\n</body>\n</html>\n\n"
    */
}

@minux
Copy link
Member

minux commented Feb 7, 2013

Comment 4:

the problem is whether qp-decode should preserve non-quoted "\r\n" or "\n"
sequences (or translate them into local line ending).
that is:
decode("foo\n\rbar") should return "foo\n\rbar".
from my reading of RFC 2045, i think it should.
however, "\r" and "\n" are explicitly ignored in pkg/mime/multipart/quotedprintable.go.
Brad?

@bradfitz
Copy link
Contributor

bradfitz commented Feb 7, 2013

Comment 5:

"foo\n\rbar" is a LFCR, not a CRLF.
So those bytes are invalid QP according to:
    (4)   Control characters other than TAB, or CR and LF as
          parts of CRLF pairs, must not appear. The same is true
          for octets with decimal values greater than 126.  If
          found in incoming quoted-printable data by a decoder, a
          robust implementation might exclude them from the
          decoded data and warn the user that illegal characters
          were discovered.
Did you mean \r\n instead?

@gopherbot
Copy link
Author

Comment 6 by runningmaster:

> "foo\n\rbar"
IMHO, this is typo, '\r\n' is correct.

@minux
Copy link
Member

minux commented Feb 8, 2013

Comment 7:

yeah, sorry, i mean \r\n.
i experimented with some other implementations, and i discovered that they
all seem to recognize \n or even \r as hard line break and preserve them, i'm
not sure whether we should do the same.
two little examples:
echo '<? echo quoted_printable_decode("a\rb\r\nc\nd\n\re"); ?>' | php
echo 'import quopri; print(quopri.decodestring("a\rb\r\nc\nd\n\re"))' | python

@gopherbot
Copy link
Author

Comment 8 by runningmaster:

> i experimented with some other implementations, and i discovered that they
all seem to recognize \n
Please inform me, will you fix this problem in Go or I must find a some workaround for
it?

@minux
Copy link
Member

minux commented Feb 11, 2013

Comment 9:

brad, what's your opinion?

@bradfitz
Copy link
Contributor

Comment 10:

Yes, I guess we could allow bare \r and \n when decoding.
I wouldn't use Python as our exemplar, though, since it can't even round-trip its input: 
>>> import quopri
>>> quopri.decodestring(quopri.encodestring("A\nB\r\nC"))
'A\nB\nC'
... not very impressive.

Labels changed: added go1.1, removed go1.1maybe.

Status changed to Accepted.

@bradfitz
Copy link
Contributor

Comment 11:

Status changed to Started.

@bradfitz
Copy link
Contributor

Comment 12:

Minux, please review this thoroughly: https://golang.org/cl/7300092
Please do not trust that it's correct.  Read the spec, patch this CL in, mess with it,
and compare Python and other popular decoders' answers.
I was even considering checking Python's answer with os/exec for each possible sequence
in the new test.  If you use sys.stdout.write(quopri.decodestring(..)) instead of print,
Python doesn't write the final newline.

Owner changed to @minux.

Status changed to WaitingForReply.

@minux
Copy link
Member

minux commented Feb 14, 2013

Comment 13:

i will compare Go's behavior with that of php and python's.
I also want to cross-check with at least one C/C++ implementation,
as there are many out there, any suggestions?

@bradfitz
Copy link
Contributor

Comment 14:

Mozilla Thunderbird? Send yourself an email with an attachment MIME part encoded as
quoted-printable, and then see the file that Thunderbird decodes?
Or this:
http://packages.debian.org/sid/qprint
There's also:
http://commons.apache.org/codec/apidocs/org/apache/commons/codec/net/QuotedPrintableCodec.html

@bradfitz
Copy link
Contributor

Comment 15:

Hah. Everybody sucks at this.
In addition to that Python module not being able to decode its own encodings, the
"qprint" command gets into states where it craps itself:
$ echo -n "0000= " | qprint -d
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of
input.
(spins forever)
This was from me trying to use qprint to check all possible sequences of up to 6 bytes
with an alphabet of "0A \r\n=":
+       if useQprint {
+           cmd := exec.Command("qprint", "-d")
+           cmd.Stdin = strings.NewReader(s)
+           want, err := cmd.Output()
+           if err != nil {
+               t.Errorf("qprint(%q) = err: %v", s, err)
+           } else {
+               if string(want) != buf.String() {
+                   t.Errorf("go decode(%q) = %q; qprint = %q", s, want, buf.String())
+               }
+           }
+       }
I'll try to detect qprint spinning now and work around it.
But at this point I trust nothing.

@bradfitz
Copy link
Contributor

Comment 16:

Modifying my test to be tolerant of qprint -d spinning on malformed input, the
exhaustive test reports plenty of differences.
I don't have time to look into them now, though.
quotedprintable_test.go:157:    go decode("=\nA\r\r ") = "A\n\n "; qprint = "A"
quotedprintable_test.go:157:    go decode("=\nA\r\r\r") = "A\n\n\n"; qprint = "A"
quotedprintable_test.go:157:    go decode("=\nA\r\r\n") = "A\n\n"; qprint = "A\r\n"
quotedprintable_test.go:157:    go decode("=\nA\r\n0") = "A\n0"; qprint = "A\r\n0"
quotedprintable_test.go:157:    go decode("=\nA\r\nA") = "A\nA"; qprint = "A\r\nA"
quotedprintable_test.go:157:    go decode("=\nA\r\n ") = "A\n "; qprint = "A\r\n"
quotedprintable_test.go:157:    go decode("=\nA\r\n\r") = "A\n\n"; qprint = "A\r\n"
quotedprintable_test.go:157:    go decode("=\nA\r\n\n") = "A\n\n"; qprint = "A\r\n\n"
quotedprintable_test.go:160:    qprint -d("=\nA\r= ") = Error: invalid character
"0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input.
quotedprintable_test.go:157:    go decode("=\nA\r=\r") = "A\n"; qprint = "A\r"
quotedprintable_test.go:157:    go decode("=\nA\r=\n") = "A\n"; qprint = "A\r"
quotedprintable_test.go:157:    go decode("=\nA\n0 ") = "A\n0 "; qprint = "A\n0"
quotedprintable_test.go:157:    go decode("=\nA\n0\r") = "A\n0\n"; qprint = "A\n0"
quotedprintable_test.go:157:    go decode("=\nA\nA ") = "A\nA "; qprint = "A\nA"
quotedprintable_test.go:157:    go decode("=\nA\nA\r") = "A\nA\n"; qprint = "A\nA"
quotedprintable_test.go:157:    go decode("=\nA\n 0") = "A\n 0"; qprint = "A\n0"
quotedprintable_test.go:157:    go decode("=\nA\n A") = "A\n A"; qprint = "A\nA"
....
(more attached)
Also attached my quotedprintable_test.go with the os/exec stuff in it.

Attachments:

  1. fail.txt (38005 bytes)
  2. quotedprintable_test.go (5138 bytes)

@minux
Copy link
Member

minux commented Feb 19, 2013

Comment 17:

yeah, every implementation is different in these aspects:
1. "=\r\r\r\n"  should it decode to "\r\r\n" (php), or "" (python)? (RFC 2045 says this
is illegal)
2. while space/tab at end of line, remove them or not? (RFC 2045 says so, but both
implementations
keep them)
relevant source code:
http://hg.python.org/cpython/file/aa77f7eb2bf1/Lib/quopri.py  [py3] or
http://hg.python.org/cpython/file/70274d53c1dd/Lib/quopri.py [py2] and
http://hg.python.org/cpython/file/aa77f7eb2bf1/Modules/binascii.c#l1224 [py3]
http://hg.python.org/cpython/file/70274d53c1dd/Modules/binascii.c#l1191 [py2]
(normally python will use binascii's C version, but C version and python version doesn't
agree with each other)
http://lxr.php.net/xref/php-src/ext/standard/quot_print.c
http://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/WebCore/platform/text/QuotedPrintable.cpp&sq=package:chromium
all three decoder disregard the 2nd space rule (except the pure python version, but
normally it's not used).
webkit's qp decoder is the most simple of the three, only ignore \r\n if immediately
preceded by =,
and in all other cases, it will pass the source data through.
php's decoder only treats "=\r", "=\n" and "=\r\n" as soft line breaks and remove them.
python's decoder will treat "=\n” or "=\r[^\n]*\n" as soft line breaks and remove
them, even "=\raaa\n" will decoded as "".
Given that all existing implementation don't agree with each other, my proposal for our
decoder is:
1. pass through \r, \n if they are not immediately preceded by =, this should fix this
particular issue.
2. disregard the aforementioned point 2 about removing while spaces at the line end, and
hope that
there won't be any broken MTA anymore.
3. treat "=\n", "=\r\n" as soft line breaks only, and give error for other cases (in
particular, "=\r" without
being followed by "\n" is an error)
What do you think?

@bradfitz
Copy link
Contributor

Comment 18:

SGTM. Thanks for doing this research!
What a mess.

@bradfitz
Copy link
Contributor

Comment 19:

The latest version of https://golang.org/cl/7300092 does and has tests for 1)
and 3), but I don't think it's worth intentionally breaking it to violate the RFC for 2)
because 2) doesn't seem related to this bug.  Thoughts?
Please review https://golang.org/cl/7300092

@bradfitz
Copy link
Contributor

Comment 20:

This issue was closed by revision 9c7aa5f.

Status changed to Fixed.

@bradfitz
Copy link
Contributor

Comment 21:

This issue was updated by revision 24555c7.

Fixes issue #5295
R=golang-dev, r
CC=golang-dev
https://golang.org/cl/8536045

@rsc rsc added this to the Go1.1 milestone Apr 14, 2015
@rsc rsc removed the go1.1 label Apr 14, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants