-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mime/multipart: allow unescaped CR and LF through #4771
Labels
Milestone
Comments
This is minimal example showing the problem: http://play.golang.org/p/fOQSjqr8Ri |
package main import ( "bytes" "fmt" "io" "mime/multipart" "strings" ) var file string = `MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_NextPart_01CABE02.6F15EB80" ------=_NextPart_01CABE02.6F15EB80 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset="us-ascii" <html> <body> runningmaste�= 14;`s issue </body> </html> ------=_NextPart_01CABE02.6F15EB80-- ` var want string = `<html> <body> runningmaster`s issue </body> </html> ` func main() { rdr := multipart.NewReader(strings.NewReader(file), "----=_NextPart_01CABE02.6F15EB80") prt, _ := rdr.NextPart() buf := &bytes.Buffer{} io.Copy(buf, prt) got := buf.String() if got != want { fmt.Printf("wrong part value:\n got: %q\n want: %q\n", got, want) } // Output: MUST BE "runningmaster`s issue" NOT "runningmaster`sissue" in browser /* wrong part value: got: "<html><body>runningmaster`sissue</body></html>" want: "<html>\n<body>\nrunningmaster`s\nissue\n</body>\n</html>\n\n" */ } |
the problem is whether qp-decode should preserve non-quoted "\r\n" or "\n" sequences (or translate them into local line ending). that is: decode("foo\n\rbar") should return "foo\n\rbar". from my reading of RFC 2045, i think it should. however, "\r" and "\n" are explicitly ignored in pkg/mime/multipart/quotedprintable.go. Brad? |
"foo\n\rbar" is a LFCR, not a CRLF. So those bytes are invalid QP according to: (4) Control characters other than TAB, or CR and LF as parts of CRLF pairs, must not appear. The same is true for octets with decimal values greater than 126. If found in incoming quoted-printable data by a decoder, a robust implementation might exclude them from the decoded data and warn the user that illegal characters were discovered. Did you mean \r\n instead? |
yeah, sorry, i mean \r\n. i experimented with some other implementations, and i discovered that they all seem to recognize \n or even \r as hard line break and preserve them, i'm not sure whether we should do the same. two little examples: echo '<? echo quoted_printable_decode("a\rb\r\nc\nd\n\re"); ?>' | php echo 'import quopri; print(quopri.decodestring("a\rb\r\nc\nd\n\re"))' | python |
Yes, I guess we could allow bare \r and \n when decoding. I wouldn't use Python as our exemplar, though, since it can't even round-trip its input: >>> import quopri >>> quopri.decodestring(quopri.encodestring("A\nB\r\nC")) 'A\nB\nC' ... not very impressive. Labels changed: added go1.1, removed go1.1maybe. Status changed to Accepted. |
Minux, please review this thoroughly: https://golang.org/cl/7300092 Please do not trust that it's correct. Read the spec, patch this CL in, mess with it, and compare Python and other popular decoders' answers. I was even considering checking Python's answer with os/exec for each possible sequence in the new test. If you use sys.stdout.write(quopri.decodestring(..)) instead of print, Python doesn't write the final newline. Owner changed to @minux. Status changed to WaitingForReply. |
Mozilla Thunderbird? Send yourself an email with an attachment MIME part encoded as quoted-printable, and then see the file that Thunderbird decodes? Or this: http://packages.debian.org/sid/qprint There's also: http://commons.apache.org/codec/apidocs/org/apache/commons/codec/net/QuotedPrintableCodec.html |
Hah. Everybody sucks at this. In addition to that Python module not being able to decode its own encodings, the "qprint" command gets into states where it craps itself: $ echo -n "0000= " | qprint -d Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input. Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input. Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input. Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input. Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input. Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input. Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input. Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input. Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input. Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input. (spins forever) This was from me trying to use qprint to check all possible sequences of up to 6 bytes with an alphabet of "0A \r\n=": + if useQprint { + cmd := exec.Command("qprint", "-d") + cmd.Stdin = strings.NewReader(s) + want, err := cmd.Output() + if err != nil { + t.Errorf("qprint(%q) = err: %v", s, err) + } else { + if string(want) != buf.String() { + t.Errorf("go decode(%q) = %q; qprint = %q", s, want, buf.String()) + } + } + } I'll try to detect qprint spinning now and work around it. But at this point I trust nothing. |
Modifying my test to be tolerant of qprint -d spinning on malformed input, the exhaustive test reports plenty of differences. I don't have time to look into them now, though. quotedprintable_test.go:157: go decode("=\nA\r\r ") = "A\n\n "; qprint = "A" quotedprintable_test.go:157: go decode("=\nA\r\r\r") = "A\n\n\n"; qprint = "A" quotedprintable_test.go:157: go decode("=\nA\r\r\n") = "A\n\n"; qprint = "A\r\n" quotedprintable_test.go:157: go decode("=\nA\r\n0") = "A\n0"; qprint = "A\r\n0" quotedprintable_test.go:157: go decode("=\nA\r\nA") = "A\nA"; qprint = "A\r\nA" quotedprintable_test.go:157: go decode("=\nA\r\n ") = "A\n "; qprint = "A\r\n" quotedprintable_test.go:157: go decode("=\nA\r\n\r") = "A\n\n"; qprint = "A\r\n" quotedprintable_test.go:157: go decode("=\nA\r\n\n") = "A\n\n"; qprint = "A\r\n\n" quotedprintable_test.go:160: qprint -d("=\nA\r= ") = Error: invalid character "0xFFFFFFFF" in soft line break sequence at byte 5 (0x5) of input. quotedprintable_test.go:157: go decode("=\nA\r=\r") = "A\n"; qprint = "A\r" quotedprintable_test.go:157: go decode("=\nA\r=\n") = "A\n"; qprint = "A\r" quotedprintable_test.go:157: go decode("=\nA\n0 ") = "A\n0 "; qprint = "A\n0" quotedprintable_test.go:157: go decode("=\nA\n0\r") = "A\n0\n"; qprint = "A\n0" quotedprintable_test.go:157: go decode("=\nA\nA ") = "A\nA "; qprint = "A\nA" quotedprintable_test.go:157: go decode("=\nA\nA\r") = "A\nA\n"; qprint = "A\nA" quotedprintable_test.go:157: go decode("=\nA\n 0") = "A\n 0"; qprint = "A\n0" quotedprintable_test.go:157: go decode("=\nA\n A") = "A\n A"; qprint = "A\nA" .... (more attached) Also attached my quotedprintable_test.go with the os/exec stuff in it. Attachments:
|
yeah, every implementation is different in these aspects: 1. "=\r\r\r\n" should it decode to "\r\r\n" (php), or "" (python)? (RFC 2045 says this is illegal) 2. while space/tab at end of line, remove them or not? (RFC 2045 says so, but both implementations keep them) relevant source code: http://hg.python.org/cpython/file/aa77f7eb2bf1/Lib/quopri.py [py3] or http://hg.python.org/cpython/file/70274d53c1dd/Lib/quopri.py [py2] and http://hg.python.org/cpython/file/aa77f7eb2bf1/Modules/binascii.c#l1224 [py3] http://hg.python.org/cpython/file/70274d53c1dd/Modules/binascii.c#l1191 [py2] (normally python will use binascii's C version, but C version and python version doesn't agree with each other) http://lxr.php.net/xref/php-src/ext/standard/quot_print.c http://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/WebCore/platform/text/QuotedPrintable.cpp&sq=package:chromium all three decoder disregard the 2nd space rule (except the pure python version, but normally it's not used). webkit's qp decoder is the most simple of the three, only ignore \r\n if immediately preceded by =, and in all other cases, it will pass the source data through. php's decoder only treats "=\r", "=\n" and "=\r\n" as soft line breaks and remove them. python's decoder will treat "=\n” or "=\r[^\n]*\n" as soft line breaks and remove them, even "=\raaa\n" will decoded as "". Given that all existing implementation don't agree with each other, my proposal for our decoder is: 1. pass through \r, \n if they are not immediately preceded by =, this should fix this particular issue. 2. disregard the aforementioned point 2 about removing while spaces at the line end, and hope that there won't be any broken MTA anymore. 3. treat "=\n", "=\r\n" as soft line breaks only, and give error for other cases (in particular, "=\r" without being followed by "\n" is an error) What do you think? |
The latest version of https://golang.org/cl/7300092 does and has tests for 1) and 3), but I don't think it's worth intentionally breaking it to violate the RFC for 2) because 2) doesn't seem related to this bug. Thoughts? Please review https://golang.org/cl/7300092 |
This issue was closed by revision 9c7aa5f. Status changed to Fixed. |
This issue was updated by revision 24555c7. Fixes issue #5295 R=golang-dev, r CC=golang-dev https://golang.org/cl/8536045 |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
by runningmaster:
The text was updated successfully, but these errors were encountered: