Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

net/mail: Subject header is not RFC2047-decoded #4687

Closed
gopherbot opened this issue Jan 22, 2013 · 26 comments
Closed

net/mail: Subject header is not RFC2047-decoded #4687

gopherbot opened this issue Jan 22, 2013 · 26 comments
Milestone

Comments

@gopherbot
Copy link

by mstplbrg:

What steps will reproduce the problem?
If possible, include a link to a program on play.golang.org.
1. Obtain an email which contains any non US-ASCII characters in its subject.
2. Use a Go program such as this one: http://play.golang.org/p/qTBmJMvE4y
3. Observe that the “From” address gets correctly decoded while the “Subject”
header doesn’t.

What is the expected output?
from Foo Käm
subject [PATCH] generate-command-parser: support <number>s, state ID replacing
and…

What do you see instead?
from Foo Käm
subject [PATCH] =?UTF-8?q?generate-command-parser:=20support=20<number>s,=20?=
=?UTF-8?q?state=20ID=20replacing=20and=E2=80=A6?=


Which compiler are you using (5g, 6g, 8g, gccgo)?
6g

Which operating system are you using?
Linux

Which version are you using?  (run 'go version')
go1, but the relevant code is unchanged in trunk
@gopherbot
Copy link
Author

Comment 1 by sunfmin:

I think it's better to have proper Reader and Writer for RFC 2047 only ?

@bradfitz
Copy link
Contributor

Comment 2:

An RFC 2047 Reader/Writer would be fine, but we should probably just do the right thing
by default and decode headers.

Labels changed: added suggested.

Status changed to Accepted.

@rsc
Copy link
Contributor

rsc commented Jan 29, 2013

Comment 3:

It's not entirely trivial. If you start decoding headers you have to allow the user to
drop in new character set definitions, which means defining character sets and so on. I
thought we had this code somewhere in the tree already, though.

@rsc
Copy link
Contributor

rsc commented Jan 29, 2013

Comment 4:

Or maybe I am thinking of http://code.google.com/p/rsc/source/browse/imap/decode.go.

@dsymonds
Copy link
Contributor

Comment 5:

The From header is considered structured by RFC 5322, and has a particular format, which
is why net/mail decodes it.
Subject and most other mail headers are considered unstructured
(https://tools.ietf.org/html/rfc5322#section-2.2.1), and may carry arbitrary US-ASCII
characters.
I think we should expose decoders (and maybe encoders) for the relevant formats (e.g.
the UTF-8 "Q" encoding) and leave it at that.

Labels changed: added packagechange.

@rsc
Copy link
Contributor

rsc commented Jan 30, 2013

Comment 6:

Labels changed: added priority-later, removed priority-triage.

@rsc
Copy link
Contributor

rsc commented Jan 30, 2013

Comment 7:

If you want to suggest this, please describe the API you want in the CL.

Labels changed: removed suggested.

@gopherbot
Copy link
Author

Comment 8 by sunfmin:

Just for your reference, I saw https://github.com/sloonz/go-qprintable and
https://github.com/sloonz/go-mime-message are related go packages

@gopherbot
Copy link
Author

Comment 9 by goneri@rulezlan.org:

I changed decodeRFC2047Word() function a bit in met/mail/message.go to be able to
reuse it in Get() method. This do the trick for me. What I changed:
- do not return "" in case of error
- relax a bit RFC2047 validity check, this was require for some of my sample
 mails and I think it's armless.
--- message.go.goneri   2013-03-09 17:04:34.216090509 +0100
+++ message.go  2013-03-09 18:27:36.480099769 +0100
@@ -107,7 +107,8 @@
 // Get gets the first value associated with the given key.
 // If there are no values associated with the key, Get returns "".
 func (h Header) Get(key string) string {
-       return textproto.MIMEHeader(h).Get(key)
+       v, _ := decodeRFC2047Word(textproto.MIMEHeader(h).Get(key))
+       return v
 }
 
 var ErrHeaderNotPresent = errors.New("mail: header not in message")
@@ -437,14 +438,13 @@
 
 func decodeRFC2047Word(s string) (string, error) {
        fields := strings.Split(s, "?")
-       if len(fields) != 5 || fields[0] != "=" || fields[4] != "=" {
-               return "", errors.New("mail: address not RFC 2047 encoded")
+       if len(fields) != 5 || fields[0] != "=" {
+               return s, errors.New("mail: address not RFC 2047 encoded")
        }
        charset, enc := strings.ToLower(fields[1]), strings.ToLower(fields[2])
        if charset != "iso-8859-1" && charset != "utf-8" {
-               return "", fmt.Errorf("mail: charset not supported: %q", charset)
+               return s, fmt.Errorf("mail: charset not supported: %q", charset)
        }
-
        in := bytes.NewBufferString(fields[3])
        var r io.Reader
        switch enc {
@@ -453,12 +453,12 @@
        case "q":
                r = qDecoder{r: in}
        default:
-               return "", fmt.Errorf("mail: RFC 2047 encoding not supported: %q", enc)
+               return s, fmt.Errorf("mail: RFC 2047 encoding not supported: %q", enc)
        }
 
        dec, err := ioutil.ReadAll(r)
        if err != nil {
-               return "", err
+               return s, err
        }
 
        switch charset {

@rsc
Copy link
Contributor

rsc commented Mar 11, 2013

Comment 10:

I think we've missed the cutoff for Go 1.1. I'd like to take the time to get the API
right, and we don't have that luxury right now. This will probably have to wait.

Labels changed: removed go1.1.

@gopherbot
Copy link
Author

Comment 11 by webustany:

I've also come across emails where only some words are RFC2047 encoded... The patch I
have currently is
abustany/go@c2a663c , which
seems to work quite OK. On massive email parsing tasks, I guess the regexp has a
performance cost, but I haven't quantified it.

@ianlancetaylor
Copy link
Contributor

Comment 12:

Labels changed: added go1.2maybe.

@rsc
Copy link
Contributor

rsc commented Jul 30, 2013

Comment 13:

Labels changed: added feature.

@robpike
Copy link
Contributor

robpike commented Aug 19, 2013

Comment 14:

Not ready for Go 1.2.

Labels changed: added go1.3maybe, removed go1.2maybe.

@robpike
Copy link
Contributor

robpike commented Aug 20, 2013

Comment 15:

Labels changed: removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 16:

Labels changed: added go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Nov 27, 2013

Comment 17:

Labels changed: removed feature.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 18:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Copy link
Contributor

rsc commented Dec 4, 2013

Comment 19:

Labels changed: added repo-main.

@mark-kubacki
Copy link

Comment 20:

I've updated Adrien Abustany's contribution by adding:
• elimination of whitespace between 'encoded-word's
• unit tests
You can find it here:
https://github.com/wmark/ossdl-overlay/blob/a5518438cb56f5e39f8c531df279ff9d37e904a6/dev-lang/go/files/go-1.3.0-net-mail-Decode-RFC2047-encoded-headers.patch

@mark-kubacki
Copy link

Comment 21:

I've updated Adrien Abustany's contribution by adding:
• elimination of whitespace between 'encoded-word's
• unit tests
You can find it here (new link):
https://github.com/wmark/ossdl-overlay/blob/master/dev-lang/go/files/go-1.3.0-net-mail-Decode-RFC2047-encoded-headers.patch

@ianlancetaylor
Copy link
Contributor

Comment 22:

To contribute to Go please follow the directions at
http://golang.org/doc/contribute.html.  Thanks.

@gopherbot
Copy link
Author

Comment 23:

CL https://golang.org/cl/101330049 mentions this issue.

@alexcesaro
Copy link
Contributor

Comment 24:

I made a CL that can fix this issue: https://golang.org/cl/101330049
Headers can be decoded with the DecodeHeader function. This function returns the decoded
text and the charset name. It does not convert the decoded string to UTF-8.
I think it is the most simple and flexible way to solve this issue. Users often ask for
more charsets support: issue #6611, issue #7140, issue #7079. But converting to UTF-8
inside this function means that we will have either a bloated binary or a lack of
charset support.
One small caveat is that this function cannot decode headers with multiple encoded-words
with different charsets but I think that is quite unusual.

@gopherbot
Copy link
Author

Comment 25:

CL https://golang.org/cl/132680044 mentions this issue.

@gopherbot
Copy link
Author

CL https://golang.org/cl/7890 mentions this issue.

@mikioh mikioh modified the milestones: Unplanned, Go1.5 May 15, 2015
@golang golang locked and limited conversation to collaborators Jun 24, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants