net/mail: Subject header is not RFC2047-decoded #4687

gopherbot · 2013-01-22T14:39:38Z

by mstplbrg:

What steps will reproduce the problem?
If possible, include a link to a program on play.golang.org.
1. Obtain an email which contains any non US-ASCII characters in its subject.
2. Use a Go program such as this one: http://play.golang.org/p/qTBmJMvE4y
3. Observe that the “From” address gets correctly decoded while the “Subject”
header doesn’t.

What is the expected output?
from Foo Käm
subject [PATCH] generate-command-parser: support <number>s, state ID replacing
and…

What do you see instead?
from Foo Käm
subject [PATCH] =?UTF-8?q?generate-command-parser:=20support=20<number>s,=20?=
=?UTF-8?q?state=20ID=20replacing=20and=E2=80=A6?=


Which compiler are you using (5g, 6g, 8g, gccgo)?
6g

Which operating system are you using?
Linux

Which version are you using?  (run 'go version')
go1, but the relevant code is unchanged in trunk

gopherbot · 2013-01-25T06:29:22Z

Comment 1 by sunfmin:

I think it's better to have proper Reader and Writer for RFC 2047 only ?

bradfitz · 2013-01-29T16:30:01Z

Comment 2:

An RFC 2047 Reader/Writer would be fine, but we should probably just do the right thing
by default and decode headers.

Labels changed: added suggested.

Status changed to Accepted.

rsc · 2013-01-29T16:34:56Z

Comment 3:

It's not entirely trivial. If you start decoding headers you have to allow the user to
drop in new character set definitions, which means defining character sets and so on. I
thought we had this code somewhere in the tree already, though.

rsc · 2013-01-29T16:36:15Z

Comment 4:

Or maybe I am thinking of http://code.google.com/p/rsc/source/browse/imap/decode.go.

dsymonds · 2013-01-29T23:14:34Z

Comment 5:

The From header is considered structured by RFC 5322, and has a particular format, which
is why net/mail decodes it.
Subject and most other mail headers are considered unstructured
(https://tools.ietf.org/html/rfc5322#section-2.2.1), and may carry arbitrary US-ASCII
characters.
I think we should expose decoders (and maybe encoders) for the relevant formats (e.g.
the UTF-8 "Q" encoding) and leave it at that.

Labels changed: added packagechange.

rsc · 2013-01-30T18:07:03Z

Comment 6:

Labels changed: added priority-later, removed priority-triage.

rsc · 2013-01-30T18:07:25Z

Comment 7:

If you want to suggest this, please describe the API you want in the CL.

Labels changed: removed suggested.

gopherbot · 2013-02-01T02:38:32Z

Comment 8 by sunfmin:

Just for your reference, I saw https://github.com/sloonz/go-qprintable and
https://github.com/sloonz/go-mime-message are related go packages

gopherbot · 2013-03-09T17:43:36Z

Comment 9 by goneri@rulezlan.org:

I changed decodeRFC2047Word() function a bit in met/mail/message.go to be able to
reuse it in Get() method. This do the trick for me. What I changed:
- do not return "" in case of error
- relax a bit RFC2047 validity check, this was require for some of my sample
 mails and I think it's armless.
--- message.go.goneri   2013-03-09 17:04:34.216090509 +0100
+++ message.go  2013-03-09 18:27:36.480099769 +0100
@@ -107,7 +107,8 @@
 // Get gets the first value associated with the given key.
 // If there are no values associated with the key, Get returns "".
 func (h Header) Get(key string) string {
-       return textproto.MIMEHeader(h).Get(key)
+       v, _ := decodeRFC2047Word(textproto.MIMEHeader(h).Get(key))
+       return v
 }
 
 var ErrHeaderNotPresent = errors.New("mail: header not in message")
@@ -437,14 +438,13 @@
 
 func decodeRFC2047Word(s string) (string, error) {
        fields := strings.Split(s, "?")
-       if len(fields) != 5 || fields[0] != "=" || fields[4] != "=" {
-               return "", errors.New("mail: address not RFC 2047 encoded")
+       if len(fields) != 5 || fields[0] != "=" {
+               return s, errors.New("mail: address not RFC 2047 encoded")
        }
        charset, enc := strings.ToLower(fields[1]), strings.ToLower(fields[2])
        if charset != "iso-8859-1" && charset != "utf-8" {
-               return "", fmt.Errorf("mail: charset not supported: %q", charset)
+               return s, fmt.Errorf("mail: charset not supported: %q", charset)
        }
-
        in := bytes.NewBufferString(fields[3])
        var r io.Reader
        switch enc {
@@ -453,12 +453,12 @@
        case "q":
                r = qDecoder{r: in}
        default:
-               return "", fmt.Errorf("mail: RFC 2047 encoding not supported: %q", enc)
+               return s, fmt.Errorf("mail: RFC 2047 encoding not supported: %q", enc)
        }
 
        dec, err := ioutil.ReadAll(r)
        if err != nil {
-               return "", err
+               return s, err
        }
 
        switch charset {

rsc · 2013-03-11T16:36:35Z

Comment 10:

I think we've missed the cutoff for Go 1.1. I'd like to take the time to get the API
right, and we don't have that luxury right now. This will probably have to wait.

Labels changed: removed go1.1.

gopherbot · 2013-04-14T12:01:36Z

Comment 11 by webustany:

I've also come across emails where only some words are RFC2047 encoded... The patch I
have currently is
abustany/go@c2a663c , which
seems to work quite OK. On massive email parsing tasks, I guess the regexp has a
performance cost, but I haven't quantified it.

ianlancetaylor · 2013-07-21T04:21:41Z

Comment 12:

Labels changed: added go1.2maybe.

rsc · 2013-07-30T22:41:07Z

Comment 13:

Labels changed: added feature.

robpike · 2013-08-19T05:39:19Z

Comment 14:

Not ready for Go 1.2.

Labels changed: added go1.3maybe, removed go1.2maybe.

robpike · 2013-08-20T23:35:29Z

Comment 15:

Labels changed: removed go1.3maybe.

rsc · 2013-11-27T18:49:27Z

Comment 16:

Labels changed: added go1.3maybe.

rsc · 2013-11-27T20:29:16Z

Comment 17:

Labels changed: removed feature.

rsc · 2013-12-04T01:28:16Z

Comment 18:

Labels changed: added release-none, removed go1.3maybe.

rsc · 2013-12-04T01:48:38Z

Comment 19:

Labels changed: added repo-main.

mark-kubacki · 2014-06-03T13:34:31Z

Comment 20:

I've updated Adrien Abustany's contribution by adding:
• elimination of whitespace between 'encoded-word's
• unit tests
You can find it here:
https://github.com/wmark/ossdl-overlay/blob/a5518438cb56f5e39f8c531df279ff9d37e904a6/dev-lang/go/files/go-1.3.0-net-mail-Decode-RFC2047-encoded-headers.patch

mark-kubacki · 2014-06-03T13:44:23Z

Comment 21:

I've updated Adrien Abustany's contribution by adding:
• elimination of whitespace between 'encoded-word's
• unit tests
You can find it here (new link):
https://github.com/wmark/ossdl-overlay/blob/master/dev-lang/go/files/go-1.3.0-net-mail-Decode-RFC2047-encoded-headers.patch

ianlancetaylor · 2014-06-03T13:49:53Z

Comment 22:

To contribute to Go please follow the directions at
http://golang.org/doc/contribute.html.  Thanks.

gopherbot · 2014-06-19T10:06:55Z

Comment 23:

CL https://golang.org/cl/101330049 mentions this issue.

alexcesaro · 2014-06-19T10:26:49Z

Comment 24:

I made a CL that can fix this issue: https://golang.org/cl/101330049
Headers can be decoded with the DecodeHeader function. This function returns the decoded
text and the charset name. It does not convert the decoded string to UTF-8.
I think it is the most simple and flexible way to solve this issue. Users often ask for
more charsets support: issue #6611, issue #7140, issue #7079. But converting to UTF-8
inside this function means that we will have either a bloated binary or a lack of
charset support.
One small caveat is that this function cannot decode headers with multiple encoded-words
with different charsets but I think that is quite unusual.

gopherbot · 2014-10-15T09:55:20Z

Comment 25:

CL https://golang.org/cl/132680044 mentions this issue.

gopherbot · 2015-04-25T20:13:31Z

CL https://golang.org/cl/7890 mentions this issue.

gopherbot added accepted labels Oct 15, 2014

jonhoo mentioned this issue Mar 28, 2015

UTF-8 mail subjects are not decoded correctly jonhoo/hasmail#4

Closed

rsc added this to the Unplanned milestone Apr 10, 2015

rsc removed priority-later labels Apr 10, 2015

bradfitz closed this as completed in 2b03610 May 11, 2015

mikioh modified the milestones: Unplanned, Go1.5 May 15, 2015

golang locked and limited conversation to collaborators Jun 24, 2016

gopherbot added the FrozenDueToAge label Jun 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

net/mail: Subject header is not RFC2047-decoded #4687

net/mail: Subject header is not RFC2047-decoded #4687

gopherbot commented Jan 22, 2013

gopherbot commented Jan 25, 2013

bradfitz commented Jan 29, 2013

rsc commented Jan 29, 2013

rsc commented Jan 29, 2013

dsymonds commented Jan 29, 2013

rsc commented Jan 30, 2013

rsc commented Jan 30, 2013

gopherbot commented Feb 1, 2013

gopherbot commented Mar 9, 2013

rsc commented Mar 11, 2013

gopherbot commented Apr 14, 2013

ianlancetaylor commented Jul 21, 2013

rsc commented Jul 30, 2013

robpike commented Aug 19, 2013

robpike commented Aug 20, 2013

rsc commented Nov 27, 2013

rsc commented Nov 27, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

mark-kubacki commented Jun 3, 2014

mark-kubacki commented Jun 3, 2014

ianlancetaylor commented Jun 3, 2014

gopherbot commented Jun 19, 2014

alexcesaro commented Jun 19, 2014

gopherbot commented Oct 15, 2014

gopherbot commented Apr 25, 2015

net/mail: Subject header is not RFC2047-decoded #4687

net/mail: Subject header is not RFC2047-decoded #4687

Comments

gopherbot commented Jan 22, 2013

gopherbot commented Jan 25, 2013

bradfitz commented Jan 29, 2013

rsc commented Jan 29, 2013

rsc commented Jan 29, 2013

dsymonds commented Jan 29, 2013

rsc commented Jan 30, 2013

rsc commented Jan 30, 2013

gopherbot commented Feb 1, 2013

gopherbot commented Mar 9, 2013

rsc commented Mar 11, 2013

gopherbot commented Apr 14, 2013

ianlancetaylor commented Jul 21, 2013

rsc commented Jul 30, 2013

robpike commented Aug 19, 2013

robpike commented Aug 20, 2013

rsc commented Nov 27, 2013

rsc commented Nov 27, 2013

rsc commented Dec 4, 2013

rsc commented Dec 4, 2013

mark-kubacki commented Jun 3, 2014

mark-kubacki commented Jun 3, 2014

ianlancetaylor commented Jun 3, 2014

gopherbot commented Jun 19, 2014

alexcesaro commented Jun 19, 2014

gopherbot commented Oct 15, 2014

gopherbot commented Apr 25, 2015