New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/mail: Subject header is not RFC2047-decoded #4687
Labels
Milestone
Comments
Or maybe I am thinking of http://code.google.com/p/rsc/source/browse/imap/decode.go. |
The From header is considered structured by RFC 5322, and has a particular format, which is why net/mail decodes it. Subject and most other mail headers are considered unstructured (https://tools.ietf.org/html/rfc5322#section-2.2.1), and may carry arbitrary US-ASCII characters. I think we should expose decoders (and maybe encoders) for the relevant formats (e.g. the UTF-8 "Q" encoding) and leave it at that. Labels changed: added packagechange. |
Just for your reference, I saw https://github.com/sloonz/go-qprintable and https://github.com/sloonz/go-mime-message are related go packages |
Comment 9 by goneri@rulezlan.org: I changed decodeRFC2047Word() function a bit in met/mail/message.go to be able to reuse it in Get() method. This do the trick for me. What I changed: - do not return "" in case of error - relax a bit RFC2047 validity check, this was require for some of my sample mails and I think it's armless. --- message.go.goneri 2013-03-09 17:04:34.216090509 +0100 +++ message.go 2013-03-09 18:27:36.480099769 +0100 @@ -107,7 +107,8 @@ // Get gets the first value associated with the given key. // If there are no values associated with the key, Get returns "". func (h Header) Get(key string) string { - return textproto.MIMEHeader(h).Get(key) + v, _ := decodeRFC2047Word(textproto.MIMEHeader(h).Get(key)) + return v } var ErrHeaderNotPresent = errors.New("mail: header not in message") @@ -437,14 +438,13 @@ func decodeRFC2047Word(s string) (string, error) { fields := strings.Split(s, "?") - if len(fields) != 5 || fields[0] != "=" || fields[4] != "=" { - return "", errors.New("mail: address not RFC 2047 encoded") + if len(fields) != 5 || fields[0] != "=" { + return s, errors.New("mail: address not RFC 2047 encoded") } charset, enc := strings.ToLower(fields[1]), strings.ToLower(fields[2]) if charset != "iso-8859-1" && charset != "utf-8" { - return "", fmt.Errorf("mail: charset not supported: %q", charset) + return s, fmt.Errorf("mail: charset not supported: %q", charset) } - in := bytes.NewBufferString(fields[3]) var r io.Reader switch enc { @@ -453,12 +453,12 @@ case "q": r = qDecoder{r: in} default: - return "", fmt.Errorf("mail: RFC 2047 encoding not supported: %q", enc) + return s, fmt.Errorf("mail: RFC 2047 encoding not supported: %q", enc) } dec, err := ioutil.ReadAll(r) if err != nil { - return "", err + return s, err } switch charset { |
I've also come across emails where only some words are RFC2047 encoded... The patch I have currently is abustany/go@c2a663c , which seems to work quite OK. On massive email parsing tasks, I guess the regexp has a performance cost, but I haven't quantified it. |
I've updated Adrien Abustany's contribution by adding: • elimination of whitespace between 'encoded-word's • unit tests You can find it here: https://github.com/wmark/ossdl-overlay/blob/a5518438cb56f5e39f8c531df279ff9d37e904a6/dev-lang/go/files/go-1.3.0-net-mail-Decode-RFC2047-encoded-headers.patch |
I've updated Adrien Abustany's contribution by adding: • elimination of whitespace between 'encoded-word's • unit tests You can find it here (new link): https://github.com/wmark/ossdl-overlay/blob/master/dev-lang/go/files/go-1.3.0-net-mail-Decode-RFC2047-encoded-headers.patch |
To contribute to Go please follow the directions at http://golang.org/doc/contribute.html. Thanks. |
CL https://golang.org/cl/101330049 mentions this issue. |
I made a CL that can fix this issue: https://golang.org/cl/101330049 Headers can be decoded with the DecodeHeader function. This function returns the decoded text and the charset name. It does not convert the decoded string to UTF-8. I think it is the most simple and flexible way to solve this issue. Users often ask for more charsets support: issue #6611, issue #7140, issue #7079. But converting to UTF-8 inside this function means that we will have either a bloated binary or a lack of charset support. One small caveat is that this function cannot decode headers with multiple encoded-words with different charsets but I think that is quite unusual. |
CL https://golang.org/cl/132680044 mentions this issue. |
CL https://golang.org/cl/7890 mentions this issue. |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
by mstplbrg:
The text was updated successfully, but these errors were encountered: