Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mime: BEncoding and QEncoding don't respect the 75 character limit in RFC2047 #12300

Closed
joegrasse opened this issue Aug 24, 2015 · 13 comments
Closed

Comments

@joegrasse
Copy link

The new mime.BEncoding.Encode and mime.QEncoding.Encode functions documented here, don't respect the 75 character line limit.

Excerpt from RFC2047:

An 'encoded-word' may not be more than 75 characters long, including
'charset', 'encoding', 'encoded-text', and delimiters. If it is
desirable to encode more text than will fit in an 'encoded-word' of
75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may
be used.

@alexcesaro
Copy link
Contributor

Are you noticing bugs because of it?

Automatically breaking encoded-words is complicated because of this:

Each 'encoded-word' MUST represent an integral number of characters.
A multi-octet character may not be split across adjacent 'encoded-word's.

Users can use the charset they want so we cannot be sure where to break encoded-words.

Also:

  1. The 75-char limit is optional according to the RFC.
  2. Users can manually break words where they want.
  3. Popular services like Gmail do not respect this 75-char limit.

That is why the 75-char limit was not implemented.
However if it is really causing bugs in email clients we could automatically break encoded-words when the charset is UTF-8 or when there is a space character for example.

@mikioh mikioh changed the title The mime BEncoding and QEncoding don't respect the 75 character limit in RFC2047 mime: BEncoding and QEncoding don't respect the 75 character limit in RFC2047 Aug 24, 2015
@joegrasse
Copy link
Author

  1. I am not seeing where the 75-char limit is optional in RFC2047. I could be overlooking it though.
  2. That is true, however if it is said that the functions encode according to RFC2047, and the 75-char limit isn't optional, then they should do it for you.
  3. What makes you think Gmail doesn't respect the 75-char limit?

@alexcesaro
Copy link
Contributor

  1. The word "may" usually indicates that a rule is optional. See RFC 2119.
  2. Just try sending an email with Gmail with a long subject containing special characters.

Again, are you getting bugs with an email client because of long encoded-words?

@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Aug 25, 2015
@joegrasse
Copy link
Author

  1. Good to know.
  2. I have tested gmail and it was doing line folding.

I have not done extensive testing yet.

I think at the very least the functions should do folding for UTF-8.
Also, is the header field value included or excluded from the char limit? I have seen some popular languages allow for this as a parameter.

@alexcesaro
Copy link
Contributor

I don't understand your last question. What do you mean by "header field value"?

@joegrasse
Copy link
Author

For example:

To: "Test" test@test.com
Subject: This is the subject

"To" and "Subject" would be what I was referring to. Some of the popular languages allow for passing of 4 for the "To" header and 9 for the "Subject" header. So they can take that into account when folding to 75 chars.

@joegrasse
Copy link
Author

After looking at this a little more, I am not sure that line folding is optional. You seem to think it is optional because of the use of "may" in the excerpt above. However, I believe that whenever they indicate requirement levels, they are capitalizing the keywords. If you notice in RFC 2047, there are several instances where may is capitalized (along with other keywords). Also in RFC 2119 the keywords are capitalized (Event though it does say "These words are often capitalized"). So, it is suspect that it isn't capitalized in the referred to instance.

@akavel
Copy link
Contributor

akavel commented Sep 9, 2015

I may be wrong, but I believe in English, while "may" is a "soft" qualifier, "may not" is a "hard" qualifier (similar as for "can" and "can not"). The RFC 2119 mentions MAY, but it doesn't mention MAY NOT, while it does mention both SHOULD and SHOULD NOT, etc. Also, in the specific RFC 2047 discussed here, there's (among others) another fragment with "may not", which I believe hints at the "hard"/"can not" meaning (emphasis added by me):

Each 'encoded-word' MUST encode an integral number of octets. The
'encoded-text' in each 'encoded-word' must be well-formed according
to the encoding specified; the 'encoded-text' may not be continued in
the next 'encoded-word'. (For example, "=?charset?Q?=?=
=?charset?Q?AB?=" would be illegal, because the two hex digits "AB"
must follow the "=" in the same 'encoded-word'.)

and another similar one just below it:

Each 'encoded-word' MUST represent an integral number of characters.
A multi-octet character may not be split across adjacent 'encoded-
word's.

@alexcesaro
Copy link
Contributor

I MAY do a CL to break words in UTF-8 😄
I had it done in a previous CL, I will try to find it.

@gopherbot
Copy link

CL https://golang.org/cl/14957 mentions this issue.

@joegrasse
Copy link
Author

@alexcesaro or @bradfitz, do either of you know which release this fix will be included?

@bradfitz
Copy link
Contributor

1.6

@joegrasse
Copy link
Author

Thanks.

@golang golang locked and limited conversation to collaborators Oct 24, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants