deal with files using \r\n or \r line endings #680

rsc · 2010-03-19T20:08:46Z

right now things misbehave.

rsc · 2010-03-19T20:13:18Z

Comment 1:

by things i mean the compilers do the wrong thing
because they expect \n.  we need to decide in the
language spec what to do and then do it.
in addition to semicolon insertion problems
there is the problem that `` strings spanning
lines get different bytes on different machines.

bradfitz · 2011-05-03T16:52:24Z

Comment 2:

I'd lex them all to \n, including inside ``. People wanting \r in backticks can
strings.Replace() them back into existence, which is already common practice in
net/http/mime testing code.
Additional problem is what gofmt should do. People's editors on Windows might not like
having their \r\n collapsed to \n.
gofmt could detect & preserve the line ending style, similar to what gri is doing with
detecting 1 or 2 line spacing between top-level declarations.

rsc · 2011-05-03T16:55:19Z

Comment 3:

that got squashed.  content-based heuristics are a slippery slope.

griesemer · 2011-05-04T23:10:35Z

Comment 4:

Typically, languages consider the following characters or character pairs as new line
indicators:
LF
CR
CR LF
(C#, Java, Python). C# also specifies three additional Unicode chars (next line char
U+2085, line sep char U+2028, and paragraph sep char U+2029).
Single CRs were used by old Macintosh OSs. We can probably ignore them. Programs
containing only CRs are likely not going to compile and thus at least a user is alerted
to the problem.
We could allow the additional Unicode chars, but I am not convinced it is important - in
the interest of simplicity I would not add them.
Since CR LR contains an LF, it will be properly recognized as a newline and things work
as expected with respect to parsing lines and semicolon insertion.
Thus, additionally inserted CRs in CR LF sequences are treated as white space and are
not visible to a program except if they occur in multi-line raw strings. Files compiled
on different platforms should all behave the same, so compiling a file on a Windows
machine should not result in different raw strings than the same file compiled on a Unix
machine.
It looks like there are two possible ways to go:
a) We don't change the language spec. It is the source file creator's responsibility to
be aware of potential extra CRs in multi-line raw strings (and often, it won't matter).
b) We change the language spec. As proposed before, one option is to say that all CR LF
sequences are replaced by LFs, in the scanning phase.
gofmt effectively does b) with every program except for raw strings which it preserves
untouched.

rsc · 2011-05-06T19:25:35Z

Comment 5:

I would be happy to say that in a Go source file,
even inside a raw string, \r\n is treated as \n.
Russ

rsc · 2011-12-09T19:49:03Z

Comment 7:

Labels changed: added priority-later.

rsc · 2011-12-12T20:00:40Z

Comment 8:

Labels changed: added priority-go1.

robpike · 2011-12-15T05:53:09Z

Comment 9:

Spec has been updated to state that \r is stripped from raw literals; compilers do not
enforce the rule yet.

griesemer · 2011-12-15T06:59:53Z

Comment 10:

On second thought, I am not convinced anymore this is good enough.
Should \r (the utf-8 byte) also be stripped from interpreted string literals? And if
not, why not?
In the spec we refer to "line breaks" explicitly (or mostly) as newline (which is
defined as \n), but for interpreted string literals we say that they cannot span
multiple lines. On a \r-based system, an interpreted string containing a \r byte will
make it appear on multiple lines; on a \n-based system it will appear on one. Is the
string legal or not?
Similarly for comments: Multi-line comments act like newline, which matters for
semicolon insertion. When is a comment mult-iline? It may depend on the system (newline
is defined, on the other hand).
I think we want to be able to take a given source (\r, \r\n, or \n-based) to be
reproduced into an equivalent program (e.g. w/ gofmt) on system with a different line
break.
One way out might be:
- We don't care about \r-based systems; a line break is present if there is a \n byte.
- Consequently, a string or comment spans multiple lines if there is a \n byte.
- \r bytes outside strings act as white space, they are ignored inside (all?) strings.
Thus, avoiding the notion of "multiple lines":
- A general comment containing newlines acts like a newline; otherwise it acts like a
space.
- An interpreted string may not contain newlines.
- \r chars are ignored in all strings. (?)

ianlancetaylor · 2011-12-15T07:18:35Z

Comment 11:

I don't think we care about systems which end lines with a plain \r.
I wouldn't bother to change the rules for interpreted string literals.  We set the rule
for raw string literals because of the potential confusion if a file changes from \r\n
lines to \n lines or vice-versa.  There is no such potential confusion for an
interpreted string literal, so there is no need to change anything.

gopherbot · 2011-12-15T07:34:50Z

Comment 12 by robert.griesemer:

We should still be more precise about what the meaning of "multiple lines" is, though.

rsc · 2011-12-15T15:50:36Z

Comment 13:

The recent spec change is implemented in the compiler.
I believe the only remaining change is to implement the
rule in go/token.
changeset:   f130f78eefa4
user:        Russ Cox <rsc@golang.org>
date:        Thu Dec 15 10:47:09 2011 -0500
summary:     gc: implement and test \r in raw strings
changeset:   2d9ac660f013
user:        Rob Pike <r@golang.org>
date:        Wed Dec 14 21:52:41 2011 -0800
summary:     spec: skip carriage returns in raw literals

robpike · 2011-12-19T23:31:05Z

Comment 14:

Owner changed to @griesemer.

griesemer · 2011-12-19T23:36:04Z

Comment 15:

This was implemented on the go/scanner side as well:
http://golang.org/cl/5495049
Marking as fixed.

Status changed to Fixed.

rsc added fixed labels Dec 19, 2011

rsc assigned griesemer Dec 19, 2011

rsc added this to the Go1 milestone Apr 10, 2015

rsc removed the priority-go1 label Apr 10, 2015

golang locked and limited conversation to collaborators Jun 24, 2016

gopherbot added the FrozenDueToAge label Jun 24, 2016

rsc unassigned griesemer Jun 22, 2022

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

deal with files using \r\n or \r line endings #680

deal with files using \r\n or \r line endings #680

rsc commented Mar 19, 2010

rsc commented Mar 19, 2010

bradfitz commented May 3, 2011

rsc commented May 3, 2011

griesemer commented May 4, 2011

rsc commented May 6, 2011

rsc commented Dec 9, 2011

rsc commented Dec 12, 2011

robpike commented Dec 15, 2011

griesemer commented Dec 15, 2011

ianlancetaylor commented Dec 15, 2011

gopherbot commented Dec 15, 2011

rsc commented Dec 15, 2011

robpike commented Dec 19, 2011

griesemer commented Dec 19, 2011

deal with files using \r\n or \r line endings #680

deal with files using \r\n or \r line endings #680

Comments

rsc commented Mar 19, 2010

rsc commented Mar 19, 2010

bradfitz commented May 3, 2011

rsc commented May 3, 2011

griesemer commented May 4, 2011

rsc commented May 6, 2011

rsc commented Dec 9, 2011

rsc commented Dec 12, 2011

robpike commented Dec 15, 2011

griesemer commented Dec 15, 2011

ianlancetaylor commented Dec 15, 2011

gopherbot commented Dec 15, 2011

rsc commented Dec 15, 2011

robpike commented Dec 19, 2011

griesemer commented Dec 19, 2011