Why is the line terminator CR+LF?

Date:March 18, 2004 / year-entry #104
Tags:history
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20040318-00/?p=40193
Comments:    40
Summary:This protocol dates back to the days of teletypewriters. CR stands for "carriage return" - the CR control character returned the print head ("carriage") to column 0 without advancing the paper. LF stands for "linefeed" - the LF control character advanced the paper one line without moving the print head. So if you wanted to...

This protocol dates back to the days of teletypewriters. CR stands for "carriage return" - the CR control character returned the print head ("carriage") to column 0 without advancing the paper. LF stands for "linefeed" - the LF control character advanced the paper one line without moving the print head. So if you wanted to return the print head to column zero (ready to print the next line) and advance the paper (so it prints on fresh paper), you need both CR and LF.

If you go to the various internet protocol documents, such as RFC 0821 (SMTP), RFC 1939 (POP), RFC 2060 (IMAP), or RFC 2616 (HTTP), you'll see that they all specify CR+LF as the line termination sequence. So the the real question is not "Why do CP/M, MS-DOS, and Win32 use CR+LF as the line terminator?" but rather "Why did other people choose to differ from these standards documents and use some other line terminator?"

Unix adopted plain LF as the line termination sequence. If you look at the stty options, you'll see that the onlcr option specifies whether a LF should be changed into CR+LF. If you get this setting wrong, you get stairstep text, where

each
    line
        begins

where the previous line left off. So even unix, when left in raw mode, requires CR+LF to terminate lines. The implicit CR before LF is a unix invention, probably as an economy, since it saves one byte per line.

The unix ancestry of the C language carried this convention into the C language standard, which requires only "\n" (which encodes LF) to terminate lines, putting the burden on the runtime libraries to convert raw file data into logical lines.

The C language also introduced the term "newline" to express the concept of "generic line terminator". I'm told that the ASCII committee changed the name of character 0x0A to "newline" around 1996, so the confusion level has been raised even higher.

Here's another discussion of the subject, from a unix perspective.


Comments (40)
  1. Could Microsoft consider adding Unix style text file support to Longhorn version of Notepad?

    Go on.

    Please.

  2. Tim Robinson says:

    Visual Studio gets this almost right: it has an Advanced Save Options dialog for choosing the character set — and the line terminator. It defaults to Current Setting but also has options for Windows, Mac and Unix.

    However, I believe by "Current Setting", it means "Windows". If I want a text file to stay in Unix format, I have to choose Unix before each time I save it.

    I’d prefer it if the EDIT control (i.e. Notepad) just displayed LF lines properly. RichEdit (Wordpad) already handles this right.

  3. Raymond Chen says:

    Notepad is just a glorified edit control.

    What should happen if a file contains mixed line terminators? Say a third of the lines end in CR, a third in CRLF and a third in LF. Should each line preserve its own terminator? What terminator should be assigned to new lines? Does this triple the number of options in the "Save as" box?

    I don’t have the numbers to prove it, but I suspect "lack of support for files that use only LF as line terminator" does not rank very high on the "most frequently requested features" list for Windows.

    Besides, if you’re a unix geek you probably hate notepad anyway and use vi or emacs for all your editing anyway. (Personally I use vi.)

  4. Ben Hutchings says:

    Tim: In my experience, the Visual Studio editor preserves line endings and automatically decides what characters to use when adding new line breaks. Saving with "Current Setting" preserves these. Sometimes the automatic detection goes wrong and it uses plain CRs for line breaks even though we never choose to use these. Then I get weird error messages from the C++ compiler because doesn’t see the CR after a preprocessor directive as being a line break (it accepts LF or CR+LF but not plain CR).

  5. I think Tim has the right solution. The data stays the same, the file written out to disk stays the same but the edit control starts a new line if it sees a CRLF or a LF (based on a window style so it doesn’t break other applications).

    This might cause problems if the user were to load a file that only used LF line breaks and inserted a new line (should Notepad insert a CRLF or try and do something clever?).

    As for it not being high on the list of requested features, I agree it isn’t that important but I am sure there are loads of new features in each version of Windows that would come further down most people’s wish lists than this. How many requests were there for an animated dog in the file finder window for XP?

  6. Raymond Chen says:

    There were requests that file searching be made easier. The animated dog was one interpretation of how that could be accomplished.

  7. David Kafrissen says:

    Greetings,

    My question about two characters to represent newline revolves around random access files.

    In comparison, suppose you are reading a file with read sequentially. In this case if the C Runtime converts those two characters to n for me, no big deal.

    But what happens if I am using the file in random access mode, and absolute positioning matters, for example, I am using fseek or seek.

    Dave

  8. Joku says:

    I would definetely appreciate if notepad would read the unix files properly. A lot of useful GPL etc utilities can run on windows, and i can’t tell you how often i’m annoyed to find that reading the help files or whatever requires either getting a notepad replacement or loading it up in some too-heavy-app-for-the-task-at-hand.

    Please!

  9. Dan Crevier says:

    And, of course, just to be different, the Macintosh uses just CR. But then, on Mac OS X, with its unix roots, files either have the unix-style LF, or the old-Mac style CR. Or, maybe you have a file from Windows with CRLF. What a mess.

    WordPad actually does a good job handling the different sorts of line endings.

  10. B.Y. says:

    >There were requests that file searching be made easier. The animated dog was one interpretation of how that could be accomplished.

    Hey I really like that animated dog office assistant (Rocky). I think most people don’t dislike the office assistant feature per se, but they dislike those assistants. The office team should to get rid of characters like clippy and Einstein, and put in more cute animals, flowers, swans, sexy babes, etc. Then people will use the help more often and may actually read some manuals.

  11. Joel Dinda says:

    A former teletype operator chimes in (only slightly relevant, I suppose, but what the hey!):

    When I was in the army, they taught us to end lines <CR><CR><LF> so the teletypes actually had time to move the printhead back to the left side before beginning the new line. The reason, apparently, was that the machines were tuned to operate at a higher speed than their design.

    Strange what memories you trigger in folks, Raymond.

    jowo

  12. Um the dog in windows is Rover (right click on him and select "choose a different animated character"), not Rocky. And Microsoft can’t add sexy babes (or sexy guys for that matter).

    Cute Animals, Flowers, Trees, Hot Rods, Motorcycles, NASCAR rigs would be fine, but you can’t do sexy people.

  13. Mike Weiss says:

    We just found a problem with some of our (C++) source files. It seems some of our VS.NET macros that inserted code into the active file (for coding-standards stuff) were using just "vbLf" and not "vbCrLf" (remember that VS.NET macros are VB.NET). So we had files with mixed line endings. We actually never noticed any problems (compling and editing worked just fine). I don’t know if "cl.exe" is tolerent to this or we just got "lucky" since lines with the UNIX style line endings were always on comment line…?

    We only noticed this when converting from Visual Source Safe to ClearCase. ClearCase converted the line endings to CR LFs, which showed-up as differences when compairing the old VSS working directories to the new ClearCase "Views".

  14. Andreas Häber says:

    B.Y. wrote:

    "The office team should to get rid of characters like clippy and Einstein, and put in more cute animals, flowers, swans, sexy babes, etc. Then people will use the help more often and may actually read some manuals."

    I actually like the office assistants too :) Maybe we should start a fan-club? :p

    It really sounds like nirvana if more people started using help/reading manuals.

    World would become a lot of nicer place! People who are having problem accomplishing a task will sit down and read for ten-fifteen minutes and be able to accomplish their task. Instead of going angry downtown and start fights, before calling up some poor man/girl who have read the manual and ask for help. :)

  15. Raymond Chen says:

    David: seek() is not a C standard function, and fseek() is supported on text files only if you seek to 0 or to a value previously returned by ftell().

    If you want to do random seeking then you must open the file in binary mode if you intend to remain C-standards-compliant.

  16. asdf says:

    Anyone know the history of why text files need to end in ‘n’ (the text-mode ‘n’ that can be converted to whatever I mean). You can see this in a few places like header files and C/C++ source files for example. But I’ve never seen it enforced in practice.

  17. mikew says:

    The only annoying thing about Rover is when you select the "Without an animated screen character" option, he *animates* going away for about 5 seconds. If I don’t want the animation, why would I want animated feedback that the animation is off?

    OT but: I read somewhere that Clippy originally used Bayesian analysis to decide when you needed help, but marketing decided to increase the frequency so the feature would be more visible. Thereby turning a useful feature into an annoying one. (Can’t cite the source, so I don’t know how accurate this is.)

  18. Jerry Pisk says:

    I think that *nix uses a single character terminator because it’s just simpler (and faster) to match a single character than a sequence of two characters.

  19. Jordan Russell says:

    asdf wrote:

    "Anyone know the history of why text files need to end in ‘n’ (the text-mode ‘n’ that can be converted to whatever I mean). You can see this in a few places like header files and C/C++ source files for example. But I’ve never seen it enforced in practice."

    I can think of two reasons:

    1) When writing to a file, it’s easier to just print "n" after every line than to special-case the last line.

    2) On Unix if you "cat" a text file whose last line does not end in "n", the next prompt ends up being displayed on the same line, i.e.:

    [user@localhost]$ cat filename

    test[user@localhost]$

  20. Raymond Chen says:

    I blame punchcards.

  21. Sam Carter says:

    Mike Weiss wrote:

    I don’t know if "cl.exe" is tolerent to this or we just got "lucky" since lines with the UNIX style line endings were always on comment line…?

    Any conforming ANSI C compiler is required to treat all whitespace sequences as equivalent to a single space, where whitespace includes the characters ‘r’, ‘n’, ‘t’, and ‘ ‘ (it may include everything lower than 0x20 but I’m not sure about that). In short, you didn’t get lucky. Microsoft wrote a conforming compiler.

    –Sam

  22. David Kafrissen says:

    That’s interesting because we have the opposite problem.

    In one company that I worked at the standard development environment was Visual Studio 6. But the deployment environment became Solaris.

    If we copied the files over verbatim to Unix the C++ compiler on the Solaris box would choke on the line termination.

    What we used was an option in Source Safe which can "get" the files with the proper line termination for the appropriate platform.

    Regards,

    David

  23. David Kafrissen says:

    Raymond, you wrote:

    > David: seek() is not a C standard function, > and fseek() is supported on text files only > if you seek to 0 or to a value previously

    > returned by ftell().

    You reminded me that there is a difference between standard C and some things that are standard UNIX C but almost standard C, for example some C runtime routines that are listed in the K&R C book.

    These routines are prefaced with an underscore under windows.

    Is there a reason they were implemented with an underscore? In general I think Microsoft products are great, but in this case, it seems the only reason was to break compatibility.

    Regards,

    David Kafrissen

  24. njkayaker says:

    Visual studio works just fine (basically) with UNIX line termination. Things get a bit messy when copying and pasting. I think the standard for text in the clipboard is the CR-NL line termination.

    I suspect that NL (only) was used in UNIX because it was easier to write programs to process line-oriented data with a single line termination character. NL was chosen because it was the second char of the CR-NL standard (and the CR would be treated like trailing white space (typically, not a problem).

  25. Raymond Chen says:

    David: Including the functions without the leading underscore by default would have been a violation of the C standard. (7.1.3.2: "No other identifiers are reserved.")

    You can link with oldnames.lib to get the old nonstandard names back.

  26. Shane King says:

    Unix didn’t differ from those RFCs, those RFCs differ from Unix in my opinion. Unix predates them. It’s hard to differ from something that hasn’t been invented yet!

  27. I guess that LF+CR should also terminate a line? Viewed in notepad (Win2k) it seems to put in two linefeeds? ;-)

  28. Bleath says:

    That wretched animated dog is intolerable for two reasons:

    It’s distracting in the same way animated banner ads are. We evolved to pay attention to movement, and it’s not easy to focus on something *static* when there’s something *flashing* and *jumping* *around* right next to it. Our instincts say the animated dog is what we should be watching, but of course there’s no information there at all. It isn’t just harmlessly wasted screen real estate (like the bizarre "web content" white area I have to disable in explorer.exe, where it tells me that my text files contain text and .exes can be executed); it’s aggressively counterproductive. It’s an ANTI-feature. However benign the intent may have been, the effect is malignant.

    Secondly, it adds another completely unnecessary step to searching for files. I hit F3 because I want to search. Instead, I get a cartoon desperately trying to draw my attention away from a list of irritatingly verbose options — and the only meaningful option on the list appears third. The text of the options is black on a dim background, too, which inhibits readability.

    I’m sure the interface scored well in usability tests with people who don’t use computers, but people who *do* use computers use them, too. However, nobody reads low-contrast text more easily than high-contrast.

    You can turn off the stupid dog, at least, but you can’t turn off the list of stupid options. It’s a constant irritation.

    I’ve got Server 2003 on my desktop, thank God, wherein they fixed that monstrosity.

  29. njkayaker says:

    What I really like (sarcasm) is that when you disable the dog, you lose functionality. In particular, the context sensitive help in Excel’s function wizard (for Excel 2000 and earlier) seems to require the irritating animations. Anyway, how hard would it have been to include an unanimated animation (that was not an advertisment) in the choices?

    (By the way, the dog is "Rocky" in Excel 2000.)

  30. Mark D. Brown says:

    CR-LF pairing permitted a fixed-element printer (IBM Selectric technology, for example) to make two passes for underlining and to accept commands for externally-controlled Bi-directional printing.

  31. Peter Torr says:

    What I want to law is why so much documentation written on / for / about / by Unix-like OSes use double back-ticks and double apostrophes to get the equivalent of "smart quotes." It might look OK on a dumb text terminal, but it just looks ugly in any modern web browser.

  32. Norman Diamond says:

    3/18/2004 3:58 PM njkayaker:

    > I suspect that NL (only) was used in UNIX

    > because it was easier to write programs to

    > process line-oriented data with a single

    > line termination character.

    Yes, but the same was true of CR (only) which was really pretty conventional by then. By the way, remember that keyboards used to have a key labeled Return instead of Enter, and it was in the same place as a typewriter’s carriage return key? If a Teletype was being used for communication with another Teletype then it was necessary to add a linefeed manually, but if it was being used for input to a computer then no one wanted the pedantic irritation of having to add a linefeed manually. Even in Unix you can type a return (or Enter) which sends CR (only) to the computer, and the computer will echo it as CR-LF (unless you change it with stty) and provide the end-of-line indication to the program.

    > NL was chosen because it was the second char

    > of the CR-NL standard (and the CR would be

    > treated like trailing white space

    > (typically, not a problem).

    Yeah, but anyone who used any other OS before Unix still hates that. CR was more conventional and remains more intuitive.

    3/18/2004 1:35 PM Raymond Chen:

    > I blame punchcards.

    But that’s nonsense. Punched cards have an end-of-line after column 80. They don’t need a CR or LF or NL or EOL character at all. Paper tape needed an EOL (end-of-line) character, and CR-LF was a common convention for Teletype paper tape.

    3/18/2004 9:15 AM Joel Dinda:

    > A former teletype operator chimes in (only

    > slightly relevant,

    Very relevant.

    > When I was in the army, they taught us to

    > end lines <CR><CR><LF> so the teletypes

    > actually had time to move the printhead back

    > to the left side before beginning the new

    > line.

    And stty has options for the number of NUL characters to be output after a CR, for exactly the same reason.

    3/18/2004 8:51 AM Raymond Chen:

    > There were requests that file searching be

    > made easier.

    Fine.

    > The animated dog was one interpretation of

    > how that could be accomplished.

    <PUKE>

    By the way, in what way does the animated dog make searching easier? Does it read the user’s mind and automatically fill in the appropriate fields in the search options?

    This isn’t your car thread, but it fits. Suppose you’re driving in a country where the rules are the mirror image from what you’re used to. The floor pedals are in the same order that you’re used to, but controls such as the turn signal and windshield wipers are mirrored. Which of the following possibilities will make it easier for you to find controls when you need them?

    1. An animated dog display in the middle of the steering wheel.

    2. Labels on the controls.

  33. Anyone who’s ever done any string manipulations beyond the ones provided by an API library has probably wondered why is it that all the Windows files have 0x0d 0x0a internally as line terminators. Whereas if you download the stuff…

  34. Pauline says:

    I’m writing a program

  35. Raymond Chen says:

    Commenting on this article has been closed.

  36. Breaking the file into lines is taking a lot of time.

  37. I have talked about Chris Walker before.

    He is one of guys behind Notepad.exe for several versions,…

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index