How can code that tries to prevent a buffer overflow end up causing one?

Date:January 7, 2005 / year-entry #6
Tags:code;history
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20050107-00/?p=36773
Comments:    65
Summary:If you read your language specification, you'll find that the ...ncpy functions have extremely strange semantics. The strncpy function copies the initial count characters of strSource to strDest and returns strDest. If count is less than or equal to the length of strSource, a null character is not appended automatically to the copied string. If...

If you read your language specification, you'll find that the ...ncpy functions have extremely strange semantics.

The strncpy function copies the initial count characters of strSource to strDest and returns strDest. If count is less than or equal to the length of strSource, a null character is not appended automatically to the copied string. If count is greater than the length of strSource, the destination string is padded with null characters up to length count.

In pictures, here's what happens in various string copying scenarios.

strncpy(strDest, strSrc, 5)
strSource
W e l c o m e \0
strDest
W e l c o
observe no null terminator
 
strncpy(strDest, strSrc, 5)
strSource
H e l l o \0
strDest
H e l l o
observe no null terminator
 
strncpy(strDest, strSrc, 5)
strSource
H i \0
strDest
H i \0 \0 \0
observe null padding to end of strDest

Why do these functions have such strange behavior?

Go back to the early days of UNIX. Personally, I only go back as far as System V. In System V, file names could be up to 14 characters long. Anything longer was truncated to 14. And the field for storing the file name was exactly 14 characters. Not 15. The null terminator was implied. This saved one byte.

Here are some file names and their corresponding directory entries:

passwd
p a s s w d \0 \0 \0 \0 \0 \0 \0 \0
newsgroups.old
n e w s g r o u p s . o l d
newsgroups.old.backup
n e w s g r o u p s . o l d

Notice that newsgroups.old and newsgroups.old.backup are actually the same file name, due to truncation. The too-long name was silently truncated; no error was raised. This has historically been the source of unintended data loss bugs.

The strncpy function was used by the file system to store the file name into the directory entry. This explains one part of the odd behavior of strcpy, namely why it does not null-terminate when the destination fills. The null terminator was implied by the end of the array. (It also explains the silent file name truncation behavior.)

But why null-pad short file names?

Because that makes scanning for file names faster. If you guarantee that all the "garbage bytes" are null, then you can use memcmp to compare them.

For compatibility reasons, the C language committee decided to carry forward this quirky behavior of strncpy.

So what about the title of this entry? How did code that tried to prevent a buffer overflow end up causing one?

Here's one example. (Sadly I don't read Japanese, so I am operating only from the code.) Observe that it uses _tcsncpy to fill the lpstrFile and lpstrFileTitle, being careful not to overflow the buffers. That's great, but it also leaves off the null terminator if the string is too long. The caller may very well copy the result out of that buffer to a second buffer. But the lstrFile buffer lacks a proper null terminator and therefore exceeds the length the caller specified. Result: Second buffer overflows.

Here's another example. Observe that the function uses _tcsncpy to copy the result into the output buffer. This author was mindful of the quirky behavior of the strncpy family of functions and manually slapped a null terminator in at the end of the buffer.

But what if ccTextMax = 0? Then the attempt to force a null terminator dereferences past the beginning of the array and corrupts a random character.

What's the conclusion of all this? Personally, my conclusion is simply to avoid strncpy and all its friends if you are dealing with null-terminated strings. Despite the "str" in the name, these functions do not produce null-terminated strings. They convert a null-terminated string into a raw character buffer. Using them where a null-terminated string is expected as the second buffer is plain wrong. Not only do you fail to get proper null termination if the source is too long, but if the source is short you get unnecessary null padding.


Comments (65)
  1. Fair catch Raymond. Although, I have to say that I’ve never seen cchTextMax = 0.

    /me wanders off to post an update to that code sample…

  2. Anonymous says:

    What about zero-filling the buffer before using strncpy (and friends)?

  3. NoInfo says:

    <blockquote><i>Personally, my conclusion is simply to avoid strncpy and all its friends if you are dealing with null-terminated strings. Despite the "str" in the name, these functions do not produce null-terminated strings. They convert a null-terminated string into a raw character buffer. Using them where a null-terminated string is expected as the second buffer is plain wrong. Not only do you fail to get proper null termination if the source is too long, but if the source is short you get unnecessary null padding.</i></blockquote>

    Easy to say, but what do you encourage instead?

    I think simply testing whether strncpy was able to copy the full amount is sufficient (and then null-terminate it). It’s more work, but that’s security for ya.

    The extra null padding is never going to be a worry. (Or if you’ve been bitten by it, please elaborate.)

  4. Raymond Chen says:

    "What about zero-filling the buffer before using strncpy (and friends)?" Since strncpy always fills the buffer, you can paint the buffer neon yellow before calling strncpy, won’t make any difference. See the diagrams.

    "What do you encourage instead?" Functions that operate on strings rather than buffers. lstrcpyn or the StrSafe.h functions, for example.

  5. Ray Trent says:

    Another example of optimization screwing things up, if you ask me.

    Just allocate a buffer of length N+1, set buff[n]=0, and always pass N to strncpy.

  6. Adrian says:

    Just another example why nearly all of the standard C library should be avoided in production code.

    String functions are unsafe. Formatted input is unsafe. File I/O is clumsy and error prone. Steve Maguire does a great job skewering malloc and friends in Writing Solid Code. Typical implementations of rand() are horrible. setjmp()/longjmp() — yikes! The floating point functions and assert() are about the only bits you can rely on.

    If you’re writing a console application and trying to be portable, then wrap the standard library calls with safer interfaces.

    But if you’re writing for Windows, consider avoiding the CRT altogether. You can save a lot of headaches by using the Windows APIs instead (not to mention have a smaller, faster loading program with fewer DLL Hell headaches and deep understanding of the redistributable agreements.

  7. Anonymous says:

    " Another example of optimization screwing things up, if you ask me.

    Just allocate a buffer of length N+1, set buff[n]=0, and always pass N to strncpy."

    Why would this screw things up?

  8. Anonymous says:

    "Since strncpy always fills the buffer, you can paint the buffer neon yellow before calling strncpy, won’t make any difference. See the diagrams. "

    Strncpy won’t bother bytes beyond the limit you give it.. If you fill it to the end with 0 then your string will be null terminated, and there are no more problems… right?

  9. JamesW says:

    Hmm – my last paragraph could have been written better! I was wondering aloud what goes on inside Windows with regards to strings – not suggesting using the TOP SEKRIT stuff as a good C string library!

  10. One warning about StrSafe.h: While the functions have length parameters and thus make you think about things, they don’t guarantee you’ll get those things right. For example, I’ve run across at least two examples in MSDN that do things like this:

    hres = StringCbCopy(lpszPath, sizeof(lpszPath), szGotPath);

    instead of:

    hres = StringCbCopy(lpszPath, pathSize, szGotPath);

    You might say, "Well, that’s easily fixed," but then you look at the function this appears in and find that lpszPath is a parameter but pathSize is not, and there’s no documented minimum size for the buffer.

  11. Raymond Chen says:

    "If you fill it to the end with 0 then your string will be null terminated, and there are no more problems… right?" If you fill it to the end, strncpy will just fill it with stuff again. See the "Welco" example above. If you want to preserve that last zero you need to pass a shorter buffer size to strncpy.

  12. strlcpy boy says:

    You really want to use strlcpy wherever possible, not strncpy. It has sane semantics. Better yet is to use a language without pointers, such as Ruby. If you’re not writing a kernel, you shouldn’t be using raw C/C++.

  13. Chris Boucher says:

    I always use strlcpy/strlcat now, see: http://www.courtesan.com/todd/papers/strlcpy.html

    Using these instead of strcpy/strcat/strncpy/strncat saves a whole lot of potential grief.

  14. your name says:

    Yeah, another voice here for strlcpy() and strlcat(). Guaranteed to be null-terminated, and with the same function signature as strncpy() so you can switch to it with a quick search/replace.

  15. Anonymous says:

    "If you fill it to the end, strncpy will just fill it with stuff again. See the "Welco" example above. If you want to preserve that last zero you need to pass a shorter buffer size to strncpy."

    Yes… But, if you have the following:

    char buf[6];

    buf[6]=0;

    strncpy(buf,"welcome",5);

    You get

    [w][e][l][c][o][]

    What’s the problem, exactly?

  16. lowercase josh says:

    What’s worse, I’ve seen people recommend using strncat instead, only to get its parameters wrong as well.

    And if all else fails, it’s not hard to write your own string functions that do what you expect.

  17. Enigma2e says:

    What about using lstrcpy() lstrlen() and the other lstr* functions? Do they have the same issues with them?

  18. Anonymous says:

    buf[6]=0;

    D’oh

    buf[5]=0;

  19. Raymond Chen says:

    Anonymous: Yes, but notice that you also changed the size of buf[] from 5 to 6. That’s a very important step.

    Enigma2e: I believe the MSDN documentation for the lstr* functions already explains their behavior adequately.

  20. Tom says:

    Translation of example one:

    >> Could you tell me how to select a handle?

    >

    > Try to dbl click the handle and select OK but CDN_FILEOK comes

    > up. To open the handle it can be selected from CDN_FOLDERCHANGE

    > but as a result of that selection, I cannot close the dialog.

    >

    > In any case I think it is for “file selection”

    Self-Less. (That is direct translation, I don’t understand it.)

    It was possible by this way, however, I do not guarantee anything.

    The following is MFC sample.

  21. orcmid says:

    Yes, I always fancy the pattern of

    char buf[N+1];

    buf[N] = 0′

    … strncpy(buf, src, N);

    and having the reassurance that there is always a null terminator and it is never creamed by an edge case.

    I hadn’t worried about the padding in the past, but I do like the simplicity of predictable values for the entire buffer.

    I can see that

    … strlcpy(buf, src, N+1)

    saves that part and is "smoother" in some sense.&nbsp; When I am dealing with fixed-field parameters, though, I always prefer the strncpy technique anyhow, since the length maximum tends to be a well-known parameter.

    Nice tip.

  22. TheDude says:

    Here’s an example of where this can go wrong… Funny that you posted this today, we were observing a buffer overflow in the code linked to below. This bug shows up when the length of string in the string table was a multiple of 16, which is another bug entirely but, a symptom of misusing/misunderstanding the *ncopy functions in this case.

    http://support.microsoft.com/default.aspx?scid=kb;en-us;200893

  23. Raymond Chen says:

    "… when they can choose from [string classes]"

    Even if you use a string class you still have to worry about buffers when interfacing with other code. Is there a way to convert a std::string to a std::wstring that doesn’t involve buffer manipulation?

  24. Matt Green says:

    string s = "a narrow string";

    std::wstring result;

    result.reserve(s.length());

    std::copy(s.begin(), s.end(), std::back_inserter(result));

  25. Raymond Chen says:

    That doesn’t work if the std::string has any characters above 127 in it. What if it were

    // This is Chinese for "Chinese"

    // using the Big5 character set.

    string s = "xA4xA4xA4xE5";

    and I want the result to be

    wstring ws = L"x4E2Dx6587";

  26. Vorn says:

    Adrian said "File I/O is clumsy and error prone. Steve Maguire does a great job skewering malloc and friends in Writing Solid Code."

    Okay. What should we use instead? I’m sure as hell not writing my own file i/o or dynamic memory allocation, because I have no choice but to do so in assempler.

    Vorn

  27. Matt Green says:

    I figured you’d reply saying something like that. You’d have to call MultiByteToWideStr, which implies you have to access the physical buffer. Wrap it up in a function and add it to the personal toolbox.

    Lack of decent strings is a language problem. But using string classes at least saves you the grief of cleanup and performing common operations without exposing yourself to unnecessary risk. Any decent string class should let you get to the internal string easily.

  28. Vorn: As suggested above, use the Win32 API instead.

    File I/O: http://msdn.microsoft.com/library/?url=/library/en-us/dnanchor/html/filesio.asp?frame=true

    Memory allocation: http://msdn.microsoft.com/library/?url=/library/en-us/dnanchor/html/memoryank.asp?frame=true

    [And of course some other stuff by Microsoft which shall not be discussed here (check subtitle for a hint) ;)]

  29. For REALLY useful wide to narrow character (and vice versa) stuff, look up CA2T, CT2A, CT2W, CA2W, CW2A, CW2T, CA2T, etc etc etc in ATL 7.0.

    Wonderful little wrappers, all pre-done and tested for you.

    If you’re not using WTL and ATL for your Windows development and you’re programming in C++, you should certainly consider taking it for a spin.

  30. Ken Buchanan says:

    The issue is well known to anyone who has read Michael Howard’s secure coding book.

    The normal mitigation is indeed to slap the nul terminator on the end of the destination buffer. The ccTextMax = 0 possibility, while valid, is a pretty exceptional case. Generally it amounts to asking ‘After I copy a string into a zero-length buffer, will I cause problems by nul terminating it?’

    The solution is that, when the length of the destination buffer is supplied to you, you should *always* ensure it is valid. As long as you have basic sanity checks in place, strncpy + explicit nul-termination is fine. Various safe coding libraries have functions and macros to make this easy.

  31. JamesW says:

    @Raymond

    Personally, my conclusion is simply to avoid strncpy and all its friends

    @NoInfo

    Easy to say, but what do you encourage instead?

    I can’t answer for Raymond but I’d rather be using a string class whilst munging and copying strings about rather than raw char*s – take your pick: std::string, CString, NSString(!), … You can always retreive a char array once you’re done messing around and pass that on to APIs that require such things.

    My examples are all for OO language’s but that’s not a requirement – you can find good string libraries in C – looking at ntdll.dll there seems to be a fair few Rtl*** string methods.

  32. Vorn says:

    Win32 API doesn’t work on Slack or Mac. :)

    Vorn

  33. James Schend says:

    The bstring library works in C and C++ and solves all of these issues gracefully:

    http://bstring.sourceforge.net/

    My problem now is that I have a huge project in C that we’re converting to C++, and it has about 30,000 char* strings in it that I’d like to convert into std::string… yet those two datatypes aren’t even close to being compatible with each other. bstring solves some of the problems of char*, but I’d like to move to std:string so that the program is "pure" C++.

  34. That’s why Microsoft’s SSCLI/"Rotor" team created their PAL(Platform Abstraction Layer) [1].

    Or you can of course use #if/#else all over your source code to make it a joy to maintain it :P.

    Platform agnostic frameworks are also helpful :)

    [1] http://dotnet.di.unipi.it/Content/sscli/docs/doxygen/pal/index.html

  35. asdf says:

    There are two things to remember when using strncpy (unless you’re using it to write something like lstrcpy):

    1. If you use strncpy to write it, you have to make sure *everybody* using the buffer knows it may not be a NUL terminated string.

    2. You can’t write code like:

    if (s[n] == ‘r’ && s[n+1] == ‘n’) {}

    because having a NUL terminated string gives you the property of having a 1 character look-ahead. If you wanted to do the above you would have to check if n+1 != length first. Sure it’s obvious here, but not when you’re writing s[n] as *s/s[0] and s[n+1] as *(s+1)/s[1] and advancing the ptr instead of using an index into it.

  36. Matt Green says:

    Why do *application* programmers using C++ futz with the zillion different variations of str* functions when they can choose from std::string, CString, or a custom String class? It boggles my mind.

  37. Vorn – while true, unfortunately, Slack and Mac don’t appear to be too interested in fixing the problems.

  38. Norman Diamond says:

    1/7/2005 8:27 AM Anonymous

    >> " Another example of optimization screwing

    >> things up, if you ask me.

    >>

    >> Just allocate a buffer of length N+1, set

    >> buff[n]=0, and always pass N to strncpy."

    >

    > Why would this screw things up?

    It wouldn’t. The "Just allocate" sentence is a solution to the screwup that was caused by an attempted optimization. (The attempted optimization was saving a byte of memory.)

    The same solution might be in use in code not visible in the Japanese article that Mr. Chen linked to. The visible portion of the code uses a pointer, but we don’t see how much memory was actually allocated when the pointer was assigned.

    The Japanese text in the article has the author first quoting her previously posted question asking for help on how to detect a folder change from a file selection, and then saying that she found the answer in an MFC sample. I wonder if the portion of code not visible in the article might be in the MFC sample. Does anyone recognize that MFC sample, and does it work reliably?

  39. autist0r says:

    To solve this problem I designed a CString equivalent class that allocates the memory on a separate heap (this way, if heap corruption occurs, your strings remain safe, and vice versa). Plus, you get the benefits of all the checks win32 heap management offers (assert(HeapValidate()) is everywhere). It has got the disadvantage (? :p) of being a windows-only class though.

    Otherwise for plain C code or for driver code (where I may not be able to use string safe functions for backward compatibility) I simply use a ripped (^^) OpenBSD’s strlcpy or I do as shown above : I have an extra byte for the NULL terminator (but it’s not very sound, is it ?).

  40. Anthony Wieser says:

    I always wrote it like this

    char dest[N+1];

    strncpy(dest, src, N)[N]=0

    if I wanted it guaranteed to be null terminated

  41. igor1960 says:

    I think that weird discussion you guys are having here is baseless. You probably have alot of free time on your hands.

    Anyway, the reson I consider it baseless is the following:

    And who said that strncpy is "designed to prevent a buffer overflow"? Just because you put that notion into the caption of that entry doesn’t meen it’s true.

    Just my humble opinion: strnpy has nothing to do with "buffer overflow/overrun" protection — it’s just a convenient function for those who know how to use it.

    In everything else I agree with an article — but conclusion is weird. It’s like making the following conclusion:

    "Don’t use "strcpy" an it’s friends because your destination buffer may not be big enough to hold the resultant length of bytes"…

    Looks, strange and obvious isn’t it?

  42. Raymond Chen says:

    And who said that strncpy is "designed to prevent a buffer overflow"?

    You did. I don’t know where you got that phrase from; its first appearance on this page is when you yourself wrote it.

  43. autist0r says:

    The Holy MSDN says, at the strcpy entry :

    Security Note : Because strcpy does not check for sufficient space in strDestination before copying strSource, it is a potential cause of buffer overruns. Consider using strncpy instead.

    I think Raymond was referring to this when writing his article.

  44. Saurabh Jain says:

    Well, help from CRT is underway in Whidbey in from secure CRT (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncode/html/secure03102004.asp). There are *_s (strcpy_s, strncpy_s) versions of the str functions that guarantee null termination. They also invoke error handlers when the parameters are incorrect (not enough buffer, source/destination not being null terminated, and so forth), which allows one to track these error more easily.

  45. Frank says:

    @Anthony:

    strncpy(dest, src, N)[N]=0

    Why not simply:

    strncpy(dest, src, N);

    dest[N]=0;

    Which i.m.o. does exactly the same and doesn’t confuse the code reviewers/maintainers!

    I think people should stop inventing smart C constructions that save 5 characters of source code but make the code less readable. Unless of course when you’re trying to win a "write the shortest C source that accomplishes this task" contest :-)

  46. Beginner says:

    Microsoft recommends: For more secure string handling please use the StrSafe.h functions.

    Well, I agree but, why should I have to download from that activex and badly designed site and install a massive Platform SDK in the first place? Please do not make life difficult for us by placing everything in a massive Platform SDK, which has to be downloaded in a very specific way and installed in a very specific way. I might just want to use a few Windows functions. I might want to download only the documentation, only download a header file like StrSafe.h, I might want to install them in any way I like. Why should I use a Platform SDK installer?

    As things stand, developing for standard C is very easy, just get a free compiler. Learning and developing for Windows is very difficult for a beginner.

  47. Matt Green says:

    If you’re a beginner, why do you want to install something in a specific way?

  48. Wow, did this make my head entirely hurt!

    /scrambles for aspirin.

  49. igor1960 says:

    > You did. I don’t know where you got that phrase from; its first appearance on this page

    We are playing the game of who said what first.

    It doesn’t change the subject. OK, I was wrong.

    But you titled the thread "How can code that tries to prevent a buffer overflow end up causing one?"

    Show me the code you are talking about. From samples you are reffering to it’s not that obvious that strncpy (and derivatives) is used in a segment of the code that prevents from "buffer overflow".

    Where is it? Am I missing something?

    Now, for me strncpy is a convenient function of just copying string array bytes of known length. Nothing else and nothing more. Now, to screw up the buffers — you don’t need to use strncpy, in fact you can use any other str… function and just provide bad pointer and/or not long enough destination. So, what’s the point? Does that mean that we shouldn’t use str… functions.

    I completely agree that it’s better and safer to use data types other then just null terminated bytes array. But if you are using bytes array what’s wrong in having extra function that you could use or not it’s up to you. Now when you are claiming "They convert a null-terminated string into a raw character buffer. Using them where a null-terminated string is expected as the second buffer is plain wrong" — this is not true — they are not: They do it only if you made a mistake while using them.

    But if this is a case then you should make the following conclusion about just strcpy. Something like that: "Don’t use strcpy, because if you pass bad dest pointer to it — your program may just crash"…

    We are talking about part of "C Standard" here as strncpy is in it. For me another advantage is readability of the code that uses strncpy. Meaning, I can send this code to others and will be understood. Also, in majority of cases strncpy if you have to read/debug somebodies code strncpy is very helpfull as it gives you a clue usually about dest buffer length. Now, in your ownd development feel free to use whaever function/method you prefer and sure about, but when you have to share your algorithms to the world — nothing is better then use functions provided by "C Standard" and strncpy is one of them…

  50. Ken Jackson says:

    I appreciate the Unix history. I didn’t know.

    This is the way I always code:

    char buf[100];

    strncpy(buf,src,sizeof(buf));

    buf[ sizeof(buf)-1 ] = 0;

    Now I KNOW that buf is nul-terminated.

    The string it contains may or may not be truncated.

  51. Raymond Chen says:

    The code samples I referenced were using strncpy (or its variants) to ensure that they didn’t overflow a string buffer. But they used it incorrectly and ended up overflowing a buffer when the previous code (that presumably used the unsafe "strcpy" function) didn’t. That’s what I mean by "code that tried to prevent a buffer overflow ends up causing one."

    My point is that unlike the other "str" functions, strncpy does *not* produce null-terminated strings. Do not use it as if it did.

  52. @FRANK

    I write it thus:

    strncpy(dest, src, N)[N]=0

    because dest may be expensive to evaluate, where a I know dest must be returned by the function itself.

    Also, the compiler may be able to optimize the result, because dest is the return value.

    Sure it’s a small optimization, and yes, it may not be immediately obvious, but it does become an idiom, which means I always write it that way, and therefore I never make the error of omission.

    Anthony Wieser

    Wieser Software Ltd

  53. autist0r says:

    I to it this way,

    ::strncpy(dest, src, N);

    ::ZeroMemory(dest, sizeof(dest));

    This way I know my data is secure ! :)

  54. Tim Smith says:

    That is a big trap. Some day somebody will change the code to something like:

    > char *buf = malloc(100);

    > strncpy(buf,src,sizeof(buf));

    > buf[ sizeof(buf)-1 ] = 0;

    If they do, they should be killed. Not only have they introduced a bug into the code, but they have also added a threading sync point to the code. Allocating memory is very expensive compared to adding 100 to ESP. MT heap allocation is expensive.

  55. Tim,

    Allocating memory is cheap when compared to blowing the stack (and thus terminating the application).

    There are situations where it’s appropriate.

  56. Michael J says:

    > char buf[100];

    > strncpy(buf,src,sizeof(buf));

    > buf[ sizeof(buf)-1 ] = 0;

    That is a big trap. Some day somebody will change the code to something like:

    char *buf = malloc(100);

    strncpy(buf,src,sizeof(buf));

    buf[ sizeof(buf)-1 ] = 0;

    and sizeof(buf) becomes "4".

    Better to do something like this:

    const int size=100;

    char buf[size];

    strncpy(buf,src,size);

    buf[ size-1 ] = 0;

    I have always used a wrapper around strncpy to make sure that I never left off the nul.

    char *safecpy(char *pDest, const char *pSrc, size_t len)

    {

    strncpy(pDest, pSrc, len);

    pDest[len-1] = ‘’;

    return pDest;

    }

  57. Paul C. says:

    I just looked over at the functions provided by strsafe.h, and couldn’t help but notice that there was no replacement for any of the scanf functions. Is there anything in the Win32 API off the top of your head that replicates the fscanf functionality safely?

  58. igor1960 says:
    1. Heap "buffer overrun" maybe as dangerouse as static(stack) "buffer overrun":

      2. Another error in your sample is sizeof(buf) is not 100 as you probably ment:

      // That’s better solution:

      char *buf = malloc(100);

      buf[ 99 ] = 0;

      strncpy(buf,src,100);

      if(buf[ 99 ] != 0 )

      {

      // Overflow….

      }

      =====================

      But I still disagree with considering strncpy more dangerous then any other str… functions.

      Using the same token you could say that strcpy is dangerous and don’t use it as src in it maybe not NULL terminated string…

  59. igor1960: Umm.. strcpy is vastly more dangerous than strncpy – the difference is that strncpy allows you to THINK you’ve fixed a security hole when you haven’t.

    Paul C: I don’t believe that there is a safe scanf. I also don’t think it’s possible to write a safe scanf in C (it may be possible in C++, I’m not sure).

    For scanf, you’re better off parsing it yourself, IMHO.

  60. igor1960 says:

    Larry: Umm.. strcpy is vastly more dangerous than strncpy…

    The problem is the guys here are pretty crazy about security — while in reality nothing in the discussions above realy relates to security. In fact the purpose of strncpy is not to protect or decrease probability of buffer overrun at all. It’s just a copying function. How you use it in secure content properly or not is a different issue. But could you please demonstrate me a case of buffer overrun using my above described sample:

    char *buf = malloc(100);

    buf[ 99 ] = 0;

    strncpy(buf,src,100);

    if(buf[ 99 ] != 0 )

    {

    // Overflow….overrun — name it whatever you want

    }

    I think you agree with me that what I think is exactly what it is: security hole is fixed…

  61. Raymond Chen says:

    If "src" is longer than 99 characters, then "buf" has no null terminator. As many people have pointed out, you have to whack a null terminator in manually. If you forget, then you create an illegal string.

    The strange behavior of the strn* functions is not well known; they are the exceptions to the general rule that "str" functions produce null-terminated strings. And strings that aren’t properly terminated lead to read errors at a minimum, and possibly write errors if the string is copied on the assumption that the function properly terminated the buffer.

  62. Paul C. says:

    Larry: Actually, I gave it a little thought. I could use fgets (which takes a size parameter, and null terminates) to safely pull an entire line out of the file, then use sscanf to parse the line. While reading the line, if the line is bigger than my fgets buffer, I’ll know because the line returned by fgets will be the size of the buffer, the character before the null terminator will not be a newline, and I won’t be at EOF, so I’ll know to allocate a bigger buffer and read the rest of the line in (or give up because the line is too big and a giant binary mess :-) ). I now know the max size of the line because I built it with fgets, no string in that line should be bigger than the line itself. So, if all the temp string buffers I pass to sscanf are as large as the string I pass to sscanf to be parsed, I should be safe.

  63. encourage memcpy instead says:

    Because of the unreliable behaviour of strncpy, I always use memcpy instead, which force me to terminate the destination buffer myself (if buf length>0).

    I didnt know of the null-padding which strncpy performs, and it’s always nice when Raymond present such information with an explaination of the underlying reason.

  64. Paul Hsieh says:

    Aha! So this is where all that traffic came from! :)

    I am the author of "The Better String Library" (http://bstring.sf.net/). This discussion about strncpy, strlcpy, etc is exactly what "The Better String Library" (Bstrlib) is all about. The real problem is that in the C language, when you want to work with strings you are forced to think about buffers. This becomes a source of problems, and the interesting weaknesses of strncpy and strlcpy (since both have to decide on some sort of compromise for situtations where the buffer is too small) highlight this problem. The same generic problem exists for the equally important strcat and fgets functions as well of course.

    This lesson has not been lost on the designers of just about every other programming language in existence. The solution is to perform automatic buffer management with every string manipulation. This frees the programmer to think about and deal with strings as just that — strings. This poses the question: can such automatic buffer management be implemented in a library for the C language? Bstrlib is an answer to this question in the affirmative. Bstrlib gives a complete string ADT which has a large superset of the functionality of the standard C library but because it hides all buffer management it avoids all of the typical string related buffer overflow problems. In practice it is truly significantly safer and easier to use than standard ‘’ terminated char * based strings.

    Because I wanted to put Bstrlib forth as a complete substitute for string manipulation needs for C and C++ I also needed to solve a number of other problems:

    1. Portability — its open source (BSD

    license, so even Microsoft can use

    it) and has been verified to compile

    with 16, 32 and 64 bit compilers on

    DOS, Windows and Linux. (I am

    waiting for some feedback from the

    Mac OS X community, though I am

    almost sure that it will compile

    there with no issues on either gcc

    or Metrowerks.)

    2. Speed — benchmarks I’ve written

    indicate that Bstrlib, despite its

    extra implicit buffer management

    code, actually *outperforms* the C

    library most of the time for most

    functions.

    3. C++ API — This is where I got the

    most feedback from other developers

    who are more experienced with C++

    than I. The result is a powerful

    C++ API that is comparable to both

    Microsoft’s MFC CString class as

    well as the STL’s std::string class

    in terms of functionality. In terms

    of performance, Bstrlib’s C++ API

    leaves the competition in the dust.

    4. Compatibility with ‘’ terminated

    char * strings — the fact is that

    other precompiled libraries with

    typically stick to using the

    standard C string concepts.

    http://www.pcre.org is a prime

    example of a very valuable library

    that makes this assumption. So it

    was very important to make sure

    Bstrlib supported backward

    compatibility modes with ‘’

    terminated char * strings very well,

    which it does. Pointer addition

    tricks, for example, are emulated as

    a constant reference substring

    macro. Remember, as inline strings,

    the language itself only supports

    backslash-escape-interpreted symbols

    rammed between a pair of quotes. So

    it doesn’t make sense to break off

    backward compatibility or to make it

    cumbersome.

    5. Good IO support — I implemented a

    fully abstracted file IO interface

    which is integrated with Bstrlib.

    Because of the subtle difference

    between text mode and binary mode

    files, bstrings have to be able to

    hold binary content (specifically

    including the ‘’ character as an

    ordinary character.) Thus Bstrlib’s

    implicit buffer management

    capabilities can be leveraged in its

    IO routines as well. A key example

    of this is the support for an

    arbitrary number of "ungetc"-like

    calls.

    6. Functionality of other languages —

    The primary considerations here was

    to implement split/join,

    insert/delete, find+replace,

    functions as well as write

    protection and complete parameter

    aliasing safety. This functionality

    can be found in other languages such

    as Python or Perl.

    By delivering all this, I believe that Bstrlib is a far superior solution to anything else out there (for string manipulation). I have a comparison table here: http://bstring.sf.net/features.html (though obviously you might find it a little biased.)

    As to specific comments made in this thread, here are my thoughts:

    – As to using Mac OS X/Slackware

    (Linux) obviously Bstrlib is a better

    solution than the Win32 or .NET API.

    Because Bstrlib is a platform neutral

    source library under active

    maintenace, enhancements, bugfixes

    and other updates will be equal for

    everyone — there is no risk of a Mac

    OS X version being "orphaned" because

    of its smaller community size, for

    example.

    – Unicode/widechar support is beyond

    the scope of what Bstrlib solves.

    However, since Bstrlib is very

    interoperable with char * based

    strings, the conversion mechanisms

    for existing Unicode/widechar string

    library functionality should still be

    able to work with Bstrlib.

    – Strsafe was a legitimate attempt by

    Microsoft to improve the safety of

    char * strings. But since it makes

    no attempt to improve the upon the

    functionality of the C library and

    still leaves the problem of precise

    buffer management in the hands of the

    programmer, it falls far short of bar

    set by Bstrlib. For C++ users MFC’s

    CString is a far superior solution

    which in turn is inferior to

    Bstrlib’s C++ API.

    – scanf is a calamity of design errors.

    Its fairly rare to desire string

    input delimited by spaces. Simply

    breaking down scanf into the

    functions fgets then sscanf is far

    superior, and the only mechanism I

    would recommend for basic parsing of

    strings using the C library

    facilities. Of course you can use

    Bstrlib’s IO to read input even more

    safely, than parse things out with

    sscanf.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index