If you read your language specification, you'll find that the ...ncpy
functions have extremely strange semantics.
The strncpy function copies the initial count characters of strSource to strDest and returns strDest. If count is less than or equal to the length of strSource, a null character is not appended automatically to the copied string. If count is greater than the length of strSource, the destination string is padded with null characters up to length count.
In pictures, here's what happens in various string copying scenarios.
strncpy(strDest, strSrc, 5) |
strSource |
|
strDest |
|
observe no null terminator |
|
strncpy(strDest, strSrc, 5) |
strSource |
|
strDest |
|
observe no null terminator |
|
strncpy(strDest, strSrc, 5) |
strSource |
|
strDest |
|
observe null padding to end of strDest |
Why do these functions have such strange behavior?
Go back to the early days of UNIX. Personally, I only go back as far as System V. In System V, file names could be up to 14 characters long. Anything longer was truncated to 14. And the field for storing the file name was exactly 14 characters. Not 15. The null terminator was implied. This saved one byte.
Here are some file names and their corresponding directory entries:
passwd |
p |
a |
s |
s |
w |
d |
\0 |
\0 |
\0 |
\0 |
\0 |
\0 |
\0 |
\0 |
|
newsgroups.old |
n |
e |
w |
s |
g |
r |
o |
u |
p |
s |
. |
o |
l |
d |
|
newsgroups.old.backup |
n |
e |
w |
s |
g |
r |
o |
u |
p |
s |
. |
o |
l |
d |
|
Notice that newsgroups.old
and newsgroups.old.backup
are actually the same file name, due to truncation. The too-long name was silently truncated; no error was raised. This has historically been the source of unintended data loss bugs.
The strncpy
function was used by the file system to store the file name into the directory entry. This explains one part of the odd behavior of strcpy
, namely why it does not null-terminate when the destination fills. The null terminator was implied by the end of the array. (It also explains the silent file name truncation behavior.)
But why null-pad short file names?
Because that makes scanning for file names faster. If you guarantee that all the "garbage bytes" are null, then you can use memcmp
to compare them.
For compatibility reasons, the C language committee decided to carry forward this quirky behavior of strncpy
.
So what about the title of this entry? How did code that tried to prevent a buffer overflow end up causing one?
Here's one example. (Sadly I don't read Japanese, so I am operating only from the code.) Observe that it uses _tcsncpy
to fill the lpstrFile
and lpstrFileTitle
, being careful not to overflow the buffers. That's great, but it also leaves off the null terminator if the string is too long. The caller may very well copy the result out of that buffer to a second buffer. But the lstrFile
buffer lacks a proper null terminator and therefore exceeds the length the caller specified. Result: Second buffer overflows.
Here's another example. Observe that the function uses _tcsncpy
to copy the result into the output buffer. This author was mindful of the quirky behavior of the strncpy
family of functions and manually slapped a null terminator in at the end of the buffer.
But what if ccTextMax
= 0? Then the attempt to force a null terminator dereferences past the beginning of the array and corrupts a random character.
What's the conclusion of all this? Personally, my conclusion is simply to avoid strncpy
and all its friends if you are dealing with null-terminated strings. Despite the "str" in the name, these functions do not produce null-terminated strings. They convert a null-terminated string into a raw character buffer. Using them where a null-terminated string is expected as the second buffer is plain wrong. Not only do you fail to get proper null termination if the source is too long, but if the source is short you get unnecessary null padding.
Fair catch Raymond. Although, I have to say that I’ve never seen cchTextMax = 0.
/me wanders off to post an update to that code sample…
What about zero-filling the buffer before using strncpy (and friends)?
<blockquote><i>Personally, my conclusion is simply to avoid strncpy and all its friends if you are dealing with null-terminated strings. Despite the "str" in the name, these functions do not produce null-terminated strings. They convert a null-terminated string into a raw character buffer. Using them where a null-terminated string is expected as the second buffer is plain wrong. Not only do you fail to get proper null termination if the source is too long, but if the source is short you get unnecessary null padding.</i></blockquote>
Easy to say, but what do you encourage instead?
I think simply testing whether strncpy was able to copy the full amount is sufficient (and then null-terminate it). It’s more work, but that’s security for ya.
The extra null padding is never going to be a worry. (Or if you’ve been bitten by it, please elaborate.)
"What about zero-filling the buffer before using strncpy (and friends)?" Since strncpy always fills the buffer, you can paint the buffer neon yellow before calling strncpy, won’t make any difference. See the diagrams.
"What do you encourage instead?" Functions that operate on strings rather than buffers. lstrcpyn or the StrSafe.h functions, for example.
Another example of optimization screwing things up, if you ask me.
Just allocate a buffer of length N+1, set buff[n]=0, and always pass N to strncpy.
Just another example why nearly all of the standard C library should be avoided in production code.
String functions are unsafe. Formatted input is unsafe. File I/O is clumsy and error prone. Steve Maguire does a great job skewering malloc and friends in Writing Solid Code. Typical implementations of rand() are horrible. setjmp()/longjmp() — yikes! The floating point functions and assert() are about the only bits you can rely on.
If you’re writing a console application and trying to be portable, then wrap the standard library calls with safer interfaces.
But if you’re writing for Windows, consider avoiding the CRT altogether. You can save a lot of headaches by using the Windows APIs instead (not to mention have a smaller, faster loading program with fewer DLL Hell headaches and deep understanding of the redistributable agreements.
" Another example of optimization screwing things up, if you ask me.
Just allocate a buffer of length N+1, set buff[n]=0, and always pass N to strncpy."
Why would this screw things up?
"Since strncpy always fills the buffer, you can paint the buffer neon yellow before calling strncpy, won’t make any difference. See the diagrams. "
Strncpy won’t bother bytes beyond the limit you give it.. If you fill it to the end with 0 then your string will be null terminated, and there are no more problems… right?
Hmm – my last paragraph could have been written better! I was wondering aloud what goes on inside Windows with regards to strings – not suggesting using the TOP SEKRIT stuff as a good C string library!
One warning about StrSafe.h: While the functions have length parameters and thus make you think about things, they don’t guarantee you’ll get those things right. For example, I’ve run across at least two examples in MSDN that do things like this:
hres = StringCbCopy(lpszPath, sizeof(lpszPath), szGotPath);
instead of:
hres = StringCbCopy(lpszPath, pathSize, szGotPath);
You might say, "Well, that’s easily fixed," but then you look at the function this appears in and find that lpszPath is a parameter but pathSize is not, and there’s no documented minimum size for the buffer.
"If you fill it to the end with 0 then your string will be null terminated, and there are no more problems… right?" If you fill it to the end, strncpy will just fill it with stuff again. See the "Welco" example above. If you want to preserve that last zero you need to pass a shorter buffer size to strncpy.
You really want to use strlcpy wherever possible, not strncpy. It has sane semantics. Better yet is to use a language without pointers, such as Ruby. If you’re not writing a kernel, you shouldn’t be using raw C/C++.
I always use strlcpy/strlcat now, see: http://www.courtesan.com/todd/papers/strlcpy.html
Using these instead of strcpy/strcat/strncpy/strncat saves a whole lot of potential grief.
Yeah, another voice here for strlcpy() and strlcat(). Guaranteed to be null-terminated, and with the same function signature as strncpy() so you can switch to it with a quick search/replace.
"If you fill it to the end, strncpy will just fill it with stuff again. See the "Welco" example above. If you want to preserve that last zero you need to pass a shorter buffer size to strncpy."
Yes… But, if you have the following:
char buf[6];
buf[6]=0;
strncpy(buf,"welcome",5);
You get
[w][e][l][c][o][ ]
What’s the problem, exactly?
What’s worse, I’ve seen people recommend using strncat instead, only to get its parameters wrong as well.
And if all else fails, it’s not hard to write your own string functions that do what you expect.
What about using lstrcpy() lstrlen() and the other lstr* functions? Do they have the same issues with them?
buf[6]=0;
D’oh
buf[5]=0;
Anonymous: Yes, but notice that you also changed the size of buf[] from 5 to 6. That’s a very important step.
Enigma2e: I believe the MSDN documentation for the lstr* functions already explains their behavior adequately.
Translation of example one:
>> Could you tell me how to select a handle?
>
> Try to dbl click the handle and select OK but CDN_FILEOK comes
> up. To open the handle it can be selected from CDN_FOLDERCHANGE
> but as a result of that selection, I cannot close the dialog.
>
> In any case I think it is for “file selection”
Self-Less. (That is direct translation, I don’t understand it.)
It was possible by this way, however, I do not guarantee anything.
The following is MFC sample.
Yes, I always fancy the pattern of
char buf[N+1];
buf[N] = 0′
… strncpy(buf, src, N);
and having the reassurance that there is always a null terminator and it is never creamed by an edge case.
I hadn’t worried about the padding in the past, but I do like the simplicity of predictable values for the entire buffer.
I can see that
… strlcpy(buf, src, N+1)
saves that part and is "smoother" in some sense. When I am dealing with fixed-field parameters, though, I always prefer the strncpy technique anyhow, since the length maximum tends to be a well-known parameter.
Nice tip.
Another look at str* gotchas here: http://blogs.msdn.com/michael_howard/archive/2004/12/10/279639.aspx
Here’s an example of where this can go wrong… Funny that you posted this today, we were observing a buffer overflow in the code linked to below. This bug shows up when the length of string in the string table was a multiple of 16, which is another bug entirely but, a symptom of misusing/misunderstanding the *ncopy functions in this case.
http://support.microsoft.com/default.aspx?scid=kb;en-us;200893
"… when they can choose from [string classes]"
Even if you use a string class you still have to worry about buffers when interfacing with other code. Is there a way to convert a std::string to a std::wstring that doesn’t involve buffer manipulation?
string s = "a narrow string";
std::wstring result;
result.reserve(s.length());
std::copy(s.begin(), s.end(), std::back_inserter(result));
That doesn’t work if the std::string has any characters above 127 in it. What if it were
// This is Chinese for "Chinese"
// using the Big5 character set.
string s = "xA4xA4xA4xE5";
and I want the result to be
wstring ws = L"x4E2Dx6587";
Adrian said "File I/O is clumsy and error prone. Steve Maguire does a great job skewering malloc and friends in Writing Solid Code."
Okay. What should we use instead? I’m sure as hell not writing my own file i/o or dynamic memory allocation, because I have no choice but to do so in assempler.
Vorn
I figured you’d reply saying something like that. You’d have to call MultiByteToWideStr, which implies you have to access the physical buffer. Wrap it up in a function and add it to the personal toolbox.
Lack of decent strings is a language problem. But using string classes at least saves you the grief of cleanup and performing common operations without exposing yourself to unnecessary risk. Any decent string class should let you get to the internal string easily.
Vorn: As suggested above, use the Win32 API instead.
File I/O: http://msdn.microsoft.com/library/?url=/library/en-us/dnanchor/html/filesio.asp?frame=true
Memory allocation: http://msdn.microsoft.com/library/?url=/library/en-us/dnanchor/html/memoryank.asp?frame=true
[And of course some other stuff by Microsoft which shall not be discussed here (check subtitle for a hint) ;)]
For REALLY useful wide to narrow character (and vice versa) stuff, look up CA2T, CT2A, CT2W, CA2W, CW2A, CW2T, CA2T, etc etc etc in ATL 7.0.
Wonderful little wrappers, all pre-done and tested for you.
If you’re not using WTL and ATL for your Windows development and you’re programming in C++, you should certainly consider taking it for a spin.
The issue is well known to anyone who has read Michael Howard’s secure coding book.
The normal mitigation is indeed to slap the nul terminator on the end of the destination buffer. The ccTextMax = 0 possibility, while valid, is a pretty exceptional case. Generally it amounts to asking ‘After I copy a string into a zero-length buffer, will I cause problems by nul terminating it?’
The solution is that, when the length of the destination buffer is supplied to you, you should *always* ensure it is valid. As long as you have basic sanity checks in place, strncpy + explicit nul-termination is fine. Various safe coding libraries have functions and macros to make this easy.
@Raymond
Personally, my conclusion is simply to avoid strncpy and all its friends
@NoInfo
Easy to say, but what do you encourage instead?
I can’t answer for Raymond but I’d rather be using a string class whilst munging and copying strings about rather than raw char*s – take your pick: std::string, CString, NSString(!), … You can always retreive a char array once you’re done messing around and pass that on to APIs that require such things.
My examples are all for OO language’s but that’s not a requirement – you can find good string libraries in C – looking at ntdll.dll there seems to be a fair few Rtl*** string methods.
Win32 API doesn’t work on Slack or Mac. :)
Vorn
The bstring library works in C and C++ and solves all of these issues gracefully:
http://bstring.sourceforge.net/
My problem now is that I have a huge project in C that we’re converting to C++, and it has about 30,000 char* strings in it that I’d like to convert into std::string… yet those two datatypes aren’t even close to being compatible with each other. bstring solves some of the problems of char*, but I’d like to move to std:string so that the program is "pure" C++.
That’s why Microsoft’s SSCLI/"Rotor" team created their PAL(Platform Abstraction Layer) [1].
Or you can of course use #if/#else all over your source code to make it a joy to maintain it :P.
Platform agnostic frameworks are also helpful :)
[1] http://dotnet.di.unipi.it/Content/sscli/docs/doxygen/pal/index.html
There are two things to remember when using strncpy (unless you’re using it to write something like lstrcpy):
1. If you use strncpy to write it, you have to make sure *everybody* using the buffer knows it may not be a NUL terminated string.
2. You can’t write code like:
if (s[n] == ‘r’ && s[n+1] == ‘n’) {}
because having a NUL terminated string gives you the property of having a 1 character look-ahead. If you wanted to do the above you would have to check if n+1 != length first. Sure it’s obvious here, but not when you’re writing s[n] as *s/s[0] and s[n+1] as *(s+1)/s[1] and advancing the ptr instead of using an index into it.
Why do *application* programmers using C++ futz with the zillion different variations of str* functions when they can choose from std::string, CString, or a custom String class? It boggles my mind.
Vorn – while true, unfortunately, Slack and Mac don’t appear to be too interested in fixing the problems.
1/7/2005 8:27 AM Anonymous
>> " Another example of optimization screwing
>> things up, if you ask me.
>>
>> Just allocate a buffer of length N+1, set
>> buff[n]=0, and always pass N to strncpy."
>
> Why would this screw things up?
It wouldn’t. The "Just allocate" sentence is a solution to the screwup that was caused by an attempted optimization. (The attempted optimization was saving a byte of memory.)
The same solution might be in use in code not visible in the Japanese article that Mr. Chen linked to. The visible portion of the code uses a pointer, but we don’t see how much memory was actually allocated when the pointer was assigned.
The Japanese text in the article has the author first quoting her previously posted question asking for help on how to detect a folder change from a file selection, and then saying that she found the answer in an MFC sample. I wonder if the portion of code not visible in the article might be in the MFC sample. Does anyone recognize that MFC sample, and does it work reliably?
To solve this problem I designed a CString equivalent class that allocates the memory on a separate heap (this way, if heap corruption occurs, your strings remain safe, and vice versa). Plus, you get the benefits of all the checks win32 heap management offers (assert(HeapValidate()) is everywhere). It has got the disadvantage (? :p) of being a windows-only class though.
Otherwise for plain C code or for driver code (where I may not be able to use string safe functions for backward compatibility) I simply use a ripped (^^) OpenBSD’s strlcpy or I do as shown above : I have an extra byte for the NULL terminator (but it’s not very sound, is it ?).
I always wrote it like this
char dest[N+1];
strncpy(dest, src, N)[N]=0
if I wanted it guaranteed to be null terminated
I think that weird discussion you guys are having here is baseless. You probably have alot of free time on your hands.
Anyway, the reson I consider it baseless is the following:
And who said that strncpy is "designed to prevent a buffer overflow"? Just because you put that notion into the caption of that entry doesn’t meen it’s true.
Just my humble opinion: strnpy has nothing to do with "buffer overflow/overrun" protection — it’s just a convenient function for those who know how to use it.
In everything else I agree with an article — but conclusion is weird. It’s like making the following conclusion:
"Don’t use "strcpy" an it’s friends because your destination buffer may not be big enough to hold the resultant length of bytes"…
Looks, strange and obvious isn’t it?
The Holy MSDN says, at the strcpy entry :
Security Note : Because strcpy does not check for sufficient space in strDestination before copying strSource, it is a potential cause of buffer overruns. Consider using strncpy instead.
I think Raymond was referring to this when writing his article.
Well, help from CRT is underway in Whidbey in from secure CRT (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncode/html/secure03102004.asp). There are *_s (strcpy_s, strncpy_s) versions of the str functions that guarantee null termination. They also invoke error handlers when the parameters are incorrect (not enough buffer, source/destination not being null terminated, and so forth), which allows one to track these error more easily.
@Anthony:
strncpy(dest, src, N)[N]=0
Why not simply:
strncpy(dest, src, N);
dest[N]=0;
Which i.m.o. does exactly the same and doesn’t confuse the code reviewers/maintainers!
I think people should stop inventing smart C constructions that save 5 characters of source code but make the code less readable. Unless of course when you’re trying to win a "write the shortest C source that accomplishes this task" contest :-)
Microsoft recommends: For more secure string handling please use the StrSafe.h functions.
Well, I agree but, why should I have to download from that activex and badly designed site and install a massive Platform SDK in the first place? Please do not make life difficult for us by placing everything in a massive Platform SDK, which has to be downloaded in a very specific way and installed in a very specific way. I might just want to use a few Windows functions. I might want to download only the documentation, only download a header file like StrSafe.h, I might want to install them in any way I like. Why should I use a Platform SDK installer?
As things stand, developing for standard C is very easy, just get a free compiler. Learning and developing for Windows is very difficult for a beginner.
If you’re a beginner, why do you want to install something in a specific way?
Wow, did this make my head entirely hurt!
/scrambles for aspirin.
I appreciate the Unix history. I didn’t know.
This is the way I always code:
char buf[100];
strncpy(buf,src,sizeof(buf));
buf[ sizeof(buf)-1 ] = 0;
Now I KNOW that buf is nul-terminated.
The string it contains may or may not be truncated.
The code samples I referenced were using strncpy (or its variants) to ensure that they didn’t overflow a string buffer. But they used it incorrectly and ended up overflowing a buffer when the previous code (that presumably used the unsafe "strcpy" function) didn’t. That’s what I mean by "code that tried to prevent a buffer overflow ends up causing one."
My point is that unlike the other "str" functions, strncpy does *not* produce null-terminated strings. Do not use it as if it did.
@FRANK
I write it thus:
strncpy(dest, src, N)[N]=0
because dest may be expensive to evaluate, where a I know dest must be returned by the function itself.
Also, the compiler may be able to optimize the result, because dest is the return value.
Sure it’s a small optimization, and yes, it may not be immediately obvious, but it does become an idiom, which means I always write it that way, and therefore I never make the error of omission.
Anthony Wieser
Wieser Software Ltd
I to it this way,
::strncpy(dest, src, N);
::ZeroMemory(dest, sizeof(dest));
This way I know my data is secure ! :)
Tim,
Allocating memory is cheap when compared to blowing the stack (and thus terminating the application).
There are situations where it’s appropriate.
> char buf[100];
> strncpy(buf,src,sizeof(buf));
> buf[ sizeof(buf)-1 ] = 0;
That is a big trap. Some day somebody will change the code to something like:
char *buf = malloc(100);
strncpy(buf,src,sizeof(buf));
buf[ sizeof(buf)-1 ] = 0;
and sizeof(buf) becomes "4".
Better to do something like this:
const int size=100;
char buf[size];
strncpy(buf,src,size);
buf[ size-1 ] = 0;
I have always used a wrapper around strncpy to make sure that I never left off the nul.
char *safecpy(char *pDest, const char *pSrc, size_t len)
{
strncpy(pDest, pSrc, len);
pDest[len-1] = ‘