How can code that tries to prevent a buffer overflow end up causing one?

Comments (65)

Roger Lipscombe says:

January 7, 2005 at 7:12 am

Fair catch Raymond. Although, I have to say that I’ve never seen cchTextMax = 0.

/me wanders off to post an update to that code sample…
Anonymous says:

January 7, 2005 at 7:57 am

What about zero-filling the buffer before using strncpy (and friends)?
NoInfo says:

January 7, 2005 at 7:59 am

<blockquote><i>Personally, my conclusion is simply to avoid strncpy and all its friends if you are dealing with null-terminated strings. Despite the "str" in the name, these functions do not produce null-terminated strings. They convert a null-terminated string into a raw character buffer. Using them where a null-terminated string is expected as the second buffer is plain wrong. Not only do you fail to get proper null termination if the source is too long, but if the source is short you get unnecessary null padding.</i></blockquote>

Easy to say, but what do you encourage instead?

I think simply testing whether strncpy was able to copy the full amount is sufficient (and then null-terminate it). It’s more work, but that’s security for ya.

The extra null padding is never going to be a worry. (Or if you’ve been bitten by it, please elaborate.)
Raymond Chen says:

January 7, 2005 at 8:14 am

"What about zero-filling the buffer before using strncpy (and friends)?" Since strncpy always fills the buffer, you can paint the buffer neon yellow before calling strncpy, won’t make any difference. See the diagrams.

"What do you encourage instead?" Functions that operate on strings rather than buffers. lstrcpyn or the StrSafe.h functions, for example.
Ray Trent says:

January 7, 2005 at 8:22 am

Another example of optimization screwing things up, if you ask me.

Just allocate a buffer of length N+1, set buff[n]=0, and always pass N to strncpy.
Adrian says:

January 7, 2005 at 8:25 am

Just another example why nearly all of the standard C library should be avoided in production code.

String functions are unsafe. Formatted input is unsafe. File I/O is clumsy and error prone. Steve Maguire does a great job skewering malloc and friends in Writing Solid Code. Typical implementations of rand() are horrible. setjmp()/longjmp() — yikes! The floating point functions and assert() are about the only bits you can rely on.

If you’re writing a console application and trying to be portable, then wrap the standard library calls with safer interfaces.

But if you’re writing for Windows, consider avoiding the CRT altogether. You can save a lot of headaches by using the Windows APIs instead (not to mention have a smaller, faster loading program with fewer DLL Hell headaches and deep understanding of the redistributable agreements.
Anonymous says:

January 7, 2005 at 8:27 am

" Another example of optimization screwing things up, if you ask me.

Just allocate a buffer of length N+1, set buff[n]=0, and always pass N to strncpy."

Why would this screw things up?
Anonymous says:

January 7, 2005 at 8:31 am

"Since strncpy always fills the buffer, you can paint the buffer neon yellow before calling strncpy, won’t make any difference. See the diagrams. "

Strncpy won’t bother bytes beyond the limit you give it.. If you fill it to the end with 0 then your string will be null terminated, and there are no more problems… right?
JamesW says:

January 7, 2005 at 8:31 am

Hmm – my last paragraph could have been written better! I was wondering aloud what goes on inside Windows with regards to strings – not suggesting using the TOP SEKRIT stuff as a good C string library!
Doug Harrison says:

January 7, 2005 at 8:39 am

One warning about StrSafe.h: While the functions have length parameters and thus make you think about things, they don’t guarantee you’ll get those things right. For example, I’ve run across at least two examples in MSDN that do things like this:

hres = StringCbCopy(lpszPath, sizeof(lpszPath), szGotPath);

instead of:

hres = StringCbCopy(lpszPath, pathSize, szGotPath);

You might say, "Well, that’s easily fixed," but then you look at the function this appears in and find that lpszPath is a parameter but pathSize is not, and there’s no documented minimum size for the buffer.
Raymond Chen says:

January 7, 2005 at 8:41 am

"If you fill it to the end with 0 then your string will be null terminated, and there are no more problems… right?" If you fill it to the end, strncpy will just fill it with stuff again. See the "Welco" example above. If you want to preserve that last zero you need to pass a shorter buffer size to strncpy.
strlcpy boy says:

January 7, 2005 at 8:45 am

You really want to use strlcpy wherever possible, not strncpy. It has sane semantics. Better yet is to use a language without pointers, such as Ruby. If you’re not writing a kernel, you shouldn’t be using raw C/C++.
Chris Boucher says:

January 7, 2005 at 8:46 am

I always use strlcpy/strlcat now, see: http://www.courtesan.com/todd/papers/strlcpy.html

Using these instead of strcpy/strcat/strncpy/strncat saves a whole lot of potential grief.
your name says:

January 7, 2005 at 9:23 am

Yeah, another voice here for strlcpy() and strlcat(). Guaranteed to be null-terminated, and with the same function signature as strncpy() so you can switch to it with a quick search/replace.
Anonymous says:

January 7, 2005 at 9:46 am

"If you fill it to the end, strncpy will just fill it with stuff again. See the "Welco" example above. If you want to preserve that last zero you need to pass a shorter buffer size to strncpy."

Yes… But, if you have the following:

char buf[6];

buf[6]=0;

strncpy(buf,"welcome",5);

You get

[w][e][l][c][o][]

What’s the problem, exactly?
lowercase josh says:

January 7, 2005 at 9:46 am

What’s worse, I’ve seen people recommend using strncat instead, only to get its parameters wrong as well.

And if all else fails, it’s not hard to write your own string functions that do what you expect.
Enigma2e says:

January 7, 2005 at 9:46 am

What about using lstrcpy() lstrlen() and the other lstr* functions? Do they have the same issues with them?
Anonymous says:

January 7, 2005 at 9:49 am

buf[6]=0;

D’oh

buf[5]=0;
Raymond Chen says:

January 7, 2005 at 9:51 am

Anonymous: Yes, but notice that you also changed the size of buf[] from 5 to 6. That’s a very important step.

Enigma2e: I believe the MSDN documentation for the lstr* functions already explains their behavior adequately.
Tom says:

January 7, 2005 at 10:11 am

Translation of example one:

>> Could you tell me how to select a handle?

>

> Try to dbl click the handle and select OK but CDN_FILEOK comes

> up. To open the handle it can be selected from CDN_FOLDERCHANGE

> but as a result of that selection, I cannot close the dialog.

>

> In any case I think it is for “file selection”

Self-Less. (That is direct translation, I don’t understand it.)

It was possible by this way, however, I do not guarantee anything.

The following is MFC sample.
orcmid says:

January 7, 2005 at 10:27 am

Yes, I always fancy the pattern of

char buf[N+1];

buf[N] = 0′

… strncpy(buf, src, N);

and having the reassurance that there is always a null terminator and it is never creamed by an edge case.

I hadn’t worried about the padding in the past, but I do like the simplicity of predictable values for the entire buffer.

I can see that

… strlcpy(buf, src, N+1)

saves that part and is "smoother" in some sense.  When I am dealing with fixed-field parameters, though, I always prefer the strncpy technique anyhow, since the length maximum tends to be a well-known parameter.

Nice tip.
tomw says:

January 7, 2005 at 10:37 am

Another look at str* gotchas here: http://blogs.msdn.com/michael_howard/archive/2004/12/10/279639.aspx
TheDude says:

January 7, 2005 at 12:00 pm

Here’s an example of where this can go wrong… Funny that you posted this today, we were observing a buffer overflow in the code linked to below. This bug shows up when the length of string in the string table was a multiple of 16, which is another bug entirely but, a symptom of misusing/misunderstanding the *ncopy functions in this case.

http://support.microsoft.com/default.aspx?scid=kb;en-us;200893
Raymond Chen says:

January 7, 2005 at 12:10 pm

"… when they can choose from [string classes]"

Even if you use a string class you still have to worry about buffers when interfacing with other code. Is there a way to convert a std::string to a std::wstring that doesn’t involve buffer manipulation?
Matt Green says:

January 7, 2005 at 12:24 pm

string s = "a narrow string";

std::wstring result;

result.reserve(s.length());

std::copy(s.begin(), s.end(), std::back_inserter(result));
Raymond Chen says:

January 7, 2005 at 12:41 pm

That doesn’t work if the std::string has any characters above 127 in it. What if it were

// This is Chinese for "Chinese"

// using the Big5 character set.

string s = "xA4xA4xA4xE5";

and I want the result to be

wstring ws = L"x4E2Dx6587";
Vorn says:

January 7, 2005 at 1:08 pm

Adrian said "File I/O is clumsy and error prone. Steve Maguire does a great job skewering malloc and friends in Writing Solid Code."

Okay. What should we use instead? I’m sure as hell not writing my own file i/o or dynamic memory allocation, because I have no choice but to do so in assempler.

Vorn
Matt Green says:

January 7, 2005 at 1:41 pm

I figured you’d reply saying something like that. You’d have to call MultiByteToWideStr, which implies you have to access the physical buffer. Wrap it up in a function and add it to the personal toolbox.

Lack of decent strings is a language problem. But using string classes at least saves you the grief of cleanup and performing common operations without exposing yourself to unnecessary risk. Any decent string class should let you get to the internal string easily.
Andreas Häber says:

January 7, 2005 at 1:41 pm

Vorn: As suggested above, use the Win32 API instead.

File I/O: http://msdn.microsoft.com/library/?url=/library/en-us/dnanchor/html/filesio.asp?frame=true

Memory allocation: http://msdn.microsoft.com/library/?url=/library/en-us/dnanchor/html/memoryank.asp?frame=true

[And of course some other stuff by Microsoft which shall not be discussed here (check subtitle for a hint) ;)]
Simon Cooke [exMSFT] says:

January 7, 2005 at 3:42 pm

For REALLY useful wide to narrow character (and vice versa) stuff, look up CA2T, CT2A, CT2W, CA2W, CW2A, CW2T, CA2T, etc etc etc in ATL 7.0.

Wonderful little wrappers, all pre-done and tested for you.

If you’re not using WTL and ATL for your Windows development and you’re programming in C++, you should certainly consider taking it for a spin.
Ken Buchanan says:

January 7, 2005 at 7:59 am

The issue is well known to anyone who has read Michael Howard’s secure coding book.

The normal mitigation is indeed to slap the nul terminator on the end of the destination buffer. The ccTextMax = 0 possibility, while valid, is a pretty exceptional case. Generally it amounts to asking ‘After I copy a string into a zero-length buffer, will I cause problems by nul terminating it?’

The solution is that, when the length of the destination buffer is supplied to you, you should *always* ensure it is valid. As long as you have basic sanity checks in place, strncpy + explicit nul-termination is fine. Various safe coding libraries have functions and macros to make this easy.
JamesW says:

January 7, 2005 at 8:25 am

@Raymond

Personally, my conclusion is simply to avoid strncpy and all its friends

@NoInfo

Easy to say, but what do you encourage instead?

I can’t answer for Raymond but I’d rather be using a string class whilst munging and copying strings about rather than raw char*s – take your pick: std::string, CString, NSString(!), … You can always retreive a char array once you’re done messing around and pass that on to APIs that require such things.

My examples are all for OO language’s but that’s not a requirement – you can find good string libraries in C – looking at ntdll.dll there seems to be a fair few Rtl*** string methods.
Vorn says:

January 7, 2005 at 4:51 pm

Win32 API doesn’t work on Slack or Mac. :)

Vorn
James Schend says:

January 7, 2005 at 9:05 am

The bstring library works in C and C++ and solves all of these issues gracefully:

http://bstring.sourceforge.net/

My problem now is that I have a huge project in C that we’re converting to C++, and it has about 30,000 char* strings in it that I’d like to convert into std::string… yet those two datatypes aren’t even close to being compatible with each other. bstring solves some of the problems of char*, but I’d like to move to std:string so that the program is "pure" C++.
Andreas Häber says:

January 7, 2005 at 5:43 pm

That’s why Microsoft’s SSCLI/"Rotor" team created their PAL(Platform Abstraction Layer) [1].

Or you can of course use #if/#else all over your source code to make it a joy to maintain it :P.

Platform agnostic frameworks are also helpful :)

[1] http://dotnet.di.unipi.it/Content/sscli/docs/doxygen/pal/index.html
asdf says:

January 7, 2005 at 10:45 am

There are two things to remember when using strncpy (unless you’re using it to write something like lstrcpy):

1. If you use strncpy to write it, you have to make sure *everybody* using the buffer knows it may not be a NUL terminated string.

2. You can’t write code like:

if (s[n] == ‘r’ && s[n+1] == ‘n’) {}

because having a NUL terminated string gives you the property of having a 1 character look-ahead. If you wanted to do the above you would have to check if n+1 != length first. Sure it’s obvious here, but not when you’re writing s[n] as *s/s[0] and s[n+1] as *(s+1)/s[1] and advancing the ptr instead of using an index into it.
Matt Green says:

January 7, 2005 at 11:27 am

Why do *application* programmers using C++ futz with the zillion different variations of str* functions when they can choose from std::string, CString, or a custom String class? It boggles my mind.
Simon Cooke [exMSFT] says:

January 7, 2005 at 9:05 pm

Vorn – while true, unfortunately, Slack and Mac don’t appear to be too interested in fixing the problems.
Norman Diamond says:

January 7, 2005 at 11:07 pm

1/7/2005 8:27 AM Anonymous

>> " Another example of optimization screwing

>> things up, if you ask me.

>>

>> Just allocate a buffer of length N+1, set

>> buff[n]=0, and always pass N to strncpy."

>

> Why would this screw things up?

It wouldn’t. The "Just allocate" sentence is a solution to the screwup that was caused by an attempted optimization. (The attempted optimization was saving a byte of memory.)

The same solution might be in use in code not visible in the Japanese article that Mr. Chen linked to. The visible portion of the code uses a pointer, but we don’t see how much memory was actually allocated when the pointer was assigned.

The Japanese text in the article has the author first quoting her previously posted question asking for help on how to detect a folder change from a file selection, and then saying that she found the answer in an MFC sample. I wonder if the portion of code not visible in the article might be in the MFC sample. Does anyone recognize that MFC sample, and does it work reliably?
autist0r says:

January 8, 2005 at 12:35 am

To solve this problem I designed a CString equivalent class that allocates the memory on a separate heap (this way, if heap corruption occurs, your strings remain safe, and vice versa). Plus, you get the benefits of all the checks win32 heap management offers (assert(HeapValidate()) is everywhere). It has got the disadvantage (? :p) of being a windows-only class though.

Otherwise for plain C code or for driver code (where I may not be able to use string safe functions for backward compatibility) I simply use a ripped (^^) OpenBSD’s strlcpy or I do as shown above : I have an extra byte for the NULL terminator (but it’s not very sound, is it ?).
Anthony Wieser says:

January 8, 2005 at 12:36 am

I always wrote it like this

char dest[N+1];

strncpy(dest, src, N)[N]=0

if I wanted it guaranteed to be null terminated
igor1960 says:

January 8, 2005 at 1:12 am

I think that weird discussion you guys are having here is baseless. You probably have alot of free time on your hands.

Anyway, the reson I consider it baseless is the following:

And who said that strncpy is "designed to prevent a buffer overflow"? Just because you put that notion into the caption of that entry doesn’t meen it’s true.

Just my humble opinion: strnpy has nothing to do with "buffer overflow/overrun" protection — it’s just a convenient function for those who know how to use it.

In everything else I agree with an article — but conclusion is weird. It’s like making the following conclusion:

"Don’t use "strcpy" an it’s friends because your destination buffer may not be big enough to hold the resultant length of bytes"…

Looks, strange and obvious isn’t it?
Raymond Chen says:

January 8, 2005 at 1:25 am

And who said that strncpy is "designed to prevent a buffer overflow"?

You did. I don’t know where you got that phrase from; its first appearance on this page is when you yourself wrote it.
autist0r says:

January 8, 2005 at 2:28 am

The Holy MSDN says, at the strcpy entry :

Security Note : Because strcpy does not check for sufficient space in strDestination before copying strSource, it is a potential cause of buffer overruns. Consider using strncpy instead.

I think Raymond was referring to this when writing his article.
Saurabh Jain says:

January 8, 2005 at 2:36 am

Well, help from CRT is underway in Whidbey in from secure CRT (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dncode/html/secure03102004.asp). There are *_s (strcpy_s, strncpy_s) versions of the str functions that guarantee null termination. They also invoke error handlers when the parameters are incorrect (not enough buffer, source/destination not being null terminated, and so forth), which allows one to track these error more easily.
Frank says:

January 8, 2005 at 4:58 am

@Anthony:

strncpy(dest, src, N)[N]=0

Why not simply:

strncpy(dest, src, N);

dest[N]=0;

Which i.m.o. does exactly the same and doesn’t confuse the code reviewers/maintainers!

I think people should stop inventing smart C constructions that save 5 characters of source code but make the code less readable. Unless of course when you’re trying to win a "write the shortest C source that accomplishes this task" contest :-)
Beginner says:

January 8, 2005 at 7:12 am

Microsoft recommends: For more secure string handling please use the StrSafe.h functions.

Well, I agree but, why should I have to download from that activex and badly designed site and install a massive Platform SDK in the first place? Please do not make life difficult for us by placing everything in a massive Platform SDK, which has to be downloaded in a very specific way and installed in a very specific way. I might just want to use a few Windows functions. I might want to download only the documentation, only download a header file like StrSafe.h, I might want to install them in any way I like. Why should I use a Platform SDK installer?

As things stand, developing for standard C is very easy, just get a free compiler. Learning and developing for Windows is very difficult for a beginner.
Matt Green says:

January 8, 2005 at 8:04 am

If you’re a beginner, why do you want to install something in a specific way?
Insanity Infusion says:

January 8, 2005 at 8:13 am

Wow, did this make my head entirely hurt!

/scrambles for aspirin.
igor1960 says:

January 8, 2005 at 11:21 am

> You did. I don’t know where you got that phrase from; its first appearance on this page

We are playing the game of who said what first.

It doesn’t change the subject. OK, I was wrong.

But you titled the thread "How can code that tries to prevent a buffer overflow end up causing one?"

Show me the code you are talking about. From samples you are reffering to it’s not that obvious that strncpy (and derivatives) is used in a segment of the code that prevents from "buffer overflow".

Where is it? Am I missing something?

Now, for me strncpy is a convenient function of just copying string array bytes of known length. Nothing else and nothing more. Now, to screw up the buffers — you don’t need to use strncpy, in fact you can use any other str… function and just provide bad pointer and/or not long enough destination. So, what’s the point? Does that mean that we shouldn’t use str… functions.

I completely agree that it’s better and safer to use data types other then just null terminated bytes array. But if you are using bytes array what’s wrong in having extra function that you could use or not it’s up to you. Now when you are claiming "They convert a null-terminated string into a raw character buffer. Using them where a null-terminated string is expected as the second buffer is plain wrong" — this is not true — they are not: They do it only if you made a mistake while using them.

But if this is a case then you should make the following conclusion about just strcpy. Something like that: "Don’t use strcpy, because if you pass bad dest pointer to it — your program may just crash"…

We are talking about part of "C Standard" here as strncpy is in it. For me another advantage is readability of the code that uses strncpy. Meaning, I can send this code to others and will be understood. Also, in majority of cases strncpy if you have to read/debug somebodies code strncpy is very helpfull as it gives you a clue usually about dest buffer length. Now, in your ownd development feel free to use whaever function/method you prefer and sure about, but when you have to share your algorithms to the world — nothing is better then use functions provided by "C Standard" and strncpy is one of them…
Ken Jackson says:

January 8, 2005 at 3:52 pm

I appreciate the Unix history. I didn’t know.

This is the way I always code:

char buf[100];

strncpy(buf,src,sizeof(buf));

buf[ sizeof(buf)-1 ] = 0;

Now I KNOW that buf is nul-terminated.

The string it contains may or may not be truncated.
Raymond Chen says:

January 8, 2005 at 1:14 pm

The code samples I referenced were using strncpy (or its variants) to ensure that they didn’t overflow a string buffer. But they used it incorrectly and ended up overflowing a buffer when the previous code (that presumably used the unsafe "strcpy" function) didn’t. That’s what I mean by "code that tried to prevent a buffer overflow ends up causing one."

My point is that unlike the other "str" functions, strncpy does *not* produce null-terminated strings. Do not use it as if it did.
Anthony Wieser says:

January 9, 2005 at 12:08 am

@FRANK

I write it thus:

strncpy(dest, src, N)[N]=0

because dest may be expensive to evaluate, where a I know dest must be returned by the function itself.

Also, the compiler may be able to optimize the result, because dest is the return value.

Sure it’s a small optimization, and yes, it may not be immediately obvious, but it does become an idiom, which means I always write it that way, and therefore I never make the error of omission.

Anthony Wieser

Wieser Software Ltd
autist0r says:

January 9, 2005 at 12:22 am

I to it this way,

::strncpy(dest, src, N);

::ZeroMemory(dest, sizeof(dest));

This way I know my data is secure ! :)
Tim Smith says:

January 9, 2005 at 8:55 am

That is a big trap. Some day somebody will change the code to something like:

> char *buf = malloc(100);

> strncpy(buf,src,sizeof(buf));

> buf[ sizeof(buf)-1 ] = 0;

If they do, they should be killed. Not only have they introduced a bug into the code, but they have also added a threading sync point to the code. Allocating memory is very expensive compared to adding 100 to ESP. MT heap allocation is expensive.
Larry Osterman says:

January 9, 2005 at 9:08 am

Tim,

Allocating memory is cheap when compared to blowing the stack (and thus terminating the application).

There are situations where it’s appropriate.
Michael J says:

January 9, 2005 at 7:24 am

> char buf[100];

> strncpy(buf,src,sizeof(buf));

> buf[ sizeof(buf)-1 ] = 0;

That is a big trap. Some day somebody will change the code to something like:

char *buf = malloc(100);

strncpy(buf,src,sizeof(buf));

buf[ sizeof(buf)-1 ] = 0;

and sizeof(buf) becomes "4".

Better to do something like this:

const int size=100;

char buf[size];

strncpy(buf,src,size);

buf[ size-1 ] = 0;

I have always used a wrapper around strncpy to make sure that I never left off the nul.

char *safecpy(char *pDest, const char *pSrc, size_t len)

{

strncpy(pDest, pSrc, len);

pDest[len-1] = ‘’;

return pDest;

}
Paul C. says:

January 10, 2005 at 9:00 am

I just looked over at the functions provided by strsafe.h, and couldn’t help but notice that there was no replacement for any of the scanf functions. Is there anything in the Win32 API off the top of your head that replicates the fscanf functionality safely?
igor1960 says:

January 10, 2005 at 4:49 pm
1. Heap "buffer overrun" maybe as dangerouse as static(stack) "buffer overrun":
  
  2. Another error in your sample is sizeof(buf) is not 100 as you probably ment:
  
  // That’s better solution:
  
  char *buf = malloc(100);
  
  buf[ 99 ] = 0;
  
  strncpy(buf,src,100);
  
  if(buf[ 99 ] != 0 )
  
  {
  
  // Overflow….
  
  }
  
  =====================
  
  But I still disagree with considering strncpy more dangerous then any other str… functions.
  
  Using the same token you could say that strcpy is dangerous and don’t use it as src in it maybe not NULL terminated string…
Larry Osterman says:

January 10, 2005 at 5:51 pm

igor1960: Umm.. strcpy is vastly more dangerous than strncpy – the difference is that strncpy allows you to THINK you’ve fixed a security hole when you haven’t.

Paul C: I don’t believe that there is a safe scanf. I also don’t think it’s possible to write a safe scanf in C (it may be possible in C++, I’m not sure).

For scanf, you’re better off parsing it yourself, IMHO.
igor1960 says:

January 10, 2005 at 7:05 pm

Larry: Umm.. strcpy is vastly more dangerous than strncpy…

The problem is the guys here are pretty crazy about security — while in reality nothing in the discussions above realy relates to security. In fact the purpose of strncpy is not to protect or decrease probability of buffer overrun at all. It’s just a copying function. How you use it in secure content properly or not is a different issue. But could you please demonstrate me a case of buffer overrun using my above described sample:

char *buf = malloc(100);

buf[ 99 ] = 0;

strncpy(buf,src,100);

if(buf[ 99 ] != 0 )

{

// Overflow….overrun — name it whatever you want

}

I think you agree with me that what I think is exactly what it is: security hole is fixed…
Raymond Chen says:

January 10, 2005 at 7:27 pm

If "src" is longer than 99 characters, then "buf" has no null terminator. As many people have pointed out, you have to whack a null terminator in manually. If you forget, then you create an illegal string.

The strange behavior of the strn* functions is not well known; they are the exceptions to the general rule that "str" functions produce null-terminated strings. And strings that aren’t properly terminated lead to read errors at a minimum, and possibly write errors if the string is copied on the assumption that the function properly terminated the buffer.
Paul C. says:

January 11, 2005 at 2:23 pm

Larry: Actually, I gave it a little thought. I could use fgets (which takes a size parameter, and null terminates) to safely pull an entire line out of the file, then use sscanf to parse the line. While reading the line, if the line is bigger than my fgets buffer, I’ll know because the line returned by fgets will be the size of the buffer, the character before the null terminator will not be a newline, and I won’t be at EOF, so I’ll know to allocate a bigger buffer and read the rest of the line in (or give up because the line is too big and a giant binary mess :-) ). I now know the max size of the line because I built it with fgets, no string in that line should be bigger than the line itself. So, if all the temp string buffers I pass to sscanf are as large as the string I pass to sscanf to be parsed, I should be safe.
encourage memcpy instead says:

January 12, 2005 at 2:36 am

Because of the unreliable behaviour of strncpy, I always use memcpy instead, which force me to terminate the destination buffer myself (if buf length>0).

I didnt know of the null-padding which strncpy performs, and it’s always nice when Raymond present such information with an explaination of the underlying reason.
Paul Hsieh says:

January 18, 2005 at 11:25 pm

Aha! So this is where all that traffic came from! :)

I am the author of "The Better String Library" (http://bstring.sf.net/). This discussion about strncpy, strlcpy, etc is exactly what "The Better String Library" (Bstrlib) is all about. The real problem is that in the C language, when you want to work with strings you are forced to think about buffers. This becomes a source of problems, and the interesting weaknesses of strncpy and strlcpy (since both have to decide on some sort of compromise for situtations where the buffer is too small) highlight this problem. The same generic problem exists for the equally important strcat and fgets functions as well of course.

This lesson has not been lost on the designers of just about every other programming language in existence. The solution is to perform automatic buffer management with every string manipulation. This frees the programmer to think about and deal with strings as just that — strings. This poses the question: can such automatic buffer management be implemented in a library for the C language? Bstrlib is an answer to this question in the affirmative. Bstrlib gives a complete string ADT which has a large superset of the functionality of the standard C library but because it hides all buffer management it avoids all of the typical string related buffer overflow problems. In practice it is truly significantly safer and easier to use than standard ‘’ terminated char * based strings.

Because I wanted to put Bstrlib forth as a complete substitute for string manipulation needs for C and C++ I also needed to solve a number of other problems:

1. Portability — its open source (BSD

license, so even Microsoft can use

it) and has been verified to compile

with 16, 32 and 64 bit compilers on

DOS, Windows and Linux. (I am

waiting for some feedback from the

Mac OS X community, though I am

almost sure that it will compile

there with no issues on either gcc

or Metrowerks.)

2. Speed — benchmarks I’ve written

indicate that Bstrlib, despite its

extra implicit buffer management

code, actually *outperforms* the C

library most of the time for most

functions.

3. C++ API — This is where I got the

most feedback from other developers

who are more experienced with C++

than I. The result is a powerful

C++ API that is comparable to both

Microsoft’s MFC CString class as

well as the STL’s std::string class

in terms of functionality. In terms

of performance, Bstrlib’s C++ API

leaves the competition in the dust.

4. Compatibility with ‘’ terminated

char * strings — the fact is that

other precompiled libraries with

typically stick to using the

standard C string concepts.

http://www.pcre.org is a prime

example of a very valuable library

that makes this assumption. So it

was very important to make sure

Bstrlib supported backward

compatibility modes with ‘’

terminated char * strings very well,

which it does. Pointer addition

tricks, for example, are emulated as

a constant reference substring

macro. Remember, as inline strings,

the language itself only supports

backslash-escape-interpreted symbols

rammed between a pair of quotes. So

it doesn’t make sense to break off

backward compatibility or to make it

cumbersome.

5. Good IO support — I implemented a

fully abstracted file IO interface

which is integrated with Bstrlib.

Because of the subtle difference

between text mode and binary mode

files, bstrings have to be able to

hold binary content (specifically

including the ‘’ character as an

ordinary character.) Thus Bstrlib’s

implicit buffer management

capabilities can be leveraged in its

IO routines as well. A key example

of this is the support for an

arbitrary number of "ungetc"-like

calls.

6. Functionality of other languages —

The primary considerations here was

to implement split/join,

insert/delete, find+replace,

functions as well as write

protection and complete parameter

aliasing safety. This functionality

can be found in other languages such

as Python or Perl.

By delivering all this, I believe that Bstrlib is a far superior solution to anything else out there (for string manipulation). I have a comparison table here: http://bstring.sf.net/features.html (though obviously you might find it a little biased.)

As to specific comments made in this thread, here are my thoughts:

– As to using Mac OS X/Slackware

(Linux) obviously Bstrlib is a better

solution than the Win32 or .NET API.

Because Bstrlib is a platform neutral

source library under active

maintenace, enhancements, bugfixes

and other updates will be equal for

everyone — there is no risk of a Mac

OS X version being "orphaned" because

of its smaller community size, for

example.

– Unicode/widechar support is beyond

the scope of what Bstrlib solves.

However, since Bstrlib is very

interoperable with char * based

strings, the conversion mechanisms

for existing Unicode/widechar string

library functionality should still be

able to work with Bstrlib.

– Strsafe was a legitimate attempt by

Microsoft to improve the safety of

char * strings. But since it makes

no attempt to improve the upon the

functionality of the C library and

still leaves the problem of precise

buffer management in the hands of the

programmer, it falls far short of bar

set by Bstrlib. For C++ users MFC’s

CString is a far superior solution

which in turn is inferior to

Bstrlib’s C++ API.

– scanf is a calamity of design errors.

Its fairly rare to desire string

input delimited by spaces. Simply

breaking down scanf into the

functions fgets then sscanf is far

superior, and the only mechanism I

would recommend for basic parsing of

strings using the C library

facilities. Of course you can use

Bstrlib’s IO to read input even more

safely, than parse things out with

sscanf.

Comments are closed.

Date:	January 7, 2005 / year-entry #6
Tags:	code;history
Orig Link:	https://blogs.msdn.microsoft.com/oldnewthing/20050107-00/?p=36773
Comments:	65
Summary:	If you read your language specification, you'll find that the ...ncpy functions have extremely strange semantics. The strncpy function copies the initial count characters of strSource to strDest and returns strDest. If count is less than or equal to the length of strSource, a null character is not appended automatically to the copied string. If...