Copyright © Microsoft Corporation. This document is an archived reproduction of a version originally published by Microsoft. It may have slight formatting modifications for consistency and to improve readability.
July 1999

Jeffrey Richter wrote Advanced Windows, Third Edition (Microsoft Press, 1998) and Windows 95: A Developer's Guide (M&T Books, 1995). Jeff is a consultant and teaches Win32 programming courses ( He can be reached at

Q I have been writing Win32®-based applications for years now and have always created my threads by calling the CreateThread function. Recently a coworker told me that I should not call CreateThread, but to use _beginthreadex instead. So far my apps have all worked just fine with CreateThread. Can you explain what all this is about?
Brady Trace
Redmond, WA

A Well, first, I must tell you that your coworker is correct. If you are writing Win32-based applications using C or C++, you should create threads using the _beginthreadex function and not use CreateThread. Now I'll tell you why.
® ships six C/C++ runtime (CRT) libraries with Visual C++®. Figure 1 lists the names of the libraries and their descriptions. When implementing any type of project, you must know which of these libraries you're linking with your project. Select the desired runtime library using the Project Settings dialog box and choose the C/C++ tab. Under the Code Generation category pick one of the six options from the "Use run-time library" combobox (see Figure 2).

Figure 2 Choosing a CRT
Figure 2 Choosing a CRT

Why is there one library for single-threaded applications and an additional library for multithreaded applications? The standard CRT library was invented around 1970, long before threads were available on any operating system.
Consider, for example, the standard CRT global variable errno. Some functions set this variable when an error occurs. Let's say you have the following code fragment:

 BOOL fFailure = (system("NOTEPAD.EXE README.TXT") == -1);
 if (fFailure) {
     switch (errno) { 
     case E2BIG: // Argument list or environment too big 
   case ENOENT: // Command interpreter cannot be found
   case ENOEXEC: // Command interpreter has bad format
   case ENOMEM: // Insufficient memory to run command
Now let's imagine that the thread executing the previous code is interrupted after the call to the system function and before the if statement. The thread is being interrupted to allow a second thread in the same process to execute, and this new thread will execute another CRT function that sets the global variable errno. When the CPU is later assigned back to the first thread, the value of errno no longer reflects the proper error code for the call to system in the previous code. To solve this problem, each thread requires its own errno variable. In addition, there must be some mechanism that allows a thread to reference its own errno variable, but not touch another thread's errno variable.
This is only one example of how the standard CRT library was not designed for multithreaded applications. Some of the CRT variables and functions that have problems in multithreaded environments are errno, _doserrno, strtok, _wcstok, strerror, _strerror, tmpnam, tmpfile, asctime, _wasctime, gmtime, _ecvt, and _fcvt.
For multithreaded C and C++ programs to work properly, a data structure must be created and associated with each thread that uses CRT library functions. Then, when you make CRT library calls, those functions must know to look in the calling thread's data block so that no other thread is adversely affected.
How does the system know to allocate this data block when a new thread is created? The answer is that it doesn't. The system has no idea that your application is written in C/C++ and that you are calling functions that are not natively thread-safe. The onus is on you to make sure that everything is done correctly.
Here is what you must do. To create a new thread, do not call the operating system's CreateThread function. Instead, call the CRT function, _beginthreadex:

 unsigned long _beginthreadex(void *security,
     unsigned stack_size, 
     unsigned (*start_address)(void *), void *arglist, 
     unsigned initflag, unsigned *thrdaddr);
_beginthreadex has the same parameter list as the CreateThread function, although the parameter names and types are not exactly the same. This is because Microsoft feels that CRT functions should not have any dependencies on Windows® data types. The _beginthreadex function also returns the handle of the newly created thread just like CreateThread. So, if you have been calling CreateThread in your source code, it is fairly easy to globally replace it with calls to _beginthreadex.
Since the data types are not quite the same, you may have to perform some casting to make the compiler happy. To make things easier, I created a macro called chBEGINTHREADEX:

 typedef unsigned (__stdcall *PTHREAD_START) (void *);

 #define chBEGINTHREADEX(psa, cbStack, pfnStartAddr, \
     pvParam, fdwCreate, pdwThreadID)                \
       ((HANDLE) _beginthreadex(                     \
          (void *) (psa),                            \
          (unsigned) (cbStack),                      \
          (PTHREAD_START) (pfnStartAddr),            \
          (void *) (pvParam),                        \
          (unsigned) (fdwCreate),                    \
          (unsigned *) (pdwThreadID))) 
Note that the _beginthreadex function exists only in the multithreaded versions of the CRT library. If you are linking to a single-threaded runtime library, you will get an "unresolved external symbol" error reported from the linker. This is by design, of course, since the single-threaded library will not work properly in a multithreaded application. Also, note that Visual Studio defaults to selecting the single-threaded library whenever you create a new project. This is not the safest default, and for multithreaded applications you must explicitly change to a multithreaded CRT library.
Since Microsoft ships the source code to the CRT library, it's easy to determine exactly what _beginthreadex does that CreateThread doesn't do. In fact, I searched the Visual Studio CD-ROM and found the source code for _beginthreadex in THREADEX.C. Rather than reprint the source code for it here, I'll give you a pseudocode version (see Figure 3) and highlight the interesting points.
There are a few important things to note about _beginthreadex. First, each thread gets its very own tiddata memory block allocated from the CRT's heap. The tiddata structure (see Figure 4) can be found in the Visual C++ source code in MTDLL.H. The address of the thread function passed to _beginthreadex is saved in the tiddata memory block. The parameter to be passed to this function is also saved in this data block. _beginthreadex does call CreateThread internally since this is the only way that the operating system knows how to create a new thread. When CreateThread is called, it is told to start executing the new thread with a function called _threadstartex, not pfnStartAddr. Also, note that the parameter passed to the thread function will be the address of the tiddata structure, not pvParam. Finally, if all goes well, the thread handle is returned just like CreateThread. If any operation fails, NULL is returned.
So now that a tiddata structure has been allocated and initialized for the new thread, you need to see how this structure is associated with the thread. Let's take a look at the _threadstartex function (which can also be found in the CRT's THREADEX.C file). Figure 5 shows a pseudocode version of this function.
There are number of things to note about _threadstartex. The new thread begins executing with BaseThreadStart (in Kernel32.DLL), and then jumps to _threadstartex. _threadstartex is passed the address to this new thread's tiddata block as its only parameter. TlsSetValue is an operating system function that allows you to associate a value with the calling thread. This is called Thread Local Storage (TLS). _threadstartex associates the tiddata block with the new thread.
A structured exception handling frame is placed around the desired thread function. This frame is responsible for handling many things related to the runtime library, such as runtime errors (like throwing C++ exceptions that are not caught) and the CRT's signal function. This is critical. If you created a thread using CreateThread and then called the CRT's signal function, the function would not work correctly.
The desired thread function is called and passed the desired parameter. Recall that the address of the function and the parameter were saved in the tiddata block by _beginthreadex. The return value from the desired thread function is supposed to be the thread's exit code. Note that _threadstartex does not simply return to BaseThreadStart. If it did, the thread would die and its exit code would be set correctly, but the thread's tiddata memory block would not be destroyed. This would cause a leak in your application. To prevent this leak, another CRT function, _endthreadex, is called and passed the exit code.
_endthreadex is also in the CRT's THREADEX.C file. Here is my pseudocode version of this function:

 void __cdecl _endthreadex (unsigned retcode) {
     _ptiddata ptd;    // Pointer to thread's data block

     // Cleanup floating-point support (code not shown) 

     // Get the address of this thread's tiddata block
     ptd = _getptd();

     // Free the tiddata block

     // Terminate the thread
Note that the CRT's _getptd function internally calls the operating system's TlsGetValue function, which retrieves the address of the calling thread's tiddata memory block. This data block is then freed and the operating system's ExitThread function is called to truly destroy the thread. Of course, the exit code is passed and set correctly.
I strongly suggest that you never call the ExitThread function when you want your thread to terminate. The best thing to do is simply return from your thread function and have the thread die naturally. Another reason you shouldn't call ExitThread is that it will prevent the thread's tiddata memory block from being freed and your application will leak memory (until the whole process terminates).
The Microsoft Visual C++ team realized that people like to call ExitThread anyway, and they wanted to make this possible without forcing your application to leak memory. So if you really want to explicitly exit your thread, you can have it call _endthreadex (instead of ExitThread) to free the thread's tiddata block and then exit. But calling _endthreadex is still discouraged.
By now you should understand why the CRT library's functions need a separate data block for each thread created, and you should also see how calling _beginthreadex allocates, initializes, and associates this data block with the newly created thread. You should also be able to understand how the _endthreadex function frees the data block when the thread terminates.
Once this data block is initialized and associated with the thread, any CRT library functions that require per-thread instance data can easily retrieve the address to the calling thread's data block (via TlsGetValue) and manipulate the thread's data.
This is fine for functions, but you might be wondering how this works for a global variable such as errno. Well, errno is defined in the standard C headers like this:

 #if defined(_MT) || defined(_DLL)
 extern int * __cdecl _errno(void);
 #define errno (*_errno())
 #else /* ndef _MT && ndef _DLL */
 extern int errno;
 #endif /* _MT || _DLL */
If you're creating a multithreaded application, you'll need to specify the /MT (multithreaded application) or /MD (multithreaded DLL) switch on the compiler's command line. This causes the compiler to define the _MT identifier. Then, whenever you reference errno, you are actually making a call to the internal CRT library function _errno. This function returns the address to the errno data member in the calling thread's associated data block.
Notice that the errno macro is defined as taking the contents of this address. This definition is necessary because it's possible to write code like this:

 int *p = &errno;
 if (*p == ENOMEM) {
If the internal _errno function simply returned the value of errno, the previous code wouldn't compile.
The multithreaded version of the CRT library also places synchronization primitives around certain functions. For example, if two threads call malloc simultaneously, the heap could possibly become corrupted. The multithreaded version of the CRT library prevents two threads from allocating memory from the heap at the same time by making the second thread wait until the first has returned from malloc. Then the second thread is allowed to enter. Obviously the performance of the multithreaded version of the CRT library is affected by all this additional work, which is why Microsoft supplies the single-threaded version of the statically linked CRT library in addition to the multithreaded version.
The dynamically linked version of the CRT library was written to be generic so that it could be shared by any and all running applications and DLLs. For this reason, the library exists only in a multithreaded version. Because the CRT library is supplied in a DLL, applications (EXE files) and DLLs don't need to include the code for the CRT library functions and are smaller as a result. Also, if Microsoft fixes a bug in the CRT library DLL, applications will automatically gain the fix as well.
As you might expect, the CRT library's startup code allocates and initializes a data block for your application's primary thread. This allows the primary thread to call any of the CRT functions safely. When your primary thread returns from its entry-point function, the CRT library frees the associated data block. In addition, the startup code sets up the proper structured exception handling code so that the primary thread can successfully call the CRT's signal function.
By now you're probably wondering why your Win32-based applications seemed to work over the years even though you've been calling CreateThread instead of _beginthreadex. When a thread calls a CRT function that requires the tiddata structure (most CRT functions are thread-safe and do not require this structure), here is what happens. First, the CRT function attempts to get the address of the thread's data block (by calling TlsGetValue). Second, if NULL is returned as the address of the tiddata block, then the calling thread doesn't have a tiddata block associated with it. At this point, the CRT function allocates and initializes a tiddata block for the calling thread right on the spot. The block is then associated with the thread (via TlsSetValue), and this block will stay with the thread for as long as the thread continues to run. Third, the CRT function can now use the thread's tiddata block, and so can any CRT functions that are called in the future.
This, of course, is fantastic because your thread runs without a hitch (almost). Well, actually there are a few problems here. If the thread uses the CRT's signal function, the entire process will terminate because the structured exception handling frame has not been prepared. Also, if the thread terminates without calling _endthreadex, the data block cannot be destroyed and a memory leak occurs. (And who would call _endthreadex for a thread created with CreateThread? Very unlikely.)
Some closing remarks: first, if you call _beginthreadex, you'll get back a handle to the thread. At some point, that thread's handle must be closed. _endthreadex doesn't do it. Normally, the thread that called _beginthreadex (possibly the main thread) will call CloseHandle on the newly created thread's handle when the thread handle is no longer needed. Second, you only need to use _beginthreadex if your app uses CRT functions. If it doesn't, then you can just use CreateThread. Also, you can use CreateThread if only one thread (the main thread) in your app uses the CRT. If newly created threads don't use the CRT, you don't need _beginthreadex or the multithreaded CRT.

Have a question about programming in Win32? Contact Jeffrey Richter at

From the July 1999 issue of Microsoft Systems Journal.