In recent columns for MSJ (June 1999), I've discussed COM type libraries and database access layers such as ActiveX® Data Objects (ADO) and OLE DB. Longtime readers of my MSJ writings (both of them) probably think I've gone soft. To redeem myself, this month I'll tour part of the Windows NT®
loader code where the operating system and your code come together.
I'll also demonstrate a nifty trick for getting loader status
information from the loader, and a related trick you can use in the
Developer Studio® debugger.
Consider
what you know about EXEs, DLLs, and how they're loaded and initialized.
You probably know that when a C++ DLL is loaded, its DllMain function
is called. Think about what happens when your EXE implicitly links to
some set of DLLs (for example, KERNEL32.DLL and USER32.DLL). In what
order will those DLLs be initialized? Is it possible for one of your
DLLs to be initialized before another DLL that you depend on? The
Platform SDK has this to say under the "Dynamic-Link Library Entry-Point
Function" section.
Your
function should perform only simple initialization tasks, such as
setting up thread local storage (TLS), creating synchronization
objects, and opening files. It must not call the LoadLibrary function,
because this may create dependency loops in the DLL load order. This can
result in a DLL being used before the system has executed its
initialization code. Similarly, you must not call the FreeLibrary
function in the entry-point function, because this can result in a DLL
being used after the system has executed its termination code.
Calling Win32®
functions other than TLS, synchronization, and file functions may also
result in problems that are difficult to diagnose. For example, calling
User, Shell, and COM functions can cause access violation errors,
because some functions in their DLLs call LoadLibrary to load other
system components.
|
Something
I've learned firsthand is that the above documentation is still way too
vague. For example, reading a registry key is a natural thing you'd
want to do inside your DllMain function. It certainly qualifies as
initialization. Unfortunately, in the right circumstances ADVAPI32.DLL
isn't initialized before your DllMain code, and the registry APIs will
just fail.
Given
the stern warning about using LoadLibrary in the documentation, it's
especially interesting that the Windows NT USER32.DLL explicitly ignores
the preceding advice. You may be aware of a Windows NT only registry
key called AppInit_Dlls that loads a list of DLLs into each process. It
turns out that the actual loading of these DLLs occurs as part of
USER32's initialization. USER32 looks at this registry key and calls
LoadLibrary for these DLLs in its DllMain code. A little thought here
reveals that the AppInit_Dlls trick doesn't work if your app doesn't use
USER32.DLL. But I digress.
My
point in bringing this up is that DLL loading and initialization is
still a gray area. In most cases, a simplified view of how the OS loader
works is sufficient. In those oddball 5 percent of cases, however, you
can go nuts unless you have a more detailed working model of how the OS
loader behaves.
Load 'er Up!
What
most programmers think of as module loading is actually two distinct
steps. Step one is to map the EXE or DLL into memory. As this occurs,
the loader looks at the Import Address Table (IAT) of the module and
determines whether the module depends on additional DLLs. If the DLLs
aren't already loaded in that process, the loader maps them in as well.
This procedure recurses until all of the dependent modules have been
mapped into memory. A great way to see all the implicitly dependent DLLs
for a given executable is the DEPENDS program from the Platform SDK.
Step
two of module loading is to initialize all of the DLLs. Stop and ponder
this. While the OS loader is mapping the EXE and/or DLLs into memory in
step one, it's not calling the initialization routines. The
initialization routines are called after all the modules have been
mapped into memory. Key point: the order in which DLLs are mapped into
memory is not necessarily the same as the order in which the DLLs are
initialized. I've seen people look at the DLL mapping notifications as
they appear in the Developer Studio debugger and mistakenly assume that
the DLLs were initialized in that same order.
In
Windows NT, the routine that invokes the entry point of EXEs and DLLs
is called LdrpRunInitializeRoutines, and it's worth taking a look at
here. In my own work, I've stepped through the assembler code for
LdrpRunInitializeRoutines many times. However, looking at a ream of
assembler code isn't the best way to understand it. Therefore, I rewrote
LdrpRunInitializeRoutines from Windows NT 4.0 SP3 in C++-like
pseudocode, with the results shown in Figure 1.
To be completely accurate, in NTDLL.DBG the routine name is __stdcall
mangled to _LdrpRunInitializeRoutines@4. Also, in my pseudocode, unless
a variable or structure name is prefixed with an underscore, it was a
name I made up.
LdrpRunInitializeRoutines
is the final stop in the Windows NT loader code before an EXE's or
DLL's specified entry point is called. (In the following discussion,
I'll use "entry point" and "initialization routine" interchangeably.)
This loader code executes in the process context that loaded the
DLL-that is, it's not part of some special loader process.
LdrpRunInitializeRoutines is called at least once during process
startup to handle implicitly loaded DLLs. LdrpRunInitializeRoutines is
also called every time one or more DLLs is dynamically loaded, usually
because of a call to LoadLibrary.
Each
time LdrpRunInitializeRoutines executes, it seeks out and calls the
entry point of all DLLs that have been mapped into memory, but not yet
initialized. In examining the pseudocode, take note of all the extra
code that provides trace output, even in the nonchecked builds of
Windows NT. I'm referring to all the code that uses the _ShowSnaps
variable and the _DbgPrint function. I'll come back to these players
later.
At
a high level, the function breaks up into four distinct sections. The
first portion of the code calls _LdrpClearLoadInProgress. This NTDLL
function returns the number of DLLs that have just been mapped into
memory. For example, if you called LoadLibrary on FOO.DLL and FOO had
implicit links to BAR.DLL and BAZ.DLL, _LdrpClearLoadInProgress would
return 3 since three DLLs were mapped into memory.
After
the number of DLLs to be concerned with is known,
LdrpRunInitializeRoutines calls _RtlAllocateHeap (also known as
HeapAlloc) to get memory for an array of pointers. In the pseudocode
I've called this array pInitNodeArray. Each pointer in pInitNodeArray
will eventually point to a structure containing information about the
newly loaded (but not yet initialized) DLL.
In
the second part of LdrpRunInitializeRoutines, the code digs into
internal process data structures to obtain a linked list containing each
of the newly loaded DLLs. As the code iterates through the linked list,
it checks to see if the loader has somehow seen this DLL before (not
likely). It also checks to ensure that the DLL has an entry point. If
both tests are passed, the code appends the module information pointer
to the pInitNodeArray. The pseudocode refers to the module information
as pModuleLoaderInfo. Note that it's entirely possible for a DLL to not
have an entry point-for example, a resource-only DLL. Thus, the number
of entries in pInitNodeArray may be fewer than the value returned
earlier by _LdrpClearLoadInProgress.
The
third (and largest) section of LdrpRunInitializeRoutines is where
things really start to happen. The code's mission here is to enumerate
through each element in pInitNodeArray and call the entry point. Because
of the very real possibility that a DLL's initialization code may
fault, the entire third section of code is surrounded by a __try block.
This is why a dynamically loaded DLL can fault in its DllMain without
bringing the whole process down.
Iterating
through an array and calling an entry point for each node should be a
small task. However, some relatively obscure features of Windows NT add
to the complexity. For starters, consider whether the process is being
debugged by a Win32 debugger such as MSDEV.EXE. Windows NT has an option
that allows you to suspend a process and send control to the debugger
before a DLL is initialized. This feature is on a per-DLL basis, and is
enabled by adding a string value (BreakOnDllLoad) to a registry key with
the name of the DLL (for instance, FOO.DLL). See the pseudocode comment
above the call to _LdrQueryImageFileExecutionOptions in Figure 1 for more information.
Another
bit of extra code that may execute before a DLL's entry point
invocation is the TLS initialization. When you declare TLS variables
using __declspec(thread), the linker includes data that causes this
condition to be triggered. Right before the DLL's entry point is called,
LdrpRunInitializeRoutines checks to see if a TLS initialization is
necessary and, if so, calls _LdrpCallTlsInitializers. More on this
later.
The
moment of truth finally comes when LdrpRunInitializeRoutines calls the
DLL's entry point. I deliberately left this part of the pseudocode in
assembly language. You'll see why later. The crucial instruction is CALL
EDI. Here, EDI points to the DLL's entry point, which is specified in
the DLL's PE header. When CALL EDI returns, the DLL in question has
completed its initialization. For DLLs written in C++, this means that
the DllMain code has executed its DLL_PROCESS_ATTACH code. Also, note
the third parameter to the entry point, normally referred to as
pvReserved. In truth, this parameter is nonzero for DLLs that the EXE
implicitly links to directly or through another DLL. The third parameter
is zero for all other DLLs (that is, DLLs loaded as a result of a
LoadLibrary call).
After
the DLL entry point is invoked, LdrpRunInitializeRoutines does a
sanity check to make sure the DLL entry point code was defined properly.
The loader code looks at the stack pointer (ESP) value from before and
after the entry point call. If they're different, something's wrong with
the DLL's initialization function. Since most programmers never define
the real DLL entry point function, this scenario rarely happens.
However, when it does, you're informed of the problem via an onerous
dialog (see Figure 2). I had to use a debugger and modify a register value at just the right spot to produce this dialog.
|