ver
since I first encountered a definition of dynamic link libraries in a
description of the then-new operating system OS/2, the idea of DLLs has
always fascinated me. This beautifully simple concept of modules that
could be loaded and unloaded as needed with well-defined interfaces that
outlined routines written beforehand and, perhaps by other programmers,
was a powerful jolt to me because I was more accustomed to statically
linked code in mainframe or MS-DOS® programs. And, like many others new
to programming for Windows®, the first utility I built enumerated DLLs
that were already loaded into the system in order to demonstrate this
concept at work. Now, even with the Windows world changing at a frenetic
pace, employing COM interfaces and their ActiveX® components, and
moving toward common language runtimes with their assemblies of managed
code, the humble DLL remains at the center of things, providing services
to the system on an as-needed basis.
During
this long association with DLLs, I accepted their loading by the
operating system as if it were magic and never truly appreciated the
amount of work required by LoadLibrary and its variations. This article
is an attempt to rectify that oversight by looking inside NTDLL.DLL.
Since I do not have access to the source files, much of what I discuss
here falls under that nebulous category of undocumented information and
is therefore subject to change or obsolescence in future releases of the
operating system.
The
details that I'll cover are based on an examination of the binaries
available when I wrote this article, Windows 2000 Professional (Build
2195: Service Pack 1). Access to a properly installed set of debug
symbol files, .DBG and .PDB dated July 9, 2000, and working with a
suitable debugger will make the information easier to understand.
This article can be seen as a precursor to Matt Pietrek's Under The Hood column in the September 1999 issue of MSJ,
where Matt writes pseudocode for the operation of
LdrpRunInitializeRoutines for Windows NT® 4.0 SP3 and describes how a
library is initialized and when DllMain gets called. Note that I will
refer to this column frequently.
My
discussion will begin with a brief look at LoadLibrary, starting with
LdrLoadDll, and will conclude when LdrpRunInitializeRoutines is invoked.
While trying to follow the execution path needed to load a simple DLL
using a debugger, you can easily become confused by the numerous
unconditional jump statements and lost in the recursion common in the
later stages of DLL loading, so I'll guide you carefully through the
call to LoadLibrary.
Note that all code modules mentioned in this article can be found at the link at the top of this article.
All Paths Lead to LoadLibraryExW
There
are several ways to get to LoadLibraryExW. For example, LoadTypeLibEx
and CoLoadLibrary in the COM universe eventually call LoadLibraryExW.
The two most familiar routes to LoadLibraryExW are LoadLibraryA and
LoadLibraryW. All you need to do is specify one parameter—the name of
the DLL—and you are on your way.
But
if you examine a disassembly of LoadLibraryA and LoadLibraryW, you will
discover that they are merely thin wrappers around the more versatile
LoadLibraryExA and LoadLibraryExW APIs, respectively. With LoadLibraryA,
there is a curious test for the DLL twain_32.dll, but normally two
zeroes are passed as the second and third parameters to LoadLibraryExA
before continuing. LoadLibraryW is even more direct; the two zeroes are
pushed onto the stack and the code moves directly onto LoadLibraryExW.
These paths merge with an examination of LoadLibraryExA. There is a call
to a helper routine to convert the DLL's name into a Unicode string
before the code proceeds onto LoadLibraryExW.
LoadLibraryExW
is a fairly involved routine which must decide between at least four
different variations on the type of DLL-loading that the program wants
to perform. In addition to the first parameter (which contains the name
of the DLL), and the second parameter (which must be NULL according to
the SDK documentation), there is a flag parameter that specifies the
action to take when loading the module. You can ignore the flag value
LOAD_LIBRARY_AS_DATAFILE, since it does not lead to the desired goal:
LdrLoadDll, an API exported by NTDLL.DLL. The remaining valid values—0,
DONT_RESOLVE_DLL_REFERENCES, and LOAD_WITH_ALTERED_SEARCH_PATH—all
eventually find their way to LdrLoadDll.
It
is interesting to note that there is no sanity check for the flag
parameter that returns something like STATUS_INVALID_PARAMETER if values
other than the documented ones are passed to LoadLibraryExW. For
example, plugging in a reasonable value such as 4 results in a normal
DLL load. In any case, the alternate paths taken with
DONT_RESOLVE_DLL_REFERENCES or LOAD_WITH_ALTERED_SEARCH_PATH will be
noted later in this article. For now, I will concentrate on the normal
case in which dwFlags has a value of 0.
There's
one more API exported from NTDLL.DLL that leads to LdrLoadDll, but I
have found neither documentation for it nor any references to it in
existing DLLs. The name of the API is LoadOle32Export. It accepts two
parameters, the image base for Ole32.DLL and the name of the routine to
find, and it returns the address to the requested function. Whether this
is a mysterious secret path or some kind of holdover from past versions
of the operating system is uncertain.
APIs Exported by NTDLL.DLL
Figure 1 lists the exported APIs in NTDLL.DLL beginning with the prefix "Ldr", and Figure 2 lists the internal routines beginning with "Ldrp". The main difference between the names for the LdrXXX and LdrpXXX
routines is that the "p" indicates that these functions are
private—hidden from the outside world. As you can see by examining the
lists, familiar APIs such as GetProcAddress are wrappers around NTDLL
exports like LdrGetProcedureAddress. In fact, many APIs in Kernel32 that
are familiar to the SDK programmer, such as GetProcAddress,
ExitProcess, ExpandEnvironmentStrings, and CreateSemaphore, pass the
real work along to an NTDLL surrogate (LdrGetProcedureAddress,
NtTerminateProcess, RtlExpandEnvironmentStrings_U, and
NtCreateSemaphore, respectively).
Now, let's take a close look at LdrLoadDll using DumpBin or my disassembler, PEBrowse (available from http://www.smidgeonsoft.com).
0X77F889A9 PUSH 0X1
0X77F889AB PUSH DWORD PTR [ESP+0X14]
0X77F889AF PUSH DWORD PTR [ESP+0X14]
0X77F889B3 PUSH DWORD PTR [ESP+0X14]
0X77F889B7 PUSH DWORD PTR [ESP+0X14]
0X77F889BB CALL 0X77F887E0 ; SYM:LdrpLoadDll
0X77F889C0 RET 0X10
Do not be deceived by the apparent simplicity of this routine, which
looks like merely a wrapper around the internal procedure, LdrpLoadDll
(that I'll discuss in the next section). The first parameter points to a
wide-string representation of the search path. The second parameter
will hold a DWORD with the value of 2 if DONT_RESOLVE_DLL_REFERENCES was
specified in the call to LoadLibraryExW. The third parameter contains a
pointer to a Unicode string structure that encases the name of the DLL
that is to be loaded. The fourth item is an output parameter and will
receive the address at which the module was loaded when LdrpLoadDll has
finished its work.
But
what does the fifth parameter, hardcoded with a value of 1, mean? As
you will see later, LdrpLoadDll can be called recursively in
LdrpSnapThunk. Here, the value means normal processing, but later you
will see that a value of 0 means process forwarded APIs.
LdrpLoadDll
I
have provided a small project in the code download for this article
that contains a main executable, ingenuously named Test, and three DLLs:
TestDll, Forwarder, and Forwarded. The DLLs demonstrate different
variations that illustrate some of the scenarios LdrpLoadDll will
commonly encounter. See the Readme.txt file in the code download for
important information about setting environment variables and predefined
breakpoints.
Figure 3
shows some of the internal loader routines you will bump into when you
pass one of the documented flags to LoadLibraryExW. If you concentrate
for a moment on the typical situation (#1 in Figure 3),
you will see that there are six subroutines called directly by
LdrpLoadDll: LdrpCheckForLoadedDll, LdrpMapDll,
LdrpWalkImportDescriptor, LdrpUpdateLoadCount,
LdrpRunInitializeRoutines, and LdrpClearLoadInProgress. (I'll discuss
the first four subroutines later in this article.)
LdrpRunInitializeRoutines has already been described in Matt Pietrek's
column, so I won't go into it here. LdrpClearLoadInProgress is briefly
mentioned in that column as well.
Let's take a high-level look at the steps taken by LdrpLoadDll, which occur as follows:
- Check to see if the module is already loaded.
- Map the module and supporting information into memory.
- Walk the module's import descriptor table (that is, find out what other modules this one is adding).
- Update the module's load count as well as any others brought in by this DLL.
- Initialize the module.
- Clear some sort of flag, indicating that the load has finished.
Looking
more closely at the routines listed, you will notice that
LdrpCheckForLoadedDll can be and is called from several locations. It is
therefore reasonable to assume that this is a helper function. Also,
note that LdrpSnapThunk and LdrpUpdateLoadCount, as well as the
previously mentioned case for LdrpLoadDll, are all candidates for
recursion.
The
code for LdrpLoadDll.cpp contains pseudocode for the operations found
in this and the other loader routines. My pseudocode is not a copy of
the actual code for LdrpLoadDll and should not be compiled; it
represents intelligent guesswork obtained by studying a disassembly of
the routine. It also shows how much information can be brought together
about any section of code just by taking the time to study the available
binaries. The pseudocode by no means covers every detail.
In
many cases, the variable names were assigned based on their role in the
code. Others were taken from parameter descriptions found in Windows NT/2000 Native API Reference
by Gary Nebbett (New Riders Publishing, 2000). Take a look at this book
to see what takes place inside NTDLL.DLL. It provides a reference for
the Nt/Zw routines—the so-called native APIs—and where possible relates
these routines back to their Win32® counterparts.
Now
I'll describe in some detail what happens when your code loads the
typical DLL, then I'll throw in a few options for variety. LdrpLoadDll
first sets up a __try/__except block, then checks the flag,
LdrpInLdrInit. If the flag is turned on, then it sets up a critical
section block to prevent updates in the data structures that it will be
referencing and modifying. Next, it checks on the length of the incoming
DLL's name against the maximum, 532 bytes or 266-wide characters, which
is close to the better-known constant MAX_PATH and its value of 260. If
the length of the name exceeds the maximum value, then the routine
quits with a return code of STATUS_NAME_TOO_LONG.
At
this point, LdrpLoadDll attempts to locate the position of the period
character (dot) in the module's name. If the dot is missing, then the
routine appends the string ".dll" to the end of the module's name,
assuming this does not exceed the maximum size. There are two items to
note here: first, you do not need to pass the .dll extension in your
calls to LoadLibrary, and second, some of the features, such as the
API-forwarding discussed later in this article, will not work with any
extension other than .dll. (For a preview, try the following: change the
output name of the Forwarded project to "Forwarded.cpl" and run the
test program, making certain that the GetProcAddress stuff is included.
You'll see that the GetProcAddress call fails and the messagebox that
should display "Hello World!" will not appear!)
The
test for ShowSnaps and enabling this handy debugging aide on your
system were explained in Matt's column. ShowSnaps occupies the second
bit position in the Registry value GlobalFlag, located at
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager.
When enabled, this provides feedback in a debugger's output window about
actions the loader is taking. If you decide not to enable "snaps" on
your system, then the text strings appearing throughout the code will
provide useful diagnostic hints. (In the download is an ASCII text file,
LdrSnap(Forwarder).TXT, that includes the snap outputs produced by
loading the Forwarder test DLL. Later in the article you will find
figures showing this information for Forwarded.DLL and TestDll.DLL.)
LdrpLoadDll then constructs a Unicode string of the module's name that
will be used by the first subroutine, LdrpCheckForLoadedDll.
LdrpCheckForLoadedDll
The
pseudocode for this routine can be found in LdrpCheckForLoadedDll.cpp.
You'll see that the code checks all of the possible lists for loaded
DLLs and if the initial search fails, the routine tries the alternative.
Following all of the paths in this routine is rather complicated, but
the essence of this routine is rather simple to understand. There are
two places to start a search: an optimization based on a hash table
found inside of NTDLL.DLL, and walking the module list maintained inside
the process's environment block (PEB). The definition of the module
list and its location in memory will follow shortly. LdrpLoadDll, the
calling routine, starts the search by setting the UseLdrpHashTable
parameter to 0. Later on, I'll show you a similar case in which the
parameter is set to 1 and the hash table is used. If the DllName
parameter does not contain a path, the comparison using
RtlEqualUnicodeString is made on the simple file name entries.
There
is a small wrinkle in the search option when the hash table is not
used, however. If the incoming DllName contains a path,
LdrpCheckForLoadedDll calls on RtlDosSearchPath_U to validate the path,
then examines the entries in the module list that contain fully
qualified file names. In case you were wondering how the Microsoft® .NET
Common Language Runtime (CLR) supports side-by-side execution of
different versions of the same assembly, this information should give
you a head start. If the module is found in the PEB list but contains a
load count of 0, LdrpCheckForLoadedDll forces the load by issuing a
sequence of commands similar to what will be seen in LdrpMapDll. But
describing this sequence now would be jumping too far ahead.
If
the search has succeeded and the loading DLL is already part of the
current process, then LdrpLoadDll has a few remaining details to take
care of. The loaded count for this image will be incremented if the
module is a DLL and if the loaded count field has not been set to -1
(which seems to mean "do not update the load count.") The module flag
will also be cleared of its load-pending bit, which was set by the
routine LdrpUpdateLoadCount. LdrpLoadDll will then leave the critical
section block and set the ImageBase parameter to the address where the
DLL was loaded.
More
than likely, though, this is not the quick exit you are looking
for—unless you are wasting machine cycles by using LoadLibrary to return
an HINSTANCE of an already loaded module. Otherwise, GetModuleHandle is
a better alternative. Instead, let's see what happens when the search
fails.
LdrpMapDll
Now
that LdrpLoadDll knows that the requested DLL has not already been
loaded into the process, it calls upon its second helper routine,
LdrpMapDll, to perform the duties of finding the DLL's housing (the
actual file), loading the DLL into memory (but not initializing it), and
creating and adding a structure that I have tagged, MODULEITEM, to the
PEB's module list. First, orient yourself in Figure 3
and become familiar with the support routines you can see in
LdrpMapDll, (LdrpCheckForKnownDll, LdrpResolveDllName,
LdrpCreateDllSection, LdrpAllocateDataTableEntry,
LdrpFetchAddressOfEntryPoint, and LdrpInsertMemoryTableEntry). In the
file LdrpMapDll.cpp, you will find the pseudocode that shows the fine
points of what actually takes place here.
LdrpCheckForKnownDll
checks to see if the loading DLL can be located in the directory
specified in the Unicode string, LdrpKnownDllPath, located at
0x77FCE008. Examining this memory location with a debugger reveals that
this variable contains the value "C:\WINNT\SYSTEM32" (or something
similar, depending upon how your system is configured).
If you start up WinObj (found at http://www.sysinternals.com) or my utility, NtObjects (found at http://www.smidgeonsoft.com),
and select View | Executive Objects, you will find an object directory
called KnownDlls. Under this directory, you will see the item
KnownDllPath (a symbolic-link object) that contains the value for your
system. LdrpCheckForKnownDll allocates two Unicode strings to hold the
fully qualified DLL file name and the file name by itself. By calling
NtOpenSection with the fully qualified file name, a FileExists operation
is performed and LdrpCheckForKnownDll returns a section handle if the
DLL is known. Otherwise, the two Unicode strings are freed and
LdrpMapDll must call on the services of LdrpResolveDllName and
LdrpCreateDllSection.
LdrpResolveDllName
returns two Unicode strings, the loading DLL's file name and its fully
qualified file name. If a search path was not provided, the routine uses
the path found at the hardcoded address 0X77FCE30C in NTDLL.DLL, which
points to a default search path. RtlDosSearchPath_U, though, performs
the real work in this routine. If RtlDosSearchPath_U can find the
loading DLL, it returns the length of the path where it resolved the
DLL's name.
If
you try to load a DLL that cannot be found in the search path, the
search returns 0. LdrpMapDll then responds to this result and returns
with the status code STATUS_DLL_NOT_FOUND, which leads to a quick exit
from LdrpLoadDll. If you change the test program so that it tries to
load a DLL with the name "bogus," you can check this for yourself.
Assuming
all is well and the loading DLL has been found, LdrpCreateDllSection
must take over and create the all important section handle. Since
sections are classified as kernel objects, the path needs to be
converted into something that the Executive Object Manager can
understand. This is where the routine RtlDosPathNameToNtPathName comes
into play. It takes a fully qualified file name
"C:\Projects\LoadLibrary\Debug\TestDll.dll" and returns something like
"\??\C:\Projects\LoadLibrary\Debug\TestDll.dll." If the file name cannot
be interpreted, then the routine returns STATUS_OBJECT_PATH_SYNTAX_BAD
and ends execution.
LdrpCreateDllSection
is just a thin wrapper around the so-called native API,
NtCreateSection. First, a file handle is obtained with NtOpenFile. If
that call fails, then a return code of STATUS_INVALID_IMAGE_FORMAT is
generated. Otherwise, the Object Manager creates the section handle and
the file handle is closed via NtClose.
Now
that LdrpMapDll has the section handle, it can actually load the DLL
into the process's address. The DLL is brought in as a memory-mapped
file through the services of NtMapViewOfSection. First, the defaults are
set for the base address and the size of the mapping object—with the
latter, a value of 0 indicates to the system that the entire section
object will be mapped. Then, a field in the PEB reserved for subsystem
calls is loaded with the value of the fully qualified file name.
Those
of you who have written debug loops and have handled the debug event
LOAD_DLL_DEBUG_EVENT may be interested to know that the lpImageName
field of the LOAD_DLL_DEBUG_INFO structure is filled with the contents
of this field, which is reserved for subsystem calls in the process
that's being debugged.
Now,
NtMapViewOfSection is called. This triggers the notification of the
loading DLLs as seen in debuggers like the one in Visual C++®.
LdrpMapDll restores the previous value of the PEB's subsystem data field
and checks on the success of the operation. If the DLL has been
successfully mapped, LdrpMapDll now possesses an actual memory address
with which it can work. How NtMapViewOfSection returns the image base
when no hint was passed to it is a question that could be resolved with
further exploration.
PEB Load List
After
a sanity check on the image now mapped into memory, LdrpMapDll
continues with some bookkeeping. Earlier I mentioned a data structure I
named MODULEITEM that is created and added to the process's PEB. In the
code for LdrpLoadDll.h you will find a reconstruction of this structure.
The job of LdrpAllocateDataTableEntry is to allocate this object and
initialize the ImageBase field using three values: the HMODULE handle
returned by NtMapViewOfSection, the ImageSize field using the
SizeOfImage value found in the portable executable (PE) file's optional
header, and the TimeDateStamp from the equivalent field in the PE file
header. If the memory allocation fails, the routine returns a NULL
pointer. LdrpMapDll then removes the file mapping, closes the section
handle, and fails, returning STATUS_NO_MEMORY.
Assuming
that all is well, the LoadCount field in the newly built ModuleItem is
zeroed out and ModuleFlags gets initialized. (LdrpLoadDll.h provides the
possible values for this field.) The two Unicode string fields
containing the full path to the DLL and the file name are filled in.
Then a call placed to the small helper routine,
LdrpFetchAddressOfEntryPoint, inserts the structure's EntryPoint field.
With
the ModuleItem partially initialized, LdrpMapDll calls
LdrpInsertMemoryTableEntry to insert the entry into the process's module
list located in the PEB. At offset 0x0C in the PEB, which can usually
be found at the address of 0x7FFDF000, you will find the pointer to a
structure I have named MODULELISTHEADER. Windows NT and Windows 2000
maintain three doubly linked lists that describe the load, memory, and
initialization order for the modules in a process.
If
you have ever used the PSAPI routines or the newly available ToolHelp32
functions in Windows 2000 to enumerate the modules loaded in a
process's address space, you should be having an epiphany. You have an
alternate method to gather this information, in three different flavors,
by using ReadProcessMemory and these undocumented structures. These
structures have remained stable from Windows NT 4.0 through Windows
2000, and through the betas for Windows XP that were available when I
did the research for this article. LdrpInsertMemoryTableEntry adds the
ModuleItem structure to the end of the load and memory chains and fixes
up the links appropriately. In my experience, I have found that the load
and memory chains possess the same order; only the initialization list
varies with its own order of the modules. See Matt Pietrek's column for
more information on the initialization chain.
Returning
to LdrpMapDll, you can now see the ModuleFlags field receiving some
additional attention. If the executable is not
IMAGE_FILE_LARGE_ADDRESS_AWARE and it is not of type IMAGE_DLL, the
EntryPoint field gets zeroed out. LdrpMapDll tests the return value from
NtMapViewOfSection to determine if the image was loaded at its
preferred image base. Unfortunately, scenarios in which your DLL is
parked in a different location is beyond the scope of this discussion,
but now you know the way, so you can investigate this phenomenon on your
own.
Finally,
after a validation of the image on multiprocessor systems, LdrpMapDll
closes the section handle obtained from LdrpCreateDllSection and returns
with the results of its work. The validation only proceeds if your DLL
contains data in the directory IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG and the
LockPrefixTable field in this directory is nonzero. (NTDLL and Kernel32
contain the IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG directory.) You can use
PEBrowse to examine this directory, and those of you fortunate enough to
have a multiprocessor machine can continue your own exploration down
this path.
LdrpWalkImportDescriptor
You
may think that the route through LdrpLoadDll has been relatively
straight and uncomplicated so far. But there still may be a maze of
passages ahead. Your attempt to load LdrpLoadDll may have generated the
need for additional modules, and this is where LdrpWalkImportDescriptor
comes in. (In order to better understand my pseudocode, it will help to
have some knowledge of the PE header definitions found in WinNt.h,
especially those relevant to imports and exports. You can take a look at
"Inside Windows: An In-Depth Look into the Win32 Portable Executable File Format" by Matt Pietrek in the February 2002 issue of MSDN Magazine.
Your
typical module load takes you through the twists and turns of
LdrpWalkImportDescriptor. However, if you specify
DONT_RESOLVE_DLL_REFERENCES in your call to LoadLibraryExW, then you
will avoid the upcoming maze. You should read the SDK documentation
carefully to make certain that this is what you want. There is also a
mechanism, which I'll explain later, to help you avoid the loops and
recursion that are part of LdrpSnapIAT and LdrpSnapThunk. But if you
don't take either of these alternatives, then you need to know what
happens with LdrpWalkImportDescriptor.
Orient yourself once again by reexamining Figure 3,
locating LdrpWalkImportDescriptor. LdrpWalkImportDescriptor has two
subroutines: LdrpLoadImportModule and LdrpSnapIAT. This does not seem so
bad, but one tip-off that this code will soon become interesting is
that there are four nesting levels in the routines for LdrpSnapIAT. The
number and depth of nested functions is one metric that indicates the
complexity of code. You should take note that recursion is possible in
not one, but two locations in LdrpSnapIAT. You may recall that in the
section on APIs exported by NTDLL.DLL I mentioned the apparent
simplicity of the call to LdrpLoadDll and the fifth parameter that took a
0 or a 1. LdrpSnapIAT can also be recursive inside
LdrpGetProcedureAddress. Finally, to make things even more complex than
they already were, it's possible that a typical DLL may import other
modules that start a cascade of additional library loads. The loader
will need to loop through each module, checking to see if it needs to be
loaded and then checking its dependencies.
With
that in mind, let's take a look at the pseudocode found in
LdrpWalkImportDescriptor.cpp. (If you are following along with the
debugger, change the test program to load the Forwarded.DLL module and
restart the debugger.) Execution starts with two calls to
RtlImageDirectoryEntryToData to locate the Bound Imports Descriptor and
the regular Import Descriptor tables. For the moment, ignore the call
for that bound import thing except to notice that the code checks for
its presence first. (I'll discuss binding later.) In Forwarded.DLL,
LdrpWalkImportDescriptor detects two imported modules, User32.DLL and
Kernel32.DLL, and now calls upon LdrpLoadImportModule for assistance.
LdrpLoadImportModule
constructs a Unicode string for each DLL found in the import table and
then employs LdrpCheckForLoadedDll, using the hash table in NTDLL that
was mentioned earlier to see if they have already been loaded. Note that
the call here is made with only the file name (no fully qualified path)
and the process's search path. If you have ever had your application
complain that it cannot find a DLL, you should realize that it may not
be the loading module's fault. Check to see that all of its dependent
modules can be found. If a module is found and already loaded, life is
good because there is one less module to worry about. If it has not been
loaded, then you've found an instance of recursion. A call to
LdrpMapDll to bring the DLL into the process' address space is followed
by a call to LdrpWalkImportDescriptor. Now you're in the middle of the
twisting mazes I mentioned.
The IAT and Forwarded API Processing
Once
LdrpWalkImportDescriptor knows that the module is in memory (either
through a call to LoadLibraryExW way back at the beginning or via the
LdrpMapDll call in LdrpLoadImportModule), the next step is to examine
each and every API referenced in Forwarded's imported module list with
the call to LdrpSnapIAT. Why is this necessary? There are two reasons.
First, you want to replace the placeholders in the Import Address Table
(IAT) with real entry points. Second, you need to locate and process
APIs that may have been forwarded on to another DLL. (Forwarding is one
technique Microsoft employs to expose a common Win32 API set and to hide
the low-level differences between the Windows NT and Windows 9x platforms.)
You
may have observed either in the debug output strings or by carefully
stepping through the loader code that at some point during Kernel32
processing, a check on NTDLL.DLL was made. Since just about any DLL with
some executable code contains references to Kernel32, you may wonder
why this is happening at all. You already know that NTDLL.DLL is loaded
into every process. The answer is that Kernel32 contains APIs that are
"forwarded" to NTDLL. Refer to Figure 4 for a complete list of these APIs for Kernel32 (version 5.0.2195.1600).
You
may also notice that the list contains functions that are common
requirements for just about any kind of serious programming. Remember,
LdrpLoadDll needs to load every module referenced by the loading DLL,
including those "hidden" references contained in forwarded APIs. Thus, a
reasonable conclusion from all of this is that an import table walk
will take place every time you load a DLL—at least for Kernel32's
forwarded APIs and for any additional forwarded APIs you may have
decided to include in your application!
To
allow you to experiment with this concept, I have included
Forwarder.DLL and Forwarded.DLL in the download, as I've mentioned
previously. The code for Forwarder.DLL is extremely simple—a DllMain
with an export pragma. If you run DumpBin or PEBrowse on this module,
you will see that it exports only one routine, and that the routine is
marked with a designation that it is forwarded on to
Forwarded.ShowMessage.
Now,
turn your attention to Forwarded.DLL and take a look at what it
exports. You will see a typical DLL with a DllMain and the single API,
ShowMessage. However, if you changed the test program to load Forwarder
and then observed what happened in the debugger, you might have wondered
why no reference to Forwarded appeared. You will encounter this
confusion unless you also included the GetProcAddress statement in your
compile. Make certain that you include the GetProcAddress line and you
will now see that both DLLs are loaded. You're observing another form of
delay-load occurring and, perhaps, another example of the optimization
that has been done on the loading engine in Windows.
Returning
to LdrpWalkImportDescriptor, you'll find that it tests several items
before calling LdrpSnapIAT. (For the purposes of following the next few
steps, change the sample back to use Forwarded.DLL.) If you look up
Section 6.4.1 of the Portable Executable Specification (found on the
MSDN Library CD under Specifications), you will read that the time/date
stamp field of an import directory will contain a value of 0 unless the
DLL has been bound. One of the fields tested is the time/date stamp
field, which is 0 in Forwarded.DLL, so let's step into LdrpSnapIAT.
LdrpSnapIAT
wraps its execution around a __try/__except block first before locating
the IAT in Forwarded.DLL, and then hunts for the export directory in
the module that Forwarded is attempting to load (assume it is Kernel32
for now). It then changes the memory protection on the IAT of
Forwarded.DLL to PAGE_READWRITE and proceeds to examine each entry in
the IAT. (If you are able to examine the protection for this chunk of
memory, you will see that it is normally PAGE_READONLY for your
executables.) Going a bit further, you'll encounter LdrpSnapThunk.
LdrpSnapThunk
requires an ordinal to locate an entry point and to determine whether
or not the API is forwarded. If the hint value in Forwarded.DLL's import
directory is correct, you can use that (generally, I have found this
not to be the correct value). Otherwise, LdrpSnapThunk calls on the
services of the helper routine, LdrpNameToOrdinal, to look up the
correct value. Observe that LdrpNameToOrdinal uses a binary search on
the export table to quickly locate the ordinal—more optimization in the
loader—and note that the table must be sorted in alphabetical order for
the search to work.
Now
that you have an ordinal, you can look up the entry point for the API
in Kernel32. LdrpSnapThunk first plugs the loading module's IAT entry
with an address derived from the export table for Kernel32; see Section
6.4.4 of the PE specifications (which can be found on the October 2001
MSDN CD under Specifications | Microsoft Portable Executable and Common
Object File Format) for more information. (This explains why the page
protection was changed in LdrpSnapIAT.)
Section 6.3.2 of the PE specifications says:
If the address (the entry-point) is not within the export
section (as defined by the address and length indicated in the Optional
Header), the field is an Export RVA: an actual address in code or data.
Otherwise, the field is a Forwarder RVA, which names a symbol in another
DLL.
So
now you can finally decide whether or not the API has been forwarded.
In the vast majority of cases, the API is not forwarded, but let's
assume you are looking at Forwarded's reference to HeapAlloc. (Check the
math first. Kernel32's image base (0x77e80000) + HeapAlloc's entry
point (0x0005b658) is 0x77edb658, which is inside the range for the
export table, 0x77ed5c20 to 0x77edb770.)
LdrpSnapThunk
now proceeds to break apart the forwarded reference for HeapAlloc,
which will have the format NTDLL.RtlAllocateHeap, and then calls
LdrpLoadDll to obtain NTDLL's image base—hmm, this looks like you're
back at the beginning. But note that the fifth parameter is passed with a
value of 0. Also note that the DLL name that was constructed before
making the call to LdrpLoadDll lacks the .DLL extension. Fortunately,
the call to LdrpLoadDll will succeed when LdrpCheckForLoadedDll figures
out that NTDLL.DLL is already loaded.
But
do you remember that experiment where I changed the extension for
Forwarded to .cpl? Try this again and you will see that LdrpLoadDll now
fails on the LdrpMapDll call with a STATUS_DLL_NOT_FOUND return code.
Now I have an explanation for the earlier results. With the module name
out of the way, LdrpSnapThunk grabs the API name, RtlAllocateHeap, and
forges on to LdrpGetProcedureAddress.
At
this point, I am getting close to the end of this forwarded API
processing, I promise. LdrpGetProcedureAddress is another routine that
wraps its processing around a __try/__except block. The routine
determines what type of API information it has been handed, either an
ordinal if the API name parameter is null, or the name itself. The test
is needed in order to properly set up parameters for an upcoming call to
LdrpSnapThunk. But wait a moment. Didn't LdrpSnapThunk just bring us to
this point? The two parameters are a pointer to the API's entry in the
IAT and an overloaded item that contains either the image base of the
loading DLL or a pointer to an IAT entry. If the flag, LdrpInLdrInit, is
turned on, the process's critical section is entered. And now let's
really dive in deep and step into LdrpCheckforLoadedDllHandle.
Fortunately,
the functionality here is pretty simple to describe and understand. I
need a MODULEITEM before I can continue. LdrpCheckforLoadedDllHandle
first examines a handle cache residing at LdrpLoadedDllHandleCache to
see if the image base there is the same as its input parameter, hDll. If
not, the routine perseveres by walking the LoadOrder list, searching
for the MODULEITEM whose image base matches hDll and whose linkage in
the memory order list has been established. Once it finds an entry
matching these criteria, it updates the cache with it and hands back to
LdrpGetProcedureAddress the MODULEITEM that it found, or returns a 0 to
indicate failure.
With
a MODULEITEM in hand, LdrpGetProcedureAddress now calls an old friend,
RtlImageDirectoryEntryToData, to locate the item's export directory and
starts that twisty, recursive call to LdrpSnapThunk. What is going on
here? Why is this call necessary? The answer, in part, is that this new
API itself may be forwarded! I am not aware of such a situation in
Windows 2000, but the possibility certainly exists. Happily, HeapAlloc's
processing ends with RtlAllocateHeap inside of NTDLL, and LdrpSnapThunk
returns an IAT entry with the entry point to this API.
LdrpGetProcedureAddress frees up any work areas it might have created,
exits the critical section (if it was acquired), and returns. Whew!
Next,
LdrpSnapThunk checks the return code and returns
STATUS_ENTRYPOINT_NOT_FOUND if the API was not found. Otherwise, it
replaces the entry in the IAT with the API's entry point and continues
on. Study Section 6.4.4 in the PE specifications and especially the
references to binding for a more complete picture of what is happening.
Now
let's return to LdrpSnapIAT and move on to the next imported API in
Kernel32 (or break from the loop if the LdrpSnapThunk call failed). Once
all of the entries are processed in Kernel32's import table,
LdrpSnapIAT restores the memory protection it changed at the beginning
of its work, calls NtFlushInstructionCache to force a cache refresh on
the memory block containing the IAT, and returns back to
LdrpWalkImportDescriptor. The cache refresh might be a little
surprising, but in many executables the IAT can be found in the .text
section where code is found. If LdrpWalkImportDescriptor does not flush
the memory block containing the updated IAT, then all of the previous
work will have been for naught because the processor may continue to use
the old version of the memory block. (For more information read the SDK
documentation for the Kernel32 API FlushInstructionCache, which is just
a thin wrapper around NtFlushInstructionCache.)
If
you want to see the results of RunTime Binding via LdrpSnapIAT and
LdrpSnapThunk on the IAT for Forwarded.DLL (SP1), take a look at Figure 5.
Bound DLL Processing
You
may recall that LdrpWalkImportDescriptor tested for the existence of
two directories or descriptors, the regular Import Descriptor and
something called the Bound Imports Descriptor, and tried to use the
Bound Imports Descriptor first if it was available. Also, LdrpSnapIAT
examined the time/date stamp in the Import Directory Table for a special
value of -1 before moving on to LdrpSnapThunk. There just may be an
alternative to all of this import table munging by pre-binding your DLL.
Now is the time to change my test project to load TestDll and see what
happens in the Windows 2000 loader with a module that has been bound
ahead of time.
If
you have not created the environment variable "MSSdk" as I mentioned in
the readme.txt file, and set it equal to the root directory where your
Platform SDK is located, do so now and rebuild TestDll (a post-link step
should kick in that performs the binding operation). Or, from a command
line you can enter and run the following command:
bind -u testdll.dll
When
you examine the resulting executable, you should see that a new
directory in the optional header array has been filled: the slot
corresponding to IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT. Launching the
result in the debugger, you will also observe that the tests for the
Bound Imports Descriptor will succeed. Try the test without a call to
GetProcAddress on "fnTestDll" and you will see that when
LdrpWalkImportDescriptor issues its call to LdrpLoadImportModule, the
check for an already loaded module (Kernel32.DLL) will succeed and that
means you can avoid the nasty bumps and turns that made the code very
complex earlier in the discussion of LdrpWalkImportDescriptor. My DLL
loads faster because there is no looping through the APIs I imported
from Kernel32 since the fix-ups have already been done (including the
forwarded references). I feel like I've found the Holy Grail.
But
this old cup loses some of its shine when I change the sample to call
fnTestDll. Before continuing, see if you can foretell why you will be
walking that twisty maze again. The reason is that Forwarded.DLL, the
module that contains the real code for my forwarded API, fnTestDll, was
not itself bound. Run the bind utility on Forwarded.DLL and the
brilliance of my newfound treasure returns. The moral of this little
exercise is that in order to gain the full measure of efficiency that
pre-binding a module provides, make certain that all the subordinate
modules have been bound, too.
There
is a slight downside to pre-binding a DLL, though. What happens when
the next version of the operating system appears with a new version of
Kernel32 and new locations for the exported functions? Or consider the
consequences of another module loading and occupying the slot reserved
in memory for that DLL you have bound your executable to. Your bindings,
the hardcoded addresses in the IAT, will be incorrect and considered
stale by the loader. Under these conditions, the binding is effectively
ignored, the LdrpWalkImportDescriptor processing takes place, and you
are no better off than you were before.
On
the other hand, if you can manage to keep your DLLs in sync with the
current versions of the system DLLs and any others you may use, you
should see an improvement in your module loads. As the SDK documentation
states: "You can minimize load time by using Bind to bypass this
lookup." (For more discussion concerning BIND.EXE and other load issues,
see Under the Hood in the May 2000 issue of MSDN Magazine. Also, note that the system DLLs, Kernel32, GDI32, User32, AdvApi32, and so on have been pre-bound.) Figure 6 shows the results of pre-binding using the SDK Bind utility on the IAT for TestDll.DLL (SP1).
LdrpUpdateLoadCount
If
you take stock of what you have seen and learned so far, you will
realize that the first three parts in LdrpLoadDll's processing have been
completed. The last part of LdrpLoadDll to explore involves an update
to module reference counts. That is the job for LdrpUpdateLoadCount.
LdrpUpdateLoadCount
is a dual-purpose routine; it is called when the DLL is both loading
and unloading. It attempts to walk either the Bound Imports table or the
Imports table, and it will recurse on itself for any subordinate
modules. The result is code that will likely be difficult for you to
follow, but LdrpUpdateLoadCount.cpp contains my attempt to write
pseudocode for this procedure. Distilling the essence of the pseudocode
leaves the following: LdrpUpdateLoadCount walks through either the Bound
Imports Descriptor or the Imports Descriptor looking for imported
modules using LdrpCheckForLoadedDll and the NTDLL hash table. If the
module was newly loaded by a LoadLibraryExW call, then
LdrpUpdateLoadCount updates its reference count and walks its tables for
any imports.
You
can easily imagine a tree structure that describes the relationships
between DLLs and their imports, and LdrpUpdateLoadCount must walk the
tree completely to update everyone's reference count correctly. Some
modules enter the process with a reference count of -1 and are skipped
by this update. I leave here a question for future exploration: why do
some DLLs have a reference count of -1 and the others contain an actual
count?
Longtime
readers of Under the Hood may recall a handy utility Matt Pietrek wrote
named NukeDll. Back in the ancient days of 16-bit Windows 3.x,
an application that General Protection Faulted had the nasty habit of
leaving wreckage strewn about in memory in the form of orphaned
Dlls—their reference counts never reached 0 and were not released from
the common address space that was part of Windows. Using NukeDll, you
could nuke these orphans out of existence and free precious resources.
The need for a utility like that has been virtually eliminated with
Windows NT and its successors because of the compartmentalization
imposed upon processes by the operating system. Still, the reference
count is important; your application could be loading and unloading
modules but leaking HINSTANCE's because of a mismatch in the
LoadLibrary/FreeLibrary pairs. But this will only hurt your own buggy
application and you will only chomp through your own resources; other
processes in Windows NT and Windows 2000 will be isolated from this
behavior.
Once
LdrpUpdateLoadCount has finished its work and has returned, LdrpLoadDll
checks the return code from LdrpWalkImportDescriptor. If the code is
STATUS_SUCCESS, processing continues on to DLL initialization (which was
described in Matt Pietrek's September 1999 Under the Hood column) and
is followed by leaving the process's critical section. But if there was a
problem in LdrpWalkImportDescriptor, then LdrpLoadDll must back out all
of the work done up until now, mostly by invoking LdrpUnloadDll. An
image base is sent back to LoadLibraryExW in the form of an HMODULE.
The LOAD_WITH_ALTERED_SEARCH_PATH Option
There
is one scenario that I have not described yet, and that is when a call
to LoadLibraryExW is made with dwFlags equal to
LOAD_WITH_ALTERED_SEARCH_PATH. Now that you have grown somewhat
accustomed to wandering around DLLs, you might want to experiment on
your own with this small wrinkle. Change the Test sample program to
issue this version of a LoadLibraryExW call and pay particular attention
to the first parameter for LdrpLoadDll and note any differences.
You
might also want to create two copies of TestDll, storing one copy in
your \TEMP directory, and observing what happens. With a minimum amount
of effort you will be able to manufacture a situation where two copies
of TestDll are loaded, one from the current working directory and the
second from the temp directory. That would demonstrate how the .NET CLR
support for side-by-side execution of different versions of the same
assembly might be implemented (although this side-by-side capability has
been available since at least Windows NT 4.0).
What You've Learned
Next
time the debugger displays a DLL load notification, you will know with
some degree of confidence the state that the module is in: it has been
mapped into memory, it has not been added to the PEB's housekeeping
area, and, most importantly, it has not been initialized yet.
You
also have learned that the PEB contains not one, but three lists
enumerating loaded modules in load, memory, and initialization order.
(There are also many other fields in the PEB worthy of examination since
they lead to other vital pieces of information on your process.) As
Matt Pietrek has pointed out, the order of the DLLs you see displayed
inside the debugger is not the order in which DLLs are initialized, as
many people mistakenly believe.
Probably
the most important fact to hold onto is that a simple call to
LoadLibrary results in many more things occurring under the covers than
might initially be apparent. The loader must examine each and every API
that DLL imports from other DLLs in order to calculate a real address in
memory and perhaps load additional DLLs and check to see if an API may
have been forwarded on to another procedure housed in another DLL.
A loading DLL may bring in additional modules where the process just described will be repeated over and over again.
The
overhead that all of this processing brings to your application may be
reduced by investigating the use of the SDK utility, BIND.EXE. The
loader still checks the reference to each DLL contained in your program,
but as long as the entries are not stale (in other words, the entries
are still correct), the address calculation and forwarded API processing
will be safely bypassed.
Finally, you have seen that DLLs are reference-counted, just as they were in the ancient Windows 3.x
days. Although this count does not have the same systemwide effect that
it once had, a DLL that you are trying to manage dynamically will still
produce resource leaks if you have not properly matched up FreeLibrary
calls with each LoadLibrary call.
Conclusion
You
should prepare yourself for additional trips into NTDLL.DLL during
future debugging sessions because, like LoadLibraryEx, many Kernel32
APIs lead inevitably to undocumented routines that reside in NTDLL. If
you end your investigation prematurely because of a reluctance to enter
this uncharted territory, you may miss the real cause of your bug or, at
least, a better understanding of your problem. I plan to maintain the
pseudocode at my Web site, http://www.smidgeonsoft.com, so if you find any errors or improvements, please pass them along and I will incorporate them into the code listings.
|