When the EXE calls one of the redirected functions,
control goes to the CALL instruction in the stub. The
DelayLoadProfileDLL_UpdateCount routine in DelayLoadProfileDLL.CPP
simply increments the value of the count field of the stub. After that
CALL returns, the JMP instruction transfers control to the original
address that was stored in the IAT before I bashed it. Figure 2 shows the big picture after the IAT has been redirected to the stubs.
Assembler
junkies might be wondering how the DelayLoadProfileDLL_UpdateCount
function knows where the stub's count field is in memory. A quick look
at the code shows that DelayLoadProfileDLL_UpdateCount finds the return
address pushed on the stack by the CALL instruction. The return address
points to the JMP XXXXXXXX instruction following the call. Since the
CALL instruction is always five bytes, some pointer arithmetic yields
the stub's starting address and easy access to the stub's count field.
I
had one problem using the DelayLoadProfileDLL_UpdateCount code that's
worth mentioning. Originally, the function didn't have the PUSHAD and
POPAD instructions to save and restore all of the regular CPU registers.
The code worked fine on many programs, but just blew up on others.
Finally, I narrowed it down to programs that imported __CxxFrameHandler
and _EH_prolog from MSVCRT.DLL. Both of these APIs expect the EAX
register to be set to a given value, and DelayLoadProfileDLL_UpdateCount
was trashing EAX.
Since
the trashed EAX was the problem, I added PUSHAD and POPAD. Alas, the
problem remained. In frustration, I examined the compiler-generated
code, and then smacked my forehead. Normally when generating code for a
debug build, the Visual C++ 6.0 compiler inserts code in the function
prolog to set all local variables to the value 0xCC. This code was
trashing EAX before my PUSHAD got a chance to execute. To get around
this, I had to remove the /GZ option from the debug build settings for
DelayLoadProfileDLL.
Reporting Results
As
your process shuts down, the system sends the DLL_
PROCESS_DETACH notification to all loaded DLLs. DelayLoadProfileDLL uses
this opportunity to harvest the information collected during the run.
In a nutshell, this means scanning through all the stub arrays, counting
the number of calls that were made through the stubs, and reporting
what it finds.
During
the setup phase when DelayLoadProfileDLL was redirecting the IATs, it
stashed away the address of the EXE's IAT into a global variable
(g_pFirstImportDesc). At shutdown time, ReportProfileResults uses this
pointer to walk through the imports section again. For each imported
DLL, it retrieves the address of the DLL's first IAT entry. If this is
an IAT that I've redirected, the first pointer in the IAT should point
to the first of the DLPD_IAT_STUB stubs allocated for that DLL. Of
course, the code does some sanity checking to ensure that this is the
case. If something doesn't look right, DelayLoadProfileDLL ignores that
particular imported DLL.
Generally
though, everything looks fine, and the first IAT entry points to my
stubs. The code then iterates through all the stubs for the DLL. At each
stub, the value of the stub's count field is added to a running total
for the DLL. When the iteration completes, ReportProfileResults formats a
string with the name of the DLL and how many calls were made through
the stubs. The code uses OutputDebugString to broadcast its findings.
Loading and Injection
The
program that loads your EXE and injects DelayLoadProfileDLL.DLL is
calledyou guessed itDelayLoadProfile.EXE (the source code is available
from the MSJ Web site at http://www.microsoft.com/msj). This code
mainly drives the CDebugInjector class, which I'll describe shortly.
Function main obtains the target EXE's command line and passes it to
CDebugInjector::LoadProcess. If the process is created successfully,
function main tells CDebugInjector which DLL it wants injected. In this
case, it's DelayLoadProfileDLL.DLL, which should be located in the same
directory as DelayLoadProfile.EXE.
The
last step before letting the target run wild is to call
CDebugInjector::SetOutputDebugStringCallback. When DelayLoadProfileDLL
reports its results via OutputDebugString, CDebugInjector sees them and
passes them to the callback you registered. This callback just printfs
the strings to the console. Finally, function main calls
CDebugInjector::Run. This call lets the target process begin and, when
the time is right, injects the DLL into it.
Figure 3
shows The CDebugInjector class. This is where all the good stuff
happens. CDebugInjector::LoadProcess creates the specified process as a
debugee process. The ramifications of running as a debugee process have
been discussed in many articles and in the MSDN documentation, so I
won't go into all the details here.
For
the purposes of this column, it's sufficient to say that the debugger
process (in this case, DelayLoadProfile) has to enter a loop that calls
WaitForDebugEvent and ContinueDebugEvent until the debugee terminates.
Every time WaitForDebugEvent returns, something has happened in the
debugee. This might be an exception (including breakpoints), a DLL load,
a thread creation, or other event. The WaitForDebugEvent documentation
covers all the events that might occur. The CDebugInjector::Run method
contains the code for this loop.
So
how does running the target process as a debugee help you inject a DLL?
A debugger process has excellent control over the debugee process's
execution. Every time a significant event occurs in the debugee, it is
suspended until the debugger calls ContinueDebugEvent. Knowing this, a
debugger process can add code to the debugee's address space and
temporarily change the debugee's registers so that the added code
executes.
In
more specific terms, CDebugInjector synthesizes a small code stub that
calls LoadLibrary. The DLL name parameter to LoadLibrary points to the
name of the DLL to inject. CDebugInjector writes the stub (and the
associated DLL name) to the debugee's address space. It then calls
SetThreadContext to change the debugee's instruction pointer (EIP) to
execute the LoadLibrary stub. All of this dirty work occurs within the
CDebugInjector::PlaceInjectionStub method.
Immediately
following the LoadLibrary call in the stub is a breakpoint instruction
(INT 3). This stops the debugee and gives control back to the debugger
process. The debugger then uses SetThreadContext again to restore the
instruction pointer and other registers to their original values.
Another call to ContinueDebugEvent and the debugee is on its way with
the DLL injected, none the wiser that anything has happened.
If
you don't think too hard, this injection process doesn't sound too
messy. Nonetheless, a few interesting problems crop up that complicate
things. For example, when is the proper time to create the stub code and
redirect control to it? You can't do this immediately after the
CreateProcess call because, among other reasons, the imported DLLs
haven't been mapped into memory at this point and the EXE's IAT hasn't
been fixed up by the Win32 loader. In other words, it's too early.
The
solution I ultimately decided on was to let the debugee run until it
encounters its first breakpoint. Then I set a breakpoint of my own at
the entry point of the EXE. When this second breakpoint triggers,
CDebugInjector knows that DLLs in the target process (including
KERNEL32.DLL) have initialized, but no code in the EXE has run. This is
the perfect time for injecting DelayLoadProfileDLL.DLL.
Incidentally,
where does the first breakpoint come from? By definition, a Win32
process that's being debugged calls DebugBreak (also known as INT 3)
very early in its execution. In my ancient APISPY32 code, I used the
initial DebugBreak as the occasion to do the injection. Unfortunately in
Windows 2000, this DebugBreak occurs before KERNEL32.DLL is
initialized. Thus, CDebugInjector
sets its own breakpoint to go off when the EXE is about to get control,
and thus knows that KERNEL32.DLL has
been initialized.
Earlier,
I mentioned a breakpoint that occurs after the LoadLibrary call
returns. This is a third breakpoint for CDebugInjector to handle. All of
the mechanics for handling the different breakpoints can be seen in
CDebugInjector::HandleException.
Another
interesting problem to address with DLL injection is where to write the
LoadLibrary stub. Under Windows NT 4.0 and later you can allocate space
in another process with VirtualAllocEx, so I took that route. That
leaves out Windows 9x, which doesn't support VirtualAllocEx. For this
scenario, I took advantage of a unique property of Windows 9x
memory-mapped files. These files are visible in all address spaces, and
at the same address. I simply create a small memory-mapped file using
the system page file as backing, and blast the LoadLibrary stub into it.
The stub is implicitly accessible in the debugee process. For the
details, see the code listing for
CDebugInjector::GetMemoryForLoadLibraryStub at the link at the top of
this article.
Using DelayLoadProfile
DelayLoadProfile
is a command-line program that writes its results to standard output.
From a command prompt, run DelayLoadProfile, specifying the target
program and any arguments it needs, such as:
|