In the December 1998 issue of MSJ,
Jeffrey Richter and I wrote dueling columns on the DelayLoad feature of the Microsoft® Visual C++®
6.0 linker. The fact that both Jeff and I jumped on this topic is
how cool this feature is. Unfortunately, I still find people who don't
know anything about DelayLoad or they think it's some feature that's
available only in the latest version of Windows NT®.
For starters, let me scream from the highest rooftop that DelayLoad is not an operating system feature. It works on any Win32®-based system. With that off my chest, I'll demonstrate this month's utility, DelayLoadProfile, which makes it almost trivial to determine whether your program can benefit from DelayLoad. As I'll show, even some of Microsoft's own programs can benefit from it.
A Quick Review
SHELL32.LIB /DELAYLOAD:SHELL32.DLL DELAYIMP.LIB
Unfortunately, the Visual Studio®
6.0 IDE doesn't have an easy way for you to specify DelayLoading for
DLLs. In Visual Studio 6.0, you'll have to add the /DELAYLOAD:XXX
command-line fragment manually to the Project Settings | Link | Project
Options edit field.
When to Use DelayLoad
DelayLoadProfile: The Big Picture
Into the Trenches
CALL DelayLoadProfileDLL_UpdateCount JMP XXXXXXXX // original IAT address DWORD count DWORD pszNameOrOrdinal
When the EXE calls one of the redirected functions,
control goes to the CALL instruction in the stub. The
DelayLoadProfileDLL_UpdateCount routine in DelayLoadProfileDLL.CPP
simply increments the value of the count field of the stub. After that
CALL returns, the JMP instruction transfers control to the original
address that was stored in the IAT before I bashed it. Figure 2 shows the big picture after the IAT has been redirected to the stubs.
Assembler junkies might be wondering how the DelayLoadProfileDLL_UpdateCount function knows where the stub's count field is in memory. A quick look at the code shows that DelayLoadProfileDLL_UpdateCount finds the return address pushed on the stack by the CALL instruction. The return address points to the JMP XXXXXXXX instruction following the call. Since the CALL instruction is always five bytes, some pointer arithmetic yields the stub's starting address and easy access to the stub's count field.
I had one problem using the DelayLoadProfileDLL_UpdateCount code that's worth mentioning. Originally, the function didn't have the PUSHAD and POPAD instructions to save and restore all of the regular CPU registers. The code worked fine on many programs, but just blew up on others. Finally, I narrowed it down to programs that imported __CxxFrameHandler and _EH_prolog from MSVCRT.DLL. Both of these APIs expect the EAX register to be set to a given value, and DelayLoadProfileDLL_UpdateCount was trashing EAX.
Since the trashed EAX was the problem, I added PUSHAD and POPAD. Alas, the problem remained. In frustration, I examined the compiler-generated code, and then smacked my forehead. Normally when generating code for a debug build, the Visual C++ 6.0 compiler inserts code in the function prolog to set all local variables to the value 0xCC. This code was trashing EAX before my PUSHAD got a chance to execute. To get around this, I had to remove the /GZ option from the debug build settings for DelayLoadProfileDLL.
Loading and Injection
DelayLoadProfile notepad c:\autoexec.bat
|Here are the results of running DelayLoadProfile against CALC.EXE from Windows 2000 Release Candidate 2:|
[d:\column\col66\debug]delayloadprofile calc DelayLoadProfile: SHELL32.dll was called 0 times DelayLoadProfile: MSVCRT.dll was called 9 times DelayLoadProfile: ADVAPI32.dll was called 0 times DelayLoadProfile: GDI32.dll was called 60 times DelayLoadProfile: USER32.dll was called 691 times
I simply started CALC and immediately shut it down. Note
that SHELL32.DLL and ADVAPI32.DLL both had no calls to them. These two
DLLs are prime candidates for CALC to DelayLoad. |
You may be wondering why CALC loads SHELL32.DLL, yet doesn't call it. It would be easy enough to run DumpBin /IMPORTS or Depends.EXE against CALC. In doing so, you'd see that the only function CALC imports from SHELL32.DLL is ShellAboutW. Simply put, unless you select the Help | About Calculator menu item in CALC, it's a complete waste of time and memory to load SHELL32.DLL. This is a fabulous example of where /DELAYLOAD can really show its worth. Incidentally, SHELL32.DLL implicitly links against SHLWAPI.DLL and COMCTL32.DLLtwo additional DLLs that are brought into memory and initialized for no reason.
Just because DelayLoadProfile reports that a DLL is receiving few or no calls at all doesn't mean you should automatically DelayLoad it. Be sure to consider whether one of your implicitly linked DLLs also links against the DLL you're considering using DelayLoad with. If this is the case, it's not worth using /DELAYLOAD in your EXE since the DLL is still going to be loaded and initialized because of some other dependency. Depends.EXE from the Platform SDK is a great tool for quickly determining the scope of a DLL's usage.
Another thing to consider when using DelayLoadProfile is how much of your app you'll exercise during your test. Obviously, if you exercise all aspects of your app, all the DLLs you import in the EXE will be invoked. Personally, I think minimal load time is a good target to shoot for. This might mean just starting your program and then closing it down. By spreading the work of loading and initializing your DLLs throughout your application as it runs, you can speed the initial load sequence. Users often subjectively judge the speed of your application by its startup time.
I've found a few DLLs that will benefit from using /DELAYLOAD. As you saw earlier, SHELL32.DLL is one of them. Another is WINSPOOL.DRV, which is used for printing support. Since most users don't print frequently, it's a good candidate, as are OLE32.DLL and OLEAUT32.DLL. In addition, a variety of programs use COM and OLE in some minimal capacity, making those DLLs possible candidates, too. For example, the Windows 2000 CDPLAYER.EXE links against OLE32.DLL and the CreateStreamOnHGlobal API. Yet in ordinary usage, I didn't observe this function being called.
DelayLoadProfile is not without its faults (literally). While I've tested it successfully with a large number of applications, you may still run into the occasional program that doesn't work so well when DelayLoadProfileDLL interfaces with its IAT. Trying to find and locate all these odd scenarios is beyond the scope of this column. However, if you locate and fix one of these problems, please let me know. I may update DelayLoadProfile at some future date.
I know that programs that import MFC42.DLL and MFC42U.DLL can crash with DelayLoadProfile. For that reason I've provided an escape hatch. In DelayLoadProfileDLL.CPP it's the IsModuleOKToHook function. I've placed MFC42.DLL, MFC42U.DLL, and KERNEL32.DLL in it. (You can't use /DELAYLOAD with KERNEL32.DLL anyhow, so it's no loss.) If a particular DLL seems to be giving you problems, first try adding it to IsModuleOKToHook.
I hope DelayLoadProfile's ease of use will inspire you to tune your applications to make use of /DELAYLOAD. I certainly had a good time updating some classic code, and I'd enjoy hearing your success stories, too.
|Have a suggestion for Under The Hood? Send it to Matt at email@example.com or http://www.wheaty.net.|
From the February 2000 issue of Microsoft Systems Journal.|