How were DLL functions imported in 16-bit Windows?

Date:July 17, 2006 / year-entry #236
Tags:history
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20060717-13/?p=30503
Comments:    16
Summary:Last time, we looked at the way functions were exported from 16-bit DLLs. Today, we'll look at how they were imported. When each segment is loaded into memory, the raw contents are loaded from disk, and then relocation fixups are applied. A fixup for an imported function consists of the name of the target DLL,...

Last time, we looked at the way functions were exported from 16-bit DLLs. Today, we'll look at how they were imported.

When each segment is loaded into memory, the raw contents are loaded from disk, and then relocation fixups are applied. A fixup for an imported function consists of the name of the target DLL, the target function (either a name or ordinal), and the position of the first location in the segment where the fixup needs to be applied. All imported addresses are far addresses since they reside in another segment. (If they were in the same segment, then they would be in the same DLL, so you wouldn't be importing it!) On 16-bit Windows, a far address is four bytes (a two-byte selector and a two-byte offset), and since the target address is not known when the DLL is generated, those four bytes are just placeholders, waiting to be filled in with the actual target address when the import is resolved. And it is those placeholder bytes that serve double duty.

All the calls within a segment that import the same function are chained in a linked list, where the relocation record points to the first entry. The items in the linked list? The four-byte placeholders. And the "next" pointer in the linked list? The placeholder itself! For example, suppose we have a segment that requires two fixups for the function GetPrivateProfileInt, which happens to be kernel function 127. The relocation table entry would say "This segment needs function 127 from KERNEL; start at offset 01D1". The on-disk copy of the segment might go something like this:

...
01D0 9A
01D1 FE
01D2 01
01D3 00
01D0 00
...
01FD 9A
01FE FF
01FF FF
0200 00
0201 00
...

To apply the fixup, we first call GetProcAddress to get the address of kernel function 127. Then we go to the first fixup location (0x01D1), write the address there, then look at the value we overwrote. That value was 0x01FE, so we now go to offset 0x01FE and write the address there, too. The value we overwrote was 0xFFFF, which marks the end of the fixup chain.

But what if the call to GetProcAddress fails? (Say, there is no such function 127 in KERNEL.) Then instead of writing the address of the target function, the loader wrote the address of a function that displayed the "Call to Undefined Dynalink" fatal error dialog.

Okay, that's a quick introduction to how functions are imported and exported on 16-bit Windows. Next time, we'll look at the transition to 32-bit Windows and the design decisions that went into the new model.


Comments (16)
  1. “Okay, that’s a quick introduction to how functions are imported and exported on 16-bit Windows. Next time, we’ll look at the transition to 32-bit Windows and the design decisions that went into the new model.”

    Please explain why you decided to perform relocations for global data references, rather than use a TOC register/local variable (at least for x86; I don’t know about other architectures). I really want to know.

    [I already discussed this. -Raymond]
  2. orcmid says:

    Wow, now that’s an old new thing.  I worked on a rewrite of a relocating loader implemented by Bill Lynch in 1962 and that is exactly how he did it and how the relocating-loader format did it from then on.  I think his design may have been inspired by a Control Data compiler system, but I can’t be sure now.  

    My first work on relocating loaders was to adapt ideas from the original Binary Symbolic Subroutine (BSS) loader of the first Fortran II compiler for working with an assembler on the IBM 650.  In the assember three of us hacked at the University of Washington in 1959, we didn’t know about fixup chains, so we missed a chance.  

    In the little Fortran compiler that Don Knuth wrote and for which Lynch’s loader was designed, Don knew all about fixup chains.  One-pass assemblers (and compilers) often used them to handle forward references as well as externals.

    Now that’s a blast from the past!

  3. Neil says:

    the loader wrote the address of a function

    KERNEL:120 UndefDynlink (yes, you can link to it…)

    >that displayed the "Call to Undefined Dynalink" fatal error dialog.

    More precisely, NTVDM displays that fatal error dialog; Windows 95 displays "Program Error" "Your program is making an invalid dynamic link call to a .DLL file". I forget whether Windows 3.1 had a different message.

  4. Jules says:

    One-pass assemblers (and compilers) often used them to handle forward references as well as externals.

    When you tell that to kids these days, they say "Rubbish.  One-pass assemblers can’t handle forward references at all."

    (You have to write an assembler before you can even consider guessing how many people think that.  A huge proportion of all programmers, probably in excess of 90%, by my estimation)

  5. BryanK says:

    Well, I won’t discount that forward references are possible in a single-pass assembler, but I’m not really seeing how you’d do them.  Even with a fixup chain, you’d have to fix up the chain eventually; wouldn’t that be considered another pass?  Or is the fixing up done at runtime, by the program loader (as in win16 here)?

  6. Tom says:

    @BryanK

    A one pass assembler makes one pass through the source code. The second pass is just patching the chained forward references in the binary.

    I guess back in the old days, reading the source code from disk was slow, and even strcmp and the like were not exactly speedy. But following and patching a linked list was blazingly fast, so you could sell this as ‘one pass assembly’, even though it’s more like a 1.1 pass assembler.

    Of course on a modern processor with caches and memory mapped files, you have to wonder whether fixup chains and the like make sense. The problem is that each chain requires a pass through the file – accesses are highly non local, which messes with both caches and memory mapping, since both optimise for sequential reading and writing.

    Worst of all, each page you patch will be copied, since the exe file is mapped as ‘copy on write’. And on a server OS like Windows NT was meant to be from the start, that’s bad because it means that they can’t be shared by different instances of the same process. So if you have 10 instances of services.exe, each instance will need to have a private copy of all the patched pages.

    I wonder if these sort of architectural changes in processors and the fact that Win32 was supposed run on big iron Risc servers rather than nasty little desktop boxes affected how they decided to do it on Win32 ;-)

  7. BryanK says:

    Ah, OK, that makes sense then.  Thanks!  ;-)

  8. Mike Dimmick says:

    Tom: that’s why the ‘rebase’ and ‘bind’ tools exist. We set the base address of the DLL so that it doesn’t clash with any other DLL and therefore doesn’t need fixing up at runtime. We then use ‘bind’ to set the default addresses in the import table to be the correct addresses assuming that the DLL loads at its preferred base address. The loader detects that the DLL is the correct version (using link checksum I believe) and that it loaded at the correct base address and therefore skips the importing step since it knows the addresses are correct. Therefore these pages aren’t touched and no copying is done.

    DLLs shipped with Windows are correctly rebased and bound. People complain about IE’s fast start times assuming that Microsoft are doing something internal to the operating system to allow it to load quickly (presumably making a corollary that this is to the detriment of other applications), but this is false: it simply uses the OS facilities as far as possible (therefore mainly touching code that is already ‘hot’), and the DLLs are rebased and bound so the loader doesn’t need to do any fixups. There have been hints that Microsoft have some special tools to break apart functions into mainline/error handling paths and put the error handling paths onto different pages – I’ve experienced this odd jumping around when trying to debug in kernel32 before! – and to then reorganise these blocks so that startup code is all on the same page, reducing the I/O needed to load a process. There’s no reason that Microsoft’s competitors can’t do the same.

    Security patches or other hotfixes obviously have to add extra code, so functions will move around a bit, making the binding less effective.

    Windows Vista will have a feature called "Address Space Layout Randomization" or ASLR for short. See http://blogs.msdn.com/michael_howard/archive/2006/05/26/608315.aspx for details. I think this will require rebasing at runtime, once, the first time a system DLL is loaded; subsequent instantiations of a DLL will use the same base address so there’ll only be one copy-on-write operation made.

    Windows CE completely ignores base addresses so binding is pointless on that platform. The copy-on-write is prevented because the DLLs are always loaded at the same address in every process. This remains the same in Windows CE 6.0 (now in beta) which changes the process address space model from a side-by-side 32MB address space to a completely overlapping 2GB model as the desktop has.

  9. Norman Diamond says:

    Tuesday, July 18, 2006 7:07 AM by Tom

    > A one pass assembler makes one pass through

    > the source code. The second pass is just

    > patching the chained forward references in

    > the binary.

    Sure but so what ^_?

    > I guess back in the old days, reading the

    > source code from disk

    Back in the old days, if you had a disk, it wasn’t big enough for source code.  If you had a disk then you didn’t have to load the OS in from cards or tape, and probably you didn’t have to load in the assembler from cards or tape, but your source code you sure did.

    > was slow, and even strcmp and the like were

    > not exactly speedy. But following and

    > patching a linked list was blazingly fast,

    But the CPU time was drowned out by the elapsed time spent reading the first version of the object code from cards or tape and punching the patched version on cards or tape.

    In less old days you could sell a 1.1 pass assembler.

  10. Norman Diamond says:

    Sorry for two in a row, I just belatedly noticed this:

    Tuesday, July 18, 2006 11:38 AM by Mike Dimmick

    > Windows CE completely ignores base addresses

    > so binding is pointless on that platform.

    > The copy-on-write is prevented because the

    > DLLs are always loaded at the same address

    > in every process.

    Therefore the DLL loader in WinCE is completely different from the DLL loader in Win32.  Therefore is there some possibility that maybe the first argument to DllMain really is a HANDLE in WinCE, really different from the HINSTANCE that Win32 has?  If these OSes really differ in this way then there’s a chance that some of Microsoft’s code and tools aren’t broken the way they seem to be and maybe Microsoft’s explanations were broken instead.  This makes a huge difference in figuring out what is correct.  Even though the bug (or part thereof) is already resolved as "won’t fix", it would still be informative to figure out exactly what the bug is.

  11. Preserving the spirit while accommodating separate address spaces and new processors.

  12. Neil says:

    In fact, you could probably have used UndefDynlink as a poor man’s weak reference. Take the example of EnableScrollBar which IIRC is a 3.1 API not present in 3.0; rather than write something along the lines of

    FARPROC esbProc = GetProcAddress(hUser, MAKEINTRESOURCE(482));

    if (esbProc) {

     /* call it */

    }

    you could import it from your .DEF file and write

    if (EnableScrollBar != UndefDynlink) {

     /* call it */

    }

  13. Wherein the compiler doesn’t know what’s going on.

  14. Norman Diamond says:

    I asked the following in response to Mike Dimmick’s comment:

    > Therefore is there some possibility that

    > maybe the first argument to DllMain really

    > is a HANDLE in WinCE, really different from

    > the HINSTANCE that Win32 has?

    I’ve seen some hints that the answer is no, the first argument has a value that’s really an HINSTANCE even in WinCE, and it just gets forced through a wrongly typed parameter.

  15. I found this list of article on Raymond's blog . Raymond's blog is one of the more interesting

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index