Date: | July 17, 2006 / year-entry #236 |
Tags: | history |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20060717-13/?p=30503 |
Comments: | 16 |
Summary: | Last time, we looked at the way functions were exported from 16-bit DLLs. Today, we'll look at how they were imported. When each segment is loaded into memory, the raw contents are loaded from disk, and then relocation fixups are applied. A fixup for an imported function consists of the name of the target DLL,... |
Last time, we looked at the way functions were exported from 16-bit DLLs. Today, we'll look at how they were imported. When each segment is loaded into memory, the raw contents are loaded from disk, and then relocation fixups are applied. A fixup for an imported function consists of the name of the target DLL, the target function (either a name or ordinal), and the position of the first location in the segment where the fixup needs to be applied. All imported addresses are far addresses since they reside in another segment. (If they were in the same segment, then they would be in the same DLL, so you wouldn't be importing it!) On 16-bit Windows, a far address is four bytes (a two-byte selector and a two-byte offset), and since the target address is not known when the DLL is generated, those four bytes are just placeholders, waiting to be filled in with the actual target address when the import is resolved. And it is those placeholder bytes that serve double duty. All the calls within a segment that import the same function are chained in a linked list, where the relocation record points to the first entry. The items in the linked list? The four-byte placeholders. And the "next" pointer in the linked list? The placeholder itself! For example, suppose we have a segment that requires two fixups for the function
To apply the fixup, we first call But what if the call to Okay, that's a quick introduction to how functions are imported and exported on 16-bit Windows. Next time, we'll look at the transition to 32-bit Windows and the design decisions that went into the new model. |
Comments (16)
Comments are closed. |
“Okay, that’s a quick introduction to how functions are imported and exported on 16-bit Windows. Next time, we’ll look at the transition to 32-bit Windows and the design decisions that went into the new model.”
Please explain why you decided to perform relocations for global data references, rather than use a TOC register/local variable (at least for x86; I don’t know about other architectures). I really want to know.
Wow, now that’s an old new thing. I worked on a rewrite of a relocating loader implemented by Bill Lynch in 1962 and that is exactly how he did it and how the relocating-loader format did it from then on. I think his design may have been inspired by a Control Data compiler system, but I can’t be sure now.
My first work on relocating loaders was to adapt ideas from the original Binary Symbolic Subroutine (BSS) loader of the first Fortran II compiler for working with an assembler on the IBM 650. In the assember three of us hacked at the University of Washington in 1959, we didn’t know about fixup chains, so we missed a chance.
In the little Fortran compiler that Don Knuth wrote and for which Lynch’s loader was designed, Don knew all about fixup chains. One-pass assemblers (and compilers) often used them to handle forward references as well as externals.
Now that’s a blast from the past!
Well, I won’t discount that forward references are possible in a single-pass assembler, but I’m not really seeing how you’d do them. Even with a fixup chain, you’d have to fix up the chain eventually; wouldn’t that be considered another pass? Or is the fixing up done at runtime, by the program loader (as in win16 here)?
@BryanK
A one pass assembler makes one pass through the source code. The second pass is just patching the chained forward references in the binary.
I guess back in the old days, reading the source code from disk was slow, and even strcmp and the like were not exactly speedy. But following and patching a linked list was blazingly fast, so you could sell this as ‘one pass assembly’, even though it’s more like a 1.1 pass assembler.
Of course on a modern processor with caches and memory mapped files, you have to wonder whether fixup chains and the like make sense. The problem is that each chain requires a pass through the file – accesses are highly non local, which messes with both caches and memory mapping, since both optimise for sequential reading and writing.
Worst of all, each page you patch will be copied, since the exe file is mapped as ‘copy on write’. And on a server OS like Windows NT was meant to be from the start, that’s bad because it means that they can’t be shared by different instances of the same process. So if you have 10 instances of services.exe, each instance will need to have a private copy of all the patched pages.
I wonder if these sort of architectural changes in processors and the fact that Win32 was supposed run on big iron Risc servers rather than nasty little desktop boxes affected how they decided to do it on Win32 ;-)
Ah, OK, that makes sense then. Thanks! ;-)
Tom: that’s why the ‘rebase’ and ‘bind’ tools exist. We set the base address of the DLL so that it doesn’t clash with any other DLL and therefore doesn’t need fixing up at runtime. We then use ‘bind’ to set the default addresses in the import table to be the correct addresses assuming that the DLL loads at its preferred base address. The loader detects that the DLL is the correct version (using link checksum I believe) and that it loaded at the correct base address and therefore skips the importing step since it knows the addresses are correct. Therefore these pages aren’t touched and no copying is done.
DLLs shipped with Windows are correctly rebased and bound. People complain about IE’s fast start times assuming that Microsoft are doing something internal to the operating system to allow it to load quickly (presumably making a corollary that this is to the detriment of other applications), but this is false: it simply uses the OS facilities as far as possible (therefore mainly touching code that is already ‘hot’), and the DLLs are rebased and bound so the loader doesn’t need to do any fixups. There have been hints that Microsoft have some special tools to break apart functions into mainline/error handling paths and put the error handling paths onto different pages – I’ve experienced this odd jumping around when trying to debug in kernel32 before! – and to then reorganise these blocks so that startup code is all on the same page, reducing the I/O needed to load a process. There’s no reason that Microsoft’s competitors can’t do the same.
Security patches or other hotfixes obviously have to add extra code, so functions will move around a bit, making the binding less effective.
Windows Vista will have a feature called "Address Space Layout Randomization" or ASLR for short. See http://blogs.msdn.com/michael_howard/archive/2006/05/26/608315.aspx for details. I think this will require rebasing at runtime, once, the first time a system DLL is loaded; subsequent instantiations of a DLL will use the same base address so there’ll only be one copy-on-write operation made.
Windows CE completely ignores base addresses so binding is pointless on that platform. The copy-on-write is prevented because the DLLs are always loaded at the same address in every process. This remains the same in Windows CE 6.0 (now in beta) which changes the process address space model from a side-by-side 32MB address space to a completely overlapping 2GB model as the desktop has.
Tuesday, July 18, 2006 7:07 AM by Tom
> A one pass assembler makes one pass through
> the source code. The second pass is just
> patching the chained forward references in
> the binary.
Sure but so what ^_?
> I guess back in the old days, reading the
> source code from disk
Back in the old days, if you had a disk, it wasn’t big enough for source code. If you had a disk then you didn’t have to load the OS in from cards or tape, and probably you didn’t have to load in the assembler from cards or tape, but your source code you sure did.
> was slow, and even strcmp and the like were
> not exactly speedy. But following and
> patching a linked list was blazingly fast,
But the CPU time was drowned out by the elapsed time spent reading the first version of the object code from cards or tape and punching the patched version on cards or tape.
In less old days you could sell a 1.1 pass assembler.
Sorry for two in a row, I just belatedly noticed this:
Tuesday, July 18, 2006 11:38 AM by Mike Dimmick
> Windows CE completely ignores base addresses
> so binding is pointless on that platform.
> The copy-on-write is prevented because the
> DLLs are always loaded at the same address
> in every process.
Therefore the DLL loader in WinCE is completely different from the DLL loader in Win32. Therefore is there some possibility that maybe the first argument to DllMain really is a HANDLE in WinCE, really different from the HINSTANCE that Win32 has? If these OSes really differ in this way then there’s a chance that some of Microsoft’s code and tools aren’t broken the way they seem to be and maybe Microsoft’s explanations were broken instead. This makes a huge difference in figuring out what is correct. Even though the bug (or part thereof) is already resolved as "won’t fix", it would still be informative to figure out exactly what the bug is.
Preserving the spirit while accommodating separate address spaces and new processors.
In fact, you could probably have used UndefDynlink as a poor man’s weak reference. Take the example of EnableScrollBar which IIRC is a 3.1 API not present in 3.0; rather than write something along the lines of
FARPROC esbProc = GetProcAddress(hUser, MAKEINTRESOURCE(482));
if (esbProc) {
/* call it */
}
you could import it from your .DEF file and write
if (EnableScrollBar != UndefDynlink) {
/* call it */
}
Wherein the compiler doesn’t know what’s going on.
I asked the following in response to Mike Dimmick’s comment:
> Therefore is there some possibility that
> maybe the first argument to DllMain really
> is a HANDLE in WinCE, really different from
> the HINSTANCE that Win32 has?
I’ve seen some hints that the answer is no, the first argument has a value that’s really an HINSTANCE even in WinCE, and it just gets forced through a wrongly typed parameter.
For reference.
I found this list of article on Raymond's blog . Raymond's blog is one of the more interesting