<-- Articles Index / Article: Things They Didn't Tell You About MS LINK and the PE Header
Article: Things They Didn't Tell You About MS LINK and the PE Header
: things they didn't tell you about ms link and the pe header : lw, 7 july 2004 :
The linkers from microsoft store information about used compiler versions
to create the object and library files in the EXE files they produces. This
information is stored right after the DOS stub, and before the start of the
actual PE header.
Appearantly they wanted to hide it, since all this stuff is encrypted using
checksums and other weird ways. I must say that I don't understand much of
the way they built up the structure, it is inefficient and simply weird.
Also I don't see much use of it, unless in some strange lawsuit or something
where the question is: is this .exe file created by this compiler+linker?
Or: are these .lib's used to create this exe file? Still then there is no
good evidence, because only the used compiler versions are stored, compilers
which are used by thousands of other people too. And why does microsoft use
this strange encryption and such?
Well, as you might see, enough questions about the reason why it exists -I can't
tell you much about the use of it- but maybe I can tell you something in this
article about the structure of this stored data though.
* The Rich-Structure
The name "rich" is used because of one field of the structure, which contains
the ASCII values that form "Rich". After the DOS stub the "rich" structure is
stored. This structure is created by the ms linker and consists mainly of compiler
id's which are gathered by the linker from the used .obj and .lib files. These
compiler id's are stored in the files by the ms compiler in the 'comp.id' fields,
and contain the version number of the compiler. Newer linkers from ms also add
their linker id to the exe file.
The "rich" structure is in the following format:
a, b, b, b -- identification block / header?
compid^b, r^b -- from 0
.. -- :
compid^b, r^b -- to n
'Rich', b -- terminator
Where all variables are dwords. b is the checksum i'll describe later, and
a=b^0x536e6144. This value is a hardcoded value and appearantly always used.
compid is the compiler id and b is the number of times it was encountered
over all the lib/obj files (that is an assumption, i'm not 100% sure). And
n is number of stored compid's. compid's are dwords too, the lower word is
the minor version number (0-9999 decimal), the high word is the major
number. i don't know how the high word is encoded, but 13.10 appears is
encoded as 0x60, and 7.10 as 0x5a. and yes, i see that 0x60-0x5a is the same
as 13-7 decimal, but where did the 0x53 (0x60-13decimal) came from? and where
is the 10 from the verison number stored?
The size of the "rich" structure is ((b/32)%3 + n)*8 + 0x20 bytes. the unused
space is padded with zeroes.
b is calculated in these steps:
b=sizeof(dos_stub) // (almost always 0x80)
then the checksum of the dos_stub, with the pointer to the PE zeroed out, is
calculated in the following way:
for(int i=0; i<sizeof(dos_stub); i++)
b += dos_stub[i] ROL i; // ROL is the x86 rotate over left operation.
when the default dos stub of 0x80 bytes is used, b contains now 0x884f3421
next, a checksum over the various compiler id's is calculated in this way:
for(int i=0; i<n; i++)
b += compid[i] ROL r[i];
as stated above, r appears to be the number of times that compid is
encountered in the libs/objs.
The linker doesn't store your the MAC address of your NIC nor your DNA profile,
but better remove it anyway ;). You can write a very simple tool that will
zero out the rich structure given an exe file, or patch your linker so that it
won't get written at all. For my investigation I used the Microsoft Visual C++
Toolkit 2003 with the "same compiler and linker that ship with Visual Studio
.NET 2003 Professional!" which you can download for free from microsoft.com,
google for "VCToolkitSetup.exe". You can locate the interesting parts of code
by searching link.exe for the string 'Rich' or in the function starting at
* Some additional last minute notes:
Disavowed told me on the RCE board that the constant 0x536e6144 is equal to
Dan^ in ASCII, and Silver had an some additional information about "Dan":
Dan Ruder, Mechanics of Dynamic Linking, Microsoft, MSDN Library 1993, as
referenced by patent 6253258 filed by Symantec for "Subclassing system for
computer that operates with portable-executable (PE) modules". Well, that must
be "our" magic Dan, shouldn't he?
Thanks goes out to my good friend malfunction for making me interested in the
Rich stuff, to StarZer0 for a lot of things, and to a lot of other guys who
make the scene a nice place to be.
If you have anything to tell me, don't hesitate to contact me.
:lifewire / ikx - lifewire$mail.ru - ikx.cjb.net: