The x86 architecture is the weirdo

Date:September 14, 2004 / year-entry #336
Tags:other
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20040914-00/?p=37873
Comments:    67
Summary:The x86 architecture does things that almost no other modern architecture does, but due to its overwhelming popularity, people think that the x86 way is the normal way and that everybody else is weird. Let's get one thing straight: The x86 architecture is the weirdo. The x86 has a small number (8) of general-purpose registers;...

The x86 architecture does things that almost no other modern architecture does, but due to its overwhelming popularity, people think that the x86 way is the normal way and that everybody else is weird.

Let's get one thing straight: The x86 architecture is the weirdo.

The x86 has a small number (8) of general-purpose registers; the other modern processors have far more. (PPC, MIPS, and Alpha each have 32; ia64 has 128.)

The x86 uses the stack to pass function parameters; the others use registers.

The x86 forgives access to unaligned data, silently fixing up the misalignment. The others raise a misalignment exception, which can optionally be emulated by the supervisor at an amazingly huge performance penalty.

The x86 has variable-sized instructions. The others use fixed-sized instructions. (PPC, MIPS, and Alpha each have fixed-sized 32-bit instructions; ia64 has fixed-sized 41-bit instructions. Yes, 41-bit instructions.)

The x86 has a strict memory model, where external memory access matches the order in which memory accesses are issued by the code stream. The others have weak memory models, requiring explicit memory barriers to ensure that issues to the bus are made (and completed) in a specific order.

The x86 supports atomic load-modify-store operations. None of the others do.

The x86 passes function return addresses on the stack. The others use a link register.

Bear this in mind when you write what you think is portable code. Like many things, the culture you grow up with is the one that feels "normal" to you, even if, in the grand scheme of things, it is one of the more bizarre ones out there.


Comments (67)
  1. DrPizza says:

    "The x86 architecture does things that almost no other modern architecture does"

    x86 is not a modern architecture, though. It should be compared to things like VAX and 68K. Through a quirk of history it’s happened to live on, but it’s still not modern. Its implementations might be, of course, but the ISA is prehistoric.

    "The x86 supports atomic load-modify-store operations. None of the others do. "

    What is fetchadd on Itanium?

  2. Adrian says:

    Very enlightening. I wonder what the security implications of some of these differences are. For example, if return addresses are in registers rather than on the stack, then would a buffer overflow bug be much harder to exploit?

  3. DrPizza says:

    "The x86 uses the stack to pass function parameters; the others use registers."

    How would you pass more than 32 (less the number of registers you can’t use; there’s a zero register for instance) arguments to a function?

    x86 spills to stack more often than other architectures (not enough architectural registers), but it’s not mandatory.

  4. Raymond Chen says:

    Rats I missed fetchadd.

    For the details on parameter passing, I refer readers to my earlier series on calling conventions. Typically, excess parameters are spilled onto the stack.

  5. blahblahblah says:

    "How would you pass more than 32…arguments to a function?"

    Maybe one register could hold the starting location of the argument list and another holds the number of arguments…

  6. Ben Cooke says:

    I was about to post about how some older processors I dealt with in my younger days supported the things you say that only x86 does, but DrPizza already did that.

    Of course, the system I spent most of my time worrying about these kinds of things on — the Commodore 64 with a Motorola 6502 chip — was 8-bit anyway, so alignment was never much of a problem! It did (sort of) come up with indirect addressing, because then it needed a 16-bit value which IIRC couldn’t straddle a page boundry since the address would overflow and you’d end up with the value at the start of the current page rather than the next one. Indexed indirect made things even more interesting, because even if you were 16-bit aligned the index offset couldn’t cross a page boundary. (Apologies if I got any of this wrong. It’s been a long time since I’ve had to think about this stuff.)

    These days I tend to stick to high-level languages (with an appropriate amount of concern for issues like alignment) so I don’t know a great deal about more modern CPUs, but it’s still interesting so it’s nice to hear from someone who does.

  7. There are situations on x86 where an explicit memory barrier of some sort is needed.

    http://www.microsoft.com/whdc/driver/kernel/MPmem-barrier.mspx

  8. Ben Hutchings says:

    DrPizza: Calling conventions generally allow only 4-6 arguments to be passed in registers; beyond that, they’re placed on the stack.

    Adrian: The return address may still have to be stored on the stack in a non-leaf function, since there is only one link register. However the link register and the abundance of registers which the return address can be moved to reduce the number of target functions somewhat.

  9. Ben: The 6502 was produced by MOS Technology/Commodore Semiconductor Group, not by Motorola. MOS was the company that made mask fixing commonplace.

    I’ve written on both MOS and CBM on Everything2. Perhaps not the most accurate articles I’ve written, and not on the best site for info, but dead-tree information sources pretty much back me up.

  10. James Curran says:

    Though, many of the odd behaviors of the x86 are improvements over the even odder behaviors of even older machine which these "modern" chips are going back to.

    IBM 360s passed parameters via registers and used a return link register because they just didn’t have a stack. (Return link registers make nested calls tough, and recursive calls very tough)

  11. Ben: are you sure that wasn’t the 6509 or the 6510? :)

  12. Cooney says:

    Though, many of the odd behaviors of the x86 are improvements over the even odder behaviors of even older machine which these "modern" chips are going back to.

    Also remember that the x86, with its myriad addressing modes, reflected a different philosophy to the other risc architectures. I seem to recall something about the relative speed of memory favoring CISC more heavily back when the x86 was first designed.

  13. Ben Hutchings says:

    Cooney: And now x86 assembly language turns out to be a reasonably good compact byte-code for a hardware JITter feeding a RISC-like core.

  14. Ben Hutchings says:

    (Uh, machine code, not assembly language.)

  15. DrPizza says:

    "DrPizza: Calling conventions generally allow only 4-6 arguments to be passed in registers; beyond that, they’re placed on the stack. "

    Which makes them much like x86, then.

    Spilling to the stack is used by pretty much all architectures. And presumably the return value must be stored somewhere other than a register, because you want to make calls more than one function deep.

  16. dce says:

    I don’t know how many are currently used, but back in the early days, MIPS machines passed the first 3 function arguments in registers, and all the rest were passed on the stack. I remember having to modify a number of Unix commands and libraries to handle varargs properly (well, "properly" isn’t really correct, since what the early MIPS compilers did was not ANSI standard).

  17. DrPizza says:

    That’s the problem. In pre-prototype (K&R) C, all functions must be assumed to be varargs functions, and that’s hard if you don’t pass arguments in the stack. Newer architectures have the benefit of prototypes, so can default to calling conventions that pass on the stack only when they have to (through spillage) or when using varargs.

  18. David the Formerly Irked says:

    DrPizza: It’s been forever and a day since I’ve done any asm, but that doesn’t seem like a problem to me. Since only one function is returning at any one time, all functions can store their return values in the same register without trouble.

    I seem to recall that one common calling convention for real-mode x86 involves storing the return value in AX.

  19. lowercase josh says:

    How do you do locks without atomic load-modify-store?

  20. Ben Hutchings says:

    josh: There are two alternatives I know of.

    The most common is the CAS (Compare And Set) instruction. Its parameters are a memory address, an expected value and a new value. It atomically loads from the memory address and then stores the new value if the current value matches the expected value. It sets a condition flag indicating what happened.

    The other is LL/SC (Load Linked, Store Conditional). These are two separate instructions. LL atomically loads from a memory address and remembers the address. SC stores a new value to that address so long as it is known not to have been modified since the LL. (False positives are possible; the granularity of modification tracking is unspecified but is likely to match the cache line size.)

    You can implement a generalised load-modify-store by looping through atomic-load, modify, CAS or LL, modify, SC until the CAS or SC succeeds.

  21. Ben Cooke says:

    Clearly everyone else has better memories than me. Writing Motorola was a brain fart since I was thinking at the same time about the 68000 series that DrPizza mentioned. I have here somewhere data sheets from Commodore Semiconductor Group about the CPU in the Commodore 64 although they’re buried somewhere in my big heap of old stuff in my attic.

    Simon Cooke (too many similar names around here – I overlap with two people!) mentioned two other model numbers that I remember from somewhere, so now I’m left trying to remember what I remember each one from.

    I know I’ve dealt with a 6502 and a 6510 at some point, but I can’t remember which numbers go with which things. (or maybe I’m confusing myself with non-CPU chip numbers from the same period)

  22. Wasn’t 6510 used in the C64? And isn’t 6502 just an older version of it?

  23. James Summerlin says:

    Raymond,

    I would like to state that I think we are witnessing one of those moments where it is good to be the weirdo.

    James

  24. Isaac Lin says:

    6502 was used on VIC-20 (and Apple II, and

    a number of others, as I recall). 6510 was

    used on Commodore 64.

  25. Don’t forget SEH. On the x86, because of the unreliable unwinding, SEH requires registrations (small objects allocated on the stack and chained in a single-link list) to associate frame handlers. On all other architectures, there’s a single calling convention, so unwinding is always reliable and the system can get the frame handler simply by matching the program counter against a table

  26. Keith Moore [exmsft] says:

    Years ago I read a (DDJ? Byte?) magazine article in which the author referred to the x86 architecture as a "code museum" for its layers of "architectural history".

    I always liked this term.

  27. Mike Dunn says:

    Don’t forget the 8502 in the C=128, with its fast mode. "Fast" being a blazing 2 MHz. ;)`

  28. asdf says:

    Don’t forget the FPU stack *shudder*.

  29. mpz says:

    On the other hand, with x86-64 x86 is getting rid of some of the worst stuff. For example the low number of general purpose registers. Also, I remember reading that at least in Windows, usage of x87/MMX/3DNow! has been obsoleted in favor of SSE/SSE2 (only SSE registers are saved in context switches or something?) so assuming x86-64 programs become popular, x86 chip makers can decrease the amount of resources for obsolete specialty instructions in the future.

  30. DrPizza says:

    "DrPizza: It’s been forever and a day since I’ve done any asm, but that doesn’t seem like a problem to me. Since only one function is returning at any one time, all functions can store their return values in the same register without trouble. "

    But what do they put back into the register when they return?

    Say you’re at 0x100 and you call a function foo at 0x200 and you have 4 byte instructions. The return value will be presumably 0x104. Say foo+0x08 then calls a function bar. You duly set the return value register to 0x20c and execute bar. bar returns to 0x20c. Now what gets put into the return value register? How does it know where to return to?

    The caller’s return value must be preserved across function calls, which means it’s got to be put somewhere other than a register, and surely that place is the stack.

  31. Raymond Chen says:

    DrPizza: I think you’re confusing return address and return value.

  32. Steven C. says:

    A nitpick:

    * PPC has variable length instructions (VLE — 2 byte instructions).

    * MIPS is god’s revenge on debuggers, as there are 16- 32- and 64- bit modes (although the latter is just a register issue and still has 32bit instructions). IIRC, functions can be either 16bit or 32bit ABI mode, and both can appear in the same compiled item.

  33. Raymond Chen has a nice entry about the weird x86 architecture. Don’t miss the references to memory barriers in the feedback!…

  34. Norman Diamond says:

    The x86 architecture is the weirdo.

    Yes, but that’s because of 30 years of backwards compatibility to the days when micros were micros. Single-chip or double-chip CPUs didn’t act as orthogonally as multi-board CPUs did, because 2-dimensional layouts didn’t yet allow hardware designers to do everything they wanted to do.

    > The x86 forgives access to unaligned data

    Now that is actually a benefit which other architectures would do well to copy. Architectures which require alignment are often slower. When exceptions do not occur and do not need fixing up, it is because programs or compiler-generated code have called memcpy() to make aligned copies of operands and will call memcpy() again to copy the results to where they’re needed.

    Until the day that internet packets automatically get their contents aligned differently depending on what kind of architecture is going to read them, and disk files get their contents realigned, and the layouts of .BMP and other structures get dynamic realignments, unaligned data will have to be worked with one way or another. Hardware can do it faster.

    By the way, mainframes also provided a limited amount of atomic operations in hardware before the x86 existed.

  35. Tim Smith says:

    Norman,

    I have run into very few applications where I needed to use memcpy deal with poorly aligned structures. When the code is properly written, alignment is usually not an issue.

  36. Anonymous Coward says:

    Although Raymond’s description of the instruction set modern x86 processors accept is correct, it isn’t actually what the chips have. For example the underlying hardware has quite a few more registers and is far more like a RISC chip. The x86 instructions are converted into the "RISC" instructions. Additionally by analysing register usage and data dependencies it is possible to make use of the extra registers.

    There is an excellent talk by Bob Colwell that I highly recommend watching. It is the 7th item on http://www.stanford.edu/class/ee380/winter-schedule-20032004.html

    (Click on the camera thingy on the right)

  37. Tony Cox [MS] says:

    In response to some of the comments in favor of x86, in particular the observations that automatic alignment fixup is handy and that under the hood the x86 has a multi-register architecture I would make the following observation in reply:

    Sure. But it wastes a ton of silicon making all that happen.

    More modern architectures can have smaller dies for the same computing power, which means lower power consumption and less cooling requirements. If those processors were produced in the same quantities as x86’s they’d be cheaper too.

    There is a reason that most embedded applications don’t use x86.

  38. Anonymous Coward says:

    With every process shrink the amount of chip real estate devoted to the x86 decoder goes down as a proportion of the whole chip. It also turns out that many of the modern RISC chips take a similar approach as well (they also have decades of legacy instructions) although it isn’t quite to the same degree.

    As for power/cooling, it is perfectly possible to make cheaper chips that do x86 instructions and meet those targets. See Via and Transmeta. Intel just happen to optimise for performance and marketing at the expense of power and cooling. (They spent $300m on the initial marketing for the Pentium 4 in 2001.)

    Even looking at the ARM (a favourite in the embedded space), there are other design goals (eg code density with the thumb extensions).

    For anyone who is interested in this low level chip stuff, I highly recommend the comp.arch newsgroup.

  39. Dave Stokes says:

    James Curran: "IBM 360s passed parameters via registers and used a return link register because they just didn’t have a stack. (Return link registers make nested calls tough, and recursive calls very tough)"

    "Passed"? 360 architecture is still alive and well, although it’s now called zArchitecture (via S/370 and ESA/390). You can still happily run OS/360 Programs on the latest z/OS, that’s compatibility that Windows programmers can only dream of. But it’s true there was no stack, until S/370 and all later generations. The stack however is a bit different to what most programmers expect, since you can’t allocate areas for local variables, parameter lists and so on, only hardware registers and return addresses are saved, and normally instructions do not reference the stack. OTOH it has a few tricks up its sleeve like allowing subroutine linkage over calls which transfer control synchronously to code in other address spaces.

    But we never had problems calling nested or recursive routines anyway since most programs use a simple call convention established by the OS.

  40. Kyle Oppenheim says:

    In regards to Tony’s comment that a lot of silicon is wasted on alignment and register renaming, there is actually an interesting trade-off in x86’s favor. Additional registers can be added to the micro-architecture without changing the ISA. So, newer generation chips (when silicon real estate is cheaper) can reduce register contention w/o the need of recompiling any code. Also, as pipeline depth, branch prediction, and other micro-architectural changes are made between chip generations, recompiles aren’t necessary to avoid register dependency conflicts. The hardware can find more instruction level parallelism that the compiler could (since the compiler was years older).

  41. Steve P says:

    DrPizza: Yes, you are right in that something needs to store the return address *somewhere*. That may be the stack, or you could have register windows.

    Register windows give you access to only a certain number of registers at a time – say, 32 out of 512. When you enter a function you tell it you need ‘n’ registers, and they get shifted to make room.

    Eventually, you may spill out the other end, but the worst case for that is as bad as storing on the stack anyway.

    Now, a disclaimer.

    I don’t know whether any processor actually uses this technique – this is merely a vague recollection from my university days. But it sounds pretty clever. :)

  42. DrPizza says:

    "DrPizza: I think you’re confusing return address and return value. "

    No, I just miswrote "value" when I meant "address".

  43. Mike says:

    KJK::Hyperion: the mixture of calling conventions on x86 isn’t a fundamental property of the architecture, it’s just how Windows does things (because back in the day, it increased performance). Modern Linux uses only one calling convention for instance, and has a table based exception dispatch ABI.

  44. Tom says:

    Quote:

    Until the day that internet packets automatically get their contents aligned differently depending on what kind of architecture is going to read them, and disk files get their contents realigned, and the layouts of .BMP and other structures get dynamic realignments, unaligned data will have to be worked with one way or another. Hardware can do it faster.

    Even on an architecture without aligment fixups, it’s still possible to access unaligned structure members. The compiler might need an __unaligned qualifier or might do it automatically – you just end up with breaking an unaligned access into two aligned accesses and some shifts, which is probably as fast as hardware could do it.

  45. Tom Kirby-Green says:

    For all it’s uglyness x86 survives because as someone once put it: "x86 *owns* the binary".

    On a related note, anyone clocked this?

    http://www.wired.com/news/technology/0,1282,64914,00.html

  46. Ben Hutchings says:

    Steve P: Register windows aren’t just theoretical. SPARC has fixed-size register windows. IA64 has variable-size register windows, as described in http://weblogs.asp.net/oldnewthing/archive/2004/01/13/58199.aspx

  47. DavidK says:

    Greetings,

    Quote:

    Until the day that internet packets automatically get their contents aligned differently depending on what kind of architecture is going to read them, and disk files get their contents realigned, and the layouts of .BMP and other structures get dynamic realignments, unaligned data will have to be worked with one way or another. Hardware can do it faster.



    I think internet datapackets is an orthogonal issue.

    1) As far as I know, alignment issues are only for certain primitive data types, for example, integer. You can read a character on any alignment.

    2) Packing of C structures are handled for you automatically. So if you have a structure with a 3 byte character array followed by an integer, the compiler will pad the structure so that the integer is placed on the proper alignment.

    Therefore binary data that you write out to disk will be properly aligned when you read it back.

    3) As far as I know, unaligned data access on x86 is slower than aligned access. So in fact, if your goal is speed, alignment is a good thing. As far as internet data packets, if you look at many of the low level TCP/IP sockets structures, they are padded with dummy bytes to make alignment work.

    Also, when I do low level TCP/IP communications, as data is received on a socket, it gets copied into the approriate data structures. If it was a on machine that required alignment, since my structures are aligned for me, by the compiler, everything works.

    Regards,

    Dave

  48. DavidK says:

    Data Alignment, one more thing

    I forgot to mention that the C library routine, malloc, also pads out any memory it returns so that you are guaranteed alignment.

    If you request one byte, the chunk you get will be larger than one byte, it will be padded to the next alignment size. That is why lots of small malloc requests are inefficient.

    Take a look at the back of the K&R C book where it discusses malloc. Your requested memory chunk size is multiplied and rounded to ensure proper alignment.

    Regards,

    Dave

  49. Tony Cox [MS] says:

    Also observe that the SSE extensions to x86 also have alignment restrictions and the instructions raise exceptions if used on unaligned data. There is a special unaligned move instruction if you really need it, but it’s slower than the regular aligned move instruction, and you have to explicitly choose to use it.

  50. lowercase josh says:

    Ben Hutchings: I was assuming CAS would count as load-modify-store, which apparently it doesn’t. :/ (I mean you’re loading a value, performing an arithmetic operation, and then potentially storing a different value back…) LL/SC is an interesting concept that I was not aware of. Thanks for the explanation. :)

    mpz: I don’t know about SSE, but x87/MMX/3DNow! registers can be swapped on demand. You can skip saving them on a context switch and only save when another thread actually tries to use them.

  51. Ben Cooke says:

    Isaac, Mike:

    Ahh, the VIC-20. That would have been where I encountered the 6502. I’m glad there are lots of people with good memories around here to set me straight: I won’t make *that* mistake again! :)

    I never had the pleasure of owning a C128, but I remember that they were quite nutty machines. Didn’t they also have a second processor inside for running CP/M or something? I remember a friend showing me how he could boot either the Commodore 128’s own kernel, the C64’s kernel (and presumably use some other hardware borrowed from the C64) and with the help of a floppy disk also boot CP/M and change into a high-resolution text mode to make it more usable. (40×25 isn’t very much for a real command-line OS, of course.)

    I jumped straight from C64 to Amiga, though, and aside from a small amount of tinkering I never did much "real programming" of the 68k chip in my A500. Instead, I learned a bunch of high-level languages enventually learning C for the first time. Those were the days. :)

  52. Andrew says:

    [QUOTE]

    That may be the stack, or you could have register windows… I don’t know whether any processor actually uses this technique.

    [/QUOTE]

    According to Raymond’s earlier entries on calling conventions, it sounds like IA-64 does something along those lines:

    http://weblogs.asp.net/oldnewthing/archive/2004/01/13/58199.aspx

    Regarding SSE instructions, Tony Cox already mentioned that they have alignment restrictions that are not fixed up by the hardware. Specifically, the data needs to be 16-byte aligned. I’ve found _aligned_malloc to be quite handy in this regard. As far as favoring SSE due to context switching rules, I don’t know. And since there’s been a lot of talk about register usage, I think the VC++ .NET compiler will make use of SSE registers to pass parameters if certain optimizations are enabled (not sure though).

  53. Skywing says:

    SSE/2 registers need ‘special’ operating system support to be saved across context switches (fxsave). MMX overlaps with the floating point registers so anything that saves floating point states across context switches (fsave/fnsave) will automagically save MMX states.

  54. Fred says:

    Just wanted to add: the 6502 was used in the original C64, the 6510 was used for the slightly revamped C64-C.

  55. David says:

    A minor nit: a typical malloc implementation reserves extra memory if needed to ensure that the *next* request is correctly aligned.

    It might be argued that on an architecture like x86 which allows unaligned memory access, a conforming malloc implementation could ignore alignment entirely. I’m not much of a language lawyer, but "correctly aligned" could be taken to mean "aligned so as to be accessible without crashing". In practice, I’d be shocked to see a malloc implementation which imposes such a performance penalty.

  56. js says:

    Point of information: The PowerPC handles misaligned reads into and writes out of general purpose registers without exceptions (modulo page-crossing reads) but with some performance penalty.

    Someone asked how you can implement atomic operations without load-modify-store; PowerPC uses <a href="http://www.go-ecs.com/ppc/ppctek1.htm">the reservation instructions</a>, <a href="http://publibn.boulder.ibm.com/doc_link/en_US/a_doc_lib/aixassem/alangref/lwarx.htm">lwarx and stwcx</a>. The special store instruction only succeeds if nobody else modified the location since the special load instruction, and reports whether the store succeeded.

  57. Norman Diamond says:

    9/15/2004 7:26 PM David

    > It might be argued that on an architecture

    > like x86 which allows unaligned memory

    > access, a conforming malloc implementation

    > could ignore alignment entirely.

    That is correct. Furthermore the compiler can ignore alignment when generating object code. The reason ordinary compilers do not ignore alignment is exactly as you said, ordinary compilers are written with at least some degree of respect for performance (execution efficiency), which is obtainable except for structure layouts that are imposed by external requirements.

    Even on CPUs which impose alignment in hardware, compilers and their associated malloc’s would be free to ignore alignment in laying out data though of course they would have to provide fixups to copy the data to aligned locations when necessary. In fact when external requirements impose layouts different from what the CPU requires, compilers and/or runtimes already have to provide fixups this way (or else C programs have to call memcpy() etc.). This is the reason why it is often faster for fixups to be done in the CPU instead of in software.

  58. What do you mean 41-bit instructions? How can an instruction span 5 bytes and an extra bit?

  59. Raymond Chen says:

    The ia64 encodes 3 instructions in 128 bits. Subtract 5 bits of overhead and you get 41 bits per instruction.

  60. Jan_Klaassen says:

    Just wanted to add: the 6502 was used in the

    > original C64, the 6510 was used for the

    > slightly revamped C64-C.

    No, the difference between the two is that the 6510 has an I/O port at address 1 (with an associated data direction register at address 0). The C64 uses it for banking some memory areas and controlling the tape recorder (yes, I have a misspent child-hood and a rather musty smelling C64 reference guide to prove it).

  61. Michael J Smith says:

    I think we may have things backwards.

    Parameter passing and return addresses are determined by the compiler, not the architecture. Because the x86 processors have few registers, the compiler’s choices are restricted – but it is still up to the compiler. If an IA64 compiler wants to pass parameters on the stack, there is nothing to stop it.

  62. Raymond Chen says:

    The Win32 ABI specifies the valid calling conventions. You can’t just make up a new one – it has to play friendly with SEH.

  63. Michael J Smith says:

    The Win32 ABI specifies the valid calling

    > conventions. You can’t just make up a new one –

    > it has to play friendly with SEH.

    Certainly, your calling convention has to be consistent with APIs, system calls and other things that you use – but none of this is imposed by the architecture.

    When porting Windows to IA64, I’m sure nobody said "well, its IA64 so we must use these conventions". You (or your colleagues) would have chosen conventions that you believed would work well in the architecture.

    Another compiler designer might decide on a different set of conventions. They would then need to add some "glue" around the edges to interface with the rest of the OS, but it could be done.

  64. Raymond Chen says:

    Note that causing an exception counts as "interfacing with the rest of the OS", so anywhere that can raise an exception needs to have glue. In order for the OS to be able to unwind exception frames, there are very specific rules about function prologues and epilogues so that the OS’s exception dispatcher can unwind a partially-executed function properly.

    Sure, you can "glue" it, but since every memory access can potentially result in an exception (STATUS_IN_PAGE_ERROR for example), you’re going to have to erect glue around every memory access. That’s an awful lot of glue.

  65. Instead of doing it en masse, Windows 95 did it incrementally.

  66. Belated answers to exercises and other questions.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index