How can a program survive a corrupted stack?

Date:January 16, 2004 / year-entry #22
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20040116-00/?p=41023
Comments:    10
Summary:Continuing from yesterday: The x86 architecture traditionally uses the EBP register to establish a stack frame. A typical function prologue goes like this: push ebp ; save old ebp mov ebp, esp ; establish new ebp sub esp, nn*4 ; local variables push ebx ; must be preserved for caller push esi ; must be...

Continuing from yesterday:

The x86 architecture traditionally uses the EBP register to establish a stack frame. A typical function prologue goes like this:

  push ebp       ; save old ebp
  mov  ebp, esp  ; establish new ebp
  sub  esp, nn*4 ; local variables
  push ebx       ; must be preserved for caller
  push esi       ; must be preserved for caller
  push edi       ; must be preserved for caller

This establishes a stack frame that looks like this, for, say, a __stdcall function that takes two parameters.


.. rest of stack ..
param2
param1
return address
saved EBP <- EBP
local1
local2
...
local-nn
saved EBX
saved ESI
saved EDI <- ESP

Parameters can be accessed with positive offsets from EBP; for example, param1 is [ebp+8]. Local variables have negative offsets from EBP; for example, local2 is [ebp-8].

Now suppose that a calling convention or function declaration mismatch occurs and extra garbage is left on the stack:


.. rest of stack ..
param2
param1
return address
saved EBP <- EBP
local1
local2
...
local-nn
saved EBX
saved ESI
saved EDI
garbage
garbage <- ESP

The function doesn't really feel any damage yet. The parameters are still accessible at the same positive offsets and the local variables are still accessible at the same negative offsets.

The real damage doesn't occur until it's time to clean up. Look at the function epilogue:

  pop  edi       ; restore for caller
  pop  esi       ; restore for caller
  pop  ebx       ; restore for caller
  mov  esp, ebp  ; discard locals
  pop  ebp       ; restore for caller
  retd 8         ; return and clean stack

In a normal stack, the three "pop" instructions match with the actual values on the stack and nobody gets hurt. But on the garbage stack, the "pop edi" actually loads garbage into the EDI register, as does the "pop esi". And the "pop ebx" - which thinks it's restoring the original value of EBX - actually loads the original value of the EDI register into EBX. But then the "mov esp, ebp" instruction fixes the stack back up, so the "pop ebp" and "retd" are executed with a repaired stack.

What happened here? Things sort of got put back on their feet. Well, except that the ESI, EDI, and EBX registers got corrupted. If you're lucky, the values in ESI, EDI and EBX weren't important and could have survived corruption. Or all that was important was whether the value was zero or not, and you were lucky and replaced one nonzero value with another. For whatever reason, the corruption of those three registers is not immediately apparent, and you end up never realizing what you did wrong.

Maybe the corruption has a subtle effect (say, you changed a value from zero to nonzero, causing the caller to go down the wrong codepath), but it's subtle enough that you don't notice, so you ship it, throw a party, and start the next project.

But then a new compiler comes along, say one that does FPO optimization.

FPO stands for "frame pointer omission"; the function dispenses with the EBP register as a frame register and instead just uses it like any other register. On the x86, which has comparatively few registers, an extra arithmetic register goes a long way.

With FPO, the function prologue goes like this:

  sub  esp, nn*4 ; local variables
  push ebp       ; must be preserved for caller
  push ebx       ; must be preserved for caller
  push esi       ; must be preserved for caller
  push edi       ; must be preserved for caller

The resulting stack frame looks like this:


.. rest of stack ..
param2
param1
return address
local1
local2
...
local-nn
saved EBP
saved EBX
saved ESI
saved EDI <- ESP

Everything is now accessed relative to the ESP register. For example, local-nn is [esp+0x10].

Under these conditions, garbage on the stack is much more fatal. The function epilogue goes like this:

  pop  edi       ; restore for caller
  pop  esi       ; restore for caller
  pop  ebx       ; restore for caller
  pop  ebp       ; restore for caller
  add  esp, nn*4 ; discard locals
  retd 8         ; return and clean stack

If there is garbage on the stack, the four "pop" instructions will restore the wrong values, as before, but this time, the cleanup of local variables won't fix anything. The "add esp, nn*4" will adjust the stack by what the function believes to be the correct amount, but since there was garbage on the stack, the stack pointer will be off.


.. rest of stack ..
param2
param1
return address
local1
local2 <- ESP (oops)

The "retd 8" instruction now attempts to return to the caller, but instead it returns to whatever is in local2, which is probably not valid code.

So this is an example of where optimizing your code reveals other people's bugs.

Monday, I'll give a much more subtle example of something that can go wrong if you use the wrong function signature for a callback.


Comments (10)
  1. Pramod says:

    Absolutely amazing! and extremely useful for students like me!

    Btw..in these days of OOPS, SOAP and buzzword-compliance, I can’t believe that it’s necessary to know these nitty-gritties! Or do u work in this particularly low level stuff?

  2. Larry Osterman says:

    I don’t work in the nitty gritties, but I’m constantly being brought into other peoples offices to look at mysterious problems (I called this routine and it corrupted a local variable) simply because I DO know how all this stuff works.

    Of course the fact that I figuratively cut my development teeth on MS-DOS 4.0 back in 1984 helps :)

  3. Raymond Chen says:

    If you work in C or C++, you pretty much have no choice but to learn these nitty-gritties because there is no framework around to do this for you. (And even if you are using a framework like VB or the CLR you still need to know this if you intend to interop with C/C++.) When your program crashes mysteriously, you’re stuck staring at nitty-gritties.

  4. MilesArcher says:

    This is one of my pet peeves. That someone writing business apps in C++ has to worry about these details. I am hoping that C# takes off.

  5. asdf says:

    That’s not really true, you can program in C++ and not even have to worry about something like *memory management* if you use smart pointers and classes that take care of themselves. Though I agree, C# is a better language than C++ for mediocre programmers to use. Not knowing basic assembly or how the architecture is implemented is like not knowing calculus or linear algebra. You can get away with not knowing it in life but if you ever come across a problem that can be solved using it, you’ve just saved yourself a lot of time and trouble.

  6. Raymond Chen says:

    True, you can get the compiler to do a lot of the work for you, but you have to do it right. A mistake as simple as declaring the same (extern "C") function two different ways will go unnoticed by the compiler and lead to all sorts of strange crashes.

    (Plus of course there are fun things like returning a pointer to a stack-allocated object and using it after the function returns…)

  7. Phaeron says:

    It is very seldom that a C++ programmer needs to keep low-level internals in mind when coding, but that doesn’t mean you shouldn’t know of them. Just knowing how the underlying code generation model works gives you a lot of insight as to why code (mis)behaves as it does, and that is true whether you are using C++ or C#.

    You can get very nasty bugs by mismatching calling conventions. One time I accidentally declared a VFW callback as __cdecl instead of __stdcall, and the result was stack hemorrhaging in the video capture loop. Normally the capture went OK because the garbage was removed by the frame pointer, but if I went above a certain time limit the thread ran out of stack and crashed.

    Sometimes declaration errors can be caught in a program by compiling it as managed C++ (/clr), even if you never intend to actually run or ship it that way. In particular, One Definition Rule (ODR) violations are caught, such as declaring a structure differently in two modules with the same name.

  8. floyd says:

    MilesArcher: "This is one of my pet peeves. That someone writing business apps in C++ has to worry about these details. I am hoping that C# takes off."

    I have a different perspective on the issue: If a programming language makes it too easy on the developer chances are that you’ll wind up with hordes of poor programmers flooding the business. Of course that is not to say that a Java programmer, for example, is a bad programmer per se, just because (s)he doesn’t have to care about as many details as a C++ programmer ideally would. From personal experience though, I have seen few good programmers that never got in touch with their machines on a low level. Many just never even cared, and it is lack of interest that seperates the boys from the men ;)

    So my opinion is that knowledge never hurts. With low-level languages you’re forced to get to know those details under the hood. This, however, doesn’t disqualify a language as the tool of choice for a given project.

    The problem with C++ is not so much that you have to know about certain details. It’s more that it was old already at the time it was brand new. There is too much C legacy and those who read D&E know how great C++ would have turned out if it didn’t have to care to attract C coders.

    .f

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index