What can go wrong when you mismatch the calling convention?

Date:January 15, 2004 / year-entry #19
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20040115-00/?p=41043
Comments:    75
Summary:Believe it or not, calling conventions is one of the things that programs frequently get wrong. The compiler yells at you when you mismatch a calling convention, but lazy programmers will just stick a cast in there to get the compiler to "shut up already". And then Windows is stuck having to support your buggy...

Believe it or not, calling conventions is one of the things that programs frequently get wrong. The compiler yells at you when you mismatch a calling convention, but lazy programmers will just stick a cast in there to get the compiler to "shut up already".

And then Windows is stuck having to support your buggy code forever.

The window procedure

So many people misdeclare their window procedures (usually by declaring them as __cdecl instead of __stdcall), that the function that dispatches messages to window procedures contains extra protection to detect incorrectly-declared window procedures and perform the appropriate fixup. This is the source of the mysterious 0xdcbaabcd on the stack. The function that dispatches messages to window procedures checks whether this value is on the stack in the correct place. If not, then it checks whether the window procedure popped one dword too much off the stack (if so, it fixes up the stack; I have no idea how this messed up a window procedure could have existed), or whether the window procedure was mistakenly declared as __cdecl instead of __stdcall (if so, it pops the parameters off the stack that the window procedure was supposed to do).

DirectX callbacks

Many DirectX functions use callbacks, and people once again misdeclared their callbacks as __cdecl instead of __stdcall, so the DirectX enumerators have to do special stack cleanup for those bad functions.

IShellFolder::CreateViewObject

I remember there was one program that decided to declare their CreateViewWindow function incorrectly, and somehow they managed to trick the compiler into accepting it!

class BuggyFolder : public IShellFolder ... {
 ...
 // wrong function signature!
 HRESULT CreateViewObject(HWND hwnd) { return S_OK; }
}

Not only did they get the function signature wrong, they returned S_OK even though they failed to do anything! I had to add extra code to clean up the stack after calling this function, as well as verify that the return value wasn't a lie.

Rundll32.exe entry points

The function signature required for functions called by rundll32.exe is documented in this Knowledge Base article. That hasn't stopped people from using rundll32 to call random functions that weren't designed to be called by rundll32, like user32 LockWorkStation or user32 ExitWindowsEx.

Let's walk through what happens when you try to use rundll32.exe to call a function like ExitWindowsEx:

The rundll32.exe program parses its command line and calls the ExitWindowsEx function on the assumption that the function is written like this:

void CALLBACK ExitWindowsEx(HWND hwnd, HINSTANCE hinst,
       LPSTR pszCmdLine, int nCmdShow);

But it isn't. The actual function signature for ExitWindowsEx is

BOOL WINAPI ExitWindowsEx(UINT uFlags, DWORD dwReserved);

What happens? Well, on entry to ExitWindowsEx, the stack looks like this:

.. rest of stack ..
nCmdShow
pszCmdLine
hinst
hwnd
return address <- ESP

However, the function is expecting to see

.. rest of stack ..
dwReserved
uFlags
return address <- ESP

What happens? The hwnd passed by rundll32.exe gets misinterpreted as uFlags and the hinst gets misinterpreted as dwReserved. Since window handles are pseudorandom, you end up passing random flags to ExitWindowsEx. Maybe today it's EWX_LOGOFF, tomorrow it's EWX_FORCE, the next time it might be EWX_POWEROFF.

Now suppose that the function manages to return. (For example, the exit fails.) The ExitWindowsEx function cleans two parameters off the stack, unaware that it was passed four. The resulting stack is

.. rest of stack ..
nCmdShow (garbage not cleaned up)
pszCmdLine <- ESP (garbage not cleaned up)

Now the stack is corrupted and really fun things happen. For example, suppose the thing at ".. rest of the stack .." is a return address. Well, the original code is going to execute a "return" instruction to return through that return address, but with this corrupted stack, the "return" instruction will instead return to a command line and attempt to execute it as if it were code.

Random custom functions
An anonymous commenter exported a function as __cdecl but treated it as if it were __stdcall. This will seem to work, but on return, the stack will be corrupted (because the caller is expecting a __stdcall function that cleans the stack, but what it gets is a __cdecl funcion that doesn't), and bad things will happen as a result.

Okay, enough with the examples; I think you get the point. Here are some questions I'm sure you're asking:

Why doesn't the compiler catch all these errors?

It does. (Well, not the rundll32 one.) But people have gotten into the habit of just inserting the function cast to get the compiler to shut up.

Here's a random example I found:

LRESULT CALLBACK DlgProc(HWND hWnd, UINT Msg,
   WPARAM wParam, LPARAM lParam);

This is the incorrect function signature for a dialog procedure. The correct signature is

INT_PTR CALLBACK DialogProc(HWND hwndDlg, UINT uMsg,
    WPARAM wParam, LPARAM lParam);

You start with

DialogBox(hInst, MAKEINTRESOURCE(IDD_CONTROLS_DLG),
          hWnd, DlgProc);

but the compiler rightly spits out the error message

error C2664: 'DialogBoxParamA' : cannot convert parameter 4
from 'LRESULT (HWND,UINT,WPARAM,LPARAM)' to 'DLGPROC'

so you fix it by slapping a cast in to make the compiler shut up:

DialogBox(hInst, MAKEINTRESOURCE(IDD_CONTROLS_DLG),
          hWnd, reinterpret_cast<DLGPROC>(DlgProc));

"Aw, come on, who would be so stupid as to insert a cast to make an error go away without actually fixing the error?"

Apparently everyone.

I stumbled across this page that does exactly the same thing, and this one in German which gets not only the return value wrong, but also misdeclares the third and fourth parameters, and this one in Japanese. It's as easy to fix (incorrectly) as 1-2-3.

How did programs with these bugs ever work at all? Certainly these programs worked to some degree or people would have noticed and fixed the bug. How can the program survive a corrupted stack?

I'll answer this question tomorrow.


Comments (75)
  1. Olsson says:

    I’m pretty new to Windows API programming so i may have gotten this wrong, but…

    I read this:

    LRESULT CALLBACK DlgProc(HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam);

    And it looked very familiar. In VC6 when you create a "Win32 Application" from the New Wizard and choose the "Hello World application" it uses the faulty(?) callback procedure. It even contains a call to:

    "DialogBox(hInst, (LPCTSTR)IDD_ABOUTBOX, hWnd, (DLGPROC)About);"

    So why does VC6 get it so wrong?

  2. Mike Dimmick says:

    The DlgProc one will work _as written_ because the return types are compatible – LRESULT maps to LONG_PTR, and on both x86 and Itanium, INT_PTR and LONG_PTR are the same size.

    However, if you perform the cast rather than correcting the code, you miss any other mistakes in the declaration (e.g. too many or too few parameters, or the wrong size for parameters).

    Unfortunately it’s a general trend of ‘what’s the least amount of work I need to do to shut the compiler up?’ rather than working out the root cause of the error/warning, and repairing that. A cast – especially a C-style cast – allows the programmer to bodge any value into any location, without having to think. I’ve seen it quite a lot on programming forums.

  3. Mike Hearn says:

    These stories always make me laugh. I sometimes tried to imagine what the Windows code might look like based on what Wine looks like, but now I know just how far wrong I was…. I never thought the code would be riddled with inline assembly to clean up the stack after random function calls!

  4. Centaur says:

    In Russian, we have an idiomatic expression that can be literally translated as “bear’s service”. It is used when you do something to someone in the hope that it will be useful, but it actually brings harm.

    This is just what happens here. Someone misdeclares the calling convention, and you detect this and silently fix their error instead of loudly killing the offending application and asking the user to type in the name of the application and the address of its author, so that it could be posted on the Internet in the Official List of Misbehaving Programs.

  5. James Curran says:

    OK, My "corrupt stack" story.

    This was about 18 years ago, when I was working in Z-8000 Assembler (Ya’want "corrupt stack" stories, ya gotta talk assembler…..)

    I was modifying an existing production system. Every time I called one particular subroutine in my modification, the program crashed on return. However, the production version called it all the time and it worked fine.

    I checked & checked my code, and everything seemed fine. (Recall that this was an embedded system in the mid-80’s, so debugging tools were very primative)

    Now, in Z8K asm, there are 16 32-bit registers, (R0 thru R15), which could also be used as 64-bit registers: Even numbered RR0 through RR14 (RR0 was R0 & R1).

    The standard practice in this program was that every subroutine would preserve any register it used. So, that routine, like most other, began with

    PUSH RR0

    PUSH RR2

    PUSH RR4

    However, I eventually noticed that it ended with:

    POP RR4

    POP RR2

    POP R0

    Which means it was pushing 6 words onto the stack at the story, but only popped 5 of them off at the end. BUT IT WORKS IN PRODUCTION!

    I figured out that the production system only called that subroutine from one place, and when it made the call, R1 (the value being left on the stack) would always have 0000 in it. In the device’s ROM, address 0000 had a RET. So, the bad instruction would leave a 0000 on the top of the stack, it would then pop that as the return address and jump to 0000, see the RET, pop the next value (the correct return) off the stack as the return address and go on it’s merry way, as the original caller didn’t care what ended up in the R1 — Until I came along and called it with some other value in R1.

    It was estimated that the device had been making that extra jump on every keystroke for about three years without any programmer noticing it.

  6. Raymond Chen says:

    "loudly killing the offending application…" -While we all wish we could do this, we also realize that this doesn’t help the customer one bit.

    "Hi, I upgraded to Windows XP, and now my accounts management program keeps crashing. It puts up this dialog box asking me to report the program. How do I fix it? I am losing thousands of dollars a day because of this!"

  7. Alan Hecht says:

    It looks like generic.c in the "Generic Sample Application" in the Windows API documentation gets it wrong. The function signature for the dialog procedure is:

    LRESULT WINAPI AboutDlgProc( HWND, UINT, WPARAM, LPARAM );

    and it is called with:

    DialogBox(ghInstance, "AboutDlg", hWnd,(DLGPROC)AboutDlgProc );

  8. Raymond Chen says:

    Thanks, Alan. I’ve notified the MSDN doc folks.

  9. Matt says:

    I tend to agree with Centaur on this one (what is that idiomatic expression, by the way? Could you post the transliteration?). Why should the Windows code be forced to "fix" the errors in the applications? Shouldn’t the application developers just be expected to program correctly for their intended OS? I’m not a Windows programmer, but the thought of OS code being written to handle poorly written applications boggles my mind. Is this common?

  10. Mike Dunn says:

    The "silently fix bad apps" is a sticky situation as it reinforces bad coding practices. Anyone who participates in programming discussion boards has certainly seen questions from newbies like "how do I cast an int to a CString?" or "I added a cast from an int to a CString but now my program crashes. Why? It compiled!"

    Bottom line is, casting in C++ is a Bad Thing unless you know what you’re doing. C++ data types (especially C strings) are just hard for some newbies to grasp, and some seem to throw in casts thinking the compiler will magically understand their intention. Sometimes it works, and that reinforces the misperception that casts are the right way to do things or (even worse) fix compiler errors. Automatically fixing up the stack after a bug caused by a bad cast also reinforces it.

    [Not that I’m arguing against the OS doing such fix-ups – I completely understand why it’s done – I’m just pointing out some downsides.]

  11. Raymond Chen says:

    Remember, serving our customers is our #1 goal. Breaking Program Q is a disservice to our customers who use Program Q, even if program Q was poorly written. Now you might say, "Well those customers deserve to be screwed for buying such crappy software." But I wouldn’t say that.

  12. This is great read! Thanks for enlightening us, I *shame* have to admit that I too have written some dialog-code like that…but I promise I will never do it again!

    And I agree that type-casting (especially C-casting) is bad unless you’re really sure on what you’re doing.

    The problem is that most programmers want to get the reward, a running program. A compiler-error is like someone standing in the way of that reward, trying to stop you. So you feel like: "Hey you! Get out of my way or I’m gonna blast you with my type-casting-gun!"

    Ok, maybe I overexaggerated a bit…

  13. asdf says:

    It’s not healthy to keep the rage against people that should be programming visual basic instead of C++ bottled up so I suggest you post more stories like this.

    P.S. Microsoft is guilty of botching __stdcall/__cdecl too like the callback functions passed to some glu/glut functions. They’re also notorious for not being const safe but that’s another article.

    P.P.S. I like how Microsoft changed the return type of DLGPROC from BOOL (in the VC6 headers) to INT_PTR (in the platform sdk headers) all of a sudden. If you search all of msdn for "(DLGPROC)" you will see tons of questionable code.

  14. Larry Osterman says:

    The reason that the dialog box return code was changed from LRESULT to INT_PTR was for 64 bit issues.

    Apps (and windows) were relying on the fact that they could return pointers into the storage that was reserved for a return code, which works for 32 bit platforms, but not for 64 bit platforms. So the parameter was changed.

    For 32 bit platforms, there is no effective difference between LRESULT and INT_PTR, which means that 32 bit code isn’t broken, however when the code is recompiled for 64bits it will be broken.

    The good news is that when you recompile your broken code, YOUR code won’t compile – the compiler will warn you that your’re truncating the bits when you try to return a 64bit pointer into a 32bit LRESULT.

  15. Larry Osterman says:

    There is also a HUGE caveat if you’re the author of a library.

    You MUST (without exception) declare the calling convention that your library was compiled with in the published headers for the library.

    If you don’t, then when someone new tries to use your library, they WILL experience mysterious failures. This has bitten me more times than I care to think of.

    For example, consider a project I worked on. It had a function ScpAppIf_Do_Startup that was declared in the header as:

    extern BOOL

    ScpAppIf_Do_Startup

    (

    xScpAppIf_this_only

    );

    In order to save space in the resulting binary, the project was compiled specifying the stdcall calling convention.

    So we released our beta DDK with the header as described above, some of our customers came along and started writing code for the project.

    And they immediately got on the phone with us to complain that their application was mysteriously crashing – when they called into our functions, their local variables all of a sudden got corrupted.

    The problem was that they didn’t specify a calling convention in their makefiles, which meant that they assumed that our functions were declared with the default calling convention, __cdecl.

    But remember – the libraries were compiled with __stdcall.

    The fix was to rerelease the headers with the prototype changed to:

    extern BOOL SCP_API

    ScpAppIf_Do_Startup

    (

    xScpAppIf_this_only

    );

    and we defined SCP_API to __stdcall for x86 platforms.

    This problem occurs whether your are producing a statically linked library (as in our case) or if you’re producing a DLL.

    Bottom line: If you don’t explicitly specify your calling convention, you WILL have troubles down the line.

  16. Raymond Chen says:

    The BOOL -> INT_PTR change happened during the Win64 effort. On Win32, BOOL and INT_PTR are the same thing, so Win32 code is unaffected. But if you want your code to be ready for Win64, change your dialog procedure BOOLs to INT_PTRs.

    When I was porting code to Win64, the first thing I did was grep for (DLGPROC) and (WNDPROC), remove the casts, and then fix all the bugs that the casts were hiding.

  17. Raymond Chen says:

    Actually, Larry, the dialog box return code was changed from *BOOL* to INT_PTR. It was never LPARAM. BOOL is the same as INT.

    The Win64 changes were carefully made so that valid Win32 code remained valid. (Of course, invalid Win32 code remained invalid and could possibly get worse, thanks to issues like this.)

  18. Larry Osterman says:

    Btw, you wrote the following above (which is where I got my LRESULT comment).

    LRESULT CALLBACK DlgProc(HWND hWnd, UINT Msg,

    WPARAM wParam, LPARAM lParam);

    So the code in question was DOUBLY bad…

  19. Larry Osterman says:

    My bad, teach me to try to answer for the master :)

  20. Alex Feinman says:

    Matt: it is "medvezhiya usluga"

    Raymond: Are you saying all these hacks became a part of Windows code or were they put into "shims"?

  21. Peter Montgomery says:

    After reading this, I panicked and went into my code to see how I handled dialog boxes. I saw (as an example) code like this:

    BOOL CALLBACK LoadLoopDlgProc(HWND hDlg, UINT iMsg, WPARAM wParam, LPARAM lParam);

    Cool! So, Raymond, do I get a gold star? Frankly, I’m sure I just cut and pasted something from the MS docs which must have been correct at the time I read them. Of course, MS docs have a long history of being flat out wrong and frocing programmers to spend countless hours trying to get their code to work. It’s little wonder there isn’t more bad code out there as a result. Any idea how the code you were porting was written incorrectly? I mean, didn’t those folks just follow the docs? Where did all these people get the idea to use LRESULT? I’m just your average API programmer, but I got it right, so how hard can it be?

    Now then, this whole change from BOOL to INT_PTR – how come this is the first I’ve heard of this? It seems like this sort of info needs to be disseminated in a less ad-hoc fashion. Discovering critical info like this by reading a blog or accidentally discovering it while checking out docs is a pretty lame means of getting the word out. For example, I have working code. When I need to write a new dialog proc, I won’t go into the docs again, I’ll just use a known exmaple I have here.

    You complain about how many people get it wrong, but don’t address the root problem which is how come MS doesn’t disseminate critical info better. The code you were porting, was it written at MS? If so, then how can you expect outside developers to get it right when MS can’t?

    Thanks,

    PeterM

  22. Raymond Chen says:

    If a problem is isolated to only a few programs, we use a shim. But if a problem is widespread, then it goes into the core.

  23. Raymond Chen says:

    Peter: Your BOOL code is still good today. You only have to worry about INT_PTR when you decide to port to Win64, at which point you will find lots of documentation listing the various parameter changes that were made for Win64 purposes (but which remain backwards compatible with Win32).

  24. Raymond Chen says:

    Windows NT used the Alpha in 32-bit mode. There was a "Win64 for Alpha AXP" project well under way but the Alpha died before it could ship.

  25. asdf says:

    Wasn’t the DEC Alpha 64 bits? How did NT work on that without all these INT_PTR and GetWindowLongPtr changes?

  26. Mike Dunn says:

    IIRC on the Alpha, all pointers are sign-extended, so you still have 4GB of process space covering addresses:

    0x00000000_00000000 to 0x00000000_7FFFFFFFF

    and

    0xFFFFFFFF_80000000 to 0xFFFFFFFF_FFFFFFFFF

    The OS (or maybe the hardware) did the sign-extension so that The Right Thing happens when you use a 32-bit address.

  27. CW User says:

    So it happens that my post (http://weblogs.asp.net/oldnewthing/archive/2004/01/08/48616.aspx#58017) was cited by Raymond as an example for this article.

    I am flattered.

    Lot of other guys said why these types of mistakes happen.

    Seems like documentation (MSDN and other resources) is always

    to blame. When I wrote the code from the link above, at first

    GetProcAddress() didn’t give anything. So I checked in Petzold. Then

    in MSDN, then in Jeffrey Richter’s Advanced Windows. (At the time

    I didn’t have Wine on my disk). Then I looked into some other books.

    Then I looked into headers to see what is really CALLBACK. Then in

    desperation I loaded compiled DLL into my editor and this is when I

    saw that strange name with @ in it. Then I remembered that Petzold

    was talking something about name mangling, but he related to it in

    terms of C++ so I thought I shouldn’t bother with that.

    After all these books and MSDN site and DVD and after six hours

    I’ve spent on GetProcAddres() no wonder that when it finally returned

    something and it all seemed to work well I went away from computer

    and it was natural I forgot to look into my header file which had

    original definition of proc type I’ve used to cast value returned from GetProcAddress().

    As Raymond writes next article, I think he would somehow come to

    one simple conclusion – all the programmers in the world need his

    cell phone number.

    Just kidding – there must be some other way of solving this mess.

    Open sourcing Windows? OK. Idea with the phone number was

    maybe more realistic.

    At least, we have this blog. World is a better place now.

  28. Petr Prazak says:

    For Matt and Centaur: this idiom in English is "a shot in the eye".

    How many more hacks are there in Windows for the sake of Compatibility? I hope Raymond will tell us. :)

  29. Raymond Chen says:

    I could talk about one compatibility hack a day and I would never run out of material. (There are hundreds of compatibility hacks just in the shims database. That’s not counting the ones that are incorporated into the core.) But it would make for a very boring blog.

  30. 4nd3r$ says:

    no that would be great, one compatibility hack every day

  31. Suppose you follow instructions in MSDN to code a TimerProc() function and pass that function’s address to the SetTimer() function. In order to get the result to compile, you WILL have to use the evil buggy kind of cast that is described in this Blog entry. After getting the result to compile, I don’t remember if it sometimes works or not.

    Visual Studio 6 includes enough source code so that I could change the declarations and compile the thing without an evil buggy cast. For the moment the thing runs. But I am violating the instructions which Microsoft published in MSDN.

    What will happen in the future when Microsoft starts conforming to instructions published in MSDN? Will you put hacks in Windows so that my incorrect program will continue to run?

    Sorry to repeat, but I really do wish that Microsoft would take this effort and put it into making valid programs work instead. Fix Windows. Fix MSDN. Fix Visual Studio. AFTER that, start thinking about your interesting hacks.

  32. Andreas Häber says:

    To fix the problem "in the core" for the end-user is very nice, from the end-user point of view. But for the developer this only makes it harder to find&fix errors.

    IMHO it would be very helpful if Windows checks if the program is running in debug-mode before it fixes the stack, and if it is then gives out a debug message (the OutputDebugString way) and explains the problem to the poor developer (or at least a link to your blog :))

  33. Raymond Chen says:

    ? I just followed the docs for SetTimer/TimerProc and it compiled okay:

    void CALLBACK MyTimerProc(HWND hwnd,

    UINT uMsg, UINT_PTR idEvent, DWORD dwTime);

    void foo(HWND hwnd)

    {

    SetTimer(hwnd, 1, 1000, MyTimerProc);

    }

    // no errors

  34. Raymond Chen says:

    You can run the checked version of Windows, which will tell you about all sorts of bad things you’re doing. Application Verifier will catch even more things.

    But some app compat fixes have to be done pre-emptively. For example, the fourth parameter to IExtractIcon::Extract

    http://msdn.microsoft.com/library/en-us/shellcc/platform/shell/reference/ifaces/iextracticon/Extract.asp

    is documented as "may be NULL" but it turns out that if you pass NULL, some icon extractors crash. So Explorer is careful never to pass NULL.

    Explorer isn’t clairevoyant. It doesn’t know whether passing NULL *would* crash, so it can’t print a debug message, "Your shell extension would have crashed if I passed NULL as the fourth parameter." It just has to play it safe and always pass a non-NULL fourth parameter.

  35. asdf says:

    With all these problems, I wonder why you guys haven’t bundled an app verifier like utility with visual studio (with an option to download current data like windbg’s symbol server stuff). The debug build then should have an option checked by default to spew out warnings like the DX debug runtime does and a message box for blatantly incorrect errors. But to make it more useful than DX’s you should have a user defined filter function instead of a tiny slider. Well I guess it doesn’t make sense anymore since you guys pretty much canned the Win api for .net.

  36. Florian says:

    I still have problems groking the change from BOOL to INT_PTR for the Win64 move. While Raymond said that the return type never was anything else but BOOL, Larry said that apps and windows relied on being able to store *pointers* in the return code. Now, usually in the programming world BOOL means Boolean and not some "type that is true or false or may also hold a pointer to something". What I mean is that BOOL and INT_PTR are vastly different concepts. Is this yet another story about creative use of the WinAPI by third party programs or did actually even Windows rely on being able to return pointers? Why would anyone return a pointer from a BOOL function?

  37. Raymond Chen says:

    The only reason BOOL changed to INT_PTR is that the WM_CTLCOLOR messages require you to return a brush handle (a pointer) cast to BOOL. If it weren’t for that, dialog procedures would have stayed BOOL.

  38. tom says:

    Re: Generic Sample Application – is it just the forward declaration that’s bad – the actual function is OK?

    Raymond, if it were your responsibility to vet the sample code after it already contained the offending cast, how would you have found it? Is there any way other than to know the signature of a dialog procedure? None of the tools mentioned in this post would find this mismatch.

    Thanks.

  39. Raymond Chen says:

    If you change the return value from LRESULT to BOOL then the dialog procedure would be fine.

    When I review code, I view every function cast with great suspicion. It usually means you’re trying to pull a fast one. The only exceptions I can think of are (1) casting the return value of GetProcAddress and (2) casting the return of GetWindowLongPtr(GWLP_WNDPROC). The second can be wrapped inside the SubclassWindow macro. The first you just have to stare at carefully, make sure the declaration is correct down to the last detail.

    There may be a version of lint that catches this sort of thing but I’m not aware of it.

  40. Martin Webrant says:

    I ran into a gotcha with the timer callback the other day.

    The callback is surrounded with try/catch and is potentially hiding errors that can happen in your timer function.

    VOID CALLBACK MyTimerProc(HWND hwnd, UINT message, UINT idTimer, DWORD dwTime)

    {

    OutputDebugString("I’m doing some stuff heren");

    int* crash = 0;

    *crash = 0;

    OutputDebugString("You never knew this code wasn’t running – hiding a potential disaster :-)");

    }

    Our crash usually happened in release code so we didn’t see the "access violation" message in the output window…

    I ended up writing a __try/__except around all the timer code so I could warn about the error and write a minidump myself.

  41. asdf says:

    Is there a reason you guys didn’t just change the BOOL type to 64 bits and leave DLGPROC alone? I could have sworn there was a lot of code (in the MFC/WTL/ATL source even?) written that assumed sizeof(BOOL) == sizeof(LRESULT) because DLGPROC returned a BOOL. Or maybe they just made the (all too common it seems) mistake of returning an LRESULT and the two wrongs made it right.

  42. Mike Dunn says:

    Ack, I’m guilty of assuming non-NULL pointers in IExtractIcon::Extract. But in my defense, the Oct 2001 MSDN does not say that those parameters may be NULL.

    Since I do not use VC 7.x, and I don’t like the new MSDN help viewer, I usually use the Oct 2001 MSDN (the last one that works with VC 6) for all my docs needs, unless it’s some really new API or interface that’s only in the Feb 2003 PSDK.

    Raymond, that reminds me of a question that stumped me… In one of my apps I have an icon handler that reads the icon for a file type with SHGetFileInfo and returns it for my own custom file type. I was unable to find how to retrieve the 48×48 icon for a file type from the system imagelist. AFAICT SHGetFileInfo can only give you the 16×16 and 32×32 ones.

  43. Tom Seddon says:

    STRICT seems to have vanished. But with older VC++, unless you did -DSTRICT all the function pointer types were just the same. So you were obliged to cast in order to get Win32 and C++ (and possibly modern C, too) to coexist nicely.

    I remember this being a problem in VC5, and it was ugly. (I found out about STRICT by looking at the header in shock/desparation.) Looks like STRICT has gone from VS.NET, thankfully; not sure about VC6.

    Anyway, when apportioning blame for these mucky casts, this should be borne in mind :)

  44. Tom Seddon says:

    I should add, in case it’s unclear — STRICT has gone, because it seems you only get the STRICT prototypes.

  45. Andreas Häber says:

    "You can run the checked version of Windows, which will tell you about all sorts of bad things you’re doing. Application Verifier will catch even more things. "

    Like said before, there should be a very easy way to use Application Verifier from Visual Studio. I don’t think as many people who should know about application verifier knows it exists :/. Same goes for the checked build of Windows – also to get it you’ve got to have a MSDN Subscription. And since it’s mostly for kernel-development I don’t think the usual application developer knows about that one too. (Maybe I’m too pessimistic? But after reading all the application compatibility errors on your blog, why should I be optimistic? ;))

    With regards to displaying debug messages it’s always ‘grey cases’. I know the Windows sourcecode is huge (30+ millions lines from what I’ve read). But what’s your opinion about the specific case you described above: "the function that dispatches messages to window procedures"?

    IMHO when the dispatch function detects that something is wrong with the stack, you can assume that it’s not very good written from the start. Therefore performance shouldn’t matter, so what’s the negative effect with giving the developer knowledge about this information?

  46. Raymond Chen says:

    asdf: None of the fundamental data types changed size because doing so would have broken file formats.

    Mike: SHGetImageList lets you get the 48×48 imagelist.

    Andreas: And then when you run a program that has a buggy window procedure your machine slows to a crawl because of all the debug spew. We had this problem in Windows 95 – our internal bug-tracking program was itself buggy and generated boatloads of of debugging warnings if you ran it on a checked build, so much that you couldn’t really use it.

    Though the specific case of the wndproc dispatcher could be fixed by say printing the message only once per program. But the general case is much harder.

  47. Mike Dimmick says:

    Tom Seddon: STRICT is still there, but now if you don’t define NO_STRICT, WinDef.h defines STRICT for you.

  48. Tim Robinson says:

    So wrap the debugging messages with if (IsDebuggerPresent()). The NTDLL heap routines do this to check heap validity, and the DLL loader in NT4 did this when it had to relocate something.

    Incidentally, that LDR relocation message in NT4 was really useful. I was disappointed to see it go when I started developing on Windows 2000. Now I’m faced with running my app in Depends and looking for red numbers. Or I could just not bother and have my programs start up a lot more slowly.

    </moan>

  49. Andreas Häber says:

    Yeah, something like IsDebuggerPresent is what I thought of above. Like I said, this is only useful for the developer, not the poor end-user (who’s stuck with the buggy program. Nice that Windows takes care of her/him).

    Ok, the program is going really slow. Now it’s up to the developer – does (s)he start to optimize code or look at all the debug messages? :)

  50. Raymond Chen wrote on 1/15/2004 5:13 PM :

    >? I just followed the docs for

    >SetTimer/TimerProc and it compiled okay:

    >

    >void CALLBACK MyTimerProc(HWND hwnd,

    >UINT uMsg, UINT_PTR idEvent, DWORD dwTime);

    >

    >SetTimer(hwnd, 1, 1000, MyTimerProc);

    My God. Code used by Microsoft internally matches MSDN, but code shipped from Microsoft to victims is completely different. Here’s what victims get when we buy VS6 and download SP5:

    >/*

    > * Windows Functions

    > */

    >

    >WINUSERAPI

    >UINT

    >WINAPI

    >SetTimer(

    > HWND hWnd ,

    > UINT nIDEvent,

    > UINT uElapse,

    > TIMERPROC lpTimerFunc);

    When victims read MSDN and code UINT_PTR, we get compile errors. We have to change it to UINT in order to match the declaration contained in C:Program FilesMicrosoft Visual StudioVC98IncludeWINUSER.H

    But Microsoft’s internal distributions have UINT_PTR exactly as documented. So you could obey MSDN and not get compilation errors.

    No wonder Microsoft thinks that customers are lying when we report Microsoft bugs. We see the facts but Microsoft doesn’t even see the facts internally.

    Also by the way Mr. Chen, when we talk about Windows 95 crashing and Windows 2000 built-in drivers causing blue screens and Windows XP Windows Explorer misbehaving and stuff like that, we’re talking about the versions that customers get. If Microsoft will try putting on some of its test machines the same versions of these products that actually get sold to victims, maybe Microsoft will see that we’re telling the truth.

  51. I quoted the declaration of SetTimer() but forgot to quote the following declaration from C:Program FilesMicrosoft Visual StudioVC98IncludeWINUSER.H (still VS6 SP5):

    typedef VOID (CALLBACK* TIMERPROC)(HWND, UINT, UINT, DWORD);

  52. 1/15/2004 10:54 PM Raymond Chen:

    > The only reason BOOL changed to INT_PTR is

    > that the WM_CTLCOLOR messages require you to

    > return a brush handle (a pointer) cast to

    > BOOL. If it weren’t for that, dialog

    > procedures would have stayed BOOL.

    MSDN October 2001 (which integrates with VS6) says that the WM_CTLCOLOR messages require programmers to return a brush handle cast to LRESULT.

    How many places does MSDN say LRESULT when the actual requirement is BOOL?

    (By the way there’s some other place where BOOL means 0, 1, or 2 for FALSE, TRUE, and OTHER. I forgot where that was though.)

  53. Raymond Chen says:

    UINT_PTR was added last year for Win64. MSDN always describes the latest version of the header files. If you’re still using header files from 1998 then it’s not surprising that they won’t match today’s MSDN.

    As for the CTLCOLOR messages: For Win32, LRESULT and BOOL are the same size, so even though it is technically incorrect, the end result is the same – no harm done. For Win64 you need to be more careful.

  54. Raymond Chen says:

    As for your Win95/Win2000 comments, I’m not sure what you want me to do. I can’t go into the past and fix bugs retroactively. All we can do is fix them for the future.

  55. asdf says:

    I think that’s the tri-state checkbox you’re referring to.

  56. 1/18/2004 5:29 PM Raymond Chen:

    > UINT_PTR was added last year for Win64.

    That means 2003, but I don’t mind if you meant 2002.

    > MSDN always describes the latest version of

    > the header files.

    No. I use the October 2001 issue of the MSDN library, which integrates with VS6. Even if VS6SP5 dates from 2001, public releases of VS .NET don’t. In 2001, anyone who was using the October 2001 MSDN library and VS6SP5 did either have to code a buggly function cast or else had to violate the MSDN library. My question still stands, are you going to put shims in future versions of Windows to support programs where victims were forced to choose between putting in buggly function casts or violating the MSDN library?

    (As for my comments on Windows 95 and Windows 2000, I meant only to inform you that even if Microsoft doesn’t see bugs internally due to Microsoft using patches internally, victims are telling the truth when victims report their suffering from released versions. However, since you ask what to do, here are a couple of suggestions. SERVICE PACKS ARE STILL BEING ISSUED FOR WINDOWS 2000 AND BUGS CAN STILL BE FIXED. For Windows 95, if Microsoft still doesn’t want to release bug fixes that it made for Windows 95 while it was also fixing them in Windows 98 development, then Microsoft can give Windows 98 first edition to victims who suffered from those bugs.)

    (By the way, Windows XP and Windows Server 2003 are still supported products, and Microsoft ought to be capable of fixing Windows Explorer in these products. This is a case where Windows 95 even outperforms Windows Server 2003. Windows 95 Windows Explorer understood long filenames on hard disk drives that were attached after the system was booted. In Windows 95, attach a SCSI hard drive through a PCMCIA SCSI adapter and you can do cut-and-paste to move a file from one directory to another. In Windows 98 or Windows 2000, the hard drive can be either SCSI (via PCMCIA) or USB. In Windows XP and Windows 2003, you can’t do it. Attach a SCSI (via PCMCIA) or USB hard drive, and Windows Explorer says that the hard drive’s FAT32 partition can’t store long filenames, so you can’t even cut-and-paste an existing long filename from one directory to another.)

    (Sorry for going way off topic. But when your tests with Microsoft internal versions produced different results from what customers get with released versions, I needed to inform you that customers’ complaints are genuine.)

  57. Raymond Chen says:

    "when your tests with Microsoft internal versions produced different results from what customers get with released versions"

    I do all my testing for these blog entries on released versions of Windows – the version that customres have.

    Yes bugs can be fixed in service packs, but service packs are very selective in what gets fixed and what isn’t.

  58. "I do all my testing for these blog entries on released versions of Windows – the version that customres have."

    And released versions of VS6SP5?

    How is it possible that you didn’t get compilation errors when you declared your MyTimerProc() with a UINT_PTR parameter?

    When customers with VS6SP5 obey MSDN and declare a TimerProc function with a UNIT_PTR parameter, we get compilation errors.

  59. I misspelled UINT_PTR in the above posting, but spelled it correctly in code when I got compilation errors. Sigh. Anyway the point is, with the released version of VS6SP5, customers cannot obey MSDN October 2001 and use UINT_PTR in our TimerProc definitions.

  60. Raymond Chen says:

    I use the latest Platform SDK headers, which includes basetsd.h, and basetsd.h is the header file that defines the Win64 types.

    http://msdn.microsoft.com/library/en-us/win64/win64/the_tools.asp

  61. Back to the BOOL tangent, there’s a famous case where BOOL can be zero for FALSE, nonzero for TRUE, or -1 for OTHER. GetMessage().

  62. 1/18/2004 5:29 PM Raymond Chen: "MSDN always describes the latest version of the header files."

    Could be. But let’s try to figure out which versions of Windows OSes it describes. Here are two quotations from

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vclib/html/_mfc_CWinApp.3a3a.RegisterShellFileTypes.asp

    "Call this member function to register all of your application’s document types with the Windows File Manager."

    "This allows the user to open a data file created by your application by double-clicking it from within File Manager."

  63. Raymond Chen says:

    Allow me to clarify. The "Platform SDK" always corresponds to the latest public header files.

  64. ??????,??????????? ?????????? calling convention: The history of calling conventions, part 1 The history of calling conventions, part 2 The history of calling conventions, part 3 The history of calling conventions, part 4: ia64 Why do member functions need to be…

  65. Oh neat, here’s a case where Microsoft made a patch to break a previously working feature of Visual Studio .NET 2002. Presumably in Visual Studio .NET 2003 it’s already broken from the start, but I’m not in the mood to test. Oddly they treat it the same as some of their patches which really fix real bugs in Windows, by requiring support calls. The fee might be canceled if Microsoft determines that the patch will resolve a problem, but the only possible effect of this patch is to encourage further breakage.

    http://support.microsoft.com/default.aspx?scid=kb;ja;813340&Product=vsnet

  66. Flier's Sky says:

    The history of calling conventions

  67. So for the past couple of posts , I’ve been walking through a psychic debugging experience I had over

  68. Channel 9 says:

    You could in fact have the same set of API’s in a variety of library formats.

  69. LonTonG says:

    Yesterday, when i delivered a session at MIC UI on Win32 IO operation, we hit C2664 error. (this is a

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index