Non-psychic debugging: Why you're leaking timers

Non-psychic debugging: Why you’re leaking timers

Date:	October 1, 2010 / year-entry #279
Tags:	code
Orig Link:	https://blogs.msdn.microsoft.com/oldnewthing/20101001-00/?p=12663
Comments:	22
Summary:	I was not involved in this debugging puzzle, but I was informed of its conclusions, and I think it illustrates both the process of debugging as well as uncovering a common type of defect. I've written it up in the style of a post-mortem. A user reported that if they press and hold the F2...

I was not involved in this debugging puzzle, but I was informed of its conclusions, and I think it illustrates both the process of debugging as well as uncovering a common type of defect. I've written it up in the style of a post-mortem.

A user reported that if they press and hold the F2 key for about a minute, our program eventually stops working. According to Task Manager, our User object count has reached the 10,000 object limit, and closer inspection revealed that we had created over 9000 timer objects.

We ran the debugger and set breakpoints on SetTimer and KillTimer to print to the debugger each timer ID as it was created and destroyed. Visual inspection of the output revealed that all but one of the IDs being created was matched with an appropriate destruction. We re-ran the scenario with a conditional breakpoint on SetTimer set to fire when that bad ID was set. It didn't take long for that breakpoint to fire, and we discovered that we were setting the timer against a NULL window handle.

A different developer on the team arrived at the same conclusion by a different route. Instead of watching timers being created and destroyed, the developer dumped each timer message before it was dispatched and observed that most of the entries were associated with NULL window handles.

Two independent analyses came to the same conclusion: We were creating a bunch of thread timers and not destroying them.

A closer inspection of the code revealed that thread timers were not intended in the first place. Each time the user presses F2, the code calls SetTimer and passes a window handle it believes to be non-NULL. The timer is destroyed in the window procedure's WM_TIMER handler, but since the timer was registered against the wrong window handle, the WM_TIMER is never received by the intended target's window procedure, and the timer is never destroyed.

The window handle is NULL due to a defect in the code which handles the F2 keypress: The handle that the code wanted to use for the timer had not yet been set. (It was set by a later step of F2 processing.) The timer was being set by a helper function which is called both before and after the code that sets the handle, but it obviously was written on the assumption that it would only be called after.

To reduce the likelihood of this type of defect being introduced in the future, we're going to introduce a wrapper function around SetTimer which asserts that the window handle is non-NULL before calling SetTimer. (In the rare case that we actually want a thread timer, we'll have a second wrapper function called SetThreadTimer.)

I haven't seen the wrapper function, but I suspect it goes something like this:

inline UINT_PTR SetWindowTimer(
    __in HWND hWnd, // NB - not optional
    __in UINT_PTR nIDEvent,
    __in UINT uElapse,
    __in_opt TIMERPROC lpTimerFunc)
{
    assert(hWnd != NULL);
    return SetTimer(hWnd, nIDEvent, uElapse, lpTimerFunc);
}

inline UINT_PTR SetThreadTimer(
    __in UINT uElapse,
    __in_opt TIMERPROC lpTimerFunc)
{
    return SetTimer(NULL, 0, uElapse, lpTimerFunc);
}

__declspec(deprecated)
WINUSERAPI
UINT_PTR
WINAPI
SetTimer(
    __in_opt HWND hWnd,
    __in UINT_PTR nIDEvent,
    __in UINT uElapse,
    __in_opt TIMERPROC lpTimerFunc);

There are few interesting things here.

First, observe that the annotation for the first parameter to SetWindowTimer is __in rather than __in_opt. This indicates that the parameter cannot be NULL. Code analysis tools can use this information to attempt to identify potential defects.

Second, observe that the SetThreadTimer wrapper function omits the first two parameters. For thread timers, the hWnd passed to SetTimer is always NULL and the nIDEvent is ignored.

Third, after the two wrapper functions, we redeclare the SetTimer, but mark it as deprecated so the compiler will complain if somebody tries to call the original function instead of one of the two wrappers. (The __declspec(deprecated) extended attribute is a nonstandard Microsoft extension.)

Exercise: Why did I use __declspec(deprecated) instead of #pragma deprecated(SetTimer)?

Comments (22)

Anonymous says:

October 1, 2010 at 7:28 am

The API they ended with is what the API should have been since the beginning – no magic values selecting a different magic meaning, and each function doing just one thing.
Henning Makholm says:

October 1, 2010 at 8:01 am

"Why did I use __declspec(deprecated) instead of #pragma deprecated(SetTimer)?"

Not being psychic, I can only guess: It must have something to do with the fact that __declspec is much better suited for preprocessor tricks that you might need in order to push the code through different compilers, analysis tools, and the like.
Saveddijon says:

October 1, 2010 at 8:06 am

What I'd like to know: given the application in question, would it be reasonable for a user to hold down the F2 key for an entire minute? How was this even discovered? (Perhaps it is reasonable; I don't know what this application is, or what it does. If it's a racing car game where F2 maps to the gas pedal then maybe it's very reasonable to have it pressed for a full minute.)
Chris Taylor says:

October 1, 2010 at 8:30 am

"Why did I use __declspec(deprecated) instead of #pragma deprecated(SetTimer)"

I prefer the using __declspec(deprecated) because you have the option to provide a message to the developer informing them of the alternate API functions. But it does not seem that you are taking advantage of that in this case.
Chris Taylor says:

October 1, 2010 at 8:34 am

Ah, it just came to me. Using the pragma would mark all overloads deprecated which might be a problem if other libraries or utility functions provide a function with the same name but different signature.

[True, but there's another reason which is probably more likely… -Raymond]
A Guy Somewhere Cold says:

October 1, 2010 at 8:54 am

@Saveddjion: Thorough application testing means not only testing the expected code paths, but the unexpected ones. Users can do some really bizarre things for reasons you might never have expected, and I think it would be rather arrogant to suggest that a given bug is a user's fault because they were using the application wrong.
Kujo says:

October 1, 2010 at 9:20 am

The pragma doesn't care about type or scope, so referencing Foo::SetTimer() or enum { SetTimer } will be considered deprecated.

[That's what I was looking for. A lot of classes have a method called SetTimer. -Raymond]
Anonymous says:

October 1, 2010 at 9:50 am

It's over 9000…!
PhiSmi says:

October 1, 2010 at 10:33 am

The F2 for a whole minute is not very relevant. Every F2 press/repeat is going to leak a timer resource. It just took a minute to exhaust them all. Don't get distracted by how the problem was discovered; the problem is valid and ought to be resolved. If the application ran for a long time the F2 will eventually be pressed enough times.
Timothy Byrd says:

October 1, 2010 at 11:10 am

PhiSmi's answer is better, but I was going to say that having the corner of a book or notebook accidentally hold a key down is not unheard of in my experience.
asf says:

October 1, 2010 at 2:55 pm

Why no IsWindow in the assert?
DWalker says:

October 1, 2010 at 8:56 am

Don't press and hold the F2 key for a whole minute! (Although it might be appropriate, in the context of the program.)

Doctor, it hurts when I do *this*. "Then, don't do that."
DWalker says:

October 1, 2010 at 4:03 pm

Ah, yes, a book on the keyboard could certainly cause that problem. Or a cat, which happens at my house often enough. Once, our cat bought an expensive piece of art from EBay. My partner learned not to leave the screen up with the "buy now" button visible and leave the computer.
Dave says:

October 2, 2010 at 3:25 am

Code analysis tools can use this information to attempt to identify potential defects.

Unfortunately PREfast's ability to detect use of NULL pointers isn't very good, and conversly it produces large numbers of false positives for these, so I wouldn't rely on this.

assert(hWnd != NULL);

… which will only work in the non-release version of the code, thus making it anything from "completely useless" through to "only marginally effective".
Neil says:

October 2, 2010 at 3:31 am

(In the rare case that we actually want a thread timer, we'll have a second wrapper function called SetThreadTimer.)

SetThreadTimer will still need the nIDEvent parameter in case you need to reset an existing timer. (Or create a ResetThreadTimer wrapper…)
Mark Steward says:

October 2, 2010 at 11:56 am

Dave: why on earth would you turn a resource leak into a fatal error on a release product? If passing a null value will cause serious problems during normal usage, don't use an assert.

asf: IsWindow is not thread safe, and you're more likely to pass a valid (but wrong) handle than a non-existent one. At best, you'd catch a corrupt variable just before SetTimer does.
Gabe says:

October 4, 2010 at 9:36 am

While it's certainly an interesting question as to how they discovered that holding down F2 for a minute makes the app stop working , you really have to ask why nobody noticed that the timer wasn't working! I'm sure the test plan is much more likely to include "Press F2. Verify that X happens after Y seconds." than "Hold down F2 for 1 minute. Verify app still works."!

[My guess (total guess) is that the "set up the F2 timer" function was called a second redundant time, at which point the handle was valid and the timer ran as expected. The only consequence was the leaked timer from the first call. -Raymond]
Tergiver says:

October 6, 2010 at 5:08 pm

The other flaw that is missed in the analysis is that the developer used the WM_KEYDOWN event to trigger an action (which is subject to repeat), rather than WM_KEYUP which is the 'more correct' choice. Mouse/Key down is for predicate to an action (e.g. selection), up is where actions should trigger.
Gabe says:

October 6, 2010 at 11:41 pm

Tergiver: I can't think of any key action that triggers on WM_KEYUP instead of WM_KEYDOWN. Can you name some?
Tergiver says:

October 7, 2010 at 7:36 am

Frankly, I think most programmers simply get it wrong. The only case that comes easily to mind where it is correct is Shift+F10 (or the context menu key). Of course this only works correctly if the programmer responds to WM_CONTEXTMENU instead of WM_RBUTTONUP (or if you're really green you think it should be WM_RBUTTONDOWN). WM_CONTEXTMENU is fired on WM_KEYUP (I believe TranslateMessage does the job of sending WM_CONTEXTMENU).
GregM says:

October 7, 2010 at 10:39 am

Tergiver, it's only "wrong" if you don't want key repeat to trigger the action multiple times.
Tergiver says:

October 7, 2010 at 10:55 am

Which is what I said the first time: "trigger an action (which is subject to repeat)"

Comments are closed.

*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

A "redesign" after 2019 erased thousands of user's comments from previous years. As many have stated, the comments are nearly as important as the postings themselves. The archived copies of the postings contained here retain the original comments.
The blog has changed domains many times and the urls have otherwise been under constant change since 2003. Even when proper redirection has been set up for those links, redirection only works for a limited period of time. For example, all of the internal blog links that were valid in early 2019, were broken by 2020 without proper redirection.
The blog has been under constant re-design and re-theming since its inception. It is downright irritating to deal with a bogged-down site experience as the result of the latest visual themes designed for cell-phone browsers. As of this writing, it is cumbersome to navigate titles with only 10 entries per page. While it is nice that the official site has a search feature, searching using this index (with all titles on a single page) is much quicker (CTRL-F in most browsers).

<-- Back to Old New Thing Archive Index