Psychic debugging: Why your thread is spending all its time processing meaningless thread timers

I was looking at one of those "my program is consuming 100% of the CPU and I don't know why" bugs, and upon closer investigation, the proximate reason the program was consuming 100% CPU was that one of the threads was being bombarded with WM_TIMER messages where the MSG.hWnd is NULL. The program was dispatching them as fast as it could, but the messages just kept on coming. Curiously, the LPARAM for these messages was zero.

This should be enough information for you to figure out what is going on.

First, you should refresh your memory as to what a null window handle in a WM_TIMER message means: These are thread timers, timers which are associated not with a window but with a thread. You create a thread timer by calling the SetTimer function and passing NULL as the window handle. Thread timer messages arrive in the message queue, and the DispatchMessage function calls the timer procedure specified by the message LPARAM. If the LPARAM of a thread timer message is zero, then dispatching the message consists merely of throwing it away. (If there were a window handle, then the message would be delivered to the window procedure, but there isn't one, so there's nothing else that can be done.)

The program was spending all its time retrieving WM_TIMER messages from its queue and throwing them away. The real question is how all these thread timers ended up on the thread when they don't do anything. Who would create a timer that didn't do anything? And who would create dozens of them?

One of the more common patterns for creating a window timer is to write SetTimer(hwnd, idTimer, dwTimeout, NULL). This creates a window timer whose identifier is idTimer. Since the timer procedure is NULL, the WM_TIMER message is dispatched to the window procedure, which in turn will have a case WM_TIMER statement followed by a switch (wParam) to handle the timer message.

But what if hwnd is NULL, say because you forgot to check the return value of a function like CreateWindow? Well, then you just created a thread timer by mistake. And if you make this mistake several times in a row, you've just created several thread timers. Now you might think that the code that created the thread timer by mistake will also destroy the thread timer by mistake when it finally gets around to calling KillTimer(hwnd, idTimer) and passes NULL for the hwnd. But it doesn't.

One reason is that in many cases, it's the timer that turns itself off. In other words, the KillTimer happens inside the WM_TIMER message handler. But if the WM_TIMER message isn't associated with that window, then that window procedure never gets a chance to turn off the timer.

Another reason is more insidious. Recall that the idTimer parameter to the SetTimer function is ignored when you create a thread timer. Since you can't predict what other thread timers may exist, you can't know which timer identifiers are in use and which are free. Instead, the SetTimer function creates a unique thread timer identifier and returns it, and it is that timer identifier you must use when destroying the thread timer. Of course, the code that accidentally created the thread timer thought it was creating a window timer (which uses the timer identifier you specify), so it didn't bother saving the return value. Result: Thread timer is created and becomes orphaned.

The machine I was asked to look at was running a stress scenario, so it was entirely likely that a low memory condition caused a function like CreateWindow to fail, and the program most likely neglected to check the return value. I never did hear back to find out if that indeed was the source of the problem, but seeing as they didn't come back for more help, I suspect I put them on the right track.

Nathan_works says:

October 16, 2008 at 2:16 pm

Stress scenarios.. Yuk.

Driver Verifier had such tools, IIRC, in the win2k ddk.. Problem is, while it’s always good to do, you can quickly triple or more your coding to test for failures of functions that should always work.

Of course, when you’re the OS, you don’t have that nice assumption of, "if this function fails, we have bigger problems" (almost an exact quote I’ve seen in a comment in code..)

Igor Levicki says:

October 16, 2008 at 6:26 pm

I guess they didn’t expect CreateWindow() to fail :)

barrkel says:

October 16, 2008 at 8:17 pm

And this is one reason why exceptions are better than error codes.

Reginald Wellington III says:

October 16, 2008 at 8:26 pm

People write empty catch(…) blocks just as often as they don’t check error codes.

Siebe Tolsma says:

October 16, 2008 at 9:21 pm

@Reginald: Perhaps, but it is much easier to spot empty catch blocks than error codes which aren’t being handled.

John says:

October 17, 2008 at 2:21 am

Is using SetTimer() to create a thread timer better than using WaitForSingleObject() with timeout (waiting on an event which kills the timer) in a loop?

Dean Harding says:

October 17, 2008 at 2:42 am

John: Um, yes?

I don’t see under what circumstance you’d expect the answer to be "no"…

Worf says:

October 17, 2008 at 3:31 am

@Dean: depends on the app.

If the thread is handling messages, then WM_TIMER is great. But if the thread isn’t handling messages, the Wait* functions are better. (Perhaps it’s a thread that does some background work and responds to signals and events – a server thread, perhaps).

@Siebe: The programmers who do empty catch blocks also often use the design pattern "exception flow", where try-catch is used as normal flow control constructs.

e.g.

try

{

int count = 10;

int dummy;

int result;

while(1)

count–;

// do stuff

dummy = 1 / count; // abort when done

}

catch (…)

// do more stuff on last iteration

// continue with rest of code

This is somewhat contrived, but there are plenty of examples of using try-catch as a program-flow-control structure at the dailywtf.com…

October 17, 2008 at 5:32 am

Worf: but that was kind of my point, you would use WaitForSingleObject when SetTimer was not appropriate. But if SetTimer is appropriate, you wouldn’t use WaitForSingleObject…

It’s like asking "should I use a tennis racket to play tennis, or a baseball bat to play baseball?"

AndyB says:

October 17, 2008 at 8:23 am

plenty of examples of using try-catch as a program-flow-control

yes, the EndOfStreamException and EndOfFile Exceptions to name just two.

October 17, 2008 at 10:58 pm

Dean: You may want to read between the lines, where the question is: I want to create a worker thread that does some work once per second. Is SetTimer more appropriate than WaitForSingleObject, and if so, why?

October 18, 2008 at 3:39 am

@Andy: Perhaps I wasn’t being explicit enough – my example was common – you deliberately throw an exception (either using throw, or cause one (divide by zero, null pointer) to break from the loop.

There are catch blocks that test the exception for deliberate loop-breaking throws and other exceptions they could catch (which they then re-throw).

The exception isn’t, basically (despite the fact that exception handling is extremely expensive compared with the proper way).

Date:	October 16, 2008 / year-entry #342
Tags:	code
Orig Link:	https://blogs.msdn.microsoft.com/oldnewthing/20081016-00/?p=20543
Comments:	12
Summary:	I was looking at one of those "my program is consuming 100% of the CPU and I don't know why" bugs, and upon closer investigation, the proximate reason the program was consuming 100% CPU was that one of the threads was being bombarded with WM_TIMER messages where the MSG.hWnd is NULL. The program was dispatching...