A thread waiting on a synchronization object could be caught napping

Date:March 13, 2006 / year-entry #91
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20060313-06/?p=31953
Comments:    14
Summary:If you have a synchronization object, say a semaphore, and two threads waiting on the semaphore, and you then release two semaphore tokens with a single call to ReleaseSempahore, you would expect that each of the waiting threads would be woken, each obtaining one token. And in fact, that's what happens-most of the time. Recall...

If you have a synchronization object, say a semaphore, and two threads waiting on the semaphore, and you then release two semaphore tokens with a single call to ReleaseSempahore, you would expect that each of the waiting threads would be woken, each obtaining one token. And in fact, that's what happens—most of the time.

Recall in our discussion of why the PulseEvent function is fundamentally flawed that a thread in a wait state could be momentarily woken to service a kernel APC and therefore miss out on the pulse. The same thing might happen to the thread waiting for the semaphore. If the ReleaseSemaphore happens to occur while a thread has been taken out of the wait state to service a kernel APC, it will not claim the token immediately but rather will attempt to claim the token when the kernel APC completes and the thread is about to be returned to the wait state.

Normally this is not a problem, because the token will still be there waiting for the thread. But if you have multiple threads all competing for the token, there is a small probability that in the time it took the thread to service the kernel APC, that other thread which was also waiting for a token not only got the first token, but managed to complete whatever work was associated with the token and issue a new WaitForSingleObject which claims the second token! The first thread was caught napping and missed out on both tokens.

Fortunately, the cases where you have this sort of rampant competition for semaphore tokens are typically producer/consumer scenarios where it doesn't really matter who consumes the token, as long as somebody does.

Exercise: If there are multiple threads waiting on an auto-reset event and the event is signalled, can you predict which thread will be released?

Comments (14)
  1. Bart says:

    In case of multiple threads waiting for an autoreset semaphore, i hope the thread that started blocking on the semaphore most recently is continued, in the hope that the stack it uses is still in the cpu cache.

  2. KJK::Hyperion says:

    The solution, of course, is to ditch events and to use condition variables – natively implemented for the first time in Windows Vista (could probably be backported all the way back to Windows XP, IIRC it’s all about basic operations plus keyed events), and independently implemented against the POSIX standard by RedHat as part of the Win32 POSIX threading library (http://sourceware.org/pthreads-win32/)

  3. J says:

    "In case of multiple threads waiting for an autoreset semaphore, i hope the thread that started blocking on the semaphore most recently is continued, in the hope that the stack it uses is still in the cpu cache."

    What if that thread is servicing a kernel APC at the time?

  4. pcooper says:

    You answer your exercise at the end of the linked-to "the PulseEvent function is fundamentally flawed" article.

  5. JJ says:

    Why would you want to predict something like this  ?!

  6. mikeb says:

    Even if you take out of consideration that a thread might be pulled out the wait queue to service an APC, I see nothing in the documentation (SDK or DDK) that indicates the system guarantees fairness or a particular order for releasing threads waiting on an event (or other waitable object).

    One commenter said that the most recent waiting thread should be dispatched in order to help maintain cache coherency.  However, this would introduce a potential starvation problem for other waiting threads, so you might want to have a FIFO ordering.

    Then again, the algorithm might be more complex, tending toward releasing the most recent waiting thread (to preserve cache coherency) but not do this all the time to prevent starvation.

    Also, you might want to factor in the priority of the waiting threads in the decision.

    I don’t know how Windows decides which thread to release when more than one is waiting, and I believe that code should not be written to expect such a promise (I’d guess it’s a FIFO or prioritized FIFO, but code should not depend on this).

  7. 8 says:

    JJ, exactly. Trying to predict it only results in nastiness. (The "WFM" bugs)

    Mikeb, I think the cache will already have had several misses by the time execution continues for any given released thread on a uniprocessor machine.

  8. offby1@blarg.net says:

    How much would they have to pay you to write Win32 documentation?  Why can’t the official Win32 docs be one-tenth as clear and engaging as your writing?  Grr.

  9. Dan Farino says:

    I once did a kernel debug session to see what happened inside of KeWaitForMultipleObjects when multiple threads were waiting on a single auto-reset event. I was surprised to see all of the threads were actually made runnable as a result of the event being set. (I would have expected only one of the threads to become runnable.)

    Each thread, when it was eventually scheduled, would look at the current state of the kernel object it had just waited on. If it was not still signalled, it would simply loop back and begin waiting on the same object again. (The first thread would see the object as signalled, and, since the event was configured to auto-reset, would then set the event to the non-signalled state).

    So it seems to me that the answer to the question is: (of the threads that were actually blocked waiting for the event inside of KeWaitForMultipleObjects) the first one to reach the code path in KeWaitForMultipleObjects that

    "unsignals" the event. In the case of multiple threads waiting, it seems like the decision is up to the Windows scheduler to determine which of the newly-runnable threads will be run on the CPU(s).

    I was surprised to see this, as it means that each thread waiting for an auto-reset event will incur a context switch (even though it will not be runnable for any longer than it takes to see the event is unsignalled).

    I’m not a Windows scheduler expert by any stretch, but I was surprised that I didn’t see a "queue" of some sort attached to the event. Auto-reset events just seem to let every thread go knowing that all but one will be put back to sleep in short order.

  10. Jam says:

    Do semaphores incur the the same context switch penalties that Dan observed for auto-reset events?

    Do critical sections incur similar context switch penalties? In http://msdn.microsoft.com/msdnmag/issues/03/12/CriticalSections/default.aspx we find out that at the heart of every critical section there’s an auto-reset event (see the description of the LockSemaphore field).

    I’ve been toying around with building my own critical sections to learn more about them, and after reading the above article I used an event coupled to a basic spinlock. Maybe it’s time to move to semaphores.

  11. Dan Farino says:

    I’ve recreated my experiment and here’s what I found:

    My original test was done using System.Threading.AutoResetEvent from a .NET application. I was calling WaitOne() on that event from several different threads. It looks like the CLR calls WaitForMultipleObjectsEx with nCount = 1 (instead of just calling WaitForSingleObjectEx). In this case, every thread waiting was made runnable when the event was signalled.

    I repeated the test with a C++ app that calls WaitForSingleObjectEx (and, in the kernel, KeWaitForSingleObject). This behaved as expected: only one thread was made runnable after the event was signalled.

    So it looks like I got bit by a CLR implementation detail on this one. I guess  the optimizations that could be made would be either 1) having the CLR call WaitForSingleObjectEx, or 2) having the kernel optimize the case of WaitForMultipleObjectsEx with nCount = 1.

    My apologies for incorrectly sending up the red flag on this one!

  12. memet says:


    Do you know which kernel you were running on? (NT 4, 5, 5.1?/ sp?).

    I am so surprised by what you wrote because it seems that it is a very simple optimization that could result in *huge* benefits. Just imagine the number of times threads block on events ever second.

    Anyways, can you share more details on what you did. Thanks.

  13. Peter Ritchie says:

    Sounds to me that the OS is fundamentally flawed, not semaphores or events (PulseEvent) in particular.

    If the OS didn’t do what it does, then these things would work fine.

Comments are closed.

*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index