How do I fix the problem of a long-running task running on the thread pool persistent thread?

Last time, we diagnosed a problem caused by a long-running task on the thread pool persistent thread, which prevented other tasks which target the persistent thread from running. But what motivated the developers to write code that put a long-running task on the persistent thread in the first place?

My theory is that the developers saw this sentence in the documentation for RegNotifyChangeKeyValue:

With the exception of RegNotifyChangeKeyValue calls with REG_NOTIFY_THREAD_AGNOSTIC set, this function must be called on persistent threads.

The reasoning was probably, "Well, the documentation says that this must be called on a persistent thread, so we have to schedule it with the WT_EXECUTEINPERSISTENTHREAD flag."

That's not what the documentation is trying to say, but I can't fault them for misinterpreting it that way.

What the documentation is trying to say is, "If you know what's good for you, you will call this function on a thread that will not exit until you close the registry key. If the thread exits prematurely, then the notification will stop working (in a specific way described below, though you would be best to just avoid the problem entirely)."

The documentation is using the word "persistent thread" here in a generic sense, meaning any thread that does not exit (until the thing you care about is over). It doesn't have to run on the thread pool's persistent thread; any persistent thread will do.

The callback function registered by the application does not close the registry key handle until it has finished with the change notification, so it's fine to run this code on any thread; it doesn't have to run on the thread pool's persistent thread.

Therefore, one fix for the problem is to remove the WT_EXECUTEINPERSISTENTHREAD flag. This runs the work item on a thread pool thread with no special attributes. You still want the WT_EXECUTELONGFUNCTION flag because the work item runs long: It runs indefinitely until the monitoring is stopped.

However, running a long function with indefinite lifetime isn't really the thread pool's bread and butter. The thread pool is really for running large numbers of short items. After all, scheduling a work item on the thread pool that runs indefinitely isn't really all that different from running a dedicated thread for the work item. The purpose of the thread pool is to amortize the cost of starting up a thread over many work items. Otherwise, the system will be spending more time starting up threads than it does actually performing work.

This benefit doesn't really help work items with indefinite running time. From a percentage standpoint, the added cost of starting up a new thread is not significant if the thread is going to be running for minutes.

But if you look at the work item that the customer was scheduling, it's not really doing work most of the time anyway. It spends most of its time waiting! Next time, we'll look at another way of designing the code so that instead of burning a thread for each active monitoring request, it pools the requests.

In other words, we're going to use the thread pool as a thread pool. Tune in next time for the exciting conclusion.

DWalker07 says:

February 16, 2017 at 9:46 am

Exactly; the documentation is unclear, and the flag (in hindsight) could have been named WT_EXECUTEINTHE THREADPOOLSONEANDONLYPERSISTENTHREAD.

IChrisI says:

February 16, 2017 at 11:35 am

How about WT_EXECUTEBLOCKING?
1. Raymond Chen - MSFT says:
  
  February 16, 2017 at 12:32 pm
  
  “Why yes, my task is blocking. I’ll set this flag so the thread pool knows that my task will block for a long time.”
2. Joshua says:
  
  February 16, 2017 at 4:00 pm
  
  WT_HOGENTIREPOOL

Piotr says:

February 16, 2017 at 12:50 pm

Why does the execution time matter in a thread pool? Shouldn’t it be irrelevant? Shouldn’t a thread pool be just an array of pre-created threads waiting for work no matter how long it takes? Like a thread pool waiting for HTTP requests – some may finish in less than a second, while some may stall due to a long SQL query.

Wear says:

February 16, 2017 at 1:46 pm

Because if everyone used the thread pool to run long tasks there would be no threads in the thread pool. You also wouldn’t get any benefit over creating your own threads. The purpose of thread pool is to avoid the overhead of creating a new thread when you only need one for a short time. Rent-a-Thread.
Brian says:

February 16, 2017 at 2:55 pm

Thread pools work best when the work dispatched to the pool threads all takes roughly the same amount of time (give or take, for some definition of roughly). Consider the case where you have two generators of work that dispatch to the same thread pool, but the work from each generator is off by, say, three orders of magnitude (some work takes 1 second and some work takes 1000 seconds). Eventually, all (for some definition of all) of the threads will be occupied doing the 1000 second tasks.
Now, in real life, you get a mix of work dispatched to the thread pool, at a mix of frequencies. But, if you start getting really long work (where really long is measured in comparison to the other tasks dispatched), bad things start to happen.

skSdnW says:

February 16, 2017 at 3:39 pm

I know this is slightly off topic but it would be fun if you could talk a bit about the original shlwapi threadpool from Win2000, it has several interesting reserved parameters (Id, Tag, Priority) that did not survive the move to the new threadpool in XP. And then the threadpool was changed again in Vista.

Will says:

February 17, 2017 at 5:05 am

I saw this in the doc for RegNotifyChangeKeyValue, so what is it trying to say? “For the original thread pool API, specify WT_EXECUTEINPERSISTENTTHREAD using the QueueUserWorkItem function.”

JDG says:

February 21, 2017 at 12:02 pm

It does indeed appear to be incorrectly documented on the MSDN page. So, the client was probably doing it that way because they were told to, simple as that. :-P

Date:	February 16, 2017 / year-entry #40
Tags:	code
Orig Link:	https://blogs.msdn.microsoft.com/oldnewthing/20170216-00/?p=95455
Comments:	10
Summary:	Hey, you, get off of my thread.