Date: | April 3, 2007 / year-entry #116 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20070403-00/?p=27393 |
Comments: | 32 |
Summary: | The thread pool is about reducing thread creating/termination overhead by consolidating work that would normally go onto separate threads into a small number of threads. In a sense, you shouldn't be surprised that the thread pool is using only one thread; instead, you should be happy! I switched to using the thread pool, and I'm... |
The thread pool is about reducing thread creating/termination overhead by consolidating work that would normally go onto separate threads into a small number of threads. In a sense, you shouldn't be surprised that the thread pool is using only one thread; instead, you should be happy!
The purpose of the thread pool, as I noted above, was to reduce the overhead of creating and terminating threads by running multiple tasks on a thread. For example, suppose you have three short tasks, say 1ms each. If you put each one on its own thread, you have
Now suppose, for the purpose of this discussion, that creating and terminating a thread take 1ms each. if you create a separate thread for each task, you've spent 6ms on thread overhead and only 3ms doing actual work. What if we could run multiple tasks on a single thread? That way, the cost of creating and terminating the thread could be amortized over all the tasks.
Ah, now we have only 2ms of overhead for 3ms of work. Not great, but certainly better than what we had before. If we can pack more tasks into the thread pool, the fixed overhead of creating and terminating the thread becomes proportionally less. The thread pool is designed for handling a collection of brief tasks, since those are the tasks that would best benefit from thread pooling. If you had a task that ran for ten seconds, putting it on the thread pool wouldn't yield much in the way of savings; that 2ms overhead you avoided is just noise compared to your ten seconds of running time. (Last year, we saw another case of a series of tasks ill-suited to thread pooling.) As an accommodation for people who will put the occasional long-running task onto the thread pool (perhaps because it simplifies the program logic by treating everything as a work item), the thread pool allows you to give it a heads-up by passing the |
Comments (32)
Comments are closed. |
Once I found the thread pool I never looked back. I’ve used it for handling long running tasks, timed waits, waiting on objects and alertable I/O. I even use it with WT_EXECUTELONGFUNCTION when I want a thread but don’t want to call _beginthreadex(). The WIN32 thread pool has helped me to become the multi-threaded programming god that I am today. ;)
Just make sure to use -D_WIN32_WINNT=0x0500 if you want to use the thread pool on Windows 2000!
It seems Raymond’s example is just shy of the break-even point where it would become worthwhile to start a second thread on a dual-core machine.
The first thread starts: +1 ms.
The second thread starts in parallel with the first thread running the first task: +1 ms.
The second and third tasks now run in parallel: +1 ms.
Two threads are taken down serially: +2 ms.
Total wall-clock time is 6 ms, half of which is overhead, only a millisecond more than Raymond’s example. As more tasks are added this is amortised AND those tasks can potentially run in parallel. With just a couple more tasks, you’re ahead of the game with two-threads on a dual-core machine.
Obviously you don’t want to create a thread for each task, but it seems like it could make sense to create one for each processor core in all but the most limited circumstances (in which case, the extra overhead is probably unnoticed anyway).
I wrote my own thread pooling that works this way (one thread per core) for a compute-intensive project. I’ve had great success with it a dual-core machine running XP. On Vista, however, the machine becomes quite sluggish if you try to interact with another app while my program is running in the background.
My problem with the thread pool pre-vista is there is no way to be 100% sure that a task assigned to the thread pool is complete. The documentation is sorely lacking when talking about lifetime management of threads in the pool.
@JeffCurless
I would recommend using an event or some other flag (possibly contained in a struct passed as a parameter to the thread) to indicate when the task is complete. Using some other synchronization primitive like waiting on the thread handle is not guaranteed to work as you have no idea when that thread will actually be terminated. Heck, the same thread that starts processing the task might not be the same one that finishes it.
I think the point is – why does the thread "pool" only use a -single- worker thread? Why not call it something different, and leave the name "thread pool" to something where you actually can have a pool of threads waiting to be assigned tasks?
That way, 2 of the 3 tasks could run concurrently (on a dualcore processor that is), and all 3 tasks would finish in 2ms (and since you could use an application-wide thread pool, the initial thread creation overhead would be close to 0, hidden somewhere in the application startup time).
Or am I missing something?
I’m not sure that you really addressed the original question – which was asking why the thread pool only used a single work thread for compute tasks.
Was it because they had only a few large tasks (that weren’t marked as ‘long’)?
What if there were 5000 “brief tasks”?
Just curious.
Re: The documentation is sorely lacking when talking about lifetime management of threads in the pool.
That might be because the thread pool is NOT there to solve threading lifetime management problems. It has very specific purpose, to allow programmers a efficient way of running multiple small tasks in separate thread(s) without having to create and tear down threads constantly. If I remember correctly, the thread pool is also smart enough to know when you’ve given it the same task (section of code) multiple times and use the same thread to execute that code gaining some efficiency.
Personally, I’ve used the thread pool in .Net to call event delegates since I don’t care what happens and don’t want the event consumer running on MY threads possibly choking my application. In that case, I really, really don’t care about lifetime management of those threads.
Re: That might be because the thread pool is NOT there to solve threading lifetime management problems.
<snip>
Think about this in non-managed C++ code then. I start a long running operation. Then something causes the application to close, which causes some memory to be cleaned up, which may or may not cause the long running operation to crash, since there is no way to do a WaitForSingleObject on the thread pool to ensure it has completed. Now if Microsoft explicitly stated when and how the thread pool was cleaned up, I could design the rest of my application accordingly. Right now, I have absolutely no knowledge of how they cleanup, so therefore I can’t in good conscience use the threadpool.
Thanks for the response Raymond
I certainly don’t envy the group that manages the heuristics for the thread pool implementation, as the optimum strategy is different depending on whether the tasks are cpu heavy or i/o heavy.
And heaven help you if you have tasks that are both cpu and i/o heavy.
Jeff,
You’re still assuming that the it’s the thread pool’s job to help you with thread lifetime management. That’s not what it is designed to do regardless of whether the code is managed or unmanaged. Managed code can and does have the same issues. I don’t understand why you want the thread pool to inform YOU of whether YOUR code is complete or not. It’s your job to manage your thread lifetime and communicate with it if necessary. I would suggest, though, that if your threading scenario is that complicated then the thread pool is not the right tool to use. In fact, the documentation (http://msdn2.microsoft.com/en-us/library/ms686756.aspx) seems fairly clear on that subject:
"To use thread pooling, the work items and all the functions they call must be thread-pool safe. A safe function does not assume that the thread executing it is a dedicated or persistent thread. In general, you should avoid using thread local storage or making an asynchronous call that requires a persistent thread, such as the RegNotifyChangeKeyValue function."
<< In a sense, you shouldn’t be surprised that the thread pool is using only one thread; instead, you should be happy! >>
Uhhh. No. I’d be happy if it was a single core processor. If it was a dual or quad core system, I’d be peeved at all of the CPU time being squandered. Especially when you change this example to be 10 or 100 or 10,000 “tasks”, the overhead for creating a thread is completely marginalized.
Why doesn’t the thread pool start with a number of threads equal to the number of apparent CPUs?
If it’s already virtualizing threads, why not amortize the cost of creating threads by keeping a harem of threads for the lifetime of the pool. If something is scaling perfectly you’d need one thread per core, which is a very small number anyways, and just means a few megs of unused stack if you don’t bother using less.
I know this doesn’t have anything to do with you Ray, but… how many MSN Messenger’s threads does it take to change a light bulb? Seriously, why the heck does Messenger need so many threads? (sorry, I needed to vent)
Anonymous Coward: check out how many threads ActiveSync uses: http://codeka.com/blogs/index.php/dean/2006/08/02/activesync_and_the_number_of_threads_in_
And to those people wondering why the thread pool only creates one thread even if you have more than one processor, you just need to realize that thread pools are meant to be used in a server environment. In a server, the wall time required for an operation to complete (that is, the "response time") is not (usually) as important as the throughput (that is, the number of requests processed per second). Even though on multiple CPUs, creating two threads takes the same amount of "wall time" as creating one thread, you’re still using twice as many CPU cycles to create them both – CPU cycles that could be doing something else.
Besides, as Raymond says, the thread pool WILL eventually create another thread if the number of requests increases.
Wednesday, April 04, 2007 3:06 AM by Dean Harding
That reinforces the question. Servers make it so much more likely that there will be a heavy load on the thread pool. Servers make it so much more likely that the number of thread pool threads will grow as large as the number of processors (and maybe even more). So much more the reason to create them to begin with, instead of leaving cores idle in the meantime.
Just for interest, there are two .net CLR thread pools. One for ‘work items’ and one to manage IO Completions. Both thread pools start out with as many threads as there are cores.
The CLR will grow the pools, but not immediately – it’s quite prepared to let remoting calls block at the server waiting for an available thread, and the pool isn’t grown nearly as quickly as I’d expect (which is probably a fault of my expectations).
Socket accept calls are each done on their own non-pooled thread. Makes sense, as these could block forever…
(I mention all this because the CLR thread pools probably use the underlying windows thread pool and there’s a 1:1 mapping between CLR threads and real kernel threads).
Nar: A ‘harem of threads’ – is this the official collective noun for threads? Are Harems kept in Pools? Is there an undocumented Harem api somewhere deep in Vista? I look forward to the MSDN page (I’d be willing to write it for that matter).
"That reinforces the question. Servers make it so much more likely that there will be a heavy load on the thread pool. Servers make it so much more likely that the number of thread pool threads will grow as large as the number of processors (and maybe even more). So much more the reason to create them to begin with, instead of leaving cores idle in the meantime."
This assumes that there is only ever one process running on the server, and therefore that process should be given all of the CPU time it can take. It is very unlikely that this is true.
I’d go with the assertion that a "pool" is a bad term for this object. Database connection pools usually have a bunch (or harem, I like that) of connections ready for use by the program. The entire purpose being to elimiate as much as possible the connection setup/teardown time.
Perhaps we have all read
http://www-128.ibm.com/developerworks/java/library/j-jtp0730.html
http://www.cs.wustl.edu/~schmidt/PDF/OM-01.pdf
It seems kind of counter-intuitive that a "pool" should only contain one thing.
"S": there specifically is NOT an 1:1 mapping between native and managed threads. See Thread.BeginThreadAffinity/EndThreadAffinity, HostProtectionResource.ExternalThreading, etc. and the ICLRTask hosting interface
I do hope nobody posts the current algorithm for deciding when to create another thread in the pool.
Because, you just *know* someone will assume that behaviour when programming, and then the algorithm will never be able to change again.
KJK::Hyperion – I meant for .net 1.0/1.1 only (some of us have to remain in the dawn, whilst other bask in the sun). 2.0+ adds all the goodness you describe.
(and in the .net world, shouldn’t you be KJK.Hyperion?)
Hayden: Read the whole article. There is NOT "only" one thread. There is "only" one thread when the number of queued tasks is small.
And my own "nitpicker’s corner":
s/small/too small to warrant creating another thread/
Wednesday, April 04, 2007 10:55 AM by GregM
>
No, the possibility that cores might be left idle assumes that there’s only one process running on the server, but the reason to create them is because of the possibility that there might be only one nontrivial process running. Sure if you’re running Exchange and SQL on the same 4-core server then you might only want 1, I mean 2, initial threads in each pool, if you’re sure you want to initially restrict each to 50% of the CPU.
Nitpicker’s corner:
s/one/fewer than the number of apparent cores/
s/leaving cores idle/sometimes leaving cores idle/
Wednesday, April 04, 2007 8:33 PM by Dean Harding
Yes nitpickers’ corners are too small to warrant creating another thread, so we put them in a subthread pool.
So:
Why thread pool doesn’t use same number of threads as the number of physical cores?
And of course, are the threads from the pool really persistent or we are still dealing with overhead of creating threads for each QueueWorkItem() call?
I have a feeling that this behavior could be fixed by a hotfix.
I don’t know precisely how the native threadpool works, but from the documents about the CLR threadpool (which is likely managed by the CLR itself), threads are created dynamically in response to demand.
You don’t want to spin up threads until they’re asked for. If you make threads beforehand, then every program gets NCPUS threads at startup, even if they’re some little console app that does all calculations and work on the main thread. It makes a lot more sense to make threads lazily. If you’re building an app where you know explicitly that you want to peg all of the cores with your work items, then by all means write your own thread pool.
Why not spin up the threads at the first use of the threadpool? After that, make a new thread whenever a task is blocked up to the number of cores (or whatever). Beyond that, you might make new threads when it looks like some of the tasks are long running.
nks said: "then by all means write your own thread pool"
So basically you are suggesting us to reinvent the wheel, not to reuse existing code, and to do the work OS is supposed to do for us? Nice logic. Too bad it’s broken.
Raymond, actually I was interested in WT_EXECUTEINPERSISTENTTHREAD flag which you haven’t discussed. MSDN says:
"Note that currently no worker thread is truly persistent"
Why is that so and has it changed in Vista?
[I don’t know what to say when people suggest that something be changed to work the way it already does. Perhaps I’ll just say "That’s interesting." That’s interesting. -Raymond]
I’m suggestign that this is a reasonable implementation, not that it’s different from that.
The "new" thread pool, as avaliable for Vista and Longhorn, does inlcude a SetThreadpoolThreadMinimum() function, see http://msdn2.microsoft.com/en-us/library/ms686268.aspx