Date: | March 2, 2009 / year-entry #68 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20090302-01/?p=18973 |
Comments: | 8 |
Summary: | Do QueryProcessCycleTime and QueryThreadCycleTime include cycles spent in kernel mode? Yes, they do. They count cycles spent both in user mode and in kernel mode. |
Do Yes, they do. They count cycles spent both in user mode and in kernel mode. |
Comments (8)
Comments are closed. |
Well, that is just dandy … now if I only knew when I would ever use that.
For performance testing, CPU cycle counts are a lot more interesting than CPU time. The alternative is to use the RDTSC instruction (http://en.wikipedia.org/wiki/RDTSC ) but then you usually have to modify the code you’re testing, and you also have to know the exact speed of the CPU during the measured interval. Which can suck on laptops.
With CPU cycle counts, it doesn’t matter which power state the CPU is in.
With QueryProcessCycleTime and QueryThreadCycleTime you can just profile the process. Of course, in the real world, nothing is ever quite that easy.
Using the CPU cycle count as a unit of time in this century is not smart. If you’re on a laptop the clock speed varies based on power usage. If you’re on a multi-CPU or multi-core machine, your thread can get switched to a different core in between measurements, rendering your last read of the cycle count useless.
The cycle count for any given instruction stream on a particular CPU will always be the same, regardless of the CPU’s speed or power state (let’s ignore pipelining issues for the moment.) Also, both Intel and AMD have published articles explaining how to avoid Time Stamp Count drift between cores (or CPUs on multi-CPU systems.) The OS does all the bookkeeping for you now, so you don’t have to.
"Using the CPU cycle count as a unit of time in this century is not smart."
But using memory cycle count might be. Increasingly, the rate limiting factor on a process is the time spent waiting for data to reach the CPU. The actual computation is pretty much free.
Yes, using the cycle count to measure time is a bad idea. It all depends on what you are trying to measure. Usually you are trying to determine what part of your program is the bottleneck, for this measuring the number of CPU cycles consumed (not the number of instructions retired because branching and cache effects may dominate cost) is better than measuring wall clock because of dynamic clock rates. Using the cycle count as your unit of time for cost of performing operations is not a bad idea at all, as long as you understand that you aren’t measuring wall clock time.
Btw, those registry entries for CPU frequency are well off. They should be calibrated properly.
Categorising it as a bad idea pretty much renders the Query*** apis a bad idea. Of course, you have have to read up and be aware of the caveats and implement it correctly.
But in an NT OS where resolution of tick count is usually c16ms (and please do not say it is 1ms for MM timers etc, it is a hack), you have no choice for less than 10ms or ms resolution.
Why is the QPC and QPF a better than bad idea then? What are the alternatives and especially what are efficient alternatives. Unix and other OS-es do not have this problem. Thus it is rather a necessity for lots of things and with lots of caveats (re:like everyting MS and Intel).
Heck, you could not even retreieve a corect number of cores or CPUs on XP and 2003..