Date: | December 2, 2005 / year-entry #370 |
Tags: | other |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20051202-27/?p=33123 |
Comments: | 24 |
Summary: | Sometimes psychic debugging consists merely of seeing the bigger picture. On one of our internal bug-reporting mailing lists, someone asked, "How come when I do XYZ, my CPU usage goes to 50%?" My psychic answer: "Because you have two processors." The response was genuine surprise and amazement. How did I know they had two processors?... |
Sometimes psychic debugging consists merely of seeing the bigger picture. On one of our internal bug-reporting mailing lists, someone asked, "How come when I do XYZ, my CPU usage goes to 50%?" My psychic answer: "Because you have two processors." The response was genuine surprise and amazement. How did I know they had two processors? Simple: If they had only one processor, the CPU usage would be 100%. This seems unhelpful on its face, but it actually does help diagnose the problem, because now they can search the bug database for bugs in the XYZ feature tagged "100% CPU" to see if any of those apply to their situation. (And in this case, it turns out that one did.) |
Comments (24)
Comments are closed. |
Why processors load was so evenly distributed? And why it wasn’t 100% on both CPU?
Single threaded application.
This also happens on a single processor machine with hyperthreading enabled.
Won’t dual core open a box of new problems when it comes to benchmarking and monitoring?
Where as currently monitoring software only considering a server under "heavy load" when above 90% is achieve for X period of time, will monitoring software have to assume that if you have 50% nonstop there is a problem because in theory, a thread could be jammed and burning away 1 of the cores?
OR will monitoring software have to go towards a trend based analysis where things are red flagged according to how your server performs, I mean, it’s very possible on a dual core server that you can have nonstop 60% usage, outside of monitoring things on a per core basis, how else could you address the question of "Is 60% a jammed thread or is it normal operation?"
90% is heavy load because it’s close to 100%, and once you reach 100% everybody suffers. If you’re worried about a single process, you should probably be monitoring response time, rather than CPU load.
Also, for monitoring purposes, you shouldn’t be looking at a magic CPU load, other than the obvious 100%. I’ve used servers with 2, 3 and 4 processors. A runaway process on each of those boxes would peg the CPU usage at a different amount.
Guest: It can’t go above 50% because the thread cannot run on both processors simultaneously. That violates the definition of a thread.
Presumably the 50% value was observed in Task Manager’s Process display. In this display, the CPU time is percentage of CPU time used versus CPU time available. With two (logical) processors there’s twice the CPU time available as clock time. Therefore even with this thread using all of one processor’s resources, it’s only using half the available CPU time.
The thread will not always stay on the same processor. The OS will try to keep a thread running on the same processor as far as possible, to benefit from cache effects (and possibly non-uniform memory architecture), but there will be situations where the preferred processor is unavailable, so the OS shifts it to another processor. It won’t be the ‘best’ processor for that thread to run on, but it will get some CPU time where it otherwise might have been stalled.
Please tell me a programmer didn’t ask this question.
I had a co-worker (programmer) look over my shoulder at task manager and say "It’s barely using the CPU."
"Dude, it’s a dual-proc dual-core machine. It’s pegging a processor."
If we start seeing more and more processors in systems, we’re going to need a better indicator than the system-tray taskman icon to know if a thread is proc-limited.
Raymond’s right about the psychic answers, though. Just being experienced with systems and having more situational awareness makes me a much more effective debugger than I used to be. It also makes me one of our few go-to guys for heisenbugs. Positives and negatives, I guess…
On most *nix systems, processor usage is in percentage of a single processor core. So my dual G4 box, when it manages to peg both processors (not often), says 200%.
Vorn
7:44 AM – maybe Raymond is a real person, after all!
I’m sure that performance management software already uses performance counters in ways that are more intelligent than "total system processor usage", and can either shrug off or be tweaked for SMP.
"On most *nix systems, processor usage is in percentage of a single processor core. So my dual G4 box, when it manages to peg both processors (not often), says 200%."
I’ve seen the load level (average number of processes in the run state over a period of time) as a more common metric for processor usage on *nix systems.
Load Level is a decimal number in 1.23 format. Anything over the number of processors you have in the system is generally bad. So… 1.23 on a dual or higher system is OK. 1.23 on a uniprocessor system needs some investigating.
Actually it’s not bad to have loadavg > num cpus:
the loadaverage includes processes blocked uninterruptible. This is traditionally used to include
IO load (IO bound processes are waiting for the disk), but in practice it can get pretty confusing results.
In additional loadavg normally doesn’t include interrupt load, so it’s in fact often pretty useless.
Anecdote: I worked on porting a protocol stack from Windows to Linux. It consisted of lots of threads
for the individual protocol layers (it was a
text book design ;-) that were communicating
using messages. To wake up another thread
when a message was ready it used simple semaphores.
When the thing was just loaded the single CPU machine had load average 10, but was 100% idle. Why? Each of the blocked semaphores counted to the load average.
It would be probably best to replace Unix
loadaverage with something else that is actually
useful and use a separate number for processes
blocked in IO, but Unix admins tend to be fairly
conservative and resist such changes.
Discalimer: I’m not a programmer and I’m not technically adept.
I know: how to open the task manager and look at CPU usage numbers
Matthew Chaboud writes:
"It’s barely using the CPU."
"Dude, it’s a dual-proc dual-core machine. It’s pegging a processor."
Oh my. That’s scared me, my superficial trouble-shooting strategies blown right out of the water…. Can I just have one wacking-great-big processor rather than a sneaky 2 that needs psychic Dev torubleshooting?
Oh me oh my
As soon as I saw the 50% I knew where you were going with this.
I love love love my dual processor machine specifically because it gets pegged at 50% instead of 100% when a thread misbehaves. It’s so much easier to kill a bad thread when you have cycles to spare. There are, of course, many other advantages to having dual processors, and this dual core trend is a very, very good one, in my opinion.
On the other side, you may see CPU usage at more than 100% with dual processors. Virtual Dub does this. I believe it’s estimate of CPU usage comes from getting an estimate for each CPU and adding them together. (This is merely a guess as to how.)
"If we start seeing more and more processors in systems, we’re going to need a better indicator than the system-tray taskman icon to know if a thread is proc-limited. "
I’ve never used a machine with more than 2 Processors/Cores, but on dual-proc machines, Task Manager allows you to view individual usage charts for each CPU.
I’m guessing this is the same for 4, 8 and 16 way processors.
Sure, if you’ve got 16 dual core processors, it’s going to get pretty busy on the task manager display, but hey – that’s why they give you the Performance admin tool, right?
My favourite psychic call was where a lrge firm in London rang (for the first time in a month) demanding to know why we’d messed up their network connection, and he knew it wasn’t a problem on their end because everyone in the office had been cut off at the same time.
I suggested they asked the workmen I could hear in the background to unplug their power tools and reconnect the router.
I have *never* on a dual CPU machine (dual CPU, hyperthreaded, or dual core), seen a heavy CPU process stick to a single CPU. There is *always* enough contention that, in Task Manager at the highest refresh rate, I always see a relativly flat 60/40 split of CPU time.
Dewi, if you had send that story to the Shark Tank instead of posting it here you probably would have now a nice Sharky t-shirt :-) . Very funny.
And related to the post subject, I was intrigued for a few weeks observing that sometimes one of the computers at work run exactly at 50% when doing heavy processing, until I realized that it was an hyperthreading machine.
One funny thing about HT is that it is represented as two processors, but usually both shows exactly the same work load (being both at 50% when some process is using all the CPU cycles).
Typically when Im doing a link in Devstudio, taskmanager shows cl.exe as using ~48%, system idle process also at ~48%. The difference being split fractionally between the virtual Interrupts, DPCs and System processes, msdev.exe, taskman.exe and explorer.exe. CPU0 shows a 70 to 77% usage, CPU1 shows a 28 to 32% usage.
hmmm.
Monday, December 05, 2005 2:31 AM by Chris Becke
> I have *never* on a dual CPU machine […]
> seen a heavy CPU process stick to a single
> CPU.
I have. Maybe you see a 50/10 split because Task Manager itself takes 10% if you don’t lower the refresh rate. But that’s on a scale of 0 to 50 for each CPU. If you scale it 0 to 100 then maybe the split is 100/10? I still don’t quite understand 60/40, why isn’t it 100/40.
An example of a 100/10 split, the 10 being a CPU which is available but unused, is installing .Net Framework 2 runtime from Windows Update.
Because you have two processors.
PingBack from http://polymathprogrammer.com/2008/07/01/solution-by-proximity/
PingBack from http://blog.wisefaq.com/2009/03/07/its-running-at-50-of-cpu/