Should I be concerned that WaitForSingleObject is taking a large percentage of my performance test's execution time

A customer designed a client-server system where the client and the server ran on the same machine (but with different security contexts). They were doing some performance tests of the data transfer portion of the system, and one of tests consisted of the following:

Client opens a unit of content on the server.
Repeat 100,000 times:
- Client seeks to the start of the content.
- Client reads entire content a line at a time until the end of the content is reached.
Client closes the connection with the server.

When they ran this test, they found that the WaitForSingleObject function was consuming 6% of the total CPU time. "We expected overhead of calling the WaitForSingleObject function to be negligible. I suspect most developers don't take into account the full cost of WaitForSingleObject."

If you read the same content 100,000 times, it will quickly become fully cached, and the entire exercise becomes CPU-bound rather than I/O-bound.

CPU-bound operation is CPU-bound.

"Yes, we understand that this is an artificial scenario and that real-world applications will not use the server in this manner. The issue is that the WaitForSingleObject function is using a lot of CPU in this trace, whereas most developers probably consider WaitForSingleObject to be effectively free."

What you've done is taken what is presumably a complex operation and stripping it down to just overhead. Caching the entire workload in memory means that the actual work of reading the data is reduced to a series of memcpy operations, and since you're reading the data a line at a time, you're not copy a lot of data at each round trip. It's like using a large cardboard box to ship an index card. All of the cost is in getting the cardboard box, setting it up, packing it, sealing it, and mailing it.

Continuing the analogy: If you construct a sample test of your company's shipping system by shipping 100,000 cardboard boxes, each of which contains a single index card, then you're going to draw conclusions like "Tape and shipping labels constitute 5% of the weight!"

Well yeah, if you're shipping empty boxes, then tape and shipping labels are going to take up a measureable percentage of the weight, seeing as you are shipping boxes that are practically empty.

What you've got there, my friend, is a WaitForSingleObject stress test.

What is more interesting from a performance standpoint is not the percentage of overhead that goes to WaitForSingleObject. What you should be worrying about is the actual amount of time spent by WaitForSingleObject. From the customer's own data, it appeared that around 3 microseconds of the per-operation cost was being spent in calls to WaitForSingleObject.¹ That's the number you should be putting into your calculations to decide whether that overhead is preventing you from reaching your performance targets.

If you think about it some more, you may notice that the customer is worried that the WaitForSingleObject cost is too large a percentage of the CPU time, when in fact they should be worried that it is too little of the CPU time. Look at it another way: 94% of the CPU time is spent inside the application code. For example, when the WaitForSingleObject call returns, the server that was waiting on the handle has to figure out what the signaled handle means, determine which client is issuing the request and which piece of content is being requested, and route the request to the appropriate handler. The handler then performs any applicable security checks and parameter checks, and then it can do actual work: Calculate how many bytes of data need to be returned, locate those bytes, and copy those bytes. Finally, it has to do whatever is necessary to transfer that data back to the client.

Assuming that the actual work of determining what memory needs to be transferred to the client is, say, 1% of CPU (this is probably being generous), then that means 93% of the CPU is being spent in application overhead.

In the above analogy, the thing you should be noticing is not that tape and shipping labels take up 5% of the weight. What you should be noticing is that cardboard boxes take up 95% of the weight. That's the thing that's determining your shipping weight overhead. If you want to lower your weight overhead for shipping a single index card, don't try to get thinner shipping labels. Work on switching from cardboard boxes to envelopes.

¹ The customer didn't say how many times WaitForSingleObject was called per operation, so I don't know what the per-call WaitForSingleObject cost was.

John Doe says:

April 22, 2016 at 10:51 am

Now here’s one of the best computer-related analogies ever!

Brian_EE says:

April 22, 2016 at 1:04 pm

Raymond’s moonlighting job is apparently working for Amazon, optimizing their Prime shipping.

Antonio Rodríguez says:

April 22, 2016 at 1:11 pm

My guess is that the box’s blame was theirs, and the label’s blame is somebody else’s (in this case, Microsoft’s). And we all know that it is easier to switch blame on the 5% than to make something useful on the remaining 95% you are responsible of. So let’s try to shave that 5%!

Master Programmer says:

April 23, 2016 at 10:03 am

So switching from cardboard boxes to envelopes means switching from WaitForSingleObject to something more lightweight?

Raymond Chen - MSFT says:

April 25, 2016 at 11:17 am

WaitForSingleObject is the tape. The cardboard box/envelope is the rest of the IPC infrastructure.

Azarien says:

April 24, 2016 at 3:30 am

I get sick of hearing things like “10 percent of car fatalities are caused by drunk driving” or “20 percent of people die of heart attacks” which is then used as an argument to take some not always well-advised actions.

smf says:

April 24, 2016 at 11:06 am

Death is different to optimisation. It is easy to justify ignoring an optimisation that would save someone an extra five seconds of their day when it takes two hours for the software to do it’s job. It’s not easy to justify ignoring something that could save a single life.
1. French Guy says:
  
  April 25, 2016 at 2:45 am
  
  It depends on the probability associated with that “could” and what the corrective action entails (and what other corrective actions might be taken and what they entail).
2. Simon Farnsworth says:
  
  April 25, 2016 at 3:39 am
  
  It rather depends on the cost/benefit tradeoffs of the change.
  
  Making it illegal to use a private car on a journey of less than 30 km would massively reduce deaths in cities (both from pollution and from collisions); and yet I see very little movement towards this, even though it’s trivial to justify the benefits on the numbers, because the cost of the change is too high.
  
  Same applies to optimization, with the difference that the benefits from saving 5 seconds in a 2 hour operation are much, much lower than the benefits of saving 1,000 lives per year per city.
  1. Lexx says:
    
    April 25, 2016 at 3:49 am
    
    Actually making using cars with less than 30 km distance is not really possible to enforce. If you switch it to – for example – prevent cars from entering cities (or centers of those), you will easily find several cases of already such an enforcement implemented. And many movements actively trying to push similar ones in their cities as well.
    1. Simon Farnsworth says:
      
      April 25, 2016 at 3:19 pm
      
      It’s trivial to enforce – just require all cars to have a reporting black box that transmits the latest OBDII state back to the police every minute, and track vehicles that travel less than 30 km between two extended time periods (say 5 minutes) with the engine off.
    2. Simon Farnsworth says:
      
      April 25, 2016 at 3:20 pm
      
      It’s trivial to enforce – just require all cars to have a reporting black box that transmits the latest OBDII state back to the police every minute, and track vehicles that travel less than 30 km between two extended time periods (say 5 minutes) with the engine off.
      
      Then correlate this with blanket ANPR coverage of the road network; you can prevent speeding at the same time, by busting anyone who manages to exceed the designated limit between two ANPR cameras.
      
      The cost is rather high, but it saves lives.
  2. French Guy says:
    
    April 25, 2016 at 6:35 am
    
    Enforcing such a law would be a nightmare (to know how far people drive would require spying on them). Also, to make people change their behavior, you need to provide them with an alternative at least as good (and probably better to offset the cost of changing). If public transportation is inconvenient, people will want to use private vehicles, regardless of pollution or risk of collision (by the way, collisions in cities are less deadly than collisions on open roads, because speed matters a lot). If environment-friendly vehicles are expensive, people will keep their polluting cars.

Mc says:

April 25, 2016 at 2:22 am

I know I’ve ordered a MicroSD card and it’s turned up in a huge box (also stuffed with other marketing material etc.). I do wonder if someone wanted to use a small padded envelope but was overruled because they couldn’t fit the other junk in at the same time.

DWalker says:

April 25, 2016 at 8:15 am

In a well-tuned singe-computer system, CPU should be at 100% and disk busy time should also be at 100%. It’s hard to achieve that in practice….

Scarlet Manuka says:

April 27, 2016 at 1:29 am

I think that would only be true if you always had a constant CPU and disk load over all your tasks. In real life, you want your computer to be able to handle its maximum reasonably common load fairly comfortably, which normally means that the rest of the time it will be cruising on idle.

Date:	April 22, 2016 / year-entry #85
Tags:	code
Orig Link:	https://blogs.msdn.microsoft.com/oldnewthing/20160422-00/?p=93335
Comments:	16
Summary:	What you've got there, my friend, is a WaitForSingleObject stress test.

Should I be concerned that WaitForSingleObject is taking a large percentage of my performance test’s execution time