Should I be concerned that WaitForSingleObject is taking a large percentage of my performance test’s execution time

Date:April 22, 2016 / year-entry #85
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20160422-00/?p=93335
Comments:    16
Summary:What you've got there, my friend, is a WaitForSingleObject stress test.

A customer designed a client-server system where the client and the server ran on the same machine (but with different security contexts). They were doing some performance tests of the data transfer portion of the system, and one of tests consisted of the following:

  • Client opens a unit of content on the server.
  • Repeat 100,000 times:
    • Client seeks to the start of the content.
    • Client reads entire content a line at a time until the end of the content is reached.
  • Client closes the connection with the server.

When they ran this test, they found that the Wait­For­Single­Object function was consuming 6% of the total CPU time. "We expected overhead of calling the Wait­For­Single­Object function to be negligible. I suspect most developers don't take into account the full cost of Wait­For­Single­Object."

If you read the same content 100,000 times, it will quickly become fully cached, and the entire exercise becomes CPU-bound rather than I/O-bound.

CPU-bound operation is CPU-bound.

"Yes, we understand that this is an artificial scenario and that real-world applications will not use the server in this manner. The issue is that the Wait­For­Single­Object function is using a lot of CPU in this trace, whereas most developers probably consider Wait­For­Single­Object to be effectively free."

What you've done is taken what is presumably a complex operation and stripping it down to just overhead. Caching the entire workload in memory means that the actual work of reading the data is reduced to a series of memcpy operations, and since you're reading the data a line at a time, you're not copy a lot of data at each round trip. It's like using a large cardboard box to ship an index card. All of the cost is in getting the cardboard box, setting it up, packing it, sealing it, and mailing it.

Continuing the analogy: If you construct a sample test of your company's shipping system by shipping 100,000 cardboard boxes, each of which contains a single index card, then you're going to draw conclusions like "Tape and shipping labels constitute 5% of the weight!"

Well yeah, if you're shipping empty boxes, then tape and shipping labels are going to take up a measureable percentage of the weight, seeing as you are shipping boxes that are practically empty.

What you've got there, my friend, is a Wait­For­Single­Object stress test.

What is more interesting from a performance standpoint is not the percentage of overhead that goes to Wait­For­Single­Object. What you should be worrying about is the actual amount of time spent by Wait­For­Single­Object. From the customer's own data, it appeared that around 3 microseconds of the per-operation cost was being spent in calls to Wait­For­Single­Object.¹ That's the number you should be putting into your calculations to decide whether that overhead is preventing you from reaching your performance targets.

If you think about it some more, you may notice that the customer is worried that the Wait­For­Single­Object cost is too large a percentage of the CPU time, when in fact they should be worried that it is too little of the CPU time. Look at it another way: 94% of the CPU time is spent inside the application code. For example, when the Wait­For­Single­Object call returns, the server that was waiting on the handle has to figure out what the signaled handle means, determine which client is issuing the request and which piece of content is being requested, and route the request to the appropriate handler. The handler then performs any applicable security checks and parameter checks, and then it can do actual work: Calculate how many bytes of data need to be returned, locate those bytes, and copy those bytes. Finally, it has to do whatever is necessary to transfer that data back to the client.

Assuming that the actual work of determining what memory needs to be transferred to the client is, say, 1% of CPU (this is probably being generous), then that means 93% of the CPU is being spent in application overhead.

In the above analogy, the thing you should be noticing is not that tape and shipping labels take up 5% of the weight. What you should be noticing is that cardboard boxes take up 95% of the weight. That's the thing that's determining your shipping weight overhead. If you want to lower your weight overhead for shipping a single index card, don't try to get thinner shipping labels. Work on switching from cardboard boxes to envelopes.

¹ The customer didn't say how many times Wait­For­Single­Object was called per operation, so I don't know what the per-call Wait­For­Single­Object cost was.


Comments (16)
  1. John Doe says:

    Now here’s one of the best computer-related analogies ever!

  2. Brian_EE says:

    Raymond’s moonlighting job is apparently working for Amazon, optimizing their Prime shipping.

  3. Antonio Rodríguez says:

    My guess is that the box’s blame was theirs, and the label’s blame is somebody else’s (in this case, Microsoft’s). And we all know that it is easier to switch blame on the 5% than to make something useful on the remaining 95% you are responsible of. So let’s try to shave that 5%!

  4. So switching from cardboard boxes to envelopes means switching from WaitForSingleObject to something more lightweight?

    1. WaitForSingleObject is the tape. The cardboard box/envelope is the rest of the IPC infrastructure.

  5. Azarien says:

    I get sick of hearing things like “10 percent of car fatalities are caused by drunk driving” or “20 percent of people die of heart attacks” which is then used as an argument to take some not always well-advised actions.

    1. smf says:

      Death is different to optimisation. It is easy to justify ignoring an optimisation that would save someone an extra five seconds of their day when it takes two hours for the software to do it’s job. It’s not easy to justify ignoring something that could save a single life.

      1. French Guy says:

        It depends on the probability associated with that “could” and what the corrective action entails (and what other corrective actions might be taken and what they entail).

      2. It rather depends on the cost/benefit tradeoffs of the change.

        Making it illegal to use a private car on a journey of less than 30 km would massively reduce deaths in cities (both from pollution and from collisions); and yet I see very little movement towards this, even though it’s trivial to justify the benefits on the numbers, because the cost of the change is too high.

        Same applies to optimization, with the difference that the benefits from saving 5 seconds in a 2 hour operation are much, much lower than the benefits of saving 1,000 lives per year per city.

        1. Lexx says:

          Actually making using cars with less than 30 km distance is not really possible to enforce. If you switch it to – for example – prevent cars from entering cities (or centers of those), you will easily find several cases of already such an enforcement implemented. And many movements actively trying to push similar ones in their cities as well.

          1. It’s trivial to enforce – just require all cars to have a reporting black box that transmits the latest OBDII state back to the police every minute, and track vehicles that travel less than 30 km between two extended time periods (say 5 minutes) with the engine off.

          2. It’s trivial to enforce – just require all cars to have a reporting black box that transmits the latest OBDII state back to the police every minute, and track vehicles that travel less than 30 km between two extended time periods (say 5 minutes) with the engine off.

            Then correlate this with blanket ANPR coverage of the road network; you can prevent speeding at the same time, by busting anyone who manages to exceed the designated limit between two ANPR cameras.

            The cost is rather high, but it saves lives.

        2. French Guy says:

          Enforcing such a law would be a nightmare (to know how far people drive would require spying on them). Also, to make people change their behavior, you need to provide them with an alternative at least as good (and probably better to offset the cost of changing). If public transportation is inconvenient, people will want to use private vehicles, regardless of pollution or risk of collision (by the way, collisions in cities are less deadly than collisions on open roads, because speed matters a lot). If environment-friendly vehicles are expensive, people will keep their polluting cars.

  6. Mc says:

    I know I’ve ordered a MicroSD card and it’s turned up in a huge box (also stuffed with other marketing material etc.). I do wonder if someone wanted to use a small padded envelope but was overruled because they couldn’t fit the other junk in at the same time.

  7. DWalker says:

    In a well-tuned singe-computer system, CPU should be at 100% and disk busy time should also be at 100%. It’s hard to achieve that in practice….

    1. Scarlet Manuka says:

      I think that would only be true if you always had a constant CPU and disk load over all your tasks. In real life, you want your computer to be able to handle its maximum reasonably common load fairly comfortably, which normally means that the rest of the time it will be cruising on idle.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index