The "symmetric" in symmetric multiprocessing really means "symmetric"

Date:February 15, 2006 / year-entry #58
Tags:other
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20060215-10/?p=32283
Comments:    19
Summary:The Windows NT family of operating systems supports symmetric multiprocessing. And symmetric really means symmetric. All the processors have to be the same speed, the same stepping, the same manufacturer. They must be identical in every way. If you break any of these rules, you will get strange results. Strange results from QueryPerformanceCounter will be the...

The Windows NT family of operating systems supports symmetric multiprocessing. And symmetric really means symmetric. All the processors have to be the same speed, the same stepping, the same manufacturer. They must be identical in every way. If you break any of these rules, you will get strange results. Strange results from QueryPerformanceCounter will be the least of your problems. Code that checks for processor capabilities will get the results from whichever processor happens to be running. If you have one processor that supports SSE and one that doesn't, a program may detect SSE (if the detection code runs on the processor that supports it), and then crash later (when the SSE code is run on the processor that doesn't).

Be cool; don't be a fool. Keep your processors symmetric.


Comments (19)
  1. BKuker says:

    I have a machine with 2 mismatched Pentium IIs (different speeds, same features). The clock multipliers on the chips were different, and locked. Linux would actually report a very different ‘bogomips’ calibration for each of them in /proc/cpuinfo, and the only weird thing was Netscape’s dialog boxes would appear 0 x 0 pixels (yes that went away if I put in a matched pair!).

    The machine runs XP to this day, a little quirky but pretty well considering…

  2. Why doesn’t Windows detect this during startup and fail with a BSOD if the CPUs are not identical?

  3. Rob says:

    Brian: I’d guess because if Windows spent all its time nurse maiding people, there’d be thousands more code-paths to test, and people would probably complain "it’s not fair that M$ won’t let me use mis-matched chips, XYZ distro lets me!"

    As somebody who manages a testing team, the thought of more codepaths, simply to nurse-maid people, which means more testing, fills me with dread.

  4. Matt Sayler says:

    A discussion over in RWT recently touched on the issues involving different speeds in a multi-cpu system:

    http://realworldtech.com/forums/index.cfm?action=detail&PostNum=4060&Thread=91&entryID=62815&roomID=11

    Do you have anything to say about the impact of VFM/Foxton on the Windows kernel?

  5. Miral says:

    Windows doesn’t even properly support QueryPerformanceCounter on even symmetric MP machines.  I’ve lost count of the number of dual-core machines which give randomly-offset QPC values (typically offset by ~200ms to 1400ms, which changes each boot and gets slowly worse over time).

  6. theorbtwo says:

    It’s interesting to compare this to RAID, where using hard drives from the same mfgr, at the same speed, from the same stepping is /bad/, because it raises the risk that they will fail at the same time.

    What it comes down to is that the CPU isn’t nearly as abstracted as a hard drive — CPUs vary in all sorts of ways.  Hard drives can be faster or slower, but there isn’t much difference in features, which is what will /really/ mess you up with SMP.

    (This was a timely warning for me; I was just pondering a bit ago making an unbalanced SMP system by underclocking the faster processor.)

  7. Miral: QPC is implemented in the HAL, so that’s where you need to investigate.

  8. Nick Lamb says:

    Miral, that’s not really a Windows bug. Each core really does lose/ gain cycles relative to the other in the hardware performance counters. You’ll see the same symptoms on FreeBSD, Linux, Windows, probably OS X too.

    It’s very annoying, but my guess is that adding a fix to the CPU just so that developers get reliable cycle counters is not a priority for the manufacturer (so far I only have reports about one chip maker).

  9. Jeff says:

    > Netscape’s dialog boxes would appear 0 x 0 pixels

    Man… I can’t even *imagine* a code path where CPU speed would affect dialog boxes. That’s just all kinds of wrong.

  10. Matt Sayler says:

    > Man… I can’t even *imagine* a code path where

    > CPU speed would affect dialog boxes. That’s

    > just all kinds of wrong.

    If you’re racing to begin with, changing the CPU speed could easily trigger breakage (or fix it!).

  11. silkio says:

    > Man… I can’t even *imagine* a code

    > path where CPU speed would affect dialog

    > boxes. That’s just all kinds of wrong.

    Maybe they use some FP to get the width/height, and one CPU doesn’t support it so the error handling results in the default values: 0.

  12. Nonanonymous Brave Person says:

    AC – that might work if

    a) there are no issues during boot

    b) you can really be bothered with all that hassle

    c) if you can tell in advance with 100% certainty that the apps that query abilities of the CPU figure out which they’re running on and query just that one

    d) you can identify which startup processes ran on which cpu before you got to the desktop, and retroactively keep them assigned to that one

    …and so on.

    Still seems a dumb idea to me. One of the main points of having multiple CPUs is so that work needing to be done while one CPU is busy can be performed on another. Having a high maintenance, performance limited, still quirky system doesn’t seem like it’s worth even 1/1000th of the effort to get there.

  13. Anonymous Coward says:

    Couldn’t this be painstakingly avoided?

    If *every* process is assigned a processor affinity by the user, will this problem ever manifest itself?  Is it only a problem with the scheduler?

  14. 8 says:

    The problem with SMP really being SMP is that memory is synchronised (afaik).

    The good thing about SMP is that memory is synchronised (afaik).

    This means that if several threads use the same global variable, there is a huge performance penalty. Each CPU has it’s own cache, and can normally read/write the variable in the cache, but in this case, the cache would become useless.

    As for RAID arrays, I thought it would be better if the drives are exactly the same because that would be optimal for performance. (So one hard drive wouldn’t constantly have to wait for another)

  15. Frederik Slijkerman says:

    So, for proper benchmarking on an SMP system, you’d need to set the process affinity so you’re absolutely sure the process to be benchmarked always executes on the same processor?

    Does this also apply to dual-core processors, or do they share the same clock and will therefore always return the same tick count?

  16. 8 says:

    I’ve never heard of a dual core CPU with different internal clockrates. It doesn’t make sense either.

    But I think for really proper benchmarking, you’d need to run it outside of Windows, or any OS. Except maybe FreeDOS.

  17. J Peters says:

    As Matt Sayler’s link to RealWorldTech alluded even matched CPU’s have issues.

    HP DL360 G4 will not run OptimizeIT on RH AS 3 due to TSC problems. Reference http://www.x86-secret.com/index.php?option=newsd&nid=846

    Linux has also been having issues with processor affinity as given by OpenMPI Portable Linux Processor Affinity, http://svn.open-mpi.org/svn/plpa/trunk/README

  18. Rune says:

    8 wrote:

    "Each CPU has it’s own cache, and can normally read/write the variable in the cache, but in this case, the cache would become useless"

    I believe AMD Opteron is one exception. Each CPU socket usually has its own memory node. To access memory that is physically connected to another CPU’s socket, the CPU will have to go through the other CPU’s memory controller. (using the HyperTransport bus connecting the CPU sockets) See "NUMA".

    Since it does that, the other CPU should be able to redirect the memory access to its own internal cache. (I think I read somewhere that this is what happens with Opteron)

    Intel’s implementation OTOH is fubar, but we all knew that already. Forcing every memory access through a single bus is so last century.



    Rune

  19. 8 says:

    Great! Too bad we always have to code for the lowest common denominator (thats still in use).

    But very interesting, Rune! I heard Sun uses these CPU’s in their latest servers, and they also re-did the mainboard architecture. No more north and south bridge! But something /fast/.

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index