Date: | February 15, 2006 / year-entry #58 |
Tags: | other |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20060215-10/?p=32283 |
Comments: | 19 |
Summary: | The Windows NT family of operating systems supports symmetric multiprocessing. And symmetric really means symmetric. All the processors have to be the same speed, the same stepping, the same manufacturer. They must be identical in every way. If you break any of these rules, you will get strange results. Strange results from QueryPerformanceCounter will be the... |
The Windows NT family of operating systems supports symmetric multiprocessing. And symmetric really means symmetric. All the processors have to be the same speed, the same stepping, the same manufacturer. They must be identical in every way. If you break any of these rules, you will get strange results. Strange results from Be cool; don't be a fool. Keep your processors symmetric. |
Comments (19)
Comments are closed. |
I have a machine with 2 mismatched Pentium IIs (different speeds, same features). The clock multipliers on the chips were different, and locked. Linux would actually report a very different ‘bogomips’ calibration for each of them in /proc/cpuinfo, and the only weird thing was Netscape’s dialog boxes would appear 0 x 0 pixels (yes that went away if I put in a matched pair!).
The machine runs XP to this day, a little quirky but pretty well considering…
Why doesn’t Windows detect this during startup and fail with a BSOD if the CPUs are not identical?
Brian: I’d guess because if Windows spent all its time nurse maiding people, there’d be thousands more code-paths to test, and people would probably complain "it’s not fair that M$ won’t let me use mis-matched chips, XYZ distro lets me!"
As somebody who manages a testing team, the thought of more codepaths, simply to nurse-maid people, which means more testing, fills me with dread.
A discussion over in RWT recently touched on the issues involving different speeds in a multi-cpu system:
http://realworldtech.com/forums/index.cfm?action=detail&PostNum=4060&Thread=91&entryID=62815&roomID=11
Do you have anything to say about the impact of VFM/Foxton on the Windows kernel?
Windows doesn’t even properly support QueryPerformanceCounter on even symmetric MP machines. I’ve lost count of the number of dual-core machines which give randomly-offset QPC values (typically offset by ~200ms to 1400ms, which changes each boot and gets slowly worse over time).
It’s interesting to compare this to RAID, where using hard drives from the same mfgr, at the same speed, from the same stepping is /bad/, because it raises the risk that they will fail at the same time.
What it comes down to is that the CPU isn’t nearly as abstracted as a hard drive — CPUs vary in all sorts of ways. Hard drives can be faster or slower, but there isn’t much difference in features, which is what will /really/ mess you up with SMP.
(This was a timely warning for me; I was just pondering a bit ago making an unbalanced SMP system by underclocking the faster processor.)
Miral: QPC is implemented in the HAL, so that’s where you need to investigate.
Miral, that’s not really a Windows bug. Each core really does lose/ gain cycles relative to the other in the hardware performance counters. You’ll see the same symptoms on FreeBSD, Linux, Windows, probably OS X too.
It’s very annoying, but my guess is that adding a fix to the CPU just so that developers get reliable cycle counters is not a priority for the manufacturer (so far I only have reports about one chip maker).
> Netscape’s dialog boxes would appear 0 x 0 pixels
Man… I can’t even *imagine* a code path where CPU speed would affect dialog boxes. That’s just all kinds of wrong.
> Man… I can’t even *imagine* a code path where
> CPU speed would affect dialog boxes. That’s
> just all kinds of wrong.
If you’re racing to begin with, changing the CPU speed could easily trigger breakage (or fix it!).
> Man… I can’t even *imagine* a code
> path where CPU speed would affect dialog
> boxes. That’s just all kinds of wrong.
Maybe they use some FP to get the width/height, and one CPU doesn’t support it so the error handling results in the default values: 0.
AC – that might work if
a) there are no issues during boot
b) you can really be bothered with all that hassle
c) if you can tell in advance with 100% certainty that the apps that query abilities of the CPU figure out which they’re running on and query just that one
d) you can identify which startup processes ran on which cpu before you got to the desktop, and retroactively keep them assigned to that one
…and so on.
Still seems a dumb idea to me. One of the main points of having multiple CPUs is so that work needing to be done while one CPU is busy can be performed on another. Having a high maintenance, performance limited, still quirky system doesn’t seem like it’s worth even 1/1000th of the effort to get there.
Couldn’t this be painstakingly avoided?
If *every* process is assigned a processor affinity by the user, will this problem ever manifest itself? Is it only a problem with the scheduler?
The problem with SMP really being SMP is that memory is synchronised (afaik).
The good thing about SMP is that memory is synchronised (afaik).
This means that if several threads use the same global variable, there is a huge performance penalty. Each CPU has it’s own cache, and can normally read/write the variable in the cache, but in this case, the cache would become useless.
As for RAID arrays, I thought it would be better if the drives are exactly the same because that would be optimal for performance. (So one hard drive wouldn’t constantly have to wait for another)
So, for proper benchmarking on an SMP system, you’d need to set the process affinity so you’re absolutely sure the process to be benchmarked always executes on the same processor?
Does this also apply to dual-core processors, or do they share the same clock and will therefore always return the same tick count?
I’ve never heard of a dual core CPU with different internal clockrates. It doesn’t make sense either.
But I think for really proper benchmarking, you’d need to run it outside of Windows, or any OS. Except maybe FreeDOS.
As Matt Sayler’s link to RealWorldTech alluded even matched CPU’s have issues.
HP DL360 G4 will not run OptimizeIT on RH AS 3 due to TSC problems. Reference http://www.x86-secret.com/index.php?option=newsd&nid=846
Linux has also been having issues with processor affinity as given by OpenMPI Portable Linux Processor Affinity, http://svn.open-mpi.org/svn/plpa/trunk/README
8 wrote:
"Each CPU has it’s own cache, and can normally read/write the variable in the cache, but in this case, the cache would become useless"
I believe AMD Opteron is one exception. Each CPU socket usually has its own memory node. To access memory that is physically connected to another CPU’s socket, the CPU will have to go through the other CPU’s memory controller. (using the HyperTransport bus connecting the CPU sockets) See "NUMA".
Since it does that, the other CPU should be able to redirect the memory access to its own internal cache. (I think I read somewhere that this is what happens with Opteron)
Intel’s implementation OTOH is fubar, but we all knew that already. Forcing every memory access through a single bus is so last century.
—
Rune
Great! Too bad we always have to code for the lowest common denominator (thats still in use).
But very interesting, Rune! I heard Sun uses these CPU’s in their latest servers, and they also re-did the mainboard architecture. No more north and south bridge! But something /fast/.