Date: | November 14, 2006 / year-entry #385 |
Tags: | history |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20061114-15/?p=29003 |
Comments: | 25 |
Summary: | At the Windows 2000 Conference and Expo which coincided with the operating system's launch, I paid a visit to the emulators.com booth, where they were excitedly showing off SoftMac 2000, a Mac emulator that ran on Windows 2000. Emulator trivia: MacOS booted in five seconds under Windows 2000, which was faster than the real Mac, because the emulator simulated... |
At the Windows 2000 Conference and Expo which coincided with the operating system's launch, I paid a visit to the emulators.com booth, where they were excitedly showing off SoftMac 2000, a Mac emulator that ran on Windows 2000. Emulator trivia: MacOS booted in five seconds under Windows 2000, which was faster than the real Mac, because the emulator simulated a 1GB Mac so the Mac memory manager never had to do any paging. Now, the host computer didn't have 1GB of real RAM, so the host computer was still paging, but it turns out that you're better off letting the Windows 2000 kernel do the paging than the copy of MacOS running inside the emulator. Anyway, Darek Mihocka, the proprietor of emulators.com, has started posting his thoughts on Intel's new Core 2, and given the promo titles of his upcoming entries, it looks like he's going to start digging into running Vista on his Mac Pro. But all of this yammering about emulation is just a sideshow to the real issue: The picture of the hardware that Darek's retiring. I mean, look at it. He's retiring more computers than I own! I bet he's one of those people who relocates his computers during the winter in order to use them as space heaters. |
Comments (25)
Comments are closed. |
I only counted 5 systems in the picture. I’ve got 4 systems on my desk at home… and another 2 at school, and another 3 sitting on the entertainment center…
Granted, they’re all 8088s – Pentium 4s, but 5 systems is nothing to me.
"the real issue: The picture of the hardware that Darek’s retiring."
Hey can you upload that picture please?
Letting Windows 2000 do the paging will, of course, be faster because it’s paging is native x86 code running on x86 hardware.
If you let MacOS do the paging you’re emulating the 68000/PPC code that does the paging as well as the memory management hardware that makes it possible.
[)amien
I think you missed what he was saying damieng. The boot was faster under emulation then on native hardware because Windows 2000 is better at paging.
But is Windows better at paging, or was it simply due to running it on faster hardware than the original Mac?
As I understand it, older Macs used a rather obscure page-replacement algorithm (FIFO with a second chance for pages used since the last time around), so it isn’t suprising that Win2K’s LRU replacement works somewhat better.
The year being 2000, tells me that Raymond was witnessing a MacOS 9 installation. It wasn’t until OS X (released in March of 2001) that the Mac had a flat memory model and a decent memory manager. It didn’t even have true pre-emptive multitasking until OS X. That is why it worked faster in emulation with Windows 2000 doing the real paging.
Raymond, looks like you just got slashdotted…
anon for good reason: The pictures are shown at the blog that Raymond linked to.
I also guess it’s better to have let Win2k do all the paging not because it does a better job, but more because the two memory managers don’t “fight” each other.
Imagine that MacOS believes that a bunch of pages are in memory, but the Win2k memory manager has them paged out to disk. It uses those pages, causing Windows to pull the pages back into memory, while maybe paging more stuff back to disk that the MacOS memory thinks is still in memory.
I didn’t say that is the reason why it runs faster. My comment was directed more at the logic of letting the emulator simulate 1GB of Mac memory and why it was better to let Win2k handle it.
Windows is optimized to swap. It’s a long tradition of all windows versions. Other OS doesn’t rely heavily on swapping to disk, windows starts swapping long before the physical memory is consumed. Is it even possible to run windows without a swapfile?
Having used SoftMac, I know that it only emulates 680×0 Macs.
Seeing as you are emulating a ~1994 computer on a ~2000 computer, it’s highly likely that the emulated computer is faster than any real 680×0 Mac. Thus the observed results are unsurprising.
You’d be surprised how many people did not find it unsurprising. You have to remember, when I was demoing SoftMac at Macworld Expo New York in 1999, a) the public perception of PCs in 1999 was that they were slower than Macs, and b) that emulation was a slow technology.
Back then it was really quite the uphill battle to convince people that what they were witnessing actual unmodified Mac OS 8.1 bits booting on Intel hardware and educate them about those myths.
It’s not simply an issue of emulating 68040 being fast. 68040 is quite a difficult instruction set to emulate. That’s compounded by the fact that a lot of Mac code used self-modifying tricks and executing in middle of instructions and such. Getting that right is hard. When I did the PowerPC emulation on x86 a couple years later, it actually turned out to be easier to emulate. PowerPC can be emulated at a faster speed levels than 68040. (Just look at the perf of Rosetta on Intel Macs, validation that Intel processors truly are faster than PowerPC and that emulation is not inherently a slow technology).
It’s also not an issue of emulating 1994 hardware on 2000 hardware. At the 1999 Macworld, I was running a 1999 era PC (450 MHz Pentium III running Windows 98) booting a 1998 OS (Mac OS 8.1) on 1997 virtual hardware (Mac Quadra). Remember, new Mac Quadra models were still being released in 1997, and what blew people’s minds was seeing a $2000 PC emulating a $5000 Quadra, and faster than any real Quadra. After all, the main selling point of the Quadra was it was the fast Mac, and thus the selling point of SoftMac at that time was that it was a software solution that gave them faster-than-real-time emulation of $5000 Mac hardware on $2000 PC hardware. Oh how I loved to rub that in to the die-hard Mac fanatics <grin>
Even users of PowerPC based Macs (which in 1999 would have been various G3 machines running Mac OS 8.x) were surprised to see Mac OS 8.1 boot up MUCH faster than they’d seen on a real Mac. It was not uncommon for people to look under the table to verify that I didn’t actually have a real Mac under the table. At future demos, I used a Gateway laptop for the demos to quash that and at the 2001 Macworld Tokyo I used a handheld Sony VAIO to really drive that point home.
What non-technical people didn’t realize in the late 1990’s is that Windows PCs were far better suited for emulation than Macs were. Why?
As has been mentioned, virtual memory and swapping was MUCH faster on Windows than wa simplemented in the Mac OS. The Mac ran faster when VM was off, but most Macs were being run with VM on. So by emulating a virtal Macintosh with up to 1 gigabyte of physical RAM and booting the Mac OS with its own VM disabled, that did in fact give huge speedsups.
But the other main factor was due to the superior design of the P6 processor architecture, which appeared first in the Pentium Pro processor, and then later the Pentium II, Pentium III, and today in Core Duo based Macs. I can go into technical details if anyone cares, but it was really after playing around with a Pentium Pro in 1996 that led me to decide to write a Mac emulator for Windows and realize there was a profitable business opportunityh there. For both 68040 and PowerPC emulation, the P6 architecture can effectively emulate those processors in software with only about a 2x to 4x perceived clock cycle slowdown, which is generally much faster than wehn you go the other way around. Add to that the VM advantage, the faster memory bandwidth and FSB speeds, the larger L1 and L2 caches on x86 processors, and it literally allowed for close to real-time emulation. Thus demoing a 500 MHz virtual Macintosh to people whose real hardware was a 33 to 300 MHz Macintosh did make a lot of jaws drop.
For the February 2000 demo at the Windows 2000 launch that Raymond refered to, I also had the benefit of a 600 MHz AMD Athlon based desktop, which briefly overtook the P6 architecture in speed until 2003 when Intel revamped P6 as the Pentium M processor ("Centrino") and today as the Core 2. The Core 2 is just a beautiful chip for emulation and virtual machine purposes.
*remembers rearranging my office at msft in winter to put warmer computers between me & the window*
Darek, thanks dude. Fascinating stuff.
Raymond, you really surprise me, I thought you would’ve had a garage/shed/whatever full of Trash80s and wot not… I’m sure someone famous said "never judge a man by his blog". Oh no, wait. That was me.
Darek, I believe you are misinformed.
First of all no new Quadras were produced after 1995. (Introduced in 1994 and cancelled in August 95) These machines were not $5000, more like the introduction price of $1,200 a little higher with a CD-Rom drive. (Actually I got a Performa 630CD for $3000 with monitor, when monitors where $500 and CD-Roms were $400, hell the floppy drive cost $140) You have to realize that this was the same year that the PPC 601s came out because the 603s didn’t have enough cache to emulate 68k instructions natively.
Second, PowerPC emulation has been amazingly difficult, and for nearly a decade after it’s introduction it remained all but unemulated. What emulators were around were slow and worthless. And I remember because endianness made emulation from either side amazingly difficult. Even SheepShaver would only work on BeOS PPC in 1998, it had no emulation at the time. Just spend 15 minutes on PearPC and tell me that PPC emulation is easy. And I’m talking no SIMD or anything fancy, it is garbage.
However I completely agree on the fact that the 68040 is extremely complex. It was the first modern computing chip, and if anything the Amiga users showed all of us that it really was more than we ever needed in basic computing.
I have a somewhat offtopic question, although it’s for an emulator I’m writing, and it does use virtual memory :)
I need the mirroring behaviour that you can get with some tricks using MapViewOfFile, and the pagewise memory protection granularity you get with VirtualProtect. So far, using VirtualProtect on mapped memory works fine, but is this supported behaviour?
Also, why are the mapping functions 16-page-granular while the VirtualAlloc family are page-granular? Is it Alpha legacy that could be removed now that only x86/x64 and Itanium are supported, or are the risks too high?
Nice to see,that I am not alone.80486;P1 133MHz;P1 188MHz and Commodor 64.It may be considered as an advertisment,but I am looking for old computers for my collection-dead or alive,parts or entire,it doesn’t matter…
Beware shipping to Czech Republic can be too expensive.
Klimax(danklima@gmail.com)
P.S.:I collect them for new museum of computers!
"68040 is quite a difficult instruction set to emulate."
The adressing modes of the 20+ CPUs are just completely over the top, but for a simple interpreting emulator the instruction set is actually very cool, as the more complex the instructions are, the less is the overhead of the interpreter :-) JIT might be a different issue…
I’m saying this after, too, having developed and maintained a commercial 68k emulator for well over 10 years. All in assembler. I thought it was fun. :-)
"That’s compounded by the fact that a lot of Mac code used self-modifying tricks and executing in middle of instructions and such."
Seeing that even the real 68040 chips (unlike Intel) had huge problems executing modifying code without a prior cache flush, was that really such a big problem? I did some sketches of a JIT version and invaliding the compiled code on a cache flush seemed to be enough to keep things running fine.
emuauthor: Generally, requests for future articles are better put in the Suggestion Box post (see the Basics part of the navbar over on the right).
Second, Raymond has already talked about the 64K granularity issues before, and it is a holdover from one of the other processors that NT was written for. I don’t remember for sure if it was Alpha or MIPS, but one of them could only load an immediate value into a register 16 bits at a time, and it’s MUCH easier to do function relocation when you’re only doing one load per function call (instead of two).
That same architecture had some kind of issue with treating numbers as signed versus unsigned; this is why there’s a 64K hole just below the 2G limit. If those addresses were allowed, the function-call calculations would have been even *more* complicated, to handle the case where the page happened to fall in that range. It was much easier to just remove access to that 64K area.
As for removing that support now that the Alpha and MIPS processors are no longer supported, I doubt it’ll happen. It might be nice, but I’d bet that when the kernel core is semi-portable, it’s easier to make other kinds of modifications (like to x86-64).
"If you let MacOS do the paging you’re emulating the 68000/PPC code that does the paging as well as the memory management hardware that makes it possible."
*snicker* Microsoft lemmings are so funny. While it’s true that the emulator being discussed handles 68000/PPC code, Apple has been building on a UNIX core (EFI boot) for years now. Solid, fast, and superior. Looks like Vista continues Microsoft’s tradition of endlessly trying to polish their BIOS turd.
You kids are so cute.
"*snicker* Microsoft lemmings are so funny. While it’s true that the emulator being discussed handles 68000/PPC code, Apple has been building on a UNIX core (EFI boot) for years now. Solid, fast, and superior. Looks like Vista continues Microsoft’s tradition of endlessly trying to polish their BIOS turd."
That’s a parody, right? You’re trying to make us think that all Mac users are overzealous, self-righteous fanboys/fangirls out to actively evangelize, right?
Ignorant bravado is something we’ve all come to expect from a certain type of Mac evangelist. Heaven knows why they think it is an appealing quality that will convert average users. It’s so clearly a case of overcompensation.
Aaron, I disagree about the price points of 1995 era Quadras. The “Performa” and “LC” branded Macs were stripped down versions of the Quadra, and while they were lower in price than the $5000 I mentioned, you quote $3000, they were also lower in CPU performance, memory size, and other specs compared to real Quadra branded machines. Either price point exemplifies the selling point of using a Mac emulator on a Windows PC to emulate a Quadra at full Quadra speed and still do it for less money than the cost of the Quadra. Emulation and virtualization technology can save real money on hardware costs. It’s not just an issue of running legacy software.
Regarding the endianness issue, it in itself is not a reason for slower perf. PearPC’s slowness has much to do with the fact that it’s written in C++ for portability. SoftMac was not written to be portable to anything but Windows running on x86 processors, and as such is mostly written in hand coded x86 assembly language so as to execute 100 million or more 680×0 instructions per second. Endianness penalties then you would think would be amplified in such a more efficient emulator, but this was designed into both my 680×0 and PowerPC emulation engines from day one. Most emulators that have to deal with the endianness differences naively perform a byte swap type of operation on every memory read and write larger than a byte. This in itself is the wrong place to do that swap, but play along for a minute. Generally on x86 you can use the XCHG AH,AL instruction to byte swap a 16-bit quantity, and the BSWAP EAX instruction to byte swap a 32-bit quantity. The Fusion PC 3.0 Mac emulator from MS-DOS which I distribute uses this scheme as it is written purely in 486 compatible assembly. Emulators written in C or C++ generally won’t have the luck of having the compiler emit the XCHG or BSWAP instructions, as those operations are generally not compiler intrinsic and thus have to be compiled using an ugly sequence of logical shifts, ANDs, and ORs. Even in Fusion PC, the extra cost of the occasional one-cycle BSWAP EAX or XCHG AL,AH instruction accounts an almost negligible amount of execution time since as it turns out, most memory operations do not need them. Byte operations to/from memory don’t need them. Storing immediate constants to memory does not need them. Memory-to-memory MOVE instructions (such as memory copy loops) don’t need them. An emulator that uses XCHG and BSWAP and intelligently filters out the huge number of unnecessary byte swap operations takes a negligible performance hit due to endianness. I did a test to inject bogus BSWAP instructions into Fusion just to test this out and it really is negligible. So your statement that “endianness made emulation from either side amazingly difficult” is just plain inaccurate.
Regardless, back in 1990 I was designing my first 68000 emulation engine for the 386 processor, I did not even have the luxury of using the BSWAP instruction. BSWAP was introduced in the 486 processor but in 1990 (and even in 1995) many DOS and Windows users did not own a 486. So I delved into the issue further and realized that BSWAP and XCHG weren’t even necessary. There are x86 instruction sequences one can use which take care of the endianness without introducing any extra instructions, extra code bytes, or extra data dependency stalls, as using XCHG and BSWAP do. I know for a fact that if x86 was the same endianness as 68040/PowerPC, my SoftMac emulator would not be any faster. The “cost” of endianness in SoftMac is effectively zero, and if people used my 386-compatible techniques in their emulators it would similarly be zero for them. There just happen to be emulators that handle endianness in a stupid manner which might slow down their perf. But that’s an implementation mistake, not an issue that inherently limits the performance of an emulator that has to work on a different endianness.
PowerPC in itself is a very simple instruction set to emulate and it can be emulated more efficiently than 68040. And ironically requires even less byte swapping due to the lower percentage of PowerPC code that performs memory operations. So I stand by my statement. In the spring of 2001 I demonstrated at MacHack and at Macworld Expo a working PowerPC emulation on a Pentium III laptop that achieved something like the speed of the 100 MHz PowerPC 601 machines. Slow by today’s standards, but at the time perfectly suitable speed for running something like Mac OS 8 as people did on that class of PowerMac. I would say the reason for there being a lack of PowerPC emulators in the 1990’s is simply because PowerPC processors and PowerPC based Macs arrived over 10 years after 68000 did, and so of course the emulators had more than a 10 year head start with 680×0. I started writing my PowerPC engine in the spring of 2000, a good 10 years after I started tackling 68000 emulation. My first prototype PowerPC engine I whipped out in about 3 weeks in C code in 2000, then re-wrote in assembly for the demos in 2001. 68000 took me over a year to implement, and then another year in 1998-1999 to add 68020 and 68040 functionality. I would guess other Mac emulator developers similarly didn’t tackle PowerPC until much later, not realizing that in fact it was the easier CPU emulation problem to tackle.