Date: | January 12, 2015 / year-entry #8 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20150112-00/?p=43173 |
Comments: | 9 |
Summary: | Suppose you have a value in an XMM register and you want to limit the bottom byte to a particular value and set all the other bytes to zero. (Yes, I needed to do this.) One way to do this is to apply the two steps in sequence: ; value to truncate/limit is in xmm0... |
Suppose you have a value in an XMM register and you want to limit the bottom byte to a particular value and set all the other bytes to zero. (Yes, I needed to do this.) One way to do this is to apply the two steps in sequence: ; value to truncate/limit is in xmm0 ; First, zero out the top 15 bytes pslldq xmm0, 15 psrldq xmm0, 15 ; Now limit the bottom byte to N mov al, N movd xmm1, eax pminub xmm0, xmm1 But you can do it all in one step by realizing that min(x, 0) = 0 for all unsigned values x. ; value to truncate/limit is in xmm0 mov eax, N movd xmm1, eax pminub xmm0, xmm1 In pictures:
In intrinsics: __m128i min_low_byte_and_set_upper_bytes_to_zero(__m128i x, uint8_t N) { return _mm_min_epi8(x, _mm_cvtsi32_si128(N)); } |
Comments (9)
Comments are closed. |
Raymond – what were you working on that needed all of this MMX stuff?
[Writing a CPU emulator, y'know, just for fun. -Raymond]
Well OK then.
One of my coworkers is implementing a filesystem in node.js, just for fun.
I should probably find something more geeky to work on in my spare time…
[Writing a CPU emulator, y'know, just for fun. -Raymond]
Oooh, I can speculate on which CPU it is. 3.2 GHz PowerPC Tri-Core Xenon. For Xbox 360 game emulation on XB1.
You go, Raymond. Show those CPUs who's the systems engineer
@sdf keyboard masher:
The tricky part of console emulation isn't emulating the CPU. It's emulating the GPU and timings between CPU/GPU/RAM communication that games can depend on very critically in order to eke out every last bit of performance possible. That's typically why most systems that provide full backwards compatibility include hardware from the previous generation (which even then can still break a few of the more sensitive games), and why most software high-level emulation often can only provide partial compatibility with previous-gen titles without specific fixes.
Yeah. It is pretty surprising how much CPU is needed to emulate even really old hardware like the C64 and Atari systems. The processor is easy. But then there's the sound and video chips, and all of their very precise timing behavior. Emulators even need to include things like the CRT beam scan position.
In that regard, current game consoles are pretty far away from "the metal". Pff, they don't even need to use a timing loop to flip sprite positions every 30 scan lines. Which was broken if your CPU wasn't running at exactly 1 MHz.
I thought it was 1.22 MHz? And that's assuming you're using NTSC monitors. PAL was a whole other story and required a completely different CPU.
In the case of the Atari 800, the clock frequency was 1/2 of colorburst or about 1.79 MHz.
I never thought emulation was too interesting (how hard can it be to translate some opcodes?) but then I ran across this grand experiment: andrewkelley.me/…/jamulator.html The author sets out to write a recompiler for NES games to run natively…