Limiting the bottom byte of an XMM register and clearing the other bytes

Date:January 12, 2015 / year-entry #8
Tags:code
Orig Link:https://blogs.msdn.microsoft.com/oldnewthing/20150112-00/?p=43173
Comments:    9
Summary:Suppose you have a value in an XMM register and you want to limit the bottom byte to a particular value and set all the other bytes to zero. (Yes, I needed to do this.) One way to do this is to apply the two steps in sequence: ; value to truncate/limit is in xmm0...

Suppose you have a value in an XMM register and you want to limit the bottom byte to a particular value and set all the other bytes to zero. (Yes, I needed to do this.)

One way to do this is to apply the two steps in sequence:

; value to truncate/limit is in xmm0

; First, zero out the top 15 bytes
    pslldq  xmm0, 15
    psrldq  xmm0, 15

; Now limit the bottom byte to N
    mov     al, N
    movd    xmm1, eax
    pminub  xmm0, xmm1

But you can do it all in one step by realizing that min(x, 0) = 0 for all unsigned values x.

; value to truncate/limit is in xmm0
    mov     eax, N
    movd    xmm1, eax
    pminub  xmm0, xmm1

In pictures:

xmm0 xmm1 xmm0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
? min 0 = 0
x min N = min(x, N)

In intrinsics:

__m128i min_low_byte_and_set_upper_bytes_to_zero(__m128i x, uint8_t N)
{
 return _mm_min_epi8(x, _mm_cvtsi32_si128(N));
}

Comments (9)
  1. Matt says:

    Raymond – what were you working on that needed all of this MMX stuff?

    [Writing a CPU emulator, y'know, just for fun. -Raymond]
  2. Joshua says:

    [Writing a CPU emulator, y'know, just for fun. -Raymond]

    Well OK then.

  3. Kevin says:

    One of my coworkers is implementing a filesystem in node.js, just for fun.

    I should probably find something more geeky to work on in my spare time…

  4. sdfsdfsdfasdfasdfasdfasdfasdf says:

    [Writing a CPU emulator, y'know, just for fun. -Raymond]

    Oooh, I can speculate on which CPU it is. 3.2 GHz PowerPC Tri-Core Xenon. For Xbox 360 game emulation on XB1.

    You go, Raymond. Show those CPUs who's the systems engineer

  5. @sdf keyboard masher:

    The tricky part of console emulation isn't emulating the CPU.  It's emulating the GPU and timings between CPU/GPU/RAM communication that games can depend on very critically in order to eke out every last bit of performance possible.  That's typically why most systems that provide full backwards compatibility include hardware from the previous generation (which even then can still break a few of the more sensitive games), and why most software high-level emulation often can only provide partial compatibility with previous-gen titles without specific fixes.

  6. Zan Lynx' says:

    Yeah. It is pretty surprising how much CPU is needed to emulate even really old hardware like the C64 and Atari systems. The processor is easy. But then there's the sound and video chips, and all of their very precise timing behavior. Emulators even need to include things like the CRT beam scan position.

    In that regard, current game consoles are pretty far away from "the metal".  Pff, they don't even need to use a timing loop to flip sprite positions every 30 scan lines. Which was broken if your CPU wasn't running at exactly 1 MHz.

  7. I thought it was 1.22 MHz?  And that's assuming you're using NTSC monitors.  PAL was a whole other story and required a completely different CPU.

  8. John Barton says:

    In the case of the Atari 800, the clock frequency was 1/2 of colorburst or about 1.79 MHz.

  9. Skyborne says:

    I never thought emulation was too interesting (how hard can it be to translate some opcodes?) but then I ran across this grand experiment: andrewkelley.me/…/jamulator.html  The author sets out to write a recompiler for NES games to run natively…

Comments are closed.


*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

<-- Back to Old New Thing Archive Index