An extensible interlocked arithmetic function (via lambdas)

Date:	May 16, 2014 / year-entry #123
Tags:	code
Orig Link:	https://blogs.msdn.microsoft.com/oldnewthing/20140516-00/?p=973
Comments:	21
Summary:	Some time ago, I noted that you can build other interlocked operations out of InterlockedCompareExchange. Here's an example: using System.Threading; public static int InterlockedMax(ref int location, int value) { int initialValue, newValue; do { initialValue = location; newValue = Math.Max(initialValue, value); } while (Interlocked.CompareExchange(ref location, newValue, initialValue) != initialValue); return initialValue; } (There's a corresponding...

Some time ago, I noted that you can build other interlocked operations out of InterlockedCompareExchange. Here's an example:

using System.Threading;

public static int InterlockedMax(ref int location, int value)
{
    int initialValue, newValue;
    do
    {
        initialValue = location;
        newValue = Math.Max(initialValue, value);
    }
    while (Interlocked.CompareExchange(ref location, newValue,
                                       initialValue) != initialValue);
    return initialValue;
}

(There's a corresponding C++ version, which I leave as an exercise.)

This function atomically updates a "highest value seen so far" variable. It follows the usual pattern:

Capture the starting value.
Do a computation based on that value.
Compare-exchange the new value in.
If the compare-exchange failed, then start over.

(For bonus points, add an early-out if the operation should be abandoned.)

You can make this function extensible by use of lambdas, so that you can update the old value with any computation you like.

using System;
using System.Threading;

public static int InterlockedCombine(ref int location,
                                     Func<int, int> update)
{
    int initialValue, newValue;
    do
    {
        initialValue = location;
        newValue = update(initialValue);
    }
    while (Interlocked.CompareExchange(ref location, newValue,
                                       initialValue) != initialValue);
    return initialValue;
}

public static int InterlockedMax(ref int location, int value)
{
    return InterlockedCombine(ref location,
                              v => Math.Max(v, value));
}

public static int InterlockedMultiply(ref int location, int value)
{
    return InterlockedCombine(ref location,
                              v => v * value);
}

public static int InterlockedIncrementWithSaturation(
    ref int location, int maximum)
{
    return InterlockedCombine(ref location,
                              v => v < maximum ? v + 1 : maximum);
}

public static int InterlockedCompareExchangeIfNotEqual(
    ref int location, int newValue, int avoidValue)
{
    return InterlockedCombine(ref location,
                              v => v != avoidValue ? newValue : avoidValue);
}

Comments (21)

Robert says:

May 16, 2014 at 8:56 am

A possible problem I see is the following: Two threads running on different processors can get stuck in an infinite loop if they attempt, for example, InterlockedMultiply(location, -1) at the same time. But making the latter possible is probably the reason for using interlocked functions in the first place.

[InterlockedIncrement has the same problem. (On most processors, InterlockedIncrement is written in terms of InterlockedCompareExchange, so if two threads are both trying to increment, then theoretically they can lock each other out.) -Raymond]
Joshua says:

May 16, 2014 at 10:56 am

@Robert: No they wouldn't, but a single thread running InterlockedMultiply(location, 1) would.
Robert says:

May 16, 2014 at 11:09 am

@Joshua

O.K., I got it now. I missed the "compare" part of CompareExchange. Though I don't see the problem with InterlockedMultiply(location, 1).
Nathaniel says:

May 16, 2014 at 11:39 am

@Robert: what value do you get if you multiply it by one? How will the loop terminate in that case?

[The return value of CompareExchange is the previous value, not the updated value. -Raymond]
Joshua says:

May 16, 2014 at 12:09 pm

[The return value of CompareExchange is the previous value, not the updated value. -Raymond]

When multiplying by 1, previous = updated. I can't believe I had to write this.

[So what's the problem? The previous value matches, so the loop terminates. -Raymond]
viniaks says:

May 16, 2014 at 12:27 pm

if multipling by 1, initialValue = location = newValue, CompareExchange() will return the initialValue, and the function will exit.
alegr1 says:

May 16, 2014 at 1:00 pm

@Robert:

There will always be one contender who succeeds with the update.
skSdnW says:

May 16, 2014 at 2:15 pm

"InterlockedIncrement has the same problem" Is this actually true? On Ix86 (and probably AMD64) the CPU is going to whatever bus locking is required, increment the value and then invalidate the cache on the other cores in one operation. There is no loop to get stuck in.

On Win95 (and Win98&ME?) InterlockedIncrement (Ix86) is just LOCK INC while later versions of Windows seem to use LOCK XADD which gets you the correct previous value and not just a "isZero bool". Sadly I don't think the VC intrinsic will ever generate the LOCK INC version :(

[Great, now look at the other processor architectures, like AXP, MIPS, PPC, ARM. They don't have bus-locked operations. Everybody else uses LL/SC. -Raymond]
Joshua says:

May 16, 2014 at 4:13 pm

[So what's the problem? The previous value matches, so the loop terminates. -Raymond]

Ok I screwed that one up.
voo says:

May 16, 2014 at 1:10 pm

I think we have to clarify how InterlockedÂCompareÂExchange actually works. Basically it does the following atomically:

InterlockedÂCompareÂExchange(destination, newVal, prevVal) {

curVal = *destination;

if (curVal == prevVal) {

*destination = newVal;

}

return curVal;

}

So neither a single thread incrementing, nor two threads multiplying by -1, or one thread multiplying by 1 will be a problem. Well there's the age-old ABA problem (ll/SC instead of CAS solves that one nicely but we can fix it for CAS too – that needs pretty much a custom approach though), but that's it.
Myria says:

May 16, 2014 at 3:54 pm

@Robert: Yes, livelock is *possible*, but it is unlikely to be stable in the long term, so one processor almost certainly will win. With interrupts, caching and other hardware activity, modern processors are quite nondeterministic at the clock cycle level, which makes InterlockedCompareExchange loops actually work well enough for uses such as this. Windows RT uses InterlockedCompareExchange loops like this, yet isn't seen to randomly freeze, so clearly this strategy is OK in practice on at least some load-linked store-conditional architectures, too.

@voo: ABA is only a problem if there is an external meaning to the exchanged values such that it matters which of the two threads acts upon the new value. For example, when they represent a pointer to a memory area that needs to be freed, as in SLIST_ENTRYs. But for something like simple numeric computations or reference counts, there's no reason to care if the ABA situation occurs – you get the result you wanted anyway.

@skSdnW: Visual Studio will generate "lock inc", "lock dec" or "lock add" if you do _InterlockedIncrement or _InterlockedDecrement in such a way that either doesn't care about the new value or only cares about the new value in a way that only depends on the flags. For example, this resulted in Visual Studio 2013 generating "lock add [rcx], -1" followed by "setz al" instead of a "lock xadd"-based implementation:

bool DecrementAndAmILast(volatile long *var) { return _InterlockedDecrement(var) == 0; }
Mark says:

May 17, 2014 at 2:04 am

Joshua: it would help if you kept the "I'm a programmer, how dare you disagree with me?" attitude out of the comments. We're all familiar enough with it.
Neil says:

May 17, 2014 at 4:31 am

Would there be an observable difference in behaviour if you compared newValue to initialValue before bothering with the compare exchange?
voo says:

May 17, 2014 at 2:19 pm

@Neil: CompareExchange gives ordering guarantees (full memory barrier actually), so on most architectures a normal load wouldn't be equivalent.
alegr1 says:

May 19, 2014 at 7:15 am

@Neil:

Using an additional compare will not make it any more effective, and if anything, will make a collision more likely.

In x86/x64 processors, interlocked operations on RAM are as fast as loading a value from L1 cache and storing it back to L1 cache. There is no global bus lock involved. Thus, it doesn't make sense to try to optimize it out.
Joshua says:

May 19, 2014 at 9:16 am

@alegr1: It would be interesting to know how they could be on an actual dual processor machine. I would think with modern processor speed the speed of light comes into play.
DWalker says:

May 19, 2014 at 11:12 am

Raymond: "On most processors, InterlockedIncrement is written in terms of InterlockedCompareExchange".

I always wondered why processors don't (or can't) implement an InterlockedIncrement (or decrement) in hardware, with setting the result flags and such at the same time. But the more I think about it, the more it seems that doing an InterlocledCompareExchange is about the only way, even at the hardware level…
Joshua says:

May 19, 2014 at 11:23 am

@DWalker: I always thought it was about having as few multi-bus access instructions as possible. These are special on RISC architectures.

Hmm odd grammar. Would be multi-(bus access) if English has precedence order parenthesis.
John Doe says:

May 19, 2014 at 5:11 pm

@alegr1, but even that implies invalidating caches in other cores when they intersect the same address, so it still has a noticeable overhead when there's even mild contention. Otherwise, spinning locks would all be pink flying unicorns suitable for all purposes.
Another John says:

May 20, 2014 at 1:31 am

John Doe: "Otherwise, spinning locks would all be pink flying unicorns suitable for all purposes."

Best comment ever :)
carbon twelve says:

May 20, 2014 at 5:28 am

@Joshua: Two hyphens would do the job: multi-bus-access.

Comments are closed.

*DISCLAIMER: I DO NOT OWN THIS CONTENT. If you are the owner and would like it removed, please contact me. The content herein is an archived reproduction of entries from Raymond Chen's "Old New Thing" Blog (most recent link is here). It may have slight formatting modifications for consistency and to improve readability.

WHY DID I DUPLICATE THIS CONTENT HERE? Let me first say this site has never had anything to sell and has never shown ads of any kind. I have nothing monetarily to gain by duplicating content here. Because I had made my own local copy of this content throughout the years, for ease of using tools like grep, I decided to put it online after I discovered some of the original content previously and publicly available, had disappeared approximately early to mid 2019. At the same time, I present the content in an easily accessible theme-agnostic way.

The information provided by Raymond's blog is, for all practical purposes, more authoritative on Windows Development than Microsoft's own MSDN documentation and should be considered supplemental reading to that documentation. The wealth of missing details provided by this blog that Microsoft could not or did not document about Windows over the years is vital enough, many would agree an online "backup" of these details is a necessary endeavor. Specifics include:

A "redesign" after 2019 erased thousands of user's comments from previous years. As many have stated, the comments are nearly as important as the postings themselves. The archived copies of the postings contained here retain the original comments.
The blog has changed domains many times and the urls have otherwise been under constant change since 2003. Even when proper redirection has been set up for those links, redirection only works for a limited period of time. For example, all of the internal blog links that were valid in early 2019, were broken by 2020 without proper redirection.
The blog has been under constant re-design and re-theming since its inception. It is downright irritating to deal with a bogged-down site experience as the result of the latest visual themes designed for cell-phone browsers. As of this writing, it is cumbersome to navigate titles with only 10 entries per page. While it is nice that the official site has a search feature, searching using this index (with all titles on a single page) is much quicker (CTRL-F in most browsers).

<-- Back to Old New Thing Archive Index