Date: | May 16, 2014 / year-entry #123 |
Tags: | code |
Orig Link: | https://blogs.msdn.microsoft.com/oldnewthing/20140516-00/?p=973 |
Comments: | 21 |
Summary: | Some time ago, I noted that you can build other interlocked operations out of InterlockedCompareExchange. Here's an example: using System.Threading; public static int InterlockedMax(ref int location, int value) { int initialValue, newValue; do { initialValue = location; newValue = Math.Max(initialValue, value); } while (Interlocked.CompareExchange(ref location, newValue, initialValue) != initialValue); return initialValue; } (There's a corresponding... |
Some time ago, I noted that
you can build other interlocked operations out of
using System.Threading; public static int InterlockedMax(ref int location, int value) { int initialValue, newValue; do { initialValue = location; newValue = Math.Max(initialValue, value); } while (Interlocked.CompareExchange(ref location, newValue, initialValue) != initialValue); return initialValue; } (There's a corresponding C++ version, which I leave as an exercise.) This function atomically updates a "highest value seen so far" variable. It follows the usual pattern:
(For bonus points, add an early-out if the operation should be abandoned.) You can make this function extensible by use of lambdas, so that you can update the old value with any computation you like. using System; using System.Threading; public static int InterlockedCombine(ref int location, Func<int, int> update) { int initialValue, newValue; do { initialValue = location; newValue = update(initialValue); } while (Interlocked.CompareExchange(ref location, newValue, initialValue) != initialValue); return initialValue; } public static int InterlockedMax(ref int location, int value) { return InterlockedCombine(ref location, v => Math.Max(v, value)); } public static int InterlockedMultiply(ref int location, int value) { return InterlockedCombine(ref location, v => v * value); } public static int InterlockedIncrementWithSaturation( ref int location, int maximum) { return InterlockedCombine(ref location, v => v < maximum ? v + 1 : maximum); } public static int InterlockedCompareExchangeIfNotEqual( ref int location, int newValue, int avoidValue) { return InterlockedCombine(ref location, v => v != avoidValue ? newValue : avoidValue); } |
Comments (21)
Comments are closed. |
A possible problem I see is the following: Two threads running on different processors can get stuck in an infinite loop if they attempt, for example, InterlockedMultiply(location, -1) at the same time. But making the latter possible is probably the reason for using interlocked functions in the first place.
@Robert: No they wouldn't, but a single thread running InterlockedMultiply(location, 1) would.
@Joshua
O.K., I got it now. I missed the "compare" part of CompareExchange. Though I don't see the problem with InterlockedMultiply(location, 1).
@Robert: what value do you get if you multiply it by one? How will the loop terminate in that case?
[The return value of CompareExchange is the previous value, not the updated value. -Raymond]
When multiplying by 1, previous = updated. I can't believe I had to write this.
if multipling by 1, initialValue = location = newValue, CompareExchange() will return the initialValue, and the function will exit.
@Robert:
There will always be one contender who succeeds with the update.
"InterlockedIncrement has the same problem" Is this actually true? On Ix86 (and probably AMD64) the CPU is going to whatever bus locking is required, increment the value and then invalidate the cache on the other cores in one operation. There is no loop to get stuck in.
On Win95 (and Win98&ME?) InterlockedIncrement (Ix86) is just LOCK INC while later versions of Windows seem to use LOCK XADD which gets you the correct previous value and not just a "isZero bool". Sadly I don't think the VC intrinsic will ever generate the LOCK INC version :(
[So what's the problem? The previous value matches, so the loop terminates. -Raymond]
Ok I screwed that one up.
I think we have to clarify how InterlockedÂCompareÂExchange actually works. Basically it does the following atomically:
InterlockedÂCompareÂExchange(destination, newVal, prevVal) {
curVal = *destination;
if (curVal == prevVal) {
*destination = newVal;
}
return curVal;
}
So neither a single thread incrementing, nor two threads multiplying by -1, or one thread multiplying by 1 will be a problem. Well there's the age-old ABA problem (ll/SC instead of CAS solves that one nicely but we can fix it for CAS too – that needs pretty much a custom approach though), but that's it.
@Robert: Yes, livelock is *possible*, but it is unlikely to be stable in the long term, so one processor almost certainly will win. With interrupts, caching and other hardware activity, modern processors are quite nondeterministic at the clock cycle level, which makes InterlockedCompareExchange loops actually work well enough for uses such as this. Windows RT uses InterlockedCompareExchange loops like this, yet isn't seen to randomly freeze, so clearly this strategy is OK in practice on at least some load-linked store-conditional architectures, too.
@voo: ABA is only a problem if there is an external meaning to the exchanged values such that it matters which of the two threads acts upon the new value. For example, when they represent a pointer to a memory area that needs to be freed, as in SLIST_ENTRYs. But for something like simple numeric computations or reference counts, there's no reason to care if the ABA situation occurs – you get the result you wanted anyway.
@skSdnW: Visual Studio will generate "lock inc", "lock dec" or "lock add" if you do _InterlockedIncrement or _InterlockedDecrement in such a way that either doesn't care about the new value or only cares about the new value in a way that only depends on the flags. For example, this resulted in Visual Studio 2013 generating "lock add [rcx], -1" followed by "setz al" instead of a "lock xadd"-based implementation:
bool DecrementAndAmILast(volatile long *var) { return _InterlockedDecrement(var) == 0; }
Joshua: it would help if you kept the "I'm a programmer, how dare you disagree with me?" attitude out of the comments. We're all familiar enough with it.
Would there be an observable difference in behaviour if you compared newValue to initialValue before bothering with the compare exchange?
@Neil: CompareExchange gives ordering guarantees (full memory barrier actually), so on most architectures a normal load wouldn't be equivalent.
@Neil:
Using an additional compare will not make it any more effective, and if anything, will make a collision more likely.
In x86/x64 processors, interlocked operations on RAM are as fast as loading a value from L1 cache and storing it back to L1 cache. There is no global bus lock involved. Thus, it doesn't make sense to try to optimize it out.
@alegr1: It would be interesting to know how they could be on an actual dual processor machine. I would think with modern processor speed the speed of light comes into play.
Raymond: "On most processors, InterlockedIncrement is written in terms of InterlockedCompareExchange".
I always wondered why processors don't (or can't) implement an InterlockedIncrement (or decrement) in hardware, with setting the result flags and such at the same time. But the more I think about it, the more it seems that doing an InterlocledCompareExchange is about the only way, even at the hardware level…
@DWalker: I always thought it was about having as few multi-bus access instructions as possible. These are special on RISC architectures.
Hmm odd grammar. Would be multi-(bus access) if English has precedence order parenthesis.
@alegr1, but even that implies invalidating caches in other cores when they intersect the same address, so it still has a noticeable overhead when there's even mild contention. Otherwise, spinning locks would all be pink flying unicorns suitable for all purposes.
John Doe: "Otherwise, spinning locks would all be pink flying unicorns suitable for all purposes."
Best comment ever :)
@Joshua: Two hyphens would do the job: multi-bus-access.