Multithreading: introducing memory fences
A few posts back, we’ve introduced the concept of load and store reordering. As we’ve seen, reordering operations exist as a way of improving performance and can be introduced on several levels (starting at compilation time and ending at runtime when the processor executes the instructions). We saw that even though things can get chaotic quickly, there are some guarantees we can hold on to when writing multithreaded code. One of those guarantees is that all platforms respect a specific memory model and that’s what we’ll be talking about in this post.
A memory model defines which kinds of moves may occur (ie, which loads and stores can be moved). If you’ve got a weak memory model, then you’ll get plenty of allowable moves and this will lead to a superior performance. However, you’ll also need to pay lots of attention to the code you write. Allowing less moves will, of course, lead to less complexity but won’t give you (hum…I mean, the compiler and processor) that many chances for updating your code.
Since we’re talking about .NET here, we’ll focus the rest of the post on the valid assumptions for the CLR. The CLR has a strong memory model. In practice, this means that several compiler optimizations are forbidden and that it should be fairly easy (sort of…) to write code that is portable across several architectures where the code might run. Before going on, it’s important to notice that the CLR memory model is tighter than the one you get in the ECMA spec.
In the CLR, you can get reordering for load/load, load/store and store/load. The only one which isn’t permitted is store/store reordering (meaning that a store can never move after another store). Volatile loads and stores are different and only allow store/load reordering (we’ll be talking about volatiles in future posts). Btw, the ECMA specification allows all these move types.
Ok, so if those store and load reordering are allowed, how can we stopped them from happening? ah, glad you asked! We can use fences (or barriers) to ensure that they don’t occur at specific times.
A fence (aka barrier) prevents memory loads and stores reordering from happening. There are several types of fences. Full fences are probably the most known and used type. A full fence ensures that no load or stores moves across the fence (ie, no load or store before the fence can move after it nor any load or store placed after the fence may move before it).
Besides the ubiquitous full fence (which is available everywhere), there are other variations. With Store fences, no store can move over the fence (it’s ok for reorders to happen with loads). Load fences are similar, but in this case, only loads are “fixed”.
Finally, there’s also a couple of “one way” fences: acquire and release fences. Acquire fences ensures that no memory operation that happens after the fence can be moved before the fence. Release fences work the other way around: instructions defined after the fence may happen before the fence but no “pre-fence” instruction may happen after the fence.
As you might have guessed by now, fences load to a more sequential model which will, without any doubt, lead to a degradation of your application. This means that you should apply them carefully. Yes, we do need fences, but do keep in mind that using them will reduce the ability for reordering and improving the code you write.
By now, I guess that we’ve covered most of the theory around fences. You might be asking: how do I use fences in my .NET code. Good question, but we’ll leave the answer for the next post. Keep tuned for more on multithreading.