Multithreading: hardware atomicity
In the previous post, we’ve started looking at memory loads and stores reordering. In this post, we’re going to keep our study and we’re going to introduce atomicity. Atomicity is a really important concept we’ve met in the past. We’ve already talked about it on a higher level: do you recall our discussion on critical regions (and how they’re implemented through critical sections)?
With critical regions we were able to get atomicity at a high level (though at a logical level – recall that using a critical region you can have a “logical” instruction which is really a group of low level instructions). In this post, we’re really into understanding memory loads and stores and that’s why we’re interested in low level atomicity. At this level, atomic operations are performed as a single instruction between the processor and memory and ensure that a thread never sees a corrupt value.
Regarding memory access, all processors ensure that we have atomic loads and stores of aligned word sized values (note that I’m talking about the current processors on which windows can run).
Since I’m also a beginner, I think that explaining these concepts a little bit more might be a good idea. Lets start with the concept of word sized values. Word sized values, aka pointer sized values, represent the maximum amount of memory a processor can handle at a time. for instance, on a 32 bits processor, the word sized value is 32 bits, ie, 4 bytes. On a 64 bit processor, you get an 8 byte word sized value. Notice that we generally use bytes for memory sizes instead of bits. I guess that you’ve got the general idea, right?
Now, the second part, which is also important: aligned. We say that a value is aligned if its address begins at a position which is evenly divisible by a certain memory unit size. For instance, a value is said to be 4 byte aligned if its memory starts on a position which is evenly divisible by 4. Here’s a practical example: if you’re loading a word sized value which “starts” at 0x28, then you can be sure that you’re accessing a value that can be 4 or 8 byte aligned (notice that 0x28 = 40 in decimal, which is evenly divisible by 4 or by 8 ).
Since this is really important, I guess I’ll repeat myself again: you’ll *only* get atomicity when you load or store an aligned word sized value. If you’re loading or storing a value which is smaller than the processor’s word size, you’ll still need to respect the current alignment. For instance, if you’re in a 32 bit processor, where words are 4 bytes long, then that value should be positioned on a position whose address is divisible by 4 (note that you’ll probably need to “fill” – relax, this is generally done by the compiler :) - the other 3 bytes with padding so that the next value is also aligned to ensure proper atomicity).
On the other hand, if you need more space than is available for the current processor’s word size, then you will not get hardware atomicity for that load or store (and this happens even if the value is aligned). In these cases, you’ll need to watch out because you can’t simply load and store a value without any further consideration. If you do this, then you can end up with a thread loading the value before another has completed storing it! (btw, this is know as torn read)
(It’s important to understand that this behavior is also observed for chunks of memory smaller than or equal to the processor’s word size if that value isn’t aligned.)
For instance, this means that if you’re writing multithreaded code that uses long variables and you’re running that code in 32 bits processors, then you shouldn’t forget to protect those write and read operations ! (in future posts. we’ll talk about interlocked operations; if you’re not using them, then you’ll need at least a lock - but don’t get too smart using a lock for “writing only” because that will not work in all the scenarios).
Now that we’re clear on alignments and processor’s word sizes, it’s time to take a quick look at how things work in the CLR. The good new is that the C# compiler and the JIT ensure proper alignment in all cases. In practice, values bigger than 4 bytes on 32 bits processors and values bigger that 8 bytes on 64 bits processors always start on 4 or 8 byte aligned boundaries. When we’re working with smaller values, the CLR will also ensure proper placement, filling the remaining space with padding.
If you want, you can have more control over the way fields are aligned. If you’ve done interop programming, then you’ve surely met the StructLayoutAttribute class. This class allows you to control the way fields are defined on the layout of a specific type. If you’re thinking about using this feature (for instance, to control the amount of wasted memory), then proceed with care (in fact, think thrice – yes, I learned this word a few days ago and i could hardly wait for using it in a post :) - before going down this path!). It’s that you can easily end up loosing the CLR’s type safety and that means you’ll probably end up getting exceptions from your code at runtime.
It’s important to understand that whenever you work with values that fall out of the aligned word size value (a non-aligned value or a value bigger than the current word size), the compiler will end up generating multiple instructions. As we’ve seen, these might end up leading to torn reads if you don’t take the necessary precautions.
Notice that even though stores and loads of aligned word sized values are atomic, they don’t really let us do much. Why? Simply because there are several scenarios where we need to check a value before updating and this means that we end up with a load followed by a store. In these cases and in order to ensure atomicity, we’re back to locks (interesting: have you ever though on how locks are implemented?)…or maybe not! The truth is that we’ve got a couple of interlocked operations which ensure atomicity and that are perfect for these scenarios. we’ll talk about them in the next posts. Don’t you think that things are getting rather interesting! Keep tuned!