Atomic Basics

Last week I’ve  encountered  some confusion around the way atomic work in C++.  Some  developers believe  wrapping any data structure with atomic will  make it,  by some magic,  thread safe.  They are then perplexed by the source not compiling or even worse compiling but not behaving as expected.

Reading the fine print in the C++ standard  [ISO/IEC 14882:2014(E)  Ch. 25.5 ]   reveals that the  T argument in the atomic template is required to be trivially copyable .    What does it mean?   a  type is trivially copyable  if its creation  or destruction doesn’t involve more than allocation or releasing memory –  similar to C’s  old style malloc.  The standard of course is more specific  you can find the full list of requirements in  [ISO/IEC 14882:2014(E)  Ch. 9 ]

So all primitives such as intchar  array  types are trivially copyable.  Since their creation requires no  more than allocation space for them in memory.  These types can be copied  safely using memcpy  . The standard library provides a handy method  called  std::is_trivially_copyable  to determine if a type is  trivially copyable:

int main()
{

    std::cout << "int  trivially copyable ? " << std::is_trivially_copyable<int>::value << " \n";
    std::cout << "std::array  trivially copyable ? " << std::is_trivially_copyable<std::array<char, 16> >::value << " \n";
    std::cout << "std::string  trivially copyable ? " << std::is_trivially_copyable<std::string>::value << " \n";

    return 0;
}

int trivially copyable ? 1
std::array trivially copyable ? 1
std::string trivially copyable ? 0

Coping memory in a thread safe way is what lays at the core of atomic thread safety.  This explains the  requirement for  trivially copyable types.  In most implementations,  atomic containing  primitive types maintain thread safety  without using a synchronization object (they are lock free).  However, with none-primitive types,  the compiler  will employ  a  lock to ensure the  modifying thread has exclusive access the memory used by the atomic variable. The method std::atomic::is_lock_free can determine which strategy the compiler selected for a type.

Atomic also control the type of order  threads can “observe”  changes to memory.  The different  options are enumerated by std::memory_order.  This an advanced  topic. In most cases, the default used the by the standard   library  std::memory_order_seq_cst  is the safest way to go.

Let’s talk about  what std::memory_order_seq_cst   means:

In modern PCs, there are often more than one CPU (cores)  each one of the cores had its own  memory cache.  Values taken from memory can be  stored temporarily  in the cache memory for quick retrieval.   So different  threads running on different cores  might not observe changes to memory at the same time. It could be that a memory location had changed but  a thread still  has an older cached value. It is, therefore, not guaranteed that all threads  have the same view of memory.  This is where  std::memory_order_seq_cst  steps in.  Atomic load/store  operation executing with this option ensures that all threads will have the same view of memory, always and for all atomic using this option. So if a store/load operation happened first for one thread, this operation will also happen first from the point of view of other threads.
As always nothing comes for  free, in this case, the cost are lost optimization opportunities for the compiler and CPU.

Consider this example (taken from here)

	#include <thread>
	#include <atomic>
	#include <cassert>

	std::atomic<bool> x = { false };
	std::atomic<bool> y = { false };
	std::atomic<int> z = { 0 };

	void write_x()
	{
		x.store(true, std::memory_order_seq_cst);
	}

	void write_y()
	{
		y.store(true, std::memory_order_seq_cst);
	}

	void read_x_then_y()
	{
		while (!x.load(std::memory_order_seq_cst))
			;
		if (y.load(std::memory_order_seq_cst)) {
			++z;
		}
	}

	void read_y_then_x()
	{
		while (!y.load(std::memory_order_seq_cst))
			;
		if (x.load(std::memory_order_seq_cst)) {
			++z;
		}
	}

	int main()
	{
		std::thread a(write_x);
		std::thread b(write_y);
		std::thread c(read_x_then_y);
		std::thread d(read_y_then_x);
		a.join(); b.join(); c.join(); d.join();
		assert(z.load() != 0);  // will never happen
	}

The claim is that the assert in line 45 will always  pass. Let’s follow this through :
Either of the 4 threads can execute first.  The while loops at lines 22,31   will cause the threads running the  read_x_then_y()  and  read_y_then_x()  methods to wait until  either  x or y  are assigned a value.   Let’s assume x is updated first:

the read_x_then_y  thread  exits the while loop and  moves to check the value of y.
the read_y_then_x thread is still looping,  waiting for y to get updated.   By the order imposed by std::memory_order_seq_cst, this thread has the same view of memory as the other threads. So if the read_x_then_y thread does not detect a change in y   neither will read_y_then_x .  So,  y value in line 24  evaluate to false , the z variable is not increased and the thread is done.

Eventually,  y  get updated by the write_y  thread.
read_y_then_x thread exits the loop at line 31 and moves to the next statement where it examines the  value of x.  As before  the std::memory_order_seq_cst total (across threads) order guarantees that the read_y_then_x  thread  will have the same view of memory as read_x_then_y,   Since that thread already seen an updated x.   read_y_then_x  will also see an x  with a true value. So the if statement on line 33 evaluate to true and z  value increases  satisfying the assert on line 45.

The same chain of arguments works the same if you swap x and y.  So in all cases, the assert will not trigger an error.

 

Note: We could drop the  std::memory_order_seq_cst argument in the load and store methods as  they are defined as:

T load( std::memory_order order = std::memory_order_seq_cst )
void store( T desired, std::memory_order order = std::memory_order_seq_cst );
Conclusion

The fundamental principals of atomics  are not that different from using synchronization objects to manage shared memory access from multiple threads.    Having said that, using atomics let you take advantage of the highly optimized  expertly crafted code written for the standard library.  In addition, it makes the code more readable by hiding most of the thread safe code inside the atomic object.

If you  like to know about the other memory order models check these pages:

GCC Wiki – Memory model synchronization modes:  a well written friendly explanation with simple examples.
cppreference.com – std::memory_order : an in-depth explanation of the different memory access options with full examples.