The Initialization List’s Illusionist Show

Consider the following  source:

 #include "b.h"
    class A
    {
    public:
        A();
        int get_i();

    private:
        B b;
        int i;
    };
a.h

 

#include "a.h" 
#include < iostream >

 A::A() : i(100), b(this){} 
 int A::get_i(){ return i; }
a.cpp
class A;
class B
{
public:
    B(A* _a);

private:
    A* a;
};
b.h
#include "a.h"
#include "b.h"
#include <iostream>

B::B( A* _a) : a(_a) 
{
  std::cout << "running B constructor "  << '\n';
  std::cout << "A::i value : " << a->get_i() << '\n';
} 
b.cpp
#include "a.h"
int main(void)
{
    A a;
    return 0;
}
main.cpp

Looking at the A class constructor’s  code you might expect the output to be:


Running B constructor
A::i value : 100

However,  after running the code on my  machine I got:

Running B constructor
A::i value : 1866555446

b  is the last member in the initialization list.  More to the point, it is located after the i member variables.  Yet, It appears as if i contains junk value when the B::B() constructor retrieves its value through the a this pointer.

So, what’s going on?

As a general principal, Having this in an initialization list should be done with some care.   However,  in this specific example,  the root cause is C++’s take on an illusionist act:  While the developer is busy carefully positioning variables in the initialization list, he is unaware the initialization order was already set without him noticing!

Variables initialization order is set by the order the variables are declared in the class,  not by the order they appear in the initialization list.  b is declared in  A  before i.  So,  when an A  object is constructed, The b member (declared first A class deceleration) is initialized first with the value supplied by the initialization list. The supplied value happen’s to be a pointer to a  half-baked A object with an uninitialised i member producing the junk value.

Conclusion

Declaring class variables doesn’t seem significate beyond the simple act of specifying data members needed to define the object’s state.  In effect, the class member declaration is important in subtle ways,  from object initialization to the way it is layout in memory.

Ordered Vs. Unordered Sets

In mathematics, a set has no order. The sets  {1,2,3}  and {3,2,1}   are equivalent.  STL’s std::set diverges from the mathematical definition and imposes an order on its elements. Typically the set is implemented internally as a red-black tree.  a compare predicate (by default the std::less <T>  operator) determines the position of a new element in the tree.   It allows for efficient iteration over the elements for a price:   insertion, removal and search of an element cost an average of  O(log n).  This is not too bad for small sets, however, it can become an issue for larger sets or sets accessed from performance critical code. 

If all you need is a container that will keep track weather and object is contained in a set or not.   You are better off using  unordered_set  (introduced in C++ 11) its functionality is similar to the pre-C++ 11  STL set,  Only it is implemented as a hash table with an average, insert, delete and search  times of O(1)    with large sets the difference is noticeable:

#include <iostream>
#include <chrono>
#include <unordered_set>
#include <set>
#include <string>

template <typename TimeT = std::chrono::milliseconds> struct measure {
    template <typename F, typename... Args> static typename TimeT::rep execution(F func, Args&&... args)
    {
        auto start = std::chrono::system_clock::now();
        // Now call the function with all the parameters you need.
        func(std::forward<Args>(args)...);
        auto duration = std::chrono::duration_cast<TimeT>(std::chrono::system_clock::now() - start);
        return duration.count();
    }
};

std::string baseString = "abcdefghi";

class SetContainer
{
public:
    void populate_unordered_set()
    {
        std::string s = baseString;
        while(std::next_permutation(begin(s), end(s))) {
            unorderedSet.insert(s);
        }
    }

    void populate_ordered_set()
    {
        std::string s = baseString;
        while(std::next_permutation(begin(s), end(s))) {
            orderedSet.insert(s);
        }
    }

    bool ordered_set_has(std::string val)
    {
        return (orderedSet.find(val) != orderedSet.end());
    }

    bool unordered_set_has(std::string val)
    {
        return (unorderedSet.find(val) != unorderedSet.end());
    }

    long iterate_ordered_set()
    {
        long count = 0;

        for(auto& s : orderedSet) {
            count++;
        }
        return count;
    }

    bool iterate_unordered_set()
    {
        long count = 0;
        for(auto& s : unorderedSet) {
            count++;
        }
        return count;
    }

private:
    std::unordered_set<std::string> unorderedSet;
    std::set<std::string> orderedSet;
};

using nsMeasure = measure<std::chrono::nanoseconds>;

int main(int argc, char** argv)
{
    SetContainer setContainer;
    std::cout << "Unordered set population time (ns) :"
              << nsMeasure::execution(std::bind(&SetContainer::populate_unordered_set, &setContainer)) << '\n';
    std::cout << "Ordered set population time (ns) :"
              << nsMeasure::execution(std::bind(&SetContainer::populate_ordered_set, &setContainer)) << "\n\n";

    std::string searchVal = "abcdefgih";
    std::cout << "searching for " << searchVal << " :\n";

    std::cout << "Unordered Set (ns) :"
              << nsMeasure::execution(std::bind(&SetContainer::unordered_set_has, &setContainer, searchVal)) << '\n';
    std::cout << "Ordered Set (ns) :"
              << nsMeasure::execution(std::bind(&SetContainer::ordered_set_has, &setContainer, searchVal)) << "\n\n";

    std::cout << "Unordered set's elements iteration (ns) :"
              << nsMeasure::execution(std::bind(&SetContainer::iterate_unordered_set, &setContainer)) << '\n';
    std::cout << "Ordered set's elements iteration (ns) :"
              << nsMeasure::execution(std::bind(&SetContainer::iterate_ordered_set, &setContainer)) << "\n\n"   ;

    return 0;
}


Unordered set population time (ns) :225967000
Ordered set population time (ns) :277887000

searching for abcdefgih :

Unordered Set (ns) :2000
Ordered Set (ns) :6000

Unordered set’s elements iteration (ns) :29006000
Ordered set’s elements iteration (ns) :17211000

(The handy measure template is from Nikos Athanasiou’s answer on Stackoverflow.com)

The program above was compiled with clang in release mode and -O2 flag.

As you can see from the output,  inserting element in the unordered  set, took nearly half of the time it took to insert an element in the ordered  set.  The unordered container search time  is third of the ordered set’s  search time.  On the other hand iterating over the unordered set  takes twice as much as the unordered set .

Conclusion

Like a cable subscription: there is really no reason to pay for the all inclusive plan when you are only going to watch the sci-fi channel.  If all you want is to keep track of a bunch of objects and you rarely need to iterate over the elements in the set, consider using the unordered set for better performance. 

Atomic Basics

Last week I’ve  encountered  some confusion around the way atomic work in C++.  Some  developers believe  wrapping any data structure with atomic will  make it,  by some magic,  thread safe.  They are then perplexed by the source not compiling or even worse compiling but not behaving as expected.

Reading the fine print in the C++ standard  [ISO/IEC 14882:2014(E)  Ch. 25.5 ]   reveals that the  T argument in the atomic template is required to be trivially copyable .    What does it mean?   a  type is trivially copyable  if its creation  or destruction doesn’t involve more than allocation or releasing memory –  similar to C’s  old style malloc.  The standard of course is more specific  you can find the full list of requirements in  [ISO/IEC 14882:2014(E)  Ch. 9 ]

So all primitives such as intchar  array  types are trivially copyable.  Since their creation requires no  more than allocation space for them in memory.  These types can be copied  safely using memcpy  . The standard library provides a handy method  called  std::is_trivially_copyable  to determine if a type is  trivially copyable:

int main()
{

    std::cout << "int  trivially copyable ? " << std::is_trivially_copyable<int>::value << " \n";
    std::cout << "std::array  trivially copyable ? " << std::is_trivially_copyable<std::array<char, 16> >::value << " \n";
    std::cout << "std::string  trivially copyable ? " << std::is_trivially_copyable<std::string>::value << " \n";

    return 0;
}

int trivially copyable ? 1
std::array trivially copyable ? 1
std::string trivially copyable ? 0

Coping memory in a thread safe way is what lays at the core of atomic thread safety.  This explains the  requirement for  trivially copyable types.  In most implementations,  atomic containing  primitive types maintain thread safety  without using a synchronization object (they are lock free).  However, with none-primitive types,  the compiler  will employ  a  lock to ensure the  modifying thread has exclusive access the memory used by the atomic variable. The method std::atomic::is_lock_free can determine which strategy the compiler selected for a type.

Atomic also control the type of order  threads can “observe”  changes to memory.  The different  options are enumerated by std::memory_order.  This an advanced  topic. In most cases, the default used the by the standard   library  std::memory_order_seq_cst  is the safest way to go.

Let’s talk about  what std::memory_order_seq_cst   means:

In modern PCs, there are often more than one CPU (cores)  each one of the cores had its own  memory cache.  Values taken from memory can be  stored temporarily  in the cache memory for quick retrieval.   So different  threads running on different cores  might not observe changes to memory at the same time. It could be that a memory location had changed but  a thread still  has an older cached value. It is, therefore, not guaranteed that all threads  have the same view of memory.  This is where  std::memory_order_seq_cst  steps in.  Atomic load/store  operation executing with this option ensures that all threads will have the same view of memory, always and for all atomic using this option. So if a store/load operation happened first for one thread, this operation will also happen first from the point of view of other threads.
As always nothing comes for  free, in this case, the cost are lost optimization opportunities for the compiler and CPU.

Consider this example (taken from here)

	#include <thread>
	#include <atomic>
	#include <cassert>

	std::atomic<bool> x = { false };
	std::atomic<bool> y = { false };
	std::atomic<int> z = { 0 };

	void write_x()
	{
		x.store(true, std::memory_order_seq_cst);
	}

	void write_y()
	{
		y.store(true, std::memory_order_seq_cst);
	}

	void read_x_then_y()
	{
		while (!x.load(std::memory_order_seq_cst))
			;
		if (y.load(std::memory_order_seq_cst)) {
			++z;
		}
	}

	void read_y_then_x()
	{
		while (!y.load(std::memory_order_seq_cst))
			;
		if (x.load(std::memory_order_seq_cst)) {
			++z;
		}
	}

	int main()
	{
		std::thread a(write_x);
		std::thread b(write_y);
		std::thread c(read_x_then_y);
		std::thread d(read_y_then_x);
		a.join(); b.join(); c.join(); d.join();
		assert(z.load() != 0);  // will never happen
	}

The claim is that the assert in line 45 will always  pass. Let’s follow this through :
Either of the 4 threads can execute first.  The while loops at lines 22,31   will cause the threads running the  read_x_then_y()  and  read_y_then_x()  methods to wait until  either  x or y  are assigned a value.   Let’s assume x is updated first:

the read_x_then_y  thread  exits the while loop and  moves to check the value of y.
the read_y_then_x thread is still looping,  waiting for y to get updated.   By the order imposed by std::memory_order_seq_cst, this thread has the same view of memory as the other threads. So if the read_x_then_y thread does not detect a change in y   neither will read_y_then_x .  So,  y value in line 24  evaluate to false , the z variable is not increased and the thread is done.

Eventually,  y  get updated by the write_y  thread.
read_y_then_x thread exits the loop at line 31 and moves to the next statement where it examines the  value of x.  As before  the std::memory_order_seq_cst total (across threads) order guarantees that the read_y_then_x  thread  will have the same view of memory as read_x_then_y,   Since that thread already seen an updated x.   read_y_then_x  will also see an x  with a true value. So the if statement on line 33 evaluate to true and z  value increases  satisfying the assert on line 45.

The same chain of arguments works the same if you swap x and y.  So in all cases, the assert will not trigger an error.

 

Note: We could drop the  std::memory_order_seq_cst argument in the load and store methods as  they are defined as:

T load( std::memory_order order = std::memory_order_seq_cst )
void store( T desired, std::memory_order order = std::memory_order_seq_cst );
Conclusion

The fundamental principals of atomics  are not that different from using synchronization objects to manage shared memory access from multiple threads.    Having said that, using atomics let you take advantage of the highly optimized  expertly crafted code written for the standard library.  In addition, it makes the code more readable by hiding most of the thread safe code inside the atomic object.

If you  like to know about the other memory order models check these pages:

GCC Wiki – Memory model synchronization modes:  a well written friendly explanation with simple examples.
cppreference.com – std::memory_order : an in-depth explanation of the different memory access options with full examples.