Using User Defined Types With STL Associative Containers

Consider the struct:

struct ProductKey
{
	std::string brand;
	std::string model;
	int type;
	int version;
};

Using  it in a  std::set will generate a compile time error.

#include  <set>

int main()
{
	std::set<ProductKey> productSet{
		{ "Best Prod", "M1" , 1, 3 },
		{ "Best Prod", "M2" , 32, 3 },
		{ "Value Prod", "P2" , 32, 3 }
	};
}

binary ‘<‘: no operator found which takes a left-hand operand of type ‘const ProductKey’

std::set internal structure calls for a less functor that given, two ProductKey objects, will determine which should be placed in front of the other in an ordered list.

by default std::set  uses std::less,  as we can see from std::set full declaration (line 4):

   		template<

  		class Key,
  		class Compare = std::less<Key>,
  		class Allocator = std::allocator<Key>

  		> class set;

The default less operator employs the ‘<’ operator to compare the  two keys:

template <class T> struct less {
  bool operator() (const T& x, const T& y) const {return x<y;}
  };

As our ProductKey  do not have  a  ‘<’  operator   defined for it, compilation fails.

One possible solution is  to overload the ‘< ‘  operator for the ProductKey   user type. Once it is defined, the default  std::less implementation  can  use this operator  during compilation.  However,  this will actually be a case of doing  too much.   The only consumer of this new ‘<’  operator will be the associative container. There is no need to expose the new operator to any other component which includes  ProductKey   (if nobody needs it don’t implement it!).  It will be more prudent to do the  minimum needed and simply overload the less functor instead.

defining it in the std namespace will enable us to plug our new less functor into the default std::set declaration:

template<> struct std::less<ProductKey>
	{
		bool operator() (const ProductKey& prod1, const ProductKey& prod2) const
		{
			if (prod1.brand != prod2.brand)
			{
				return prod1.brand < prod2.brand;
			}

			if (prod1.model != prod2.model)
			{
				return prod1.model < prod2.model;
			}

			if (prod1.type != prod2.type)
			{
				return prod1.type < prod2.type;
			}

			return prod1.version < prod2.version;
		}
	};

the empty <>  after the template keyword  in line 1 indicate that this is a specialized template.  That is , when the compiler tries to instantiate the less<T> functor  where T is  ProductKey  it will consider our template definition instead of the more  general  std::less functor defined in the standard library.

We can improve the less  functor’s implementation  by  using  std::tie. This function, included in <tuple>,   creates a tuple out  of  the values supplied to it ,  Since ‘< ‘  is defined for the tuple type  by the standard library ,  our std::less functor can be rewritten as

template<> struct std::less<ProductKey>
{
	bool operator() (const ProductKey& prod1, const ProductKey& prod2) const
	{
		return (std::tie(prod1.brand, prod1.model, prod1.type, prod1.version) <
		std::tie(prod2.brand, prod2.model, prod2.type, prod2.version));
	}
};

which is no only more readable,  It’s  also less error prone and not as boring to write!

So the complete example is:


#include <string>
#include <set>
#include <tuple>


struct ProductKey
{
	std::string brand;
	std::string model;
	int type;
	int version;
};

template<> struct std::less<ProductKey>
{
	bool operator() (const ProductKey& prod1, const ProductKey& prod2) const
	{
		return (std::tie(prod1.brand, prod1.model, prod1.type, prod1.version) < 
			std::tie(prod2.brand, prod2.model, prod2.type, prod2.version));
	}
};

int main()
{
	std::set<ProductKey> productSet{
		{ "Best Prod", "M1" , 1, 3 },
		{ "Best Prod", "M2" , 32, 3 },
		{ "Value Prod", "P2" , 32, 3 }
	};
}
Conclusion

Plugging your code into STL’s containers and algorithms,  Enables  you to  make  your code more robust and maintainable with less effort.  Most important it can make it more fun to write.

Finding extreme values with STL algorimths

Recently  while doing a code review I was surprised to notice that the developer rolled out his own method to find the maximal  element in a  container.  With the STL offering several ways to do just that,  it’s difficult to imagine why  reinventing this particular  wheel justify the effort.

in this post,  I’ll  review 4 ways if finding the maximum and minimal elements in a container.  Choosing the algorithm best suited for your situation depends on the constraints  and the problem you are trying to solve.

max_element / min_element

this is the most trivial algorithm.  It  will scan the entire container searching for the min/max element,  returning an iterator  pointing to it.
If your  compiler supports C++ 11  and up, you can use the minmax_element  method to find  both minimal and maximal elements in the same run.

	vector<int> numbers = {3,7,8,93,11,4,-1,0};
	printVector(numbers);
	auto max = max_element(begin(numbers), end(numbers));
	printMax(*max);

3,7,8,93,11,4,-1,0
max val: 93

Cost:   O(n) as the entire  n elements of the container must be scanned.

Priority Queue

A priority queue is an ordered sequence starting with a zero sized container, each  new element is  inserted into a position that maintains the order of the queue:

	cout << "\n Priority queue - max value\n" << endl;
		vector<int> numbers = { 3,7,8,93,11,4,-1,0 };
		priority_queue<int> max_queue;
		cout << "<empty>" << endl;
		for ( auto n : { 7, 3,8,-1,0 } )
		{
			cout << " insert " << n << endl;
			max_queue.push(n);
			print_queue(max_queue);
		}
		auto  max1 = max_queue.top();
		printMax(max1);
		cout << "removing max value" << endl;
		max_queue.pop();
		print_queue(max_queue);

Priority queue – max value
<empty>
insert 7
7
insert 3
7,3
insert 8
8,7,3
insert -1
8,7,3,-1
insert 0
8,7,3,0,-1
8,7,3,0,-1
max val: 8
removing max value
7,3,0,-1

It worth pointing out that a queue structure is built over another container overriding  push and pop operations of  the original container ensure the priority structure maintains its order after an element is inserted or removed from the queue.

By default the order of the elements is determined by std::less :  suppose there are two elements x,y than – x will be placed before y if x < y  (max queue).
Your can create a min queue by using std::greater instead of std::less :

			cout << "Minimum max_queue" << endl;
			priority_queue<int, vector<int>, greater<int> > min_queue;
			cout << "<empty>" << endl;
			for (auto n : { 7, 3,8,-1,0 })
			{
				cout << " insert " << n << endl;
				min_queue.push(n);
				print_queue(min_queue);
	
			cout << "Minimum max_queue" << endl;
			priority_queue<int, vector<int>, greater<int> > min_queue;
			cout << "<empty>" << endl;
			for (auto n : { 7, 3,8,-1,0 })
			{
				cout << " insert " << n << endl;
				min_queue.push(n);
				print_queue(min_queue);
	
Partial Sort

as its suggests –  partial sort will only sort part of the container so that the first m elements  are sorted.  It’s a good strategy to employ if you need a set of the first extreme values.

	cout << "\n Partial Sort \n" << endl;
	vector<int> numbers = { 13,7,8,93,11,4,-1,0 };
	printVector(numbers);
	cout << "Sorting 3 frist elements" << endl;
	partial_sort(numbers.begin(), numbers.begin() + 3, numbers.end());
	printVector(numbers);

13,7,8,93,11,4,-1,0
Sorting first 3 elements
-1,0,4,93,13,11,8,77

As you can see the first 3 values in the list are sorted in descending order. Values past the 3’d elements has no predictable order.

Accessing the max element can be done by simply iterating over the first m elements of the sorted container or by using partial_sort_copy that will sort the container and copies the result to a different container

Cost: sorting the first m elements out of a – n size container will cost O(n lg (m))  which is nearly linear for small m in the. The cost of Inserting elements is  container dependent (e.g. it will cost O(1) for a list).

Heap

You can use an existing container is a heap by calling make_heap. The containers elements order will change so that the extreme value will be on  top of the heap.


		cout << "\n Heap \n" << endl;
		vector<int> numbers = { 3,7,8,93,11,4,-1,0 };
		printVector(numbers);
		make_heap(numbers.begin(), numbers.end());
		cout << "\nafter make_heap: " << endl;
		printVector(numbers);

Getting the extreme element and removing it are three distinct operations:

  1.  Getting the max/min element is done with the containers front method.
  2.  calling the heap pop_heap to move the current extreme element to the end of the container.
  3. Calling the container’s pop_back to remove the value from the container.
int max1 = numbers.front();
		cout << "\nmax value : " << max1 << endl;
		pop_heap(numbers.begin(), numbers.end());  
		cout << "\nafter pop_heap, max value is now last:" << endl;
		printVector(numbers);
		cout << "\nremove  max  element " << endl;
		numbers.pop_back();    // vector / list no re-allocation triggered
		printVector(numbers);

Note:  I almost included the heap method as an honorable mention, while is  possible to use a heap. It’s very easy to mess it up by calling the underlying container methods. Honestly, I find it difficult to imagine a scenario where a bare heap will be useful  for this purpose, but  If you do need to use this method, you’ll probably want to hide the container in another class, exposing specialized insert/retrieve method that will make sure the heap structure is maintained.

Cost: the initial cost of creating the heap is O(3n), once the heap is created it will cost O(2lg n) to retrieve the minimum/maximum value and O(lg n) to push a new value into the container with push_heap.

Conclusion

When it comes to finding a max element, the STL offers something to everyone. There have to be convincing arguments to justify implementing your own algorithm instead of using the fully field tested and super optimized implementation of the standard library.

I am sure there are more ways to find the min/max values,   leave a comment if you have other ways of extracting the min/max using the standard library. 

Raw pointers are also Iterators!

Using raw pointers (a.k.a. naked pointers) is never a good first choice. Being high maintenance and error prone creatures, you’re far better off using one of the alternatives offered by the standard library.
Having said that, sometimes you simply do not have a choice. You might be working with a block of memory passed on by a hardware component, or you might be stuck with some critical legacy library that “just works” and nobody dares touch today.
In Any case, using  naked pointer doesn’t mean you have to go full commando yourself. Many of the STL algorithms would accept, with little or no complaints,  raw pointers. an iterator is, after all , a  generalized  pointer:

#include <iostream>
#include <algorithm>
#include <array>
using namespace std;


void PrintArray( int* pBegin, int* pEnd)
{
	cout << *pBegin++;
	for_each(pBegin, pEnd,[](int i) { cout << ',' << i; } );
	cout << endl << endl;
}

int main()
{
	// an integer array
	const auto integer_list =  { 19,14,3,5,20,11,10,18,1,17,9,6,7,2,13,12,4,16,8,15 };
	
	const size_t block_size = integer_list.size();
	int* pMemBlock = new int[block_size];
	int* pBegin = pMemBlock;
	int* pEnd = (pBegin + block_size);

	//copy the content of the initializer list to the memory block 
	std::copy(integer_list.begin(), integer_list.end(), pBegin);
	cout << "The list of numbers in the memory block:" << endl;
	PrintArray(pBegin, pEnd);

	int count = count_if(pBegin, pEnd, [](int n) {return  n < 5; });
	cout << count  <<  " numbers are less than 5\n" <<  endl;
	
	std::sort(pBegin, pEnd);
	cout << "sorted list:" << endl;
	PrintArray(pBegin, pEnd);
	cout << endl;
	
	random_shuffle(pBegin, pEnd);

	cout << "shuffled  list:" << endl;
	PrintArray(pBegin, pEnd);

	delete[] pMemBlock;
}

The list of numbers in the memory block:
19,14,3,5,20,11,10,18,1,17,9,6,7,2,13,12,4,16,8,15

4 numbers are less than 5

sorted list:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20

shuffled list:
5,11,12,16,15,17,18,2,7,10,4,8,20,3,1,13,6,19,14,9

Since the array size is not known at compile time, it’s up to you to accurately calculate the begin and end  pointers (lines 22,23) ,  once these pointers contain the proper addresses,  they can be passed around  just like any other iterator.

note:  if it was a static array, it would have been possible to apply the begin() and end() functions  (overloaded for static arrays by the standard library):

	int arr[]{ 28,4,26,21,20,22,11,12,14,8,17,30,13,15,9,7,19,1,16 };
	cout << "The list of numbers in the static array :" << endl;
	PrintArray(begin(arr), end(arr));