File hunting: the express way (windows)

Searching for files is a recurring part in our work as developers.  It might be messing header file,  a file containing a function  name, a configuration file with a specific string etc.

Everything Search

a recurring scenario for me is cloning a repository from Git and getting this message while trying to compile it: “Cannot open include file: ‘someheader.h’: No such file or directory” 

I start to wonder – do I have this file at all?  the obvious solution is running a search on the file system. It’s not  a terribly long process but still it gets me out of the zone.

Enter  Everything Search – one of the pieces of software I install as soon as possible in a new  Windows development machine.  This little free utility reads the NTFS master file table directly as its starts up. Using the NTFS ready-made index, allows it to be ready for action in an incredibly short time  even when there are hundreds of  thousands of files.  Once the index is read it is updated in real time by getting file system change  notification from  the OS.

a search such as the one  described above, takes as  much time as it takes to enter the file name into the search box. The search results just snap into place covering your entire drive(s).  You can fix the build without breaking  your stride.

this utility is also invaluable when searching for none source files like configuration files and other resource files

image

Files can be matched by partial name ,  by regex  and other useful criteria, found in the search menu.  The  list of files can be sorted by clicking on the column header.

Docfetcher

While search everything is fast, it will not be able to provide you any information about the content your files.
suppose you are working on a team implementing server/client  protocol.

You, on the server side, want to remove a  string value declared in a protocol used  by both client and server. You are pretty sure nobody is using this value in the client sources, so you can safely remove it.  However,  being a responsible developer you do a  text search over  the entire client code – a no brainer albeit discarding task.  Now,   If you could only Google for the string like you do everything else….

as a matter of fact, you can!

Docfetcher is an open source desktop file indexing  utility.  It will  turn a folder and its subfolders  into  a searchable index,  as files  added/removed/modified in  the tracked folders  the index gets updated automatically.  Your text search is completed  in a fraction of the  time it will take you to do a full-text search on your files.

image

creating the index is as easy as adding your source root directory to the search scope, by right-clicking on it:

image

Once you select the directory, you can  add more files types to the plain text file list:

image

Pressing on the (…) button next to the field lets you add additional files extensions  available in the folder tree:

image

You can repeat the process, selecting additional directories – for example any other location where configuration files are stored.

DocFetcher is open source,  so yo can assure yourself by examining the utility source code , that  sensitive files  (like configuration files with database user names and passwords)  are not shared behind your back,    Like the ill-fated Google Desktop from a while ago

In addition to the rapid plain text search, which I find the most useful , DocFetcher has a search query syntax enabling you to easily query your files content. For example  “int number”~5   will find any file where number is up to 5 words apart from int ;    so this search will find both of these lines :

int number;
int index, number, value;
conclusions

While you can easily do both types of  searches with a  plain vanilla  grepdir or find  commands,  I  often find it more productive to use these tools instead, and use my time for actual development.

C++ Variadic templates from the ground up

Its  the new projec’s kickoff meeting. You’ve  been assigned one of the most important tasks: writing a sum function  (yes I know,  work with me here..)

the function needs to accept two values and return their sum. You implant the function and try it out:

double sumf_v1(double a, double  b) 
{
	return a + b;
}

 

int main()
{
	double a = 1;
	double b = 2;

	double  theSum = sumf_v1(a,b);
	cout << "the sum is: " << theSum << endl;
	return 0;
}

the sum is: 3

After completing the implementation you notice that every argument is cast to double.  You decide to rewrite sumf  as a  template function:

template <typename T>
double sumf_v2(T a, T  b)
{
	return a + b;
}

In this version, if the user calls sunf with two integers  the addition operation will be done on two integers leaving only the result to be cast to double,  which is cleaner.

However,  your unit tests reveal a problem:  This test program do not compile,  with several errors. One of which is  ‘double sumf_v2(T,T)’: could not deduce template argument for ‘T’ from ‘int’:

int main()
{
	double  theSum = sumf_v2(1.5,2);
	cout << "the sum is: " << theSum << endl;
	return 0;
}

The compiler has found the first argument in the sumf call   sumf_v2(1.5,2)  and deduced its type as  float (so the template type is set to a float type)
moving over to the next argument, sumf_v2(1.5,2)  the compiler decides  it’s of type int ,  so now the compiler tries to set T to int,  but T is already a float, so compilation stops and a complaint is issued to you.

Fortunately, this is easily fixable by adding another template type:

template <typename T1, typename T2>
double sumf_v3(T1 a, T2  b)
{
	return a + b;
}

Now each value has its own independent  type.  Any  combination of numerical types will compile. All the unit tests compile and pass!  you commit your code and move on.

a couple of days later  you get an email from the project’s lead , your new  function  works great, it’s just that the requirements have changed and now sumf needs to sum-up either 2 or 3 numerical.

Leveraging on your earlier efforts you create and overload of  sumf_v2.
Now have these two functions:

template <typename T1, typename T2>
double sumf_v2(T1 a, T2  b)
{
	return a + b;
}

template <typename T1, typename T2, typename T3>
double sumf_v2(T1 a, T2  b, T3 c)
{
	return a + sumf_v2(b,c);
}

You can now compile and run this program:

int main()
{
	double  theSumOfTwo   = sumf_v2(1.5,2);
	double  theSumOfThree = sumf_v2(1.5,2,3);
	cout << "the sum of two is: " 
	  << theSumOfTwo
	  << " The sum of three is: "
	  << theSumOfThree 
	  << endl;
	return 0;
}

Before you get a chance to  commit your code,  another email arrives informing you requirements have changed again:  in some cases, your function will need to sum 4 arguments!

You wearily  create yet another overload for sumf_v2 

template <typename T1, typename T2, typename T3, typename T4>
double sumf_v2(T1 a, T2  b, T3 c, T4 d)
{
	return a + sumf_v2(b, c, d);
}

so now your sum function  can  sum 2,3 or 4 arguments. But, that not enough for the greedy project lead as he drops by and asks you to also support 5 arguments.

Ok, you’ve  had it! there simply  must  be a better way! besides rewriting the code and adding overloads.  Well, you can have the compiler write all these functions for you when they are needed!

If the long forward didn’t make you forget what this post is all about, you probably  guessed the solution  will involve variadic templates.

let’s define another version for sumf at this time we’ll  define only the prototype as the implementation requires one more step

template <typename... AllT>
double sumf_v3(AllT.. ts)

the  funny typename  keyword with the three dots  in the first line  is called a parameter pack.  It’s our way to tell the compiler that this function template can take any number of arguments (including  0 –   that is no arguments at all)

It’s similar to what we’ve manually done with sumf_v2, by creating overloads  :

template <typename T1, typename T2>
double sumf_v2(T1 a, T2  b);
  
template <typename T1, typename T2, typename T3>
double sumf_v2(T1 a, T2  b, T3 c);

template <typename T1, typename T2, typename T3, typename T4>
double sumf_v2(T1 a, T2  b, T3 c, T4 d);

we told the compiler our function might have 2,3 or 4 arguments.  With the  parameter pack declaration,  we are telling the compiler to create templates with these types:

template <>
template <typename T1>
template <typename T1, typename T2>
template <typename T1, typename T2, typename T3>
template <typename T1, typename T2, typename T3, typename T4>
.
.
.

But which of these overloads   the compiler should create?  this list can be very long.  The answer – only the functions we need – that is the ones that actually get called in the code.  So if there is a function call trying to sum-up  5 arguments the compiler will create :

template <typename T1, typename T2, typename T3, typename T4, typename T5>
double sumf_v3(T1 a, T2  b, T3 c, T4 d, T5 e);

We can employ the strategy used while expanding  sumf_v2 to support more than two values:  calculate sum of arguments by adding the first argument to the  sum of the other n-1  arguments. It’s a recursive pattern that pops up a lot when dealing with variadic templates.   Look again at the implementation of  sumf_v2(T1 a, T2  b, T3 c)  and try to identify this pattern.

To get the  system,  we developed for sumf_v2 ,  to work,  we need a way to isolate the value of the first argument and add it to the sum of the rest argument.  Just we did in the sumf_v2 overloads.

The way to do it in the template function is to define it in a way that  gives the first argument a special name:

template <typename T, typename... RestOfTs>
double sumf_v3(T a , RestOfTs... ts)
{
	return a + sumf_v3(ts...);
}

In this implementation, the first argument gets a type T and a special name a.   The three dots after the ts argument  are called  parameter unpacking it tells the compiler to  pass the rest of the arguments  to the next function.  So in the function call sumf_v3(1, 4, 5.5, 7, 89).  The variable a will be  assigned the value 1 and the generated function implementation will look something like
return  a + sumf_v3(4, 5.5, 7, 89) ;
This is exactly what we wanted!

There is one more small roadblock between us and sumf bliss.  This code will not compile, just yet, giving an error  sumf_v3′: no matching overloaded function found
what?  didn’t we just define a function that will match any possilbe number of  argument ?  its true that    typename…  define  0 or more arguments,  but   our declaration is:

template <typename T, typename... RestOfTs>
double sumf_v3(T a , RestOfTs... ts);

That is one argument followed by a list of 0 or more other arguments. So there has to be at least one argument. Consider the implementation  generated  for sumf_v3(1,2):

return 2 + sumf_v3(3, )

You should read the weird function call  sumf_v3(3, )  as  sumf_v3(3).  I wanted to emphasize the empty list as the second argument . The   implementation for this function call  is :

return 3 + sumf_v3();

which not defined since there has to be at leat the first argument (T a),   The way to resolve this is to give the compiler a better choice when sumf gets called with only one argument :


template <typename T>
double sumf_v3(T a)
{
	return a;
}

When there is  only  one argument,  the compiler will  not try to recursively call a function with an empty list.   Instead , it will just return the received value.  That’s the recursion termination condition:

sumf_v3(1,2)   expands to
return 1 + sumf_v3(2)   expands to
return 1 + 2

so now we have a  pair  of templates

template <typename T>
double sumf_v3(T a)
{
	return a;
}

template <typename T, typename... RestOfTs>
double sumf_v3(T a , RestOfTs... ts)
{
	return a + sumf_v3(ts...);
}

That define a way to generate, on demand, a sum function accepting any number of arguments of any numerical type.  No additional coding needed!

Final notes

Like regular templates, variadic templates can also be used when defining classes and structures.  There are  several methods  in C++ your use to control  the generated code. For example, the template can examine  whether or not a type is a POD (Plain Old Data) and generate different code   for each case. this is called meta programming and will be the subject of a future post.

It’s important to understand that like any other  code generated from  templates, variadic or not,   any  decision  made  is  done in compile time.  By the time  the compiler finish it job all functions and types have defined static attributes in terms of  arguments number and types.

 

Online C++ Compilers

I must confess, at first I was skeptical  the usefulness of an online C++ compiler.  While I’m not quite ready to develop an entire project using an online  environment, I do find that an online C++ compiler is a handy tool to have around. I use it as a scratchpad  when I am considering coding strategies, It is also helpful when you’re  collaborating with a colleague  or want to   illustrate a point  without  going to the trouble of creating compiling and linking a small C++ program.

here are few sites I found useful:

cpp.sh:

With its short and easily  remembered URL this online compiler is my go-to tool for quick jotting.
The behind the scenes compiler is g++ (GNU C++ compiler),  the website offers  limited and uncomplicated control over the  compiler switches.

webcompiler.cloudapp.net

This  website is maintained by the visual C++  team at Microsoft,  as  you can expect it hosts the latest version of  Visual C++ (currently 2015).
The web interface is very basic , it lets you  pass  command line arguments to the compiler with some limitation.    However if VC is your primary work tool and you want  to try out  a code snippet, it’s a good tool to try.

tutorialspoint.com:

an online C++ compiler is just a small part of what this website has to offer,  it is an amazing playground for  nearly 100 online languages   IDE’s   no registration needed!
As the site name suggests  there it offers a large selection of online tutorials covering nearly every aspect of software development.
Among  the available IDE’s  there are three   C++ IDEs,  which are in fact the same interface   pre-configured to comply with  three different C++  standard versions.  As you get full access to the  g++ command line you can easily swap between  supported standards at any given “IDE”  simply by changing the  command line’s   -std   switch.  For instance
  g++ -std=c++17 -o main *.cpp  will compile your program with C++ 17 compiler although it is no listed  as an available  IDE.
clicking the compiler button on the top of the editor area, generate a command line in the terminal pane ,  you can modify.

 

ideone.com

This is another site offering a selection of over 60+ online compilers for different  languages. However what makes this website stand out is how easy it is to privately share the source with the entire build environment.   each virtual IDE has its own URL (click run to generate  the unique  address).  There is no need to register or login to share your  code.  Once you share the  URL the receiver can view your code or fork it to a new IDE (with new ULR) to  run or modify it.
the compiler powering the online IDE is  g++  or clang

Rextester

This site takes code sharing to the next level and allows multiple developers to modify  the same code at the same time (think google docs for developers). It offers a simple solution if want to discuss a code  with a  remote colleague without the hassle of sharing your desktop.  there is a  small selection of different languages  online environments. For C++  you have  a choice between g++ , clang  or VC++ 2015 compilers  (at this time).

Onlinegdb

This is a C++ compiler / Debugger combo that will actually let you debug your code online!   the compiler is g++ supporting  C++ 98 and C++ 14.
clicking debug opens a gdb terminal windows where you can some do some online debugging of your program:

Many thanks to Rajan for letting me know about this gem!

final note

Online C++ compilers are a handy tool to have in your virtual toolbox ,  however,  it worth pointing out that the code you are submitting is saved and compiled on an unknown remote machine, you should be wary of submitting sensitive code,  you wouldn’t feel safe posting on an  online C++ discussion forum.

 

Finding extreme values with STL algorimths

Recently  while doing a code review I was surprised to notice that the developer rolled out his own method to find the maximal  element in a  container.  With the STL offering several ways to do just that,  it’s difficult to imagine why  reinventing this particular  wheel justify the effort.

in this post,  I’ll  review 4 ways if finding the maximum and minimal elements in a container.  Choosing the algorithm best suited for your situation depends on the constraints  and the problem you are trying to solve.

max_element / min_element

this is the most trivial algorithm.  It  will scan the entire container searching for the min/max element,  returning an iterator  pointing to it.
If your  compiler supports C++ 11  and up, you can use the minmax_element  method to find  both minimal and maximal elements in the same run.

	vector<int> numbers = {3,7,8,93,11,4,-1,0};
	printVector(numbers);
	auto max = max_element(begin(numbers), end(numbers));
	printMax(*max);

3,7,8,93,11,4,-1,0
max val: 93

Cost:   O(n) as the entire  n elements of the container must be scanned.

Priority Queue

A priority queue is an ordered sequence starting with a zero sized container, each  new element is  inserted into a position that maintains the order of the queue:

	cout << "\n Priority queue - max value\n" << endl;
		vector<int> numbers = { 3,7,8,93,11,4,-1,0 };
		priority_queue<int> max_queue;
		cout << "<empty>" << endl;
		for ( auto n : { 7, 3,8,-1,0 } )
		{
			cout << " insert " << n << endl;
			max_queue.push(n);
			print_queue(max_queue);
		}
		auto  max1 = max_queue.top();
		printMax(max1);
		cout << "removing max value" << endl;
		max_queue.pop();
		print_queue(max_queue);

Priority queue – max value
<empty>
insert 7
7
insert 3
7,3
insert 8
8,7,3
insert -1
8,7,3,-1
insert 0
8,7,3,0,-1
8,7,3,0,-1
max val: 8
removing max value
7,3,0,-1

It worth pointing out that a queue structure is built over another container overriding  push and pop operations of  the original container ensure the priority structure maintains its order after an element is inserted or removed from the queue.

By default the order of the elements is determined by std::less :  suppose there are two elements x,y than – x will be placed before y if x < y  (max queue).
Your can create a min queue by using std::greater instead of std::less :

			cout << "Minimum max_queue" << endl;
			priority_queue<int, vector<int>, greater<int> > min_queue;
			cout << "<empty>" << endl;
			for (auto n : { 7, 3,8,-1,0 })
			{
				cout << " insert " << n << endl;
				min_queue.push(n);
				print_queue(min_queue);
	
			cout << "Minimum max_queue" << endl;
			priority_queue<int, vector<int>, greater<int> > min_queue;
			cout << "<empty>" << endl;
			for (auto n : { 7, 3,8,-1,0 })
			{
				cout << " insert " << n << endl;
				min_queue.push(n);
				print_queue(min_queue);
	
Partial Sort

as its suggests –  partial sort will only sort part of the container so that the first m elements  are sorted.  It’s a good strategy to employ if you need a set of the first extreme values.

	cout << "\n Partial Sort \n" << endl;
	vector<int> numbers = { 13,7,8,93,11,4,-1,0 };
	printVector(numbers);
	cout << "Sorting 3 frist elements" << endl;
	partial_sort(numbers.begin(), numbers.begin() + 3, numbers.end());
	printVector(numbers);

13,7,8,93,11,4,-1,0
Sorting first 3 elements
-1,0,4,93,13,11,8,77

As you can see the first 3 values in the list are sorted in descending order. Values past the 3’d elements has no predictable order.

Accessing the max element can be done by simply iterating over the first m elements of the sorted container or by using partial_sort_copy that will sort the container and copies the result to a different container

Cost: sorting the first m elements out of a – n size container will cost O(n lg (m))  which is nearly linear for small m in the. The cost of Inserting elements is  container dependent (e.g. it will cost O(1) for a list).

Heap

You can use an existing container is a heap by calling make_heap. The containers elements order will change so that the extreme value will be on  top of the heap.


		cout << "\n Heap \n" << endl;
		vector<int> numbers = { 3,7,8,93,11,4,-1,0 };
		printVector(numbers);
		make_heap(numbers.begin(), numbers.end());
		cout << "\nafter make_heap: " << endl;
		printVector(numbers);

Getting the extreme element and removing it are three distinct operations:

  1.  Getting the max/min element is done with the containers front method.
  2.  calling the heap pop_heap to move the current extreme element to the end of the container.
  3. Calling the container’s pop_back to remove the value from the container.
int max1 = numbers.front();
		cout << "\nmax value : " << max1 << endl;
		pop_heap(numbers.begin(), numbers.end());  
		cout << "\nafter pop_heap, max value is now last:" << endl;
		printVector(numbers);
		cout << "\nremove  max  element " << endl;
		numbers.pop_back();    // vector / list no re-allocation triggered
		printVector(numbers);

Note:  I almost included the heap method as an honorable mention, while is  possible to use a heap. It’s very easy to mess it up by calling the underlying container methods. Honestly, I find it difficult to imagine a scenario where a bare heap will be useful  for this purpose, but  If you do need to use this method, you’ll probably want to hide the container in another class, exposing specialized insert/retrieve method that will make sure the heap structure is maintained.

Cost: the initial cost of creating the heap is O(3n), once the heap is created it will cost O(2lg n) to retrieve the minimum/maximum value and O(lg n) to push a new value into the container with push_heap.

Conclusion

When it comes to finding a max element, the STL offers something to everyone. There have to be convincing arguments to justify implementing your own algorithm instead of using the fully field tested and super optimized implementation of the standard library.

I am sure there are more ways to find the min/max values,   leave a comment if you have other ways of extracting the min/max using the standard library.