Auto Escaping C++ Strings

Just a few weeks after celebrating C++’s  raw strings.  I found myself writing  regular expressions  in Visual Studio 2012. I was disappointed to find out VS 2012  it did not support raw strings.     It’s streaking just how half-baked VS 2012  is.  I think it safe the assume Microsoft Visual Studio’s marking team insisted  on sticking to a 2-year product release cycle while the technical team protested they need more time to implement the then new C++ 11 standard.  The compromise  was a product with only some of the exciting  new  C++  features in place.  The rest had to wait for VS 2013/2015.   As it turned out one of the casualties were raw strings.

Regex are difficult to read on their on,  writing and reading long escaped regex is a form of torture.   Fortunately the escaping and unescaping  parts follow  simple rules can be done automatically.

tomeko.net  provides tools to escape and unescape a C++ string:

image

The example above escapes a regex matching a positive float enclosed by double quotes:

“\d*\.?\d+”  ->   “\\\d*\\.?\\d+\”

Conclusion

When forced to escape C++ string manually, it’s best to leave this daunting task to a computer  program which will do a faster, better job with less mistake.

Or, be more vocal about upgrading to a newer compiler.

Using User Defined Types With STL Associative Containers

Consider the struct:

struct ProductKey
{
	std::string brand;
	std::string model;
	int type;
	int version;
};

Using  it in a  std::set will generate a compile time error.

#include  <set>

int main()
{
	std::set<ProductKey> productSet{
		{ "Best Prod", "M1" , 1, 3 },
		{ "Best Prod", "M2" , 32, 3 },
		{ "Value Prod", "P2" , 32, 3 }
	};
}

binary ‘<‘: no operator found which takes a left-hand operand of type ‘const ProductKey’

std::set internal structure calls for a less functor that given, two ProductKey objects, will determine which should be placed in front of the other in an ordered list.

by default std::set  uses std::less,  as we can see from std::set full declaration (line 4):

   		template<

  		class Key,
  		class Compare = std::less<Key>,
  		class Allocator = std::allocator<Key>

  		> class set;

The default less operator employs the ‘<’ operator to compare the  two keys:

template <class T> struct less {
  bool operator() (const T& x, const T& y) const {return x<y;}
  };

As our ProductKey  do not have  a  ‘<’  operator   defined for it, compilation fails.

One possible solution is  to overload the ‘< ‘  operator for the ProductKey   user type. Once it is defined, the default  std::less implementation  can  use this operator  during compilation.  However,  this will actually be a case of doing  too much.   The only consumer of this new ‘<’  operator will be the associative container. There is no need to expose the new operator to any other component which includes  ProductKey   (if nobody needs it don’t implement it!).  It will be more prudent to do the  minimum needed and simply overload the less functor instead.

defining it in the std namespace will enable us to plug our new less functor into the default std::set declaration:

template<> struct std::less<ProductKey>
	{
		bool operator() (const ProductKey& prod1, const ProductKey& prod2) const
		{
			if (prod1.brand != prod2.brand)
			{
				return prod1.brand < prod2.brand;
			}

			if (prod1.model != prod2.model)
			{
				return prod1.model < prod2.model;
			}

			if (prod1.type != prod2.type)
			{
				return prod1.type < prod2.type;
			}

			return prod1.version < prod2.version;
		}
	};

the empty <>  after the template keyword  in line 1 indicate that this is a specialized template.  That is , when the compiler tries to instantiate the less<T> functor  where T is  ProductKey  it will consider our template definition instead of the more  general  std::less functor defined in the standard library.

We can improve the less  functor’s implementation  by  using  std::tie. This function, included in <tuple>,   creates a tuple out  of  the values supplied to it ,  Since ‘< ‘  is defined for the tuple type  by the standard library ,  our std::less functor can be rewritten as

template<> struct std::less<ProductKey>
{
	bool operator() (const ProductKey& prod1, const ProductKey& prod2) const
	{
		return (std::tie(prod1.brand, prod1.model, prod1.type, prod1.version) <
		std::tie(prod2.brand, prod2.model, prod2.type, prod2.version));
	}
};

which is no only more readable,  It’s  also less error prone and not as boring to write!

So the complete example is:


#include <string>
#include <set>
#include <tuple>


struct ProductKey
{
	std::string brand;
	std::string model;
	int type;
	int version;
};

template<> struct std::less<ProductKey>
{
	bool operator() (const ProductKey& prod1, const ProductKey& prod2) const
	{
		return (std::tie(prod1.brand, prod1.model, prod1.type, prod1.version) < 
			std::tie(prod2.brand, prod2.model, prod2.type, prod2.version));
	}
};

int main()
{
	std::set<ProductKey> productSet{
		{ "Best Prod", "M1" , 1, 3 },
		{ "Best Prod", "M2" , 32, 3 },
		{ "Value Prod", "P2" , 32, 3 }
	};
}
Conclusion

Plugging your code into STL’s containers and algorithms,  Enables  you to  make  your code more robust and maintainable with less effort.  Most important it can make it more fun to write.

Stop Logging Debug Values, Use Visual Studio’s Trace Points instead!

nearly every company I’ve worked for had a “let’s clean our logs project”.  The trigger for this activity was often  frustrated log uses such as tech supporters , customer service staff and other developers (usually the new ones)   overwhelmed  by the clutter and junk included in the logs.
Some of the log trash  is generated during development,  is not uncommon to see something like

[26/04/2016:14:23:48]  INFO: In nameVerify method (remove after Q4-2014  release)

Or even worse
[26/04/2016:14:23:48]  WARNING:  secstr = DvVP1$XFJ2T
where secstr stands from secret or security ,  and warnign logging was used because it stands out in the logs.

Often these log entries are created during the infant days of the software components.  While some of  them  are an  indication the should  me more unit test  cases,   Other are used sed to extract and examine  internal data  dynamically created by the component on each run.

For this late use, you can use trace points.
trace points are a kind of breakpoints only the debugger can be instructed no to halt the program. The expression is evaluated and the  result  can be printed to the output windows:

consider this simple TraceMe class:

#include  <vector>
#include  <string>

class TraceMe
{
	public: 
	TraceMe() 
	{
		val = 5;
	}
	TraceMe(std::string str) 
	{
		secStruct.secStringVec.push_back(str);
		val = 10;
	}
private:
	int  val;
	struct  InternalStruct
	{
		std::vector<std::string>  secStringVec;
	};
	InternalStruct  secStruct;
};

int main(int argc, char* argv[])
{
	TraceMe  trace;
	return 0;
}

 

Suppose we need to examine the content of the val member after it has been   initialized.  The first step is writing an expression that would evaluate to its contend.  This can be done without typing in a single character :

I placed a breakpoint (the plain vanilla type)  at line 28.  After I debug the program and the breakpoint is hit ,  I right click on the trace variable and select Quick Watch:

image

After expanding the structure,  Select the variable, the content of the expression edit box is what the QuickWatch  tool is using to evaluate val :

image

You can remove the breakpoint, we won’t need it anymore. Go to the next line (line 29)  and create another breakpoint :

image

right click on it and select, when hit:

image

In the dialog make sure  Print Message  and Continue execution checkboxes are both  is ticked.

now we can use the expression we extracted earlier: paste it into the edit box and enclose it with curly braces :   {(trace).valthe cause tell the debugger to evaluate the the expression and output the result.

Write  some text that will the value some context for example C++ Island -Debug the value of val is  {(trace).val}.

image

Now run the program and search the output window for you message:

image

This method works the same way  even when the value you are interested in is buried deep inside other data structure(s). Getting  the secString from the the vector inside internal structure is just as easy:

image

The expression turns out to be

((((trace).secStruct).secStringVec)._Myfirst)[0]

Conclusion

When printing a variable value during development, consider trace points However  if the variable values  serves as  an indication the program is running as it should (a virtual green LED)  , it might be a hint that an  an assertion or  a test case is a better  tool for the job.

Raw Strings in C++

 

Sometimes Strings in C++  must be pampered and gently persuaded to display correctly.  Consider a string representing a small HTML block :

<h1 style="color: #5e9ca0;"><strong>welcome!</strong></h1>
<h4 style="color: #5e9ca0;">these are the available options:</h4>
<ol>
	<li>Reset Sensor</li>
	<li>Disengage power coupling</li>
</ol>

generating this output :

welcome!

these are the available options:

  1. Reset Sensor
  2. Disengage power coupling

if I  want to keep the HTML block in a string  literal,  I  can’t just copy and paste it from  a text editor (where I presumably created and tested it) . It will have to escaped it first, and then either split  to adjoining static strings (or have a very long one-line string):

std::string htmlMessage =  "<h1 style=\"color: #5e9ca0;\"><strong>welcome!</strong></h1>\n"
"<h4 style=\"color: #5e9ca0;\">these are the available options:</h4>\n"
"<ol>\n"
"<li>Reset Sensor</li>\n"
"<li>Disengage power coupling</li>\n"
"</ol>"

Fortunately, with C++ 11  and up this  kind of manual pre-processing can be a thing of the past:

 std::string htmlMessage = R"(<h1 style="color: #5e9ca0;"><strong>welcome!</strong></h1>
  <h4 style="color: #5e9ca0;">these are the available options:</h4>
  <ol>
	  <li>Reset Sensor</li>
	  <li>Disengage power coupling</li>
  </ol>)";

The unusual form R”(<actual string>)”  is the raw string modifier. When encountering  a raw string, the compiler will treat all characters in the string literal as simple  plain characters including normally escaped characters  . For example,  It will not attempt to translate ‘\t‘  into a tab character,  instead, the string literal will include the actual sequence “\t

Note how the left and right parenthesis inside the quotes are part of the raw string literal.

By having this slightly more complex definition  the raw string can include the ‘ ‘    character itself inside the string without confusing the compiler into believing  the string has terminated.

I am sure at least one of you ask: What if I want to include the  literal  )”  in my string ?     That standard has  you covered by having an optional delimiter string  at both ends of the string

So you can define the string as :

std::string rowString = R”C++Island(look ma “( and “) in a raw string!)C++Island!”

here the string  C++Island  comes between  the ‘“‘ and ‘(‘ or ‘)’   indicating to  the compiler to be out on the lookout for this specific  sequence signaling  the literal end.

The delimiter string can be anything you want as long as its 16 characters at most.

C++ raw string’s power become’s its  weakness when trying to express special character such as newline ‘\n‘ or tab ‘\t‘. However, you can always fall back on the old familiar  none-raw strings   you can also contact the two literals types together :

std::string str = R"(The '\n' character can easily be shown)" "\n"  "and used";
std::cout << str;


The ‘\n’ character can easily be shown
and used
[/color-box color]

Conclusions:

While not being a game changer in C++ it’s nice having the language  do the boring technical stuff  for me instead of the other way around.  I mean isn’t  that what software development is all about ?   plus all the other kids languages have  this feature, it about time we get to play with it as well.