C++ programming on Cloud 9

A weblog dedicated to Visual C++, interoperability and other stuff.

October 2007 - Posts

Fun with templates: Implementing a tri-state value

A couple of weeks ago I was working on a text file parser and I needed to know when certain variables had been assigned a value so that my algorithm could skip the remainder of that file section.

Since the ordering of data is not guaranteed explicitly (and I wanted to keep my algorithms reusable for other parts of the program) I had to find a way to keep track of which variables had been assigned to and which not.

The easiest way to do this was to keep a Boolean flag for each variable that I had to track, but this is error prone and cumbersome to maintain.

The better solution would be to use a template class J that could convert to and from the required datatype, and with the appropriate functionality for keeping a state flag. That way each of the variables had all the tracking functionality built in while still allowing me to assign and read from them like I would with a normal string or int or whatever.

So I slapped together a class that did just what I wanted and continued programming. But I thought it would be neat to fully work out the details and decided to spend some of my precious free time. The main requirement I placed on this class was that I wanted it to be used as transparently as possible, allowing me to use TriStateValue instances instead of their contained type without modifying any of the code if at all possible.

Base stuff

The basic features of this class are very simple.

 

template <class _ValType> struct TriStateVal
  {
  private:
    bool Sentinel;
    _ValType Value;

    void VerifyState(void) const
    {
      if(!Sentinel)
        throw ExTriStateVal();
    }

  public:
    typedef _ValType value_type;

    TriStateVal() :
      Sentinel(false), Value(_ValType()) {}

    explicit TriStateVal(_ValType const &NewVal) :
      Sentinel(true), Value(NewVal) {}

    TriStateVal(TriStateVal<_ValType> const &original) :
      Sentinel(original.Sentinel), Value(original.Value) {}
  };

The data is kept private to keep others from looking at it, because that would defeat the whole purpose of this class.

The VerifyState method is also for internal use only. Its purpose if pretty obvious, and it will be called anytime an operation is performed that relies on Value having been assigned a valid value.

The 3 constructors do nothing more than initializing the internal variables. The conversion constructor has been marked explicit for the reasons detailed in an earlier blog post.

Assigning and casting

The 2 major features of our class is that we want to assign a value of type _ValType, or that we want to assign it to a variable of type _ValType. This is handled by the following operators.

TriStateVal<_ValType>& operator =(_ValType const &NewVal)
{
  Value = NewVal;
  Sentinel = true;
  return *this;
}

operator _ValType()
{
  VerifyState();
  return Value;
}

Assigning a value automatically sets the sentinel to true during the assignment. The type cast throws an exception if the sentinel is not set. It’s pretty boring stuff actually

Then there are a couple of methods included for convenience

void Clear()
{
  Value = _ValType();
  Sentinel = false;
}

bool HasValue()
{
  return Sentinel;
}
   
_ValType GetValue()
{
  VerifyState();
  return Value;
}

Implementing the standard operators

With all that stuff out of the way, I decided it would be nice to implement all the standard operators that would allow you to e.g. add and subtract 2 TriStateValues as if they were real value types.

The nice thing about templates is that I can implement them all and don’t worry about whether _ValType actually supports them or not. There will only be a compiler error if you use them and they are not implemented.

The implementation is very simple. A lot of methods are identical, save for the actual operator. For example, the implementation of the + and the – operator are exactly the same except for the operation that is performed.

This is where I decided to take the easy road and use macros.

I defined a macro for each type of operation, and then used that macro to implement all operations that have the same structure.

I am not going to list all the individual operators and implementations here. Instead I will list one operator macro and the it is used in the code

Standard arithmetic operations

These operations perform simple arithmetic operations on the 2 operands, and then returns a new TriStaveVal instance with the result.

#define TSV_OP(OP)\
    TriStateVal<_ValType>
      operator OP (TriStateVal<_ValType> const &rhs)\
   
{\
      VerifyState();\
      rhs.VerifyState();\
      return TriStateVal<_ValType>(Value OP rhs.Value);\
    }

Making this a macro gives me the advantage that I have to implement it only once. This prevents copy / paste errors and it also increases readability.

The function itself will verify that the right hand side and left hand side arguments have a valid value, and then perform an operation on both values and return the result. This macro implements the following operations in the class body:

    TSV_OP(+);
    TSV_OP(-);
   
TSV_OP(*);
    TSV_OP(/);
    TSV_OP(%);
    TSV_OP(^);
    TSV_OP(&);
    TSV_OP(|);

Other considerations

I chose not to implement the * and & operators because that wouldlead to the possibility of Value being out of sync with Sentinel. Ditto for the -> and ->* operators.

I didn’t see any use for a ‘,’ overload, a function call operator or a an array subscript so I left those out as well.

The end result is IMO a good balance between transparency and simplicity. The code is available for download as usual. If you experience problems or have suggestions you are welcome to drop me a message.

Posted: Thu, Oct 11 2007 6:48 by vanDooren | with 1 comment(s)
Filed under:
C++ keyword of the day: explicit

Suppose we have the following class:

struct Foo
{
  int m_i;
  Foo(int const &i)
  {
    m_i = i;
  }
  operator int()
  {
    return m_i;
  }
  bool operator==(Foo const &foo)
  {
    return foo.m_i == m_i;
  }
};

This class is a wrapper for ints. It allows you to transparently substitute Foos with ints in normal operations. This way you can use Foos like ints with additional meta data attached.

We also implement the normal operators (== in this example) so that we can operate on 2 Foos as if we were using 2 ints,

The constructor allows the compiler to convert an int to a Foo, and the typecast allows it to convert a foo to an int. This can be a problem. Consider the following use case:

  if(foo == 1)
    cout << "foo == 1" << endl;

Which will cause the following ambiguity error:

error C2666: 'Foo::operator ==' : 2 overloads have similar conversionscould be 'bool Foo::operator ==(const Foo &)' or 'built-in C++  operator==(int, int)' while trying to match the argument list '(Foo, int)'

To get out of this situation without removing either the constructor or the type cast, we can use the keyword explicit like this:

struct Foo
{
  explicit Foo(int const &i){ } //details omitted
};

Now there is only one conversion path left: Foo to int.

It also makes it impossible to use implicit conversion during an assignment but you can also implement an assignment operator.

You lose the ability to implicitly convert an int to a Foo at construction time. You also lose the convenience of having ints silently being converted to Foos, but perhaps that is not so bad from a design point of view because that might introduce subtle bugs.

Posted: Mon, Oct 8 2007 2:52 by vanDooren | with 3 comment(s)
Filed under:
Integral promotion: It doesn't have to be logical. It's the law.

Integral promotion is one of those things that everybody takes for granted, but most people don’t know exactly how or when, or what it does exactly.

Simply said, integral promotion means that an integer type can automatically be widened to a wider type. For example, in the following statement, b is automatically widened to a short because a is also of that type.

  short a;

  char b = 2;
  a = b;

Everybody assumes this will happen because it is the logical thing to do.

What most people don’t realize is that integral promotion is not something that happens when it is logical. It happens when the C or C++ standard says it should happen.

Every once and again, someone posts a message in the public VC++ forums or newsgroups about how he found a bug in the VC++ compiler for the following code (it’s nearly always the same code).

  short a = 1;

  short b = 2;
  a += b;

This example will cause a warning at warning level /W4: warning C4244: '+=' : conversion from 'int' to 'short', possible loss of data.

There is even a bug logged at connect about this behavior, and it’s gotten good feedback because a lot of people really think this is a bug, but it’s not.

The C++ standard says that the result of += is subject to the integral promotion rules that will promote narrow integer values to proper ints. So even though a and b are both of type short, the results of += is of type int. when that is assigned to a the compiler recognizes that you are assigning an int to a short and generates the appropriate warning.

Integral promotion happens for the result of several operators. The one that triggered this blog post is the unary + operator.

For those who don’t know what it does: it returns the value of whatever is on the right hand side. You can use it like in the following contrived example.

  short a = 1;  short b;

  b = +a;

The funny thing is that the result of the unary plus is also subject to integral promotion. The result of +a is of type int, so it should also generate warning C4244. Actually, that is not entirely correct. According to the language lawyers in comp.lang.c++, the compiler is not required to generate a warning. However, if it generates a warning for 1 case (+=) it should logically also generate a warning for the other cases.

So I logged a bug on connect. I am not sure that it’ll get much attention, and I even hesitated to put it in. But perhaps this is a very simple fix, and if more C4244 warnings are given, then perhaps more people will learn and understand the integral promotion rules.

Some links to documentation for promotion rules:

Standard C++ conversions explained in MSDN

C++ arithmetic conversions in MSDN

C type conversions in MSDN

Usual arithmetic conversions (C) in MSDN

 

Btw, you might think that the unary + is a completely useless language feature, but I got a couple of use cases on comp.lang.c++ that convinced me otherwise:

  1. Kai-Uwe Bux: One thing that comes to mind is that it renders expression like  + some_var + some_other_var + yet_another_var legal. This can come in handy when you want to generate code mechanically
  2. James Kanze: I suspect that there's an orthogonality issue involved.  It would seem rather odd to have unary minus, but not unary plus.
  3. James Kanze: In another thread, Alf used +0 to force a char to be treated as an int---unary plus would work as well (and be even more obfuscation), e.g.:
    char c = 'a' ;
    std::cout << +c << std::endl ;
    will display "97" on my machine, whereas without the +, it displays "a" (but I still prefer "(int)( c )").
  4. Ben Pfaff: In C, sometimes the unary plus operator is used to keep an expression from being an lvalue.  For example:
    struct opaque { int foo; };
    #define foo_accessor(OPAQUE) (+(OPAQUE)->foo)
    Without the +, the foo_accessor macro could be used to set the value of `foo', as in foo_accessor(x) = 0;
    With the +, the above will not compile
Posted: Fri, Oct 5 2007 2:28 by vanDooren | with no comments
Filed under:
Fun with templates: Partial specialization

A couple of weeks ago I was writing the code for my previous article, and I needed to implement partial template specialization for a certain class. I had some problems, did some research, consulted other MVPs and finally got a good solution and a proper understanding of the subject at hand.

Before I start the topic of this article, I will quickly explain some of the terminology as it is used in the official C++ standard. Templates are complex enough as it is without adding terminology ambiguity.

In this article I will assume that you already have a basic understanding of C++ templates so that this article does not need to be 30 pages.

As written down in The Book of Words

A template class always has one primary template declaration:

template <template-parameter-list> class template-name {...};

A basic example of a primary template declaration is this:

template < class T> class foo {...};

A template class can also have 0 or more specializations. A specialization is a definition of the template class for which one or more template arguments are made specific.

template <template-parameter-list>
class template-name< template-argument-list > {...};

An example of a specialization for Foo is this:

template <> class foo<std::string> {...};

Now, in order to avoid confusion later one, it is time to explain some of the terminology that is used in the C++ standard.
  • Template parameter list: this is the list of template parameters right after the ‘template’ keyword (14.1). I.e. for our example Foo, this is < class T>.
  • Template argument list: this is the list of arguments after the class name (14.2, 14.5.1). For primary templates, this list is omitted in the declaration because it is implied by the template parameters (14.5.4.5). For specializations it has to be declared explicitly because the compiler needs this information to know for which combination of parameters the primary template is specialized. For our primary declaration of Foo the argument list is <T> but it is omitted from the declaration. For our specialization, the argument list is <std::string>.
  • Template ID: A template ID of a class is the template name, followed by the list of template arguments (14.2, 14.5.1). For our example the template ID of the primary template declaration is foo<T>. For the specialization it is foo<std::string>.
  • Partial specialization: a specialization of a template that has one or more arguments that are made specific, but at least one argument that is still determined by its parameter list.

Partial configuration confusion

Now, the C++ standard also gives some examples of partial specializations in section 14.5.4.4:

template<class T1, class T2, int I> class A { }; // #1
template<class T, int I> class A<T, T*, I> { };  // #2
template<class T1, class T2, int I> class A<T1*, T2, I> { }; // #3
template<class T> class A<int, T*, 5> { }; // #4
template<class T1, class T2, int I> class A<T1, T2*, I> { }; // #5

This –together with the surrounding definitions in section 14.5 seems to suggest that in a declaration of a partial specialization, the template parameter list of a partial declaration always has less elements than the parameter list of the primary template.

Suppose we have the following template class and specialization:

template <class T> struct Foo {Foo();};
template <> struct Foo< int> {Foo();};
Foo<int> foo1;  //specialization
Foo<float> foo2;//general class

In my original understanding, I assumed that if Foo had T as an argument, then any specialization could have only 0 template arguments.

This is not the case. The specialized class can have any number of elements in the template argument list, as long as the template ID itself has the correct number of arguments, which can be made up from template parameters.

The following is a perfectly valid specialization for Foo as well:

template <class _Key, class _Value>
struct Foo<map<_Key, _Value> > {Foo();};

This allows you to specialize Foo for maps with any kind of type of _Key and _Value.

As you can see, this is much more powerful because it allows you to specialize any template for any type of argument, while still giving you a way to use the composites inside (i.e. _Key and _Value in this case).

This way it is trivial to make a specialization of Foo for maps, for vectors etc, despite the fact that vector and map themselves require different numbers of template arguments.

Round hole, meet square peg and Mr. Hammer

Now, suppose that this was not possible, like I thought originally. If your original template has only 1 parameter, how would you be able to make a specialization for vectors if you have no way to know or specify the type of the items in the vector?

Well, you wouldn’t. Not directly anyway, but I found a weird way to actually make it possible within my imagined constraints.

Suppose we make the primary template like this:

template<class T, class Q= void> struct Bar {Bar();};

Bar is a template class with 2 parameters. The second is optional and void by default.

We can then specialize Bar like this:

template<class T>
struct Bar<T, vector<typename T::value_type> >{Bar();};

Bar is specialized by declaring that the first argument is T (which can still be anything), and by specifying that the second argument is a vector of which the elements are of the same type as the elements of type T, assuming that T actually has a typedefs for value_type. Lucky for me, the STL containers all make internal typedefs for their template arguments.

Note that I stay within the constraints I assumed where there: The number of elements in the template argument list is less than the number of elements of the primary template declaration.

For example for a map specialization the declaration would be like this:

template<class T>
struct Bar<T, map<typename T::key_type,
                  typename T::mapped_type> >{Bar();};

So even though map needs at least 2 type arguments, we can get away with specifying only 1 because we extract everything from T in the assumption that T actually has those typedefs inside. If T doesn’t, then the specialization will not match.

Afterthoughts J

My original solution is kludgy, but it works. At least, it would have worked if not for a compiler bug in the Visual C++ compiler. With gcc 3.2.2 and gcc 4.1.1 it works as intended, but the VC++ compiler fails to recognize the specialization, and instantiates the general template instead.

I have filed a bug report on connect, which has been validated by Microsoft on Orcas b2. It was too late to fix it, but it has been added to the internal bug database for fixing post Orcas.

The demo code can be downloaded under the MIT license as usual.

When I learned that what I wanted to do had an easy solution, I was a bit peeved that I didn’t know this. After all, I have been using templates for years now. And I was a bit embarrassed as well because I am an MVP after all, so it is kind of expected that I know C++ fairly well.

But then I reread Stroustrup’s book (well, the chapter on templates) and it was explained there like I understood it, so no blame there. It didn’t forbid longer template parameter lists, but it didn’t mention that they were allowed either, and of the examples had shorter parameter lists for the specialization.

Then I decided to check the C++ ISO specification. After all, that is the holy text all compilers should obey. However, I found that it had the same shortcoming. I have consulted with my fellow C++ MVPs, and got the following reply from John Carson

<quote>

…I agree that the possibility that the template parameter list of the
partially specialized class could have more parameters than the template
parameter list of the primary class is rather unexpected, but it seems to be
a fact. An example is given in Josuttis and Vandevoorde, C++ Templates,
p.351 (who remark that "it may come as a surprise that the partial
specializations involve more template parameters than the primary
template").

</quote>

So I am far from the only person to be surprised by this fact.

Btw, anyone who really is into programming C++ should have a copy of C++ ISO standard 14882 lying around. You can easily find it as a pdf with google. That, together with Stroustrup’s book are the 2 things I still use today whenever I need to know specific things.

Posted: Wed, Oct 3 2007 2:29 by vanDooren | with 1 comment(s)
Filed under: