Cluebat-man to the rescue

A weblog dedicated to Visual C++, interoperability and other stuff.

September 2008 - Posts

AdSync validated for the plant network

Last week my AdSync application was validated for running on the plant network.

The plant network exists for running the DCS application that controls everything and the kitchen sink. The application runs on a different desktop as some sort of kiosk application. This means that the users have no access to the windows desktop.

The user management of this network is completely implemented within the DCS software. This means that all groups and privileges are application defined. I can create new groups and modify permissions, but all of that stays within the application. Of course the user accounts themselves exist in Active Directory, but everything that is DCS related (groups and permissions) is configured in an application database.

The main reason that this is done (I think) is that the software can also run in a workgroup environment where no Active Directory exists. Imo they should deprecate this and switch to AD only as soon as possible. It's not like there's any reason not to run a domain, given the size of the networks for which this software is intended.

Anyway, this is a bit of a problem because people get assigned to groups, based on the department they are working for, and the things they need to do. E.g. people are in the QA group, or Automation, or Technology…

So far so good, but unfortunately, the department someone is in also has implications for Active Directory. Automation people need to be able to reboot certain servers, including the domain controller. The technology group needs a protected folder on the fileserver for storing the reports they generate for the FDA filing of our product…

Before now, there was no way to handle all these things, other than manually creating groups and assigning people to it on an as-needed basis. This is tedious for one thing, but it people change groups, their other permissions would not be changed accordingly.

To solve this problem is a scalable way, I've programmed an application called 'AdSync'.

AdSync will be executed every night after the daily backup scripts have taken an export of the DCS configuration database. Another of my applications will parse this database file and generate an XML file containing just the user account configuration, including users, groups and permissions.

AdSync will first check the groups that exist in the XML file, and verify if they exist in AD. If they don't exist, they will be created, but with a 'DCS ' prefix. This way there is a clear distinction between the existing groups, and those that are managed by my application.

When that is done, the group membership of the user in the XML file is compared with the Active Directory group membership. If they are different, the necessary changes will be made to the AD group. This way, the AD membership is always a mirror of the DCS group membership.

The synchronization is done only once per day, because it is based on the nightly backup of the configuration database. Exporting that database takes about an hour, and renders the system unusable for engineering work. This means we can only do it at night.

Every change that is made by my application, as well as the execution timestamp and any errors that may occur are logged in a custom Windows Event Log for auditing.

Since this is a production environment of pharmaceutical products, installing this on the system takes a good amount of paperwork. The change has to be approved by QA and other relevant departments; I have to do an impact assessment; I have to write an installation manual; I have to set up a validation protocol, and execute it with a witness from validation to prove that the application works as intended; …

So all in all, it takes quite a lot of work, but now everything is ready. Yaay me. J

My application was written with VC# 2008, but built to run on the .NET 2.0 framework, which is qualified by the DCS vendor for use alongside the DCS software. The Active Directory interface was made using the System.DirectoryServices namespace. I used the .NET Developer's Guide to Directory Services as a reference, because the System.DirectoryServices namespace is horrible underdocumented in MSDN.

I have to admit it is nice to keep on developing software. Officially I am a system administrator, but my boss agrees with me that making custom tools like these is something of real value to the company, and something that cannot be outsourced without spending a significant amount of money.

Planning the plant shutdown, part 2

The preparation of the plant shutdown continues unabated. More and more I can understand why this needs months of intense planning.

There are lots of dependencies between everything that needs doing. For example: the floors of the cold room need new coating. For this the entire room needs to be brought to near environmental temperature, but this cannot happen too suddenly or the floor will crack.

Sanding away the old coating can be done while the temperature is still low. And during the first day, another team has access to the ceiling panels, but not during the final sanding or coating stages. Other work needs to wait until the coatings are dry. The other teams can then work in the cold room as the temperature is dropping again.

But as the temperature is re-established, it is important that the controller loops can run uninterrupted. This means from that time onwards I cannot disrupt the DCS network anymore because that would cause problems for the people who rely for their work on the correct functioning of the DCS network.

After a lot of discussion back and forth, I've gotten 1 weekend (from Friday evening to Sunday evening) where I can do with the system as I please. During the same weekend, the utilities will all be down, as well as mains electrical distribution.

During the shutdown I need to do quite some things. But as luck has it, we have bought additional server racks and servers. Several servers need replacing anyway, so this is a rough sequence of how we will do things:

  1. Create an additional Domain Controller, put it in the new rack, disconnect it from the network, and connect it to a new switch. That switch will be the backbone of a 'clone' of the DCS domain.
  2. Rename it to the same name as the old master server which has to be replaced.
  3. Issue a development freeze so that we can restore the databases to the new server without risking the loss of work.
  4. Restore the master databases and perform the software upgrade.
  5. Add new servers in the rack to replace the other machines that needed replacing. All new servers will be connected to the cloned domain.
  6. Move the engineering workstation to the cloned domain.
  7. Wait for official signoff of the DCS network. At this moment we have the major infrastructure running on a cloned domain.
  8. As soon as we have ownership of the network we shutdown the live domain controller and all other servers that have a replacement servers running in the cloned domain.
  9. As soon as the domain controllers are down, the backbone switch of the cloned domain is connected to the live network so that the new domain controller can take over.
  10. The automation engineers can start their work as soon as that happens, because the engineering workstation is already back online.
  11. While the automation engineers do their thing, we can move the remaining servers into their new slots in the new server racks and reconfigure them.
  12. The old servers which were replaced can be taken out of the existing racks and put aside for repurposing.
  13. The additional servers which were bought for new functionality can be moved into the existing racks and configured.
  14. While all this is going on, the operator terminals can all be reconfigured 1 by 1 after their software upgrade.

This is a rough first draft of how what I will do during shutdown. I'll need to make a more detailed planning and see what can be done in parallel. The brunt of the work needs to be done beforehand so that the cloned domain can be prepared well before the shutdown. We only have 2 days of shutdown to do what we need, so everything needs to be done as efficiently as possible.

Posted: Sep 29 2008, 12:25 AM by vanDooren | with no comments
Filed under: ,
My latest article in NTInsider

A bit late perhaps, but my latest article on API development got published in the NTInsider (owned by OSR Online).

The online version can be found here. It requires free registration.

I know that at least 2 people have read it completely, because that is the number of mails I got to tell me I missed something.

  1. In the code example at the bottom, I acidentally switched the names of fooA and fooB in the declaration. Doh!
  2. That same code, compiled for a 64 bit platform, gives a linker warning. I don't know how I missed that. This code was made using VS2005, so one of two things could have happened: VC2005 didn't report the warning (unlikely), or I only checked the compilation in detail, and then only checked if the function was exported without checking if there were linker warnings. It's probably the latter.

I'll see if I can get these errata published. But at least now I know that people actually read my articles and use the code. Well, at least one person is using the code :) I'll have to make an extra effort next time to verify that a) I didn't make a stupid type, and b) There are no warnings anywhere.

Configuring an application through a batch file: what domain am I in?

I recently had to make an install script for one of my applications which manipulates the Active Directory. The app itself is executed via a scheduled batch file, and it expects the domain name as a command line parameter. I wanted the batch file to figure out on its own what the domain was of the computer on which it is executed. That turned out to be harder than I thought, but in the end I found an elegant, if rather hairy solution.

FOR /F "tokens=1* delims=REG_SZ " %%A IN ('REG QUERY HKLM\System\CurrentControlSet\Services\Tcpip\Parameters /v Domain') DO (
SET CURR_DOMAIN=%%B
)

This code snippet will query the registry to get the Domain the computer belongs to. Of course, the REG QUERY command returns not just the name, but a tab separated table with the key name, the key type and the key value. That table is then munged by the 'FOR' loop which uses 'REG_SZ ' (the whitespace is a tab character) as a delimiter. The actual domain name itself is then stored in the variable 'CURR_DOMAIN'

It's crude, but it works rather well. This trick saves me the necessity of having someone configure this by hand when the application is installed. Apart from the fact that this is error prone, it would make the installation procedure more verbose.

I haven't tried this on computer that is not part of a domain.  Luckily, that is not a situation that will ever occurr on the network that I have to maintain.

Posted: Sep 25 2008, 01:00 AM by vanDooren | with no comments
Filed under:
C++ keyword of the day: export

The export keyword is a bit like the Higgs boson of C++. Theoretically it exists, it is described by the standard, and noone has seen it in the wild. :)

Before I get flamed to hell and back: that last part is not entirely true. There is 1 C++ compiler front-end in the world which actually supports it. That is the one made by EDG (Edison Design Group). This is used by Comeau which claim to have the only compiler that is 100% standards compliant and Borland (for which the support of export is a bit vague). I heard that the Intel compiler shipped with it, but I don’t know if this is true or not.

What’s all the fuss about?

It seems a bit strange that a language should contain a keyword that all compiler makers refuse to implement. Comeau has some interesting content here as does Wikipedia.

The reason that I believe 'export' has never really taken off is that is seems to be an significant effort to implement it, combined with the fact that is doesn't achieve all that much. The export keyword allows a C++ programmer to declare a template class in a header, and then provide the implementation separately in a cpp file. This is the normal C++ way of doing things. Unfortunately, there's a snag.

Template class implementations cannot be compiled on their own. Template code itself is meaningless without specification of template arguments. Thus it is that a compiler doesn't know what to do with an implementation until the template class is used somewhere. At that point the template arguments will be known, and the compiler can compile the template class.

Since the template is only compiled then, the easiest thing to do would be to do what most compilers do now: demand that the implementation is in scope when it is used somewhere. I.e. the implementation has to be put in a header file, and that header file has to be included by the source file that is being compiled. the reason for this is that the compiler would simply not know where to look for it if the implementation was in a cpp file somewhere.

The export keyword would solve this by telling the compiler 'Look somewhere else for the implementation'. The compiler would then have to compile the implementation for that class, and save the object code somewhere for that combination of template arguments.

The export keyword sounds like a great thing, but it doesn't solve that many problems.

Separation of interface and implementation

The separation of interface and implementation is one of the cornerstones of C++ philosophy. The traditional way is to have the declaration in a header file, and then put the implementation in a cpp file.

Since a template implementation cannot be compiled anyway, we could just as well put it in a header file.

And then we could also make 1 header file with just the declarations, and underneath that include the header file(s) containing the implementation. The nice thing is that you can then put specializations in different headers to keep the code comprehensible.

This way we keep true to the principle of separation without needing the ‘export’ keyword

Distribution

You can’t distribute template classes in a binary form, no matter whether export is supported or not.

Because templates cannot be compiled on their own, you cannot link them into a static or dynamic library. The only thing you can do is to distribute the source code.

Templates are distributed as source code. The export keyword cannot do anything about that. And if you want to distribute templates, what are you going to do? You can develop your classes using ‘export’, but then they will be usable only by a very limited subset of compilers.

If you develop your classes as if ‘export’ doesn’t exist, then they can be used by all compilers.

Of course you could do both, and use ugly macro magic to construct your code in such a way that would leave the choice to the client programmer. Some people do this. But that is kind of  ugly and time consuming. Furthermore, it’s not like it gets you any more clients, because they will be able to use your code in any case.

Compiler performance

It is true that our scheme of using header files is less efficient than using the export keyword. Without ‘export’, a template is compiled whenever it is used. You can mitigate this somewhat by including commonly used templates in a precompiled header file. But the impact will only be somewhat lessened.

And whenever an implementation detail changes, every source file which includes the template has to be recompiled. This can be a real pain in large projects.

Conclusion

‘export’ is facing the chicken or the egg dilemma.

‘export’ is neglected by most compiler vendors because it is not an important language feature -> You cannot use it in code that has to be portable (which is ironic, since it is a standard keyword) -> it is not used in many scenarios -> there is no large codebase depending on support for ‘export’ -> there is little incentive for compiler vendors to start supporting it.

Make no mistake: I would appreciate it if I could use ‘export’, if only because it allows me to organize my template implementations in a standard manner, but I don’t think it’ll happen soon.

Smoke and mirrors in the world of finance

The recent nosedive of Lehman Brothers brought back some memories of a company that I once worked for as a contractor.

That company was a NASDAQ darling for a long time, but the financial situation was not great when the stock market for high tech stuff collapsed. Our site had to work hard to save enough money to be spared the axe. Unfortunately, in the year leading up to the stock collapse, a lot of equipment had been ordered for the purpose of increasing our ouput with 100%. They had to do this, because a lot of equipment had lead times up to a year.

Thus it was that while everyone started to wonder about the reality of getting virtually no orders at all, new equipment was pouring in that we already didn't need anymore. And in the semiconductor high speed optics world, there is no such thing as cheap equipment. The cheapest thing was probably a power meter at 30 KEuro. The most expensive thing was a 500KEuro 10 GGHz bit error rate tester.

I should note that the CFO of the site was very good at what he did, which allowed the site to exist for as long as it did. In order to make the papers look better, he invoked a special rule over the new equipment: noone could use it.

Apparently, if you don't start using equipment, it is not subject to the write off procedures, and it hasn't actually cost any money in the books. The hardware still represents the original sum of money and as a result, doesn't show up as a cost. Financially, it was just very bulky money. Note that this is completely legal.

Unfortunately it is also as realistic as the tooth fairy, because in reality, the money is already gone. So you get situations like I had, where you have a power meter shaped hole in a measurement system and a brand new power meter waiting to be used. I wan't allowed to put 1 and 1 together because that would cost money. I argued that the money was already gone, and we might as well do something useful with the equipment. And while the director agree with me, he also said that it didn't work that way on paper.

It was then that I realized the world of finances is all smoke and mirrors, and it only works as long as the people believe the dream. As soon as everybody discovers that the gold is just gilt, you get what happened to Lehman Brothers. Financial institutions are like hot air balloons. If the hot air gets out, all that is left is the skin and the dead weight, and it will plummet to its rocky end.

Posted: Sep 18 2008, 12:20 AM by vanDooren | with no comments
Filed under:
References in C++ are not necessarily safe

People who are new to C++ sometimes have the mistaken idea that using references instead of pointers makes your code safe. People who have been programming a bit longer know this is anything but the case. References are just semantic sugar coated pointers.

I'll explain in more detail with a couple of examples. They use this pretty simple class A.

struct A
{
 
int &I;
 
A(int &i) : I(i)
 
{
 
}
 
void print(void)
 
{
   
cout << I << endl;
 
}
};

When A is instantiated, the constructor takes a reference to an integer and uses it to initialize an internal integer reference. The print method simply prints the value of that integer.

Case 1

int * i= new int(32);
A a(*i);
*i = 43;
a.print();
delete i;
a.print();

I create a new integer value on the heap and I initialize a pointer with its address. I then create a new instance of A, and pass the newly created integer by reference. a initializes its own internal reference with the address of that integer. And indeed, if I assign a value to that integer and then print a's internal reference, the values match.

But what happens if that integer is deleted from the heap? It depends. The behavior is undefined. The next read from a.I could result in a bad value or an application crash. But in any case, it is not safe.

You might argue that I am purposely using pointers to cause problems, but in a large application, you won't know how and where your objects will be created. This could happen.

Case 2

A second case is much more innocuous. It uses no visible pointers.

vector<int> v;
v.push_back(42);
A b(v[0]);
v[0] = 43;
b.print();
v.pop_back(); //remove that element from the container
b.print();
v.push_back(2); //put something else in the same location
b.print();

Instead of dynamically allocation memory myself, I put an integer in a vector and then feed a reference to that element into an instance of A. This new instance (b) is now using the location of an element in a vector as an internal reference.

The thing with vectors is that they normally have more capacity than elements. This is because you don't want the vector to allocate more space for every element that gets inserted. Likewise, you are not going to do a deallocation with each element that gets removed.

Because of this, when that element is removed, the physical space is still there. A read from that location is not immediately going to trigger a crash. Instead, the program will keep on running, but with bad data instead. Or maybe not, if that element was at the threshold of triggering a deallocation. With bugs like this, you never know what will happen.

Case 3

There is an even more interesting option. I am not going to write a repro case for it, but consider the following: I create multiple instances of A, giving them each a reference to an integer that is located on the stack, and I pass those instances of A as pointers to newly created threads, to be used there.

Meanwhile, the stack on which the integers are located is unwound. What happens next?

Again, it depends. Probably the stack gets stomped. Or perhaps only the data will be corrupted. Again, what happens is unknown. And I don't say that this design pattern is a good one. It isn't. But given certain constraints, it might just be an approach for solving a specific problem (though perhaps not the best one).

Conclusion

By now, anyone should be convinced that using references do not protect your application against design mistakes.

The only time your references are really safe is if:
a) your application is single threaded, and there is no asynchronous code executed anywhere, and
b) your classes don't store references anywhere.

Outside of those constraints, references don't guarantee anything more than pointers would. The only guarantee you have with references is that at some point in time, there was an object of the correct type at the specified location, and it was not NULL. Probably.

If someone is actively trying to subvert your code, you can't even be sure of that. But since there is nothing you can do about that, you might as well use references because they make your life easier, and are better than raw pointers when it comes to preventing honest mistakes.

Posted: Sep 17 2008, 07:46 AM by vanDooren | with 1 comment(s)
Filed under:
Please use a 'real' email address

Yesterday I tried to register on the opcfoundation forum so that I could find out why the opc SDK comes without documentation or code samples.

I know that corporate members (i.e. the ones who pay $$$$) can download code samples and specifications, but a publicly available SDK is not going to do anyone much good without access to at least the documentation.

Anyway, the message board required me to specify my email address so I used my gmail address. I also made a mistake while typing in the captcha. After looking again after the first 2 failures, I noticed that it had to be typed in from right to left, but by then my email adress had been banned.

I sent a mail to the forum admin, asking for the ban to be undone. This is the complete message I got back.

All gmail accounts are banned.  Please use a "real" email address.

Apart from the total lack of style which I would not expect from an admin of a forum for professionals (notice the lack of addressing), it amazes me that people still think that webmail addresses are unclean (notice the sarcasm). Most people I know on most forums (incl the forum for which I am a moderator) use webmail. And this specific forum was very low traffic so it's not like they are battling spammers by the hundreds.

My gut tells me that the problem is that webmail addresses are primarily used by non-paying members. Foundations like these are interested in big bucks. If you can't even drop 1000$ for the membership, then you are obviously to poor to count for anything, and you can subsequently be dismissed without much thought.

Posted: Sep 16 2008, 12:57 AM by vanDooren | with no comments
Filed under:
How DLL exports really work

I found this list of article on Raymond's blog. Raymond's blog is one of the more interesting for programmers who use native APIs because he often touchs on things that are not documented, but interesting to know if you care about how things really work under the hood.

These links all point to information that goes into DLL importing and exporting, and how it is implemented under the hood. Very interesting stuff.

  • How were DLL functions exported in 16-bit Windows?
  • How were DLL functions imported in 16-bit Windows?
  • How are DLL functions exported in 32-bit Windows?
  • Exported functions that are really forwarders
  • Rethinking the way DLL exports are resolved for 32-bit Windows
  • Calling an imported function, the naive way
  • How a less naive compiler calls an imported function
  • Issues related to forcing a stub to be created for an imported function
  • What happens when you get dllimport wrong?
  • Names in the import library are decorated for a reason
  • The dangers of dllexport
  • Another thing which may be useful if you haven't read it yet are Michael Grier's articles on the NT DLL loader over here. I've blogged about those articles before. Michael is one of the project leaders who maintained the NT DLL loader for some time.

     

    Stuck on stupid

    It happens to every programmer. It doesn't happen often, but it does happen to everyone. You are looking at a piece of code or some debugging output, scratching your head and thinking 'This is impossible'. It's probably only 10 or 20 lines of code or text, and you've looked carefully at every line at least a dozen times over the last hour.

    Every line makes sense and seems to mean exactly what you think it means, every word is in place, every semicolon is accounted for, and yet the sum of those lines is something completely unlike what you think it should be.

    What makes matters even worse is that you just know that you are missing something obvious; something that anyone else would see after looking at the issue for less than 5 seconds. Maybe someone is even explaining the issue to you and you still don't get it.

    Well, I had such a moment just this weekend.

    I was working on some C++ code that involved friend functions and streams. I've never done much with streams in C++, and never done anything with friends. Basically, friend declarations allow e.g. class A to specify that class B is allowed to access its private variables. Hence the joke: in C++, your friends can see your privates. I've never used it in a design, because if other classes need access to private data, the design is probably wrong.

    There are some exceptions, where it can be extremely useful. One of these cases is if you have to create a serialization function that can serialize and deserialize an object. That function would need access to the private data to efficiently reconstruct the object, but you don't want any other functions or classes to have the same privilege.

    In my case, I had a template class that I wanted to reconstruct from a generic iostream. I implemented a global function that performed the reconstruction, but despite my friend declaration that gave istream access, the compiler didn't agree. This is what my code boiled down to:

    class A
    {
    friend istream;
    int i;
    };

    istream& operator >>(istream& is, A &a)
    {
    is >> a.i; //error C2248: 'A::i' :
    //cannot access private member declared in class 'A'
    return is;
    }

     

    I looked at it for a long time, read the 'friend' documentation in MSDN, consulted my copy of 'The C++ programming language, 3ed', asked a fellow MVP to explain it to me, and I still didn't get it. I reasoned: 'I just gave istream access to A::I, so WTF is the problem'.

    In this case, the solution was really a 'duh' moment. Yes, istreams have access to A. But – and this is the whole issue – it is not the istream doing the accessing. The istream accesses an integer by reference; an integer that was retrieved by the global operator >> function. My code was functionally equivalent to:

    istream& operator >>(istream& is, A &a)
    {
    int &temp = a.i; //error C2248
    is >> temp;
    return is;
    }

    And that is pretty obviously wrong. By writing is >> a.i, I lulled myself into believing it was 'is' who did the access, while in reality it was someone else.

    As I said already: this was one of those 'Aaargh' moments where I just seemed to be stuck on 'stupid'. Ah well. Live, learn, and have more coffee next time J Glad this doesn't happen every day, week, or month.

    Posted: Sep 10 2008, 03:31 PM by vanDooren | with no comments
    Filed under: ,
    The power of a workstation and the convenience of a laptop

    over a year ago, I built myself a new workstation, with a Core2Duo, SATAII hard disks in RAID1 configuration, 4GB memory, MSI nv7950 video card, ... The whole machine was everything a machine could want to be at the time.

    Unfortunately, I learned that having a powerful workstation is not all it's cracked up to be. The big problem is that you have to put it somewhere. It's big, it needs power, and it needs to be someplace where the kids will not try to feed it sandwiches or cereals through the DVD. On a table in the living room didn't get approval from She-who-must-be-obeyed, so I was forced to put it in our bedroom.

    Since that was not a good solution either (for practical reasons), I moved it to the basement where it sat unused for some time. But then I pulled a network cable from my wifi router to the basement, and life was good again. I installed that machine with Vista64, installed VMware and Visual Studio, Installed Office and all the other stuff I need, and then I enabled remote access.

    Now I can sit anywahere in the house with a laptop and connect via remote desktop to my workstation. I can do all my nerdy stuff while sitting in the living room with my wife while she is watching something on tv, and if I need to stop I simply shutdown or close the laptop. When I want to continue I simply connect to the remote session and proceed where I had left my work.

    And the good thing is that I can do all that work on a 6 year old laptop which is more than fast enough to run XP and RDP. And if I use my wife's laptop from work or mine, I can still have all my tools at my finger tips. Gone are the days of having to move stuff around and installing tools on multiple computers. Now I can put all that time into sufficiently geeky programming activities.

    Posted: Sep 09 2008, 05:24 AM by vanDooren | with no comments
    Filed under:
    Planning for the plant shutdown

    Come November or december, there will be a 3 week shutdown of our plant to perform maintenance and repairs. A lot of systems cannot be shutdown normally, or taken apart. A lot of the medical systems are never opened, or even turned off. This means that there are a lot of things that cannot be done while the plant is operational.

    Since I am the system admin in charge of the production network and servers, I am part of the group that plans this event.

    My part is both easy and hard at the same time. Easy because I only have to replace some servers and do a software upgrade, and I need no other departments for either action. But hard because everyone needs a live system to do their jobs.

    For example, if the people from maintenance are working on the cooling system, they need the plant software in order to open and close valves. And once the cooling system is up and running, the plant software should be up and running because they need working controllers. And system cleaning has to be done before the system is live again, but the clean steam utilities need working batch execution software...

    But there are much more dependencies, with some of them being difficult to manage. for example, for some of the maintenance, the ceilings in manufacturing have to be opened up. This also destroys the cleanroom aspect of that area, so it needs to be cleaned. Unfortunately, the cleaning cannot begin before the second week of shutdown, but other activities that have to wait for the end of the second week also need ceiling access. But another activity that has to start in second week for another reason needs a clean environment...

    So all in all a shutdown is a very complicated issue.

    For now it looks like I am going to do most of the work in the first weekend during which everything is shut down completely, including the mains power. I can get power to the server room via the emergency generator, and the second I get the go ahead signal on friday evening, I can start my work. 2 full days should be enough to cover the critical work, though it will be a very busy weekend.

    Posted: Sep 08 2008, 05:15 AM by vanDooren | with no comments
    Filed under:
    Regex bug in VC2008 SP1 TR1 library

    Yesterday I tracked down a bug in the TR1 regex library that is shipped with VS2008 SP1.

    If you use regexes, then the regex a| will cause an exception to be thrown because of supposedly illegal syntax. For example you could use (a|) in a regular expression to indicate that a string contains at a certain point either a or nothing.

    There are better ways to do this of course, and I've changed the problematic regex already, but it should have worked.

    I checked the .NET regexes, and they accept this syntax. .NET regex follows the ECMAScript syntax, which specifies pretty unambiguously in section 15.10.1 (Thanks Igor for helping me identify the correct section) that the alternative operator should accept empty sections.
    The TR1 proposal also specifies that ECMAScript is used, so it should accept this syntax.

    If you use TR1 regexes, take a minute of your time to vote for this bugreport on connect so that they fix it in the next SP / release.

    Posted: Sep 04 2008, 04:46 AM by vanDooren | with no comments
    Filed under:
    Extending a native C++ project with managed code

    One question that comes up from time to time in the newsgroups is ‘I have a native C++ project and I want to extend it with Managed code (e.g. Windows Forms). What do I do?’

    The answer is not so complex. It is fairly easy to extend native projects with managed code. In this article I’ll explain how.

    What NOT to do

    To quote Kate Gregory ‘In the name of all that is good and right: Do not set /CLR for the entire project’.

    /CLR tells the compiler that it should compile a source file as C++/CLI code in which it finds both native and managed C++ code. Theoretically you could specify this for the whole project, but this has some serious consequences.

    Apart from the fact that it is possible for the compiled output size to explode, there can be significant problems with COM and CRT initialization. I am no interop expert in the matter like Marcus Heege or Kate, but I am willing to take their word for it.

    Furthermore, your existing project is possibly validated by QA en unit tested 7 days from Sunday. You don’t want to travel that road again if you can prevent it.

    What to do

    You want to add managed stuff to your project while impacting the rest of the code as little as humanly possible. To achieve this you have to compartmentalize the managed stuff and give it a native interface that you can use in the native parts of you projects without any hassle.

    I think I can best explain this with a practical example.

    For the sake of this example I have a native DLL project that exports a function that returns an integer value. Perhaps this function got that integer value previously through an MFC interface, and now that value has to come from a Windows Forms interface.

    Add the windows form to your project

    This is the first and also the easiest step of the process. Rightclick your project and select ‘Add…->New Item’ and select ‘UI->Windows Form’ in the Visual C++ dialog.

    You will then get a message box, telling you that your project will be converted to a managed project. Here you click ‘yes’. Even though you don’t want to compile your project with /clr, Visual Studio itself needs to know that you are doing managed stuff in your project. This is not necessary for compiling your project, but if you want to have the benefit of working with the forms designer and other .NET related things, then it would be a good idea. And you cannot add the form if you choose ‘No’ so it’s not like you have much choice anyway.

    Now you have a .NET Windows Form in your code. You can verify the file specific settings if you want, but the cpp file will be compiled with the /clr flag set, and without the use of precompiled headers. Not using precompiled headers is important, because the default precompiled header is compiled without /clr set, and this would lead to conflicts.

    Of course, if you need to, you can create a copy of StdAfx.h/cpp cpp files called ‘StdAfxClr.h/cpp’ add them to your project, and configure them to create a managed precompiled header, which you could then use for all files that are compiled with /clr. This is not necessary for small projects, but it could be a useful optimization in large projects.

     The interface layer

    At this point you can compile and link the entire project, but your native code isn’t yet using the new managed functionality. And because the native code will not be compiled with /clr, it will never do so. In order to make that happen, you will need a thin layer between the native and managed code. This layer will have a native C or C++ interface which can be called by the native code, and a managed implementation that will do all the .NET stuff.

    In our example, the interface is 1 simple C style function in Interface.h:

    int __stdcall DoManagedStuff(void);

    And this function has a simple implementation in Interface.cpp:

    int __stdcall DoManagedStuff(void)
    {
          DemoForm ^df = gcnew DemoForm();
          df->ShowDialog();
          return df->Value;
    }

    Interface.h and Interface.cpp are 2 new files that you have to add to your project. You have to manually configure the cpp file to be compiled with /clr, and to not use precompiled headers.

    As you can see, the interface layer provides a clean native interface for the managed stuff that you want your code to perform. Of course, you are not limited to C style interfaces. You can also work with classes and make classes with a native interface and a managed implementation. But there are a couple of issues that you need to be aware of. If you want to go there, then it would be a good idea to read the paper written by Herb Sutter which explains the rationale behind C++/CLI, as well as some of the limitations and pitfalls.

    Actually, reading that paper is a good idea for anyone working with C++/CLI who also cares about understanding what is actually going on behind the scenes.

    Using the interface layer

    All the hard work is done. Now you can simply include Interface.h in your native code files, and call ‘DoManagedStuff’ where appropriate.

    #include "stdafx.h"
    #include
    "InterfaceLayer.h"
    int __stdcall DoFoo(void)
    {
          return DoManagedStuff();
    }

    Conclusion

    As you can see, extending native code with managed code is not so hard. At least, it isn’t if you maintain a clean break between managed and native code. If you cannot do this, things might become more difficult. For example, using .NET controls on an MFC dialog, or using .NET remoting in a native COM project are things that can be much more tricky.

    Another thing you should be aware of is that not all compiler switches can be used in conjunction with the /clr  switch. The compiler will inform you if it detects this. The combinations which are not allowed are documented, and can be found in the documentation of the /clr switch itself.

    This post was not meant to be an exhaustive how-to to those complex scenarios, but instead an explanation behind the basic ideas.

    The demo project for this article can be downloaded under the MIT license as usual.